Why Linux audit may freeze the machine
The netlink interface defines a specialized type of a network socket used to connect Linux kernel and user-space programs. One type of the netlink interface named NETLINK_AUDIT is used to implement Linux audit infrastructure. Central place of kernel-user connectivity is in audit.c.
File audit.c contains following declaration:
/* waitqueue for callers who are blocked on the audit backlog */
static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
Every function in the kernel, especially open() for opening files, execve() for executing programs or connect() for estabilishing network connections, is equipped with the SELinux control points which want to deliver an audit event. Kernel needs to put the events generated by all system calls in one place (into a queue), so these are processed by the user-space. Queue size is limited, it cannot grow too much. This size is defined by the command-line parameter audit_backlog_limit.
Callers wait on this queue to be able to insert (append an audit event element). They are released by the kauditd_thread kernel thread. This thread works in an infinite loop. It acquires read-copy-update lock to reference user-space audit daemon netlink connection and then sends pending socket buffers to the auditd.
If sending fails 5 times the buffers are moved to a backup queue. After the unicast part is done (sending to auditd) successfully or not kauditd_thread tries to send buffer data to multicast receivers (other programs willing to read from NETLINK_AUDIT).
One of the SELinux control points is the audit_log_start function. This function waits for kauditd_thread to finish it’s work then waits no more that 17ms in one loop iteration to acquire exclusive access to the audit_backlog_wait queue. Due to contention and concurrency the loop in audit_log_start may be executed many times blocking audited syscall.