Unix Domain Sockets
Local IPC with file descriptor passing, abstract namespace, and SOCK_SEQPACKET
Unix domain sockets (AF_UNIX) provide full-duplex, connection-oriented or datagram IPC entirely within the kernel — no network stack involved. They are the foundation of D-Bus, systemd socket activation, Docker's control socket, and countless other local daemons.
Socket types
| Type | Semantics | Boundary preserved | Use case |
|---|---|---|---|
SOCK_STREAM |
Reliable byte stream | No | High-throughput pipes, databases |
SOCK_DGRAM |
Unreliable datagrams | Yes | Short fire-and-forget messages |
SOCK_SEQPACKET |
Reliable, ordered messages | Yes | RPC, privilege separation |
SOCK_SEQPACKET is the sweet spot for structured IPC: it combines the reliability of SOCK_STREAM with the message-boundary preservation of SOCK_DGRAM. Each send() maps to exactly one recv() on the other side.
#include <sys/socket.h>
#include <sys/un.h>
/* Server side */
int srv = socket(AF_UNIX, SOCK_SEQPACKET | SOCK_CLOEXEC, 0);
struct sockaddr_un
#include <sys/un.h>
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[108]; /* socket path or abstract name */
};
sun_path is 108 bytes total (UNIX_PATH_MAX). For abstract-namespace sockets the first byte is \0 and the remaining 107 bytes form the name.
Filesystem namespace
Binding to a filesystem path creates a socket inode in the VFS:
struct sockaddr_un addr = {
.sun_family = AF_UNIX,
/* null-terminated path */
};
strncpy(addr.sun_path, "/run/myservice.sock", sizeof(addr.sun_path) - 1);
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, SOMAXCONN);
Properties:
- Permissions are enforced by the file's mode bits (chmod 0660 /run/myservice.sock)
- The socket file persists after the server exits — must be removed with unlink() before rebinding
- Controlled by the filesystem namespace: containers with separate mount namespaces cannot see each other's socket files
Abstract namespace
An abstract socket name starts with a null byte (\0). The kernel tracks the binding in memory — there is no filesystem entry:
struct sockaddr_un addr = { .sun_family = AF_UNIX };
/* First byte \0, then the name */
memcpy(addr.sun_path, "\0myservice", 10);
/* sun_path length includes the leading \0 */
socklen_t addrlen = offsetof(struct sockaddr_un, sun_path) + 10;
bind(srv, (struct sockaddr *)&addr, addrlen);
Properties:
- Automatically cleaned up when the last file descriptor referencing it is closed — no unlink() needed
- Name is arbitrary bytes, not a C string; can contain null bytes beyond the first
- Visible only within the same network namespace (ip netns or container namespaces provide isolation)
- ss -xlp shows abstract sockets with a @ prefix in the address column
socketpair()
socketpair() creates a connected pair of sockets without binding or listening — the simplest way to create a bidirectional channel between a parent and child process:
int sv[2];
socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_CLOEXEC, 0, sv);
/* sv[0] and sv[1] are connected */
if (fork() == 0) {
close(sv[0]);
/* child uses sv[1] */
} else {
close(sv[1]);
/* parent uses sv[0] */
}
This is preferable to a pipe for bidirectional IPC and to an anonymous socket pair for structured (message-boundary-preserving) communication.
File descriptor passing with SCM_RIGHTS
The SCM_RIGHTS control message lets a process send open file descriptors across a Unix socket. The kernel duplicates the file description into the receiver's file descriptor table — the receiver gets a new fd number pointing to the same underlying open file, including its offset, flags, and access mode.
#include <sys/socket.h>
#include <sys/un.h>
/* --- Sender --- */
int fd_to_pass = open("/etc/passwd", O_RDONLY);
struct msghdr msg = {};
char buf[CMSG_SPACE(sizeof(int))]; /* control message buffer */
struct iovec iov = { .iov_base = "x", .iov_len = 1 }; /* must send ≥1 data byte */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = buf;
msg.msg_controllen = sizeof(buf);
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(int));
memcpy(CMSG_DATA(cmsg), &fd_to_pass, sizeof(int));
sendmsg(sock, &msg, 0);
close(fd_to_pass); /* sender no longer needs it */
/* --- Receiver --- */
char data[1];
char cbuf[CMSG_SPACE(sizeof(int))];
struct iovec riov = { .iov_base = data, .iov_len = sizeof(data) };
struct msghdr rmsg = {
.msg_iov = &riov,
.msg_iovlen = 1,
.msg_control = cbuf,
.msg_controllen = sizeof(cbuf),
};
recvmsg(sock, &rmsg, 0);
struct cmsghdr *rcmsg = CMSG_FIRSTHDR(&rmsg);
if (rcmsg && rcmsg->cmsg_type == SCM_RIGHTS) {
int received_fd;
memcpy(&received_fd, CMSG_DATA(rcmsg), sizeof(int));
/* received_fd is usable immediately */
}
The kernel's unix_stream_sendmsg() / unix_scm_to_skb() path (in net/unix/af_unix.c) attaches the file references to the socket buffer. On the receive side, unix_detach_fds() extracts them from the skb, then scm_detach_fds() (in net/core/scm.c) installs them into the receiver's files_struct via receive_fd().
Leak risk
Every received fd must be closed, including on error paths. If the receiver does not close the fd, it accumulates in the process's file descriptor table invisibly. lsof -p <pid> will reveal them, but the damage is done when EMFILE hits.
Use SOCK_CLOEXEC on the socket to prevent the socket fds themselves from leaking across exec(). For received fds, set O_CLOEXEC with fcntl(received_fd, F_SETFD, FD_CLOEXEC) immediately after receipt.
Peer credential passing with SCM_CREDENTIALS
SCM_CREDENTIALS lets a sender attach its pid, uid, and gid to a message. The kernel validates the credentials against the sender's actual values — a process cannot forge a different uid (unless it is root).
/* Enable credential reception on the socket */
int enable = 1;
setsockopt(sock, SOL_SOCKET, SO_PASSCRED, &enable, sizeof(enable));
/* Sender: attach credentials */
struct ucred cred = {
.pid = getpid(),
.uid = getuid(),
.gid = getgid(),
};
char cbuf[CMSG_SPACE(sizeof(struct ucred))];
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_CREDENTIALS;
cmsg->cmsg_len = CMSG_LEN(sizeof(struct ucred));
memcpy(CMSG_DATA(cmsg), &cred, sizeof(cred));
For a connected socket, the simpler alternative is SO_PEERCRED, which returns the credentials of the connected peer without requiring a per-message control message:
struct ucred peer;
socklen_t len = sizeof(peer);
getsockopt(sock, SOL_SOCKET, SO_PEERCRED, &peer, &len);
printf("peer pid=%d uid=%d gid=%d\n", peer.pid, peer.uid, peer.gid);
SO_PEERCRED is used by systemd, D-Bus, and polkit to authenticate the calling process before granting privileged operations.
Performance
Unix domain sockets bypass the entire TCP/IP stack. Data is copied directly between socket send and receive buffers in the kernel, or in some configurations uses zero-copy tricks via sk_buff. Typical throughput on modern hardware:
| Mechanism | Throughput |
|---|---|
AF_INET loopback (127.0.0.1) |
~5 GB/s |
AF_UNIX SOCK_STREAM |
~10–15 GB/s |
| Shared memory | ~30–50 GB/s |
AF_UNIX with SOCK_DGRAM avoids connection setup overhead entirely, which is useful for short fire-and-forget control messages between co-located processes.
Kernel implementation
The implementation lives in net/unix/af_unix.c. Key structures:
/* include/net/af_unix.h */
struct unix_sock {
/* WARNING: sk has to be the first member */
struct sock sk;
struct unix_address *addr; /* bound address */
struct path path; /* socket file path (filesystem sockets) */
struct mutex iolock, bindlock;
struct sock *peer; /* connected peer (SOCK_STREAM/SEQPACKET) */
struct list_head link; /* list of all unix sockets */
atomic_long_t inflight; /* SCM_RIGHTS fds in flight */
spinlock_t lock;
unsigned long gc_flags;
struct socket_wq peer_wq;
wait_queue_entry_t peer_wake;
struct scm_stat scm_stat; /* SCM stats for this socket */
/* ... */
};
Messages for SOCK_DGRAM and SOCK_SEQPACKET are stored as sk_buff entries in sk->sk_receive_queue. For SOCK_STREAM, unix_stream_sendmsg() copies data into the peer's receive queue directly.
The inflight counter tracks file descriptors currently in-flight via SCM_RIGHTS. The kernel garbage collector (net/unix/garbage.c) detects cycles where in-flight fds reference the very sockets they are being sent over, preventing reference count leaks.
Further reading
net/unix/af_unix.c,net/unix/garbage.c— full implementation- Pipes and FIFOs — unidirectional byte-stream IPC
- Shared Memory — zero-copy data exchange
- eventfd and signalfd — pollable notification fds
unix(7)man page — complete API reference