Juggling multiple SSH_AUTH_SOCKs in tmux

This post is about a portable Unix shell script, socklink.sh, written to keep SSH_AUTH_SOCK working within a long-running tmux session—even when switching between multiple clients that have different SSH agents, whether those clients are simultaneously connected or attaching and detaching over time. I also digresses into what I’ve learned from writing and testing a cross-platform shell script in 2025.

Target audience

If you’re reading this I’ll assume you have some knowledge of Secure Shell agent forwarding and how the SSH_AUTH_SOCK environment variable works. I also expect you already have SSH clients configured to use agent forwarding.

The problem

I use tmux almost everywhere I SSH to. I’m also a fan of hardware tokens with proof-of-presence for SSH authentication. [1]

But out of the box, this combination introduces major usability problems when I connect to my dev server from multiple clients. SSH agent requests need to be directed to whichever client I’m currently using so that I can provide proof-of-presence, like by touching the contacts on a YubiKey or using Face ID in Termius. However, in practice the request will go to whichever agent was defined when tmux started. Even if that one is no longer connected!

Different workarounds for scenario this have emerged. A previous employer had a wrapper script called tmx that would fix up SSH_AUTH_SOCK when re-attaching a session from a new client. Meanwhile, this popular gist uses ~/.ssh/rc to override SSH_AUTH_SOCK to the path of a symlink that can be updated as new clients connect. Other solutions involve configuring tmux’s update-environment to reconfigure the session-wide environment when new clients join.

But as far as I know, there was no existing solution that would let me keep multiple clients connected simultaneously and move between them at will, ensuring that SSH_AUTH_SOCK points to whichever one I’m using at any given moment, without my taking manual action such as reconnecting the client. This was a papercut, but one I’ve been frustrated enough by over the years to finally try bandaging, with the help of the relatively new client-active tmux hook.

The rest of this post goes into more detail about how this script works, and then provides instructions for setting it up, if it might be useful to you too.

Objectives

There were a few unusual things I wanted out of my solution, and those dictated the shape it would take.

First, I wanted something compatible with just about any Unix-like operating system on arbitrary CPU architectures. [2]

Second, I wanted something that would slot nicely into my pre-existing dotfiles git repo, which is how I synchronize my main configuration files across machines. Ideally, this would be something I could just check into that repo and run directly without a compilation step.

I also hoped that I could avoid adding new requirements outside of the base system, aside from tmux itself. Some of the machines I would use this on lack GNU bash; others don’t have Python or even Perl. Assuming the presence of a Rust toolchain so that I might use cargo-script would be right out.

The combination of these goals made a cross-platform shell script seem like the way to go, despite the headaches this would end up creating for me.

socklink.sh

The result is a script called socklink.sh. It’s a utility for maintaining a two-level map of symlinks to Unix-domain SSH authentication sockets: First from a symlink representing a tmux server instance to one representing a tmux client’s login TTY, and second from that one representing the TTY to the actual SSH authentication socket, if any, that the TTY connected with.

Physically, both levels of links are maintained under /tmp/socklink-$UID:

% tree /tmp/socklink-33464
/tmp/socklink-33464
|-- servers
|   `-- 24994 -> /tmp/socklink-33464/ttys/dev+pts+124
`-- ttys
    |-- dev+pts+124 -> /tmp/ssh-vb39ZeB1Kw/agent.7211
    `-- dev+pts+17 -> /tmp/ssh-4NlQ1EaodH/agent.4678

This map is kept up-to-date by hooking invocations of the script into various tmux events and shell initialization. If you’re using zsh or bash, you can just run socklink.sh setup to automatically add the necessary hooks to your .tmux.conf and shell init, restart tmux and all your shells, and you’re good to go.

If you’re using a different shell, you’ll additionally have to add the logical equivalent of the following to your shell’s init:

if [ -n "$THIS_IS_AN_INTERACTIVE_SESSION" ]; then
        if [ -z "$TMUX" ]; then
                socklink.sh -c shell-init set-tty-link
        else
                export SSH_AUTH_SOCK="$(socklink.sh show-server-link)"
        fi
fi

The invariants that this setup attempts to maintain are essentially:

If you’re using a tmux client in a tty that connected with a valid SSH_AUTH_SOCK set, then your auth socket in all your session’s tmux panes will point at whatever that was.
If your tmux client’s tty did not connect with an SSH_AUTH_SOCK, your panes’ auth socket will be left dangling so that OpenSSH may fall back to other methods such as password authentication, without potentially waiting on proof-of-presence from a hardware token somewhere you aren’t currently sitting.

This is orchestrated using the following subcommands:

set-server-link is invoked when a different tmux client becomes active, or a new client connects. If given an argument, this is interpreted as the newly active client’s tty; otherwise, this queries the most recently active client from tmux itself. Garbage collection of old links is also done here.
set-server-link-by-name does the same, but takes as its argument a client name instead of a client tty. This way it can use the #{hook_client} variable in the client-active hook, which as it turns out doesn’t race with hook execution in the same way that #{client_tty} does.
set-tty-link should be called when starting an interactive shell from outside of tmux, in order to set up the ttys/ link to $SSH_AUTH_SOCK if it’s set.
Finally, show-server-link is used to print the path to the server link that should be configured as SSH_AUTH_SOCK within your tmux session’s panes.

Most people shouldn’t have to worry about the details of these hooks; if you’re running zsh or bash, you should be able to simply do socklink.sh setup and all the necessary hooks will be installed automatically.

Alternatives considered

I considered writing a small Perl daemon that would listen for client changes as a control mode client. But this wouldn’t actually address the issues I’d encountered with regular hooks, while introducing the additional complexity of managing the daemon’s lifecycle.

I could have also kept the hook-based model while instead writing the script in Python or Perl. While it’s more convenient to have a shell script for this job, because of the issues described below I probably would have taken this approach in hindsight.

Cross-platform testing and surprises

The biggest challenge in all this was making sure the script actually works reliably across platforms: Different Linux distributions may use bash, dash, or ash as /bin/sh, and the BSDs have their own implementations.

So I set up a Python test harness that uses pexpect to put the script through its paces inside tmux in an actual pseudoterminal. GitHub Actions provides test runners for Ubuntu and macOS, and SourceHut Builds nicely complements this with runners for BSD and other Linux distributions.

This uncovered some compatibility issues that I hadn’t anticipated. For a sample:

Even though the \+ quantifier is commonly supported in grep implementations’ basic regular expressions, it isn’t actually part of the standard and is unimplemented on OpenBSD.
Busybox’s ps doesn’t support -o uid.
Exclusively on NetBSD, setting noclobber in /bin/sh causes piping to >/dev/null to fail.

An aside on Unix shell scripting

I have a long on-again, off-again history with shell scripting. And being introduced to BSD earlier than Linux, I was already aware of the challenges of making a script genuinely portable. Yet this project was still a learning experience for me.

A particular shock was the subtle differences in how set -e works across shells. Take a look at this script:

set -e

bar() {
        false
        echo "hi"
}

foo() {
        echo "In foo()"
        out=$(bar)
        echo "Back in foo(), out = '$out'"
}

echo "Starting..."
foo
echo "Done."

Due to set -e (“errexit”), I would have expected bar‘s call to false to cause the entire script to exit with an error status and the output:

Starting...
In foo()

And that’s what happens in most implementations, including bash in POSIX mode [3], zsh, Debian’s dash, and /bin/sh on the BSDs and OpenIndiana/Illumos. But in the busybox implementation of /bin/sh on Alpine Linux, the script startlingly succeeds with:

Starting...
In foo()
Back in foo(), out = 'hi'
Done.

(This wasn’t a mere oddity, from my point of view. This behavior silently hid a separate error that only reproduced on Alpine Linux, masking a consequential bug. I only happened to catch this when I wrote a more fine-grained unit test for that function on a whim.)

I almost reported this to the Busybox mailing list as a bug, until I stepped back and questioned whether this behavior was even technically wrong, as far as POSIX goes. The specification cautions that “application writers should avoid relying on set -e within functions”, givng the example of a function being called from within an AND-OR list; I assume the same logic applies to one called from a command substitution. I’d already known about the case that v=$(false; echo foo) can be expected to succeed even with set -e in the top-level shell, but I guess that same principle is allowed to apply within functions called in the subshell, too. [4]

Looking at busybox’s code, what’s happening is that when evaluating command substitutions, the -e flag is cleared in subshells in the name of bash compatibility. And indeed, as the comment suggests, bash also exhibits the “incorrect” behavior above—but only when not run in POSIX mode.

Ironically, in copying this behavior I think busybox has actually made itself less compatible with bash. When invoked as /bin/sh, bash—like every other implementation I’ve tested this with—behaves as I had expected above. As far as I know, busybox is unique among major /bin/sh implementations in behaving this way.

What I took away from this all is that writing a cross-platform POSIX script in the abstract is a fantasy. What you can do instead is write a POSIX-ish script targeting various platforms (even if casting a fairly wide net, as I have here), and then test the heck out of it. Programs in normal languages benefit from tests for logic and system integration, and maybe as a crutch in place of a stronger type system. Shell scripts need all of those, but also tests to establish confidence in the interpreter itself. Even down to its error handling mechanisms!

Parting thoughts

This was more of a pain to finish than I’d expected.

I actually had an earlier version of this script in use for months before I decided to properly test it and host it on GitHub. While I expected some difficulties due to implementing expect-style tests across multiple platforms, I still thought testing it would be a bit of a formality. In truth, it turned into the most difficult part of this project by far.

To be clear, implementing tests often revealed real issues with the script, not just false positives, so I’m glad that I did it. Often these were errors that got silently suppressed by shell’s semantics, not surfacing in the manner I’d have expected in other languages.

I’m still happy with the end result: I have a cross-platform script that I can just drop onto any of the Unix-like platforms that I regularly use, handling this task without introducing extra dependencies. Yay. And with the tests I now have running, I can be fairly confident about keeping it working into the future.

Still, I wouldn’t recommend this approach for new projects. If I were to recreate this from scratch I would probably write it in Python or Perl and deal with the extra dependency, over the complexity of supporting /bin/sh in all of its personalities.

Footnotes

[1]

These mean if I’m doing work on a remote host like a shared shell server or a dev server, I can authenticate from that host to other services while at least somewhat restricting the dev server’s ability to compromise my keys.

If your goal is to open an interactive SSH session elsewhere you should instead use an SSH jump host, which doesn’t expose your auth agent to the intermediate host at all. This, in contrast, is for situations like when using SCP or pushing to a private git repository, where you specifically need to authenticate to a third server from the intermediate host.

[2]	I regularly use all of Linux, macOS, OpenBSD (on my router), FreeBSD (on my file server), and NetBSD (on sdf.org), so I needed compatibility with all of these.

[3]	With either `set -o posix` or invocation as `/bin/sh` instead of `/bin/bash`.

[4] Correct or not, it’s quite unfortunate if you want confidence your script will behave correctly. The only workaround I’ve found is to add a redundant set -e to the top of any function that might be called from within a command substitution. I don’t know if this workaround is any more guaranteed to work by the spec, but it does work in practice on busybox and non-POSIX bash.