Summary
This issue distills findings from a systematic cross-implementation evaluation of AutoNAT v2, conducted by ProbeLab. The final report of the investigation is available in the following link.
rust-libp2p's TCP transport may produce incorrect PortUse metadata. libp2p-tcp's Transport::dial is called by the swarm with a requested PortUse in DialOpts. The swarm records that value verbatim on ConnectedPoint::Dialer { port_use } and exposes it through NetworkBehaviour::handle_established_outbound_connection. When a caller requests PortUse::Reuse but port reuse is not actually used at the socket layer — because no listener is registered for the target IP family yet, or the listen port's 4-tuple is already in use — the Transport::dial method completes the connection on an OS-assigned ephemeral port but does not update the metadata. The swarm and behaviours see port_use = Reuse even though the actual local port is ephemeral.
The doc comment on ConnectedPoint::Dialer::port_use core/src/connection.rs#L86-L92 on v0.56.0 already flags this is best-effort, but existing consumers treat it as a correctness signal. libp2p-identify uses the field to decide whether to translate an observed TCP address: it only records the connection in its ephemeral-port set if port_use == PortUse::New protocols/identify/src/behaviour.rs#L426-L428 on v0.56.0; a connection that claims Reuse but used an ephemeral port is never inserted, so the membership check at behaviour.rs#L338-L340 returns false, the translation block is skipped entirely, and the raw ephemeral observed-address falls through to behaviour.rs#L382-L383 which broadcasts it as a NewExternalAddrCandidate. Downstream, AutoNAT v2 then probes an ephemeral external address that the client's NAT mapping isn't guaranteed to preserve.
Expected behavior
After Transport::dial completes, the port_use surfaced to the swarm on ConnectedPoint::Dialer should reflect what actually happened at the socket layer. If the dial ended up on an OS-assigned ephemeral port, port_use should be PortUse::New, matching what libp2p-identify's translation logic already checks for.
Actual behavior
port_use on the ConnectedPoint is set to whatever the caller requested in DialOpts (swarm/src/lib.rs#L540) and is never updated, even when:
local_dial_addr() returns None (no listener registered for the IP family) — the TCP transport falls through to an OS-assigned ephemeral port (transports/tcp/src/lib.rs#L374-L380 on v0.56.0);
bind() succeeded to the listener port but connect() returns AddrNotAvailable because the 4-tuple is already in use — the transport drops the socket and retries with PortUse::New internally (transports/tcp/src/lib.rs#L395-L407), but that internal PortUse::New never propagates back to the swarm.
Possible Solution
At the point where libp2p-tcp's dial future is about to return the stream, compute the effective PortUse from the post-connect state: if socket.local_addr().port() isn't one of the listener ports registered for the same IP family in PortReuse::listen_addrs, the dial used an ephemeral port and the effective PortUse is New; otherwise Reuse. This covers all three paths: no listener registered at all, listener registered for the wrong IP family, and the AddrNotAvailable fallback at transports/tcp/src/lib.rs#L395-L407.
Version
v0.56
Would you like to work on fixing this bug?
Yes
Summary
This issue distills findings from a systematic cross-implementation evaluation of AutoNAT v2, conducted by ProbeLab. The final report of the investigation is available in the following link.
rust-libp2p's TCP transport may produce incorrect PortUse metadata.
libp2p-tcp'sTransport::dialis called by the swarm with a requestedPortUseinDialOpts. The swarm records that value verbatim onConnectedPoint::Dialer { port_use }and exposes it throughNetworkBehaviour::handle_established_outbound_connection. When a caller requestsPortUse::Reusebut port reuse is not actually used at the socket layer — because no listener is registered for the target IP family yet, or the listen port's 4-tuple is already in use — theTransport::dialmethod completes the connection on an OS-assigned ephemeral port but does not update the metadata. The swarm and behaviours seeport_use = Reuseeven though the actual local port is ephemeral.The doc comment on
ConnectedPoint::Dialer::port_usecore/src/connection.rs#L86-L92on v0.56.0 already flags this is best-effort, but existing consumers treat it as a correctness signal.libp2p-identifyuses the field to decide whether to translate an observed TCP address: it only records the connection in its ephemeral-port set ifport_use == PortUse::Newprotocols/identify/src/behaviour.rs#L426-L428on v0.56.0; a connection that claimsReusebut used an ephemeral port is never inserted, so the membership check atbehaviour.rs#L338-L340returns false, the translation block is skipped entirely, and the raw ephemeral observed-address falls through tobehaviour.rs#L382-L383which broadcasts it as aNewExternalAddrCandidate. Downstream, AutoNAT v2 then probes an ephemeral external address that the client's NAT mapping isn't guaranteed to preserve.Expected behavior
After
Transport::dialcompletes, theport_usesurfaced to the swarm onConnectedPoint::Dialershould reflect what actually happened at the socket layer. If the dial ended up on an OS-assigned ephemeral port,port_useshould bePortUse::New, matching whatlibp2p-identify's translation logic already checks for.Actual behavior
port_useon theConnectedPointis set to whatever the caller requested inDialOpts(swarm/src/lib.rs#L540) and is never updated, even when:local_dial_addr()returnsNone(no listener registered for the IP family) — the TCP transport falls through to an OS-assigned ephemeral port (transports/tcp/src/lib.rs#L374-L380on v0.56.0);bind()succeeded to the listener port butconnect()returnsAddrNotAvailablebecause the 4-tuple is already in use — the transport drops the socket and retries withPortUse::Newinternally (transports/tcp/src/lib.rs#L395-L407), but that internalPortUse::Newnever propagates back to the swarm.Possible Solution
At the point where
libp2p-tcp's dial future is about to return the stream, compute the effectivePortUsefrom the post-connect state: ifsocket.local_addr().port()isn't one of the listener ports registered for the same IP family inPortReuse::listen_addrs, the dial used an ephemeral port and the effectivePortUseisNew; otherwiseReuse. This covers all three paths: no listener registered at all, listener registered for the wrong IP family, and theAddrNotAvailablefallback attransports/tcp/src/lib.rs#L395-L407.Version
v0.56
Would you like to work on fixing this bug?
Yes