Skip to content

ConnectedPoint.port_use reports Reuse when dial actually used an ephemeral port #6393

@srene

Description

@srene

Summary

This issue distills findings from a systematic cross-implementation evaluation of AutoNAT v2, conducted by ProbeLab. The final report of the investigation is available in the following link.

rust-libp2p's TCP transport may produce incorrect PortUse metadata. libp2p-tcp's Transport::dial is called by the swarm with a requested PortUse in DialOpts. The swarm records that value verbatim on ConnectedPoint::Dialer { port_use } and exposes it through NetworkBehaviour::handle_established_outbound_connection. When a caller requests PortUse::Reuse but port reuse is not actually used at the socket layer — because no listener is registered for the target IP family yet, or the listen port's 4-tuple is already in use — the Transport::dial method completes the connection on an OS-assigned ephemeral port but does not update the metadata. The swarm and behaviours see port_use = Reuse even though the actual local port is ephemeral.

The doc comment on ConnectedPoint::Dialer::port_use core/src/connection.rs#L86-L92 on v0.56.0 already flags this is best-effort, but existing consumers treat it as a correctness signal. libp2p-identify uses the field to decide whether to translate an observed TCP address: it only records the connection in its ephemeral-port set if port_use == PortUse::New protocols/identify/src/behaviour.rs#L426-L428 on v0.56.0; a connection that claims Reuse but used an ephemeral port is never inserted, so the membership check at behaviour.rs#L338-L340 returns false, the translation block is skipped entirely, and the raw ephemeral observed-address falls through to behaviour.rs#L382-L383 which broadcasts it as a NewExternalAddrCandidate. Downstream, AutoNAT v2 then probes an ephemeral external address that the client's NAT mapping isn't guaranteed to preserve.

Expected behavior

After Transport::dial completes, the port_use surfaced to the swarm on ConnectedPoint::Dialer should reflect what actually happened at the socket layer. If the dial ended up on an OS-assigned ephemeral port, port_use should be PortUse::New, matching what libp2p-identify's translation logic already checks for.

Actual behavior

port_use on the ConnectedPoint is set to whatever the caller requested in DialOpts (swarm/src/lib.rs#L540) and is never updated, even when:

  1. local_dial_addr() returns None (no listener registered for the IP family) — the TCP transport falls through to an OS-assigned ephemeral port (transports/tcp/src/lib.rs#L374-L380 on v0.56.0);
  2. bind() succeeded to the listener port but connect() returns AddrNotAvailable because the 4-tuple is already in use — the transport drops the socket and retries with PortUse::New internally (transports/tcp/src/lib.rs#L395-L407), but that internal PortUse::New never propagates back to the swarm.

Possible Solution

At the point where libp2p-tcp's dial future is about to return the stream, compute the effective PortUse from the post-connect state: if socket.local_addr().port() isn't one of the listener ports registered for the same IP family in PortReuse::listen_addrs, the dial used an ephemeral port and the effective PortUse is New; otherwise Reuse. This covers all three paths: no listener registered at all, listener registered for the wrong IP family, and the AddrNotAvailable fallback at transports/tcp/src/lib.rs#L395-L407.

Version

v0.56

Would you like to work on fixing this bug?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions