Skip to content

feat(fetch): add allowed_domains and blocked_domains filters#2572

Open
dgageot wants to merge 3 commits intodocker:mainfrom
dgageot:board/feature-request-domain-filtering-for-fet-ef2cfffc
Open

feat(fetch): add allowed_domains and blocked_domains filters#2572
dgageot wants to merge 3 commits intodocker:mainfrom
dgageot:board/feature-request-domain-filtering-for-fet-ef2cfffc

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 28, 2026

Summary

Adds two new options to the fetch toolset — allowed_domains and blocked_domains — letting operators restrict which hosts an agent can reach, mirroring Anthropic's web-fetch tool and Claude Code's WebFetch permission model.

The check runs before any network call (including robots.txt), so blocked URLs never leak DNS or TCP traffic. Redirect targets are re-checked against the same lists, closing an SSRF-style bypass.

Configuration

toolsets:
  - type: fetch
    allowed_domains:
      - docker.com               # docker.com AND *.docker.com
      - github.com
      - .githubusercontent.com   # leading dot = strict subdomains only
toolsets:
  - type: fetch
    blocked_domains:
      - 169.254.169.254          # cloud metadata endpoint
      - internal.example.com

The two lists are mutually exclusive on a single fetch toolset.

Matching rules

  • Bare domain (example.com) — matches the host exactly and any subdomain (docs.example.com); does not match unrelated hosts that share a suffix (badexample.com).
  • Leading dot (.example.com) — matches only strict subdomains, not the apex.
  • IP literal — exact match (169.254.169.254).
  • Trailing dot in FQDN-form URLs (http://example.com./) is stripped before matching, so it can't bypass a deny-list entry.
  • Case-insensitive throughout.

Changes

Config & schema (pkg/config/latest)

  • New AllowedDomains and BlockedDomains fields on Toolset.
  • Validation rejects: using either list on a non-fetch toolset, setting both lists at once, and empty/whitespace-only entries (which would silently match nothing and turn the list into a foot-gun).
  • agent-schema.json updated with descriptions and examples.

Fetch tool (pkg/tools/builtin/fetch.go)

  • New WithAllowedDomains / WithBlockedDomains options.
  • New checkDomainAllowed enforces the lists on the initial URL.
  • New http.Client.CheckRedirect re-checks every redirect target against the same lists (10-redirect cap mirrors net/http default).
  • New matchesDomain helper implements the rules above.
  • Instructions() advertises the configured lists to the model so it can avoid futile calls.

Wiring (pkg/teamloader/registry.go)

  • createFetchTool propagates the new fields from YAML into the tool options.

Docs & examples

  • docs/tools/fetch/index.md documents the new options, matching rules, redirect re-check, and the IP-encoding limitation (matching is purely string-based on the URL host: it does not normalize alternative IP encodings — decimal, hex, octal, IPv4-mapped IPv6, etc.).
  • New examples/fetch_domain_filtering.yaml shows both an allow-list agent and a deny-list agent.

Security fixes folded in

Three issues were caught during review of the initial implementation and fixed in this PR:

  1. SSRF via redirect (critical) — the http.Client had no CheckRedirect, so an allow-listed origin returning a 3xx to a forbidden host would be followed and its body returned to the caller. Now every redirect target is re-checked. Regression test TestFetch_AllowedDomains_RejectsRedirectToBlockedHost uses http://169.254.169.254/ (AWS metadata IP) to demonstrate the fix.
  2. FQDN trailing-dot bypassurl.URL.Hostname() keeps the trailing dot for FQDN-form URLs like http://host./, which slipped past patterns like host. The matcher now strips trailing dots from both inputs.
  3. Silent empty entriesallowed_domains: [""] would have rejected every URL; blocked_domains: [""] would have matched nothing. Validation now rejects empty/whitespace-only entries at config-load time.

Validation

  • mise lint
  • mise test
  • New tests cover: matcher truth table (exact, subdomain, suffix-collision, leading-dot, IP, case, trailing dot, whitespace, empty), allow-list deny + permit, deny-list deny (and that it short-circuits before robots.txt), redirect re-check on both lists, instructions surfacing the lists, validation errors for misuse.

Commits

  1. feat(fetch): add allowed_domains and blocked_domains filters — initial implementation, config plumbing, docs, example, tests.
  2. refactor(fetch): simplify domain matcher and instructions — collapse checkDomainAllowed branches into a switch, drop the matchesAnyDomain helper in favor of inline slices.ContainsFunc, simplify matchesDomain by leveraging the leading dot directly (drop subdomainOnly bool, drop dead IPv6 bracket-strip), tighten Instructions(). No functional change — matcher truth table and integration tests unchanged.
  3. fix(fetch): close redirect/FQDN bypasses in domain filtering — the three security fixes above, plus the IP-encoding limitation note in the user-facing docs.

Assisted-By: docker-agent

dgageot added 3 commits April 28, 2026 11:34
Lets operators restrict the fetch tool to a curated set of hosts (or

deny a few sensitive ones), mirroring Anthropic's web-fetch tool and

Claude Code's WebFetch permission model. Patterns match the host and

any subdomain by default; a leading dot restricts to strict subdomains.

The check runs before any network call (including robots.txt) so blocked

URLs never leak DNS or TCP traffic.

Assisted-By: docker-agent
- Drop the matchesAnyDomain helper; use slices.ContainsFunc inline.

- Collapse checkDomainAllowed branches into a switch.

- Simplify matchesDomain by leveraging the leading dot directly

  (no more subdomainOnly bool, no dead IPv6 bracket-strip).

- Tighten Instructions() with fmt.Fprintf and shorter phrasing.

No functional change; matcher truth table and integration tests unchanged.

Assisted-By: docker-agent
Three issues found while reviewing the domain-filtering feature:

1. SSRF via redirect (critical). The http.Client had no CheckRedirect,

   so an allow-listed origin returning a 3xx to a forbidden host would

   be followed and its body returned to the caller \u2014 a classic bypass.

   Now every redirect target is re-checked against the same lists; a

   regression test against http://169.254.169.254/ (AWS metadata IP)

   demonstrates the fix.

2. FQDN trailing-dot bypass. URLs in FQDN form ("http://host./") kept

   the trailing dot in url.URL.Hostname() and slipped past patterns

   like "host". The matcher now strips trailing dots from both inputs.

3. Empty/whitespace domain entries silently rejected every URL in

   allowed_domains and matched nothing in blocked_domains. Validation

   now rejects them at config-load time with a clear error.

Also documented the IP-encoding limitation (decimal/hex/octal IPv4)

in the user-facing fetch tool docs.

Assisted-By: docker-agent
@dgageot dgageot requested a review from a team as a code owner April 28, 2026 11:25
Copy link
Copy Markdown
Contributor

@gtardif gtardif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description does not match what's in the PR, this description is from #2573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants