feat(fetch): add allowed_domains and blocked_domains filters#2572
Open
dgageot wants to merge 3 commits intodocker:mainfrom
Open
feat(fetch): add allowed_domains and blocked_domains filters#2572dgageot wants to merge 3 commits intodocker:mainfrom
dgageot wants to merge 3 commits intodocker:mainfrom
Conversation
Lets operators restrict the fetch tool to a curated set of hosts (or deny a few sensitive ones), mirroring Anthropic's web-fetch tool and Claude Code's WebFetch permission model. Patterns match the host and any subdomain by default; a leading dot restricts to strict subdomains. The check runs before any network call (including robots.txt) so blocked URLs never leak DNS or TCP traffic. Assisted-By: docker-agent
- Drop the matchesAnyDomain helper; use slices.ContainsFunc inline. - Collapse checkDomainAllowed branches into a switch. - Simplify matchesDomain by leveraging the leading dot directly (no more subdomainOnly bool, no dead IPv6 bracket-strip). - Tighten Instructions() with fmt.Fprintf and shorter phrasing. No functional change; matcher truth table and integration tests unchanged. Assisted-By: docker-agent
Three issues found while reviewing the domain-filtering feature: 1. SSRF via redirect (critical). The http.Client had no CheckRedirect, so an allow-listed origin returning a 3xx to a forbidden host would be followed and its body returned to the caller \u2014 a classic bypass. Now every redirect target is re-checked against the same lists; a regression test against http://169.254.169.254/ (AWS metadata IP) demonstrates the fix. 2. FQDN trailing-dot bypass. URLs in FQDN form ("http://host./") kept the trailing dot in url.URL.Hostname() and slipped past patterns like "host". The matcher now strips trailing dots from both inputs. 3. Empty/whitespace domain entries silently rejected every URL in allowed_domains and matched nothing in blocked_domains. Validation now rejects them at config-load time with a clear error. Also documented the IP-encoding limitation (decimal/hex/octal IPv4) in the user-facing fetch tool docs. Assisted-By: docker-agent
gtardif
approved these changes
Apr 28, 2026
gtardif
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two new options to the
fetchtoolset —allowed_domainsandblocked_domains— letting operators restrict which hosts an agent can reach, mirroring Anthropic's web-fetch tool and Claude Code'sWebFetchpermission model.The check runs before any network call (including
robots.txt), so blocked URLs never leak DNS or TCP traffic. Redirect targets are re-checked against the same lists, closing an SSRF-style bypass.Configuration
The two lists are mutually exclusive on a single fetch toolset.
Matching rules
example.com) — matches the host exactly and any subdomain (docs.example.com); does not match unrelated hosts that share a suffix (badexample.com)..example.com) — matches only strict subdomains, not the apex.169.254.169.254).http://example.com./) is stripped before matching, so it can't bypass a deny-list entry.Changes
Config & schema (
pkg/config/latest)AllowedDomainsandBlockedDomainsfields onToolset.fetchtoolset, setting both lists at once, and empty/whitespace-only entries (which would silently match nothing and turn the list into a foot-gun).agent-schema.jsonupdated with descriptions and examples.Fetch tool (
pkg/tools/builtin/fetch.go)WithAllowedDomains/WithBlockedDomainsoptions.checkDomainAllowedenforces the lists on the initial URL.http.Client.CheckRedirectre-checks every redirect target against the same lists (10-redirect cap mirrorsnet/httpdefault).matchesDomainhelper implements the rules above.Instructions()advertises the configured lists to the model so it can avoid futile calls.Wiring (
pkg/teamloader/registry.go)createFetchToolpropagates the new fields from YAML into the tool options.Docs & examples
docs/tools/fetch/index.mddocuments the new options, matching rules, redirect re-check, and the IP-encoding limitation (matching is purely string-based on the URL host: it does not normalize alternative IP encodings — decimal, hex, octal, IPv4-mapped IPv6, etc.).examples/fetch_domain_filtering.yamlshows both an allow-list agent and a deny-list agent.Security fixes folded in
Three issues were caught during review of the initial implementation and fixed in this PR:
http.Clienthad noCheckRedirect, so an allow-listed origin returning a 3xx to a forbidden host would be followed and its body returned to the caller. Now every redirect target is re-checked. Regression testTestFetch_AllowedDomains_RejectsRedirectToBlockedHostuseshttp://169.254.169.254/(AWS metadata IP) to demonstrate the fix.url.URL.Hostname()keeps the trailing dot for FQDN-form URLs likehttp://host./, which slipped past patterns likehost. The matcher now strips trailing dots from both inputs.allowed_domains: [""]would have rejected every URL;blocked_domains: [""]would have matched nothing. Validation now rejects empty/whitespace-only entries at config-load time.Validation
mise lint✓mise test✓robots.txt), redirect re-check on both lists, instructions surfacing the lists, validation errors for misuse.Commits
feat(fetch): add allowed_domains and blocked_domains filters— initial implementation, config plumbing, docs, example, tests.refactor(fetch): simplify domain matcher and instructions— collapsecheckDomainAllowedbranches into a switch, drop thematchesAnyDomainhelper in favor of inlineslices.ContainsFunc, simplifymatchesDomainby leveraging the leading dot directly (dropsubdomainOnlybool, drop dead IPv6 bracket-strip), tightenInstructions(). No functional change — matcher truth table and integration tests unchanged.fix(fetch): close redirect/FQDN bypasses in domain filtering— the three security fixes above, plus the IP-encoding limitation note in the user-facing docs.Assisted-By: docker-agent