Skip to content

runtime: extract image-stripping into a registered MessageTransform#2573

Merged
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/extracting-runtime-features-into-builtin-d52e607b
Apr 28, 2026
Merged

runtime: extract image-stripping into a registered MessageTransform#2573
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/extracting-runtime-features-into-builtin-d52e607b

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 28, 2026

Summary

Extracts the inline stripImageContent call from runStreamLoop into a registered, runtime-private message-transform mechanism that opens the door to a family of message-mutating builtins (PII redactors, secret scrubbers, prompt-prefix injectors, …).

Changes

New mechanism — MessageTransform (in-process before_llm_call rewrites)

  • New MessageTransform type and WithMessageTransform("name", fn) option in pkg/runtime/transforms.go.
  • Transforms are intentionally a runtime-private contract: the cost of JSON-roundtripping a full conversation through the cross-process hook protocol would be prohibitive, so command/model hooks cannot rewrite messages. By design.
  • Transforms run after the standard before_llm_call gate — a hook that wants to abort the call should target the gate, not a transform.
  • Fail-soft: a transform that returns an error logs at warn level and the chain continues with the previous slice. A transform must never break the run loop.
  • Chain order = registration order. Per-agent scoping (if needed) lives in the transform body via hooks.Input.AgentName.

First built-in transform — strip_unsupported_modalities

  • New pkg/runtime/strip_modalities.go hosts BuiltinStripUnsupportedModalities, the transform body, and the stripImageContent helper (moved from streaming.go).
  • The inline if m != nil && len(m.Modalities.Input) > 0 && !slices.Contains(...) block in runStreamLoop is gone. The loop now calls executeBeforeLLMCallHooks (gate) followed by applyBeforeLLMCallTransforms (rewrite) — so a transform failure cannot waste the gate's allow verdict.

Correctness fix — alloy mode + per-tool model override

  • New ModelID field on hooks.Input, populated by runStreamLoop with the model the loop actually picked (post per-tool override, post alloy-mode random selection).
  • The strip transform now keys its modality lookup off in.ModelID instead of calling agent.Model() again — which would re-randomize the alloy pick or miss a per-tool override and consult the wrong modalities.
  • Pinned by TestStripUnsupportedModalitiesTransform_UsesInputModelID, which uses an ID-keyed model store to prove the lookup keys off ModelID rather than the agent.
  • The same ModelID is now also surfaced to user-authored before_llm_call hooks for free.

What's preserved

All previous user-facing behavior:

  • Strip-when-text-only: identical decision logic.
  • "Unknown model → pass through": identical fall-through.
  • The add_date / add_environment_info / add_prompt_files / cache_response builtins are untouched.
  • hooks.Input field additions are backward-compatible (omitempty JSON tags; existing handlers ignore unknown fields).

What's not preserved (intentional)

The original PR briefly experimented with auto-injecting transforms as {type: builtin, command: name} entries into agent hook configs (with a no-op BuiltinFunc shim and dedup logic). This was simplified away because users couldn't actually control transforms through YAML — auto-injection always won — so the YAML coupling was internal plumbing for a control surface that didn't exist. The simplification dropped ~340 net lines without losing any user-facing capability.

Why this matters

The payoff isn't in code we deleted today (the strip is the only candidate currently inline). The payoff is shrinking the diff for future message-rewriting features:

  • PII redactor: ~30-line transform + WithMessageTransform("redact_pii", fn). 0 lines in the run loop.
  • "Drop large tool outputs from old turns": same shape.
  • "Inject team-policy prefix": same shape.

Without this mechanism, each of those would have grown a new branch in runStreamLoop. With it, the loop's pre-LLM-call section stays at three logical lines: get gate verdict, run transforms, call model.

Validation

  • mise lint ✓ (golangci-lint run: 0 issues, internal lint checker: no offenses, go mod tidy --diff: clean)
  • mise test ✓ (full suite passes)
  • New tests cover: text-only / multimodal / unknown-model branches, empty ModelID, registration-order chain semantics, fail-soft contract, end-to-end strip via RunStream, end-to-end transform-error survival, input validation, alloy / per-tool override correctness.

Commits

  1. extract strip_unsupported_modalities into a registered before_llm_call transform
  2. simplify message transforms: drop the YAML auto-injection plumbing
  3. fix strip transform reading wrong model in alloy / per-tool override mode

Assisted-By: docker-agent

dgageot added 3 commits April 28, 2026 11:11
…mode

The transform was calling agent.Model() which re-randomizes alloy picks and ignores per-tool overrides — it could end up consulting modalities for a different model than the one the loop was actually about to call. Pass the resolved modelID through hooks.Input.ModelID instead.
@dgageot dgageot merged commit e59e163 into docker:main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants