Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
969 changes: 969 additions & 0 deletions .planning/cbm-cross-repo-proposal.md

Large diffs are not rendered by default.

89 changes: 89 additions & 0 deletions .planning/tier1-extractor-fixes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Tier 1 cross-repo gRPC: extractor + pipeline fixes

Companion to `cbm-cross-repo-proposal.md` and PR #293. Production-readiness gaps that prevented Tier 1's producer-side detection from firing on a real .NET microservice fleet, all addressed in this branch.

---

## Gap 1: `pass_idl_scan` not called in the parallel pipeline

**Where:** `src/pipeline/pipeline.c`

The pass was registered only in `seq_passes[]` (sequential path). The parallel path ran extract → registry → resolve → infra → free(cache) → k8s and never invoked `pass_idl_scan`. Repos cross the parallel-pipeline threshold at ~50 files, so production codebases silently skipped Tier 1.

**Fix:** invoke `cbm_pipeline_pass_idl_scan` in `run_parallel_pipeline` after `process_infra_bindings`, before the cache is freed, with `ctx->result_cache` set to the cache pointer.

**Diff:** `src/pipeline/pipeline.c` ~12 lines.

---

## Gap 2: C# 12 primary-constructor params not surfaced as Field defs

**Where:** `internal/cbm/extract_defs.c::extract_class_def`

Modern .NET 8+/9+ controllers/services use the C# 12 primary-constructor syntax. The params on the class declaration line bind to implicit captured fields accessible from instance members, but `extract_class_def` only walked body `field_declaration` / `property_declaration` nodes, missing the primary-ctor params entirely. Tier 1c (ctor params) and Tier 1f (class fields) couldn't fire because there was no Method "ctor" def with `param_names`/`param_types` and no Field def for the captured param.

**Fix:** after the existing class extraction, when `language == CBM_LANG_CSHARP`, locate the primary `parameter_list` (try `child_by_field_name("parameters")` first, fall back to direct child walk for grammars that don't surface the field name) and emit a `Field` def per param with `parent_class` and `return_type` set.

**Diff:** `internal/cbm/extract_defs.c` ~35 lines.

---

## Gap 3: protobuf rpc Functions not linked to their service Class

**Where:** `src/pipeline/pass_idl_scan.c::idl_proto_class_visitor`

The visitor used `cbm_gbuf_find_edges_by_source_type(class.id, "DEFINES_METHOD", ...)` to find rpc methods of each proto service. tree-sitter-protobuf emits rpc Functions as **flat siblings** of the service Class (not children), so `DEFINES_METHOD` returned empty for every proto Class and zero `__route__grpc__` Routes were created. `pass_route_nodes` did emit Routes but in the old `__grpc__<svc>/<method>` format, which doesn't match Tier 1's `__route__grpc__<svc>/<method>` consumer-side QN.

**Fix:** when `DEFINES_METHOD` is empty, fall back to scanning proto Functions in the same file whose `start_line`/`end_line` falls within the service Class's range. Optimized to O(N+F) via a single pre-pass that collects all proto Classes and Functions into flat arrays.

**Diff:** `src/pipeline/pass_idl_scan.c` ~85 lines (pre-collection helpers + refactored visitor).

---

## Gap 4: graph UI dropped `linked_projects` so cross-galaxy never rendered

**Where:** `graph-ui/src/components/GraphTab.tsx`

`/api/layout` returns `{nodes, edges, total_nodes, linked_projects}` where each linked-project entry carries the satellite's nodes + edges + `cross_edges` (primary→linked id pairs). `GraphScene` already knew how to render satellites, but `GraphTab` rebuilt `filteredData` as `{nodes, edges, total_nodes}` and silently dropped `linked_projects`, so the scene received `data.linked_projects === undefined` on every render.

**Fix:** pass `linked_projects` through the `useMemo` and apply the same enabled-labels / enabled-edge-types filter inside satellites. Filter init + `enableAll` union-in labels and edge types from satellites so they're visible by default. The binary embeds the built UI via `scripts/embed-frontend.sh`; rebuild with `scripts/build.sh --with-ui`.

**Diff:** `graph-ui/src/components/GraphTab.tsx` ~25 lines.

---

## Adversarial-review follow-ups

Three additional findings from `/codex:adversarial-review`. Two fixed in-line; the third (route-key uniqueness) is mitigated rather than fully resolved.

### Gap 5: incremental indexing skipped `pass_idl_scan`

**Where:** `src/pipeline/pipeline_incremental.c::run_extract_resolve`

Sequential incremental called `cbm_pipeline_pass_idl_scan` without attaching a cache, so the pass returned early at `if (!ctx->result_cache)`. Parallel incremental built a cache for extract+resolve but never called the pass. Producer-side edges only refreshed on full reindex.

**Fix:** mirror the full-pipeline pattern in both branches — allocate a `CBMFileResult **` cache, attach to `ctx->result_cache`, run the pass, free.

**Diff:** `src/pipeline/pipeline_incremental.c` ~25 lines.

### Gap 6: project-wide stub-var name-only fallback could misattribute calls

**Where:** `src/pipeline/pass_idl_scan.c::idl_stub_var_arr_find`

The lookup ran function-scope exact, then class-scope, then a name-only fallback. The fallback is safe for `file_vars` (one TU) but unsafe for `class_vars` (project-wide) — two unrelated classes with a `_client` field would silently bind to each other.

**Fix:** thread `allow_name_only_fallback` flag. The `class_vars` call site passes `false` (fail closed); `file_vars` lookups keep it `true`.

**Diff:** `src/pipeline/pass_idl_scan.c` ~20 lines.

### Gap 7 (mitigation): gRPC route-key collisions across proto packages

**Where:** `src/pipeline/pass_idl_scan.c::idl_emit_route_for_rpc`

Routes are keyed `__route__grpc__<service>/<method>` using the bare service name. Two `.proto` files in different proto packages with the same `service` + `rpc` names will upsert to the same Route node. A symmetric FQN fix needs both producer and consumer to derive the same fully-qualified key, which the consumer side can't do from typed-client class names alone.

**Mitigation:** log `idl_scan.route_collision` when an existing Route's `file_path` differs from the incoming emission, and write the proto Class's `qualified_name` as a `service_qn` Route property so a future FQN-aware matcher can recover provenance.

**Full fix path:** tracked as Tier 1g in `cbm-cross-repo-proposal.md` §5.7. The four-piece sequence (producer dual emission, AST-time package extraction, deterministic collision resolution, NuGet/Maven consumer-side scan) ships as a focused follow-on PR if/when a fleet hits an actual collision. Five rounds of `/codex:adversarial-review` on a 1g.1 prototype showed it's preventive defense for a scenario that didn't fire in real-world validation, and a half-shipped 1g.1 without consumer-side derivation produces dormant nodes.

**Diff:** `src/pipeline/pass_idl_scan.c` ~15 lines.
15 changes: 15 additions & 0 deletions Makefile.cbm
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ PIPELINE_SRCS = \
src/pipeline/pass_configures.c \
src/pipeline/pass_configlink.c \
src/pipeline/pass_route_nodes.c \
src/pipeline/pass_idl_scan.c \
src/pipeline/pass_enrichment.c \
src/pipeline/pass_envscan.c \
src/pipeline/pass_compile_commands.c \
Expand Down Expand Up @@ -503,6 +504,20 @@ $(BUILD_DIR)/codebase-memory-mcp: $(MAIN_SRC) $(PROD_SRCS) $(EXTRACTION_SRCS) $(
cbm: $(BUILD_DIR)/codebase-memory-mcp
@echo "Built: $(BUILD_DIR)/codebase-memory-mcp"

# Standalone debug tool: dump C# extraction results for one file.
$(BUILD_DIR)/dump_csharp: $(OBJS_VENDORED_PROD) | $(BUILD_DIR)
$(CC) $(CFLAGS_PROD) -o $@ \
tests/dump_csharp.c \
$(FOUNDATION_SRCS) \
$(SIMHASH_SRCS) $(SEMANTIC_SRCS) \
src/pipeline/worker_pool.c \
$(EXTRACTION_SRCS) $(AC_LZ4_SRCS) $(ZSTD_SRCS) \
$(OBJS_VENDORED_PROD) \
$(LDFLAGS)

dump-csharp: $(BUILD_DIR)/dump_csharp
@echo "Built: $(BUILD_DIR)/dump_csharp"

# ── Build with embedded UI (requires Node.js) ───────────────────

# Swap embedded_stub.c for the generated embedded_assets.c
Expand Down
40 changes: 33 additions & 7 deletions graph-ui/src/components/EdgeLines.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ interface EdgeLinesProps {
edges: GraphEdge[];
highlightedIds: Set<number> | null;
opacity?: number;
/* Optional: when set, edge.target is looked up in this array instead of
* `nodes`. Used for cross-galaxy edges where source lives in the primary
* graph and target lives in a linked project's offset-adjusted nodes. */
targetNodes?: GraphNode[];
}

function getClusterKey(fp?: string): string {
Expand All @@ -28,17 +32,39 @@ const EDGE_TYPE_COLORS: Record<string, string> = {
IMPLEMENTS: "#f97316",
HTTP_CALLS: "#e11d48",
ASYNC_CALLS: "#ec4899",
GRPC_CALLS: "#f59e0b",
GRAPHQL_CALLS: "#e879f9",
TRPC_CALLS: "#a78bfa",
CROSS_HTTP_CALLS: "#fb923c",
CROSS_ASYNC_CALLS: "#fb7185",
CROSS_GRPC_CALLS: "#fbbf24",
CROSS_GRAPHQL_CALLS: "#f0abfc",
CROSS_TRPC_CALLS: "#c4b5fd",
CROSS_CHANNEL: "#fdba74",
MEMBER_OF: "#64748b",
TESTS_FILE: "#06b6d4",
};

const DEFAULT_EDGE_COLOR = "#1C8585";

export function EdgeLines({ nodes, edges, highlightedIds, opacity = 1.0 }: EdgeLinesProps) {
export function EdgeLines({
nodes,
edges,
highlightedIds,
opacity = 1.0,
targetNodes,
}: EdgeLinesProps) {
const geometry = useMemo(() => {
const idMap = new Map<number, number>();
const srcMap = new Map<number, number>();
for (let i = 0; i < nodes.length; i++) {
idMap.set(nodes[i].id, i);
srcMap.set(nodes[i].id, i);
}
const tgtArr = targetNodes ?? nodes;
const tgtMap = targetNodes ? new Map<number, number>() : srcMap;
if (targetNodes) {
for (let i = 0; i < targetNodes.length; i++) {
tgtMap.set(targetNodes[i].id, i);
}
}

const hasHighlight = highlightedIds && highlightedIds.size > 0;
Expand All @@ -47,12 +73,12 @@ export function EdgeLines({ nodes, edges, highlightedIds, opacity = 1.0 }: EdgeL
let validCount = 0;

for (const edge of edges) {
const si = idMap.get(edge.source);
const ti = idMap.get(edge.target);
const si = srcMap.get(edge.source);
const ti = tgtMap.get(edge.target);
if (si === undefined || ti === undefined) continue;

const s = nodes[si];
const t = nodes[ti];
const t = tgtArr[ti];

const sHL = !hasHighlight || highlightedIds.has(s.id);
const tHL = !hasHighlight || highlightedIds.has(t.id);
Expand Down Expand Up @@ -99,7 +125,7 @@ export function EdgeLines({ nodes, edges, highlightedIds, opacity = 1.0 }: EdgeL
new THREE.BufferAttribute(colors.slice(0, validCount * 6), 3),
);
return geo;
}, [nodes, edges, highlightedIds]);
}, [nodes, edges, highlightedIds, targetNodes]);

return (
<lineSegments geometry={geometry}>
Expand Down
11 changes: 11 additions & 0 deletions graph-ui/src/components/GraphScene.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,17 @@ export function GraphScene({
onClick={onNodeClick}
opacity={0.5}
/>
{/* Inter-galaxy CROSS_* edges: source is in primary, target in
* this linked project's offset nodes. */}
{lp.cross_edges && lp.cross_edges.length > 0 && (
<EdgeLines
nodes={data.nodes}
targetNodes={offsetNodes}
edges={lp.cross_edges}
highlightedIds={highlightedIds}
opacity={0.85}
/>
)}
</group>
);
})}
Expand Down
32 changes: 29 additions & 3 deletions graph-ui/src/components/GraphTab.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ export function GraphTab({ project }: GraphTabProps) {
if (!data) return;
const labels = new Set(data.nodes.map((n) => n.label));
const types = new Set(data.edges.map((e) => e.type));
for (const lp of data.linked_projects ?? []) {
for (const n of lp.nodes) labels.add(n.label);
for (const e of lp.edges) types.add(e.type);
for (const e of lp.cross_edges) types.add(e.type);
}
setEnabledLabels(labels);
setEnabledEdgeTypes(types);
}, [data]);
Expand All @@ -65,7 +70,21 @@ export function GraphTab({ project }: GraphTabProps) {
nodeIds.has(e.target),
);

return { nodes, edges, total_nodes: data.total_nodes };
const linked_projects = data.linked_projects?.map((lp) => {
const lpNodes = lp.nodes.filter((n) => enabledLabels.has(n.label));
const lpIds = new Set(lpNodes.map((n) => n.id));
const lpEdges = lp.edges.filter(
(e) =>
enabledEdgeTypes.has(e.type) && lpIds.has(e.source) && lpIds.has(e.target),
);
const crossEdges = lp.cross_edges.filter(
(e) =>
enabledEdgeTypes.has(e.type) && nodeIds.has(e.source) && lpIds.has(e.target),
);
return { ...lp, nodes: lpNodes, edges: lpEdges, cross_edges: crossEdges };
});

return { nodes, edges, total_nodes: data.total_nodes, linked_projects };
}, [data, enabledLabels, enabledEdgeTypes]);

useEffect(() => {
Expand Down Expand Up @@ -136,8 +155,15 @@ export function GraphTab({ project }: GraphTabProps) {

const enableAll = useCallback(() => {
if (!data) return;
setEnabledLabels(new Set(data.nodes.map((n) => n.label)));
setEnabledEdgeTypes(new Set(data.edges.map((e) => e.type)));
const labels = new Set(data.nodes.map((n) => n.label));
const types = new Set(data.edges.map((e) => e.type));
for (const lp of data.linked_projects ?? []) {
for (const n of lp.nodes) labels.add(n.label);
for (const e of lp.edges) types.add(e.type);
for (const e of lp.cross_edges) types.add(e.type);
}
setEnabledLabels(labels);
setEnabledEdgeTypes(types);
}, [data]);

const disableAll = useCallback(() => {
Expand Down
87 changes: 85 additions & 2 deletions internal/cbm/extract_defs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1893,6 +1893,55 @@ static void extract_class_def(CBMExtractCtx *ctx, TSNode node, const CBMLangSpec

// Extract class-level variables (field declarations)
extract_class_variables(ctx, node, spec);

// C# 12 primary-constructor parameters: declared on the class line
// (`class Foo(IBar bar, IBaz baz) : Base { ... }`) and bound to implicit
// captured fields accessible from any instance member. Tree-sitter c-sharp
// wraps them inside the hidden _class_declaration_initializer node, so the
// `parameters` field on class_declaration may not always resolve directly;
// iterate top-level children for parameter_list as a robust fallback.
if (ctx->language == CBM_LANG_CSHARP) {
TSNode primary_params = ts_node_child_by_field_name(node, TS_FIELD("parameters"));
if (ts_node_is_null(primary_params)) {
uint32_t total = ts_node_child_count(node);
for (uint32_t i = 0; i < total; i++) {
TSNode c = ts_node_child(node, i);
if (!ts_node_is_null(c) && strcmp(ts_node_type(c), "parameter_list") == 0) {
primary_params = c;
break;
}
}
}
if (!ts_node_is_null(primary_params)) {
uint32_t pcount = ts_node_child_count(primary_params);
for (uint32_t k = 0; k < pcount; k++) {
TSNode p = ts_node_child(primary_params, k);
if (ts_node_is_null(p) || !ts_node_is_named(p)) {
continue;
}
char *pname = resolve_param_name(a, p, ctx->source);
if (!pname || !pname[0]) {
continue;
}
char *ptype = resolve_param_type_text(a, p, ctx->source, ctx->language);
if (!ptype || !ptype[0]) {
continue;
}
CBMDefinition pdef;
memset(&pdef, 0, sizeof(pdef));
pdef.name = pname;
pdef.qualified_name = cbm_arena_sprintf(a, "%s.%s", class_qn, pname);
pdef.label = "Field";
pdef.file_path = ctx->rel_path;
pdef.parent_class = class_qn;
pdef.return_type = ptype;
pdef.start_line = ts_node_start_point(p).row + TS_LINE_OFFSET;
pdef.end_line = ts_node_end_point(p).row + TS_LINE_OFFSET;
pdef.is_exported = false;
cbm_defs_push(&ctx->result->defs, a, pdef);
}
}
}
}

// Find the body/members node inside a class node
Expand Down Expand Up @@ -2049,6 +2098,7 @@ static void push_method_def(CBMExtractCtx *ctx, TSNode child, const char *class_
TSNode params = ts_node_child_by_field_name(child, TS_FIELD("parameters"));
if (!ts_node_is_null(params)) {
def.signature = cbm_node_text(a, params, ctx->source);
def.param_names = extract_param_names(a, params, ctx->source, ctx->language);
def.param_types = extract_param_types(a, params, ctx->source, ctx->language);
}

Expand Down Expand Up @@ -2207,6 +2257,7 @@ static void extract_rust_impl(CBMExtractCtx *ctx, TSNode node, const CBMLangSpec
TSNode params = ts_node_child_by_field_name(child, TS_FIELD("parameters"));
if (!ts_node_is_null(params)) {
def.signature = cbm_node_text(a, params, ctx->source);
def.param_names = extract_param_names(a, params, ctx->source, ctx->language);
def.param_types = extract_param_types(a, params, ctx->source, ctx->language);
}

Expand Down Expand Up @@ -3203,8 +3254,41 @@ static void extract_class_fields(CBMExtractCtx *ctx, TSNode class_node, const ch
continue;
}

// Extract type from "type" field
/* Locate the field's "type" + name node. Two shapes:
* - direct (Java/Go/Rust/C/C++):
* field_declaration .type=identifier .declarator=variable_declarator(.name)
* - nested (C#):
* field_declaration > variable_declaration(.type=identifier,
* variable_declarator(.name))
* For the nested case, the child has no "type" field directly. Detect by
* walking named children for a variable_declaration. */
TSNode type_node = ts_node_child_by_field_name(child, TS_FIELD("type"));
TSNode name_node = ts_node_is_null(type_node) ? (TSNode){0} : resolve_field_name_node(child);

if (ts_node_is_null(type_node)) {
uint32_t cnc = ts_node_named_child_count(child);
for (uint32_t k = 0; k < cnc; k++) {
TSNode inner = ts_node_named_child(child, k);
if (strcmp(ts_node_type(inner), "variable_declaration") != 0) {
continue;
}
type_node = ts_node_child_by_field_name(inner, TS_FIELD("type"));
/* Find first variable_declarator child for the name. */
uint32_t nc = ts_node_named_child_count(inner);
for (uint32_t j = 0; j < nc; j++) {
TSNode vd = ts_node_named_child(inner, j);
if (strcmp(ts_node_type(vd), "variable_declarator") == 0) {
TSNode nm = ts_node_child_by_field_name(vd, TS_FIELD("name"));
if (!ts_node_is_null(nm)) {
name_node = nm;
break;
}
}
}
break;
}
}

if (ts_node_is_null(type_node)) {
continue;
}
Expand All @@ -3213,7 +3297,6 @@ static void extract_class_fields(CBMExtractCtx *ctx, TSNode class_node, const ch
continue;
}

TSNode name_node = resolve_field_name_node(child);
if (ts_node_is_null(name_node)) {
continue;
}
Expand Down
Loading