Tested on real repos.
Results you can verify.
4 benchmarks on public open source repos: Angular, Django, and Flutter. Every prompt, evaluation criterion, result, and limitation is available for review.
Tessra does more than find files. It gives AI agents structured context to understand relationships, impact, and architecture in real repos.
pp = percentage points; some benchmarks use points or case-level criteria.
Four benchmarks. Three stacks. Fully reviewable.
Each benchmark ran on a real, public repository. The prompts, evaluation criteria, results, and known limitations are available for review.
| Repo | Stack | Size | Model | Without Tessra | With Tessra | Improvement | What it tested |
|---|---|---|---|---|---|---|---|
| ThingsBoard | Angular 17+ | ~8 000 | Sonnet 4.6 | 7% | 84% | +77 pp | DI inject(), lazy routes, non-obvious callers |
| NetBox | Django / Python | ~1 165 | Haiku 4.5 | 49.5% | 96% | +46.5 pp | Signals, QuerySet internals, SearchIndex weights |
| ngrx-platform | Angular + NgRx Nx | ~1 379 | Conservative local comparison | 8 / 10 | 9 / 10 | +1 pt | NgRx internals, workflow leverage, architectural tracing |
| Ente Photos | Flutter / Dart | ~4 061 | Validated case | 2 / 3 | 3 / 3 | +1 criterion | Cognitive leverage, architectural tracing, cross-module flows |
Most rows show normalized benchmark scores. Ente Photos is shown as a focused validated case: 2/3 to 3/3 on SelectionState, plus directional evidence of stronger architectural tracing in cross-module flows.
Improvements are shown in the unit that fits each benchmark: percentage points (pp), points, or criteria.
In these benchmarks, Tessra helped models produce more complete architecture-level answers, reduce blind exploration, and in several cases approach or exceed premium-model baselines.
Some public repos may already be well represented in frontier-model training data. When a case is already solved or mostly solved without Tessra, it is marked as saturated or directional and excluded from strong lift claims.
Angular on 8,000 TypeScript files
Real lazy routing, modern inject() DI, and deeply nested routes: questions where text search alone is not enough to explain the full relationship.
Test cases
Django internals across 9 apps
Cross-app signal receivers, QuerySet permission logic, and SearchIndex weights: internal implementation details that are not solved by public documentation alone.
Test cases
NgRx internals: conservative +1 lift, stronger navigation
On this public NgRx monorepo, Tessra reached 9/10 in a verified local-code run. Against a conservative no-Tessra baseline of 8/10, Tessra shows a +1 point lift and clearer navigation through effects internals, specs, entity adapters, and router-store state behavior.
Test cases
More cognitive leverage for architectural tracing
Cross-module flows across ≥4 module boundaries in an active Flutter app. This benchmark measures context completeness and cognitive leverage, not failure recovery.
Baseline found the core mechanism. Tessra improved answer quality, focus, and completeness. Not a zero-to-perfect case.
Test cases
Cases where file search is not enough.
A model can answer basic questions when it knows the public API. These cases test something harder: following internal relationships across modules, services, routes, signals, effects, tests, and dependencies.
What this means for developers.
Large repos are not hard because files are hidden. They are hard because the answer is spread across routes, services, state, effects, signals, serializers, querysets, widgets, and APIs. Tessra gives that structure to the agent so it knows what connects to what before touching code. The point is not memorizing these repos; the point is giving the model real code relationships when it needs them.
Inspect the evidence.
Every benchmark has a public report with tested questions, per-case results, key findings, and known limitations. No sign-up required.
What each report includes
These benchmarks measure architectural navigation and context quality, not code generation. Results may vary across repos with different structural patterns. Model responses are non-deterministic — individual scores may differ across runs.
See what Tessra surfaces in your repo.
Index an Angular, Django, or Flutter repo and try local context for 7 days.