← All benchmarks
June 2026
Flutter · Ente Photos
Open source end-to-end encrypted photo storage app with a Flutter frontend. Tested on the main branch available in June 2026. The benchmark focuses on cross-module flows across UI, services, gateway calls, local DB access, and state propagation.
View on GitHub ↗Benchmark setup
Results summary
Claude Haiku 4.5
core flow → connected risk directional
Baseline search found several core mechanisms. Tessra helped connect those findings to cross-module context, callers, and product risk.
Validated case · SelectionState
2 / 3 → 3 / 3 +1 pt
Validated case: SelectionState. The baseline found the core mechanism with normal repo search. Tessra added targeted symbol context and produced a cleaner architecture-level explanation. This is not a zero-to-perfect case.
Results per case
| Case | Question | Baseline result | With Tessra | Observation |
|---|---|---|---|---|
| 01 | How does DeduplicationService._getDuplicateFiles() decide whether to make the HTTP call, and how does it group files in two different ways? | Solved with normal repo search | Solved | Simple local lookup; not a strong differentiator |
| 02 | In DeleteSuggestionsPage, trace the complete chain from asyncLoader to the local DB: service, gateway, HTTP endpoint, intermediate Dart type, and final DB flag. | Partial chain | More complete chain | Cross-module tracing improved |
| 03 | In trashFilesOnServer(), what ownership validation is performed? What fallback runs if collectionID is not owned? What happens if the fallback also fails? | Core fallback found | Fallback + downstream risk | Tessra connected callers, local deletion, and user-visible risk |
| 04 | What HTTP endpoint and batch size does the hasMigratedSizes backfill use? | Endpoint or batch size may be found by search | Endpoint + batch context | Concrete implementation detail |
| 05 | Why does SelectionState's InheritedWidget have updateShouldNotify=false? How does the state actually propagate? | 2 / 3 — found the core mechanism with normal repo search | 3 / 3 — stronger architecture-level answer with targeted symbol context | Tessra improved answer quality, focus, and usefulness. This is not a zero-to-perfect case. |
Key findings
Context completeness and cognitive leverage for architectural tracing
Baseline search found core mechanisms in several cases. The harder part was connecting those findings to the full flow: UI, services, gateway, local DB, state propagation, and user-visible behavior. Tessra added cognitive leverage: it gave the agent a clearer working map of the repo, connected relevant symbols, and produced more useful engineering explanations.
Example: in Case 03, normal search found the fallback. Tessra connected that fallback to callers, local deletion, and consistency risk. In Case 05, the baseline reached 2/3; Tessra raised it to 3/3 with targeted symbol context.
The win is not that Tessra finds a file. The win is that it helps the agent turn scattered code paths into an engineering-level explanation.
Silent file drop — risk beyond the first lookup
Normal search found the core fallback: ownership validation, search for another owned collection, and severe logging when no fallback exists. Tessra went further: it connected that flow to callers, local DB deletion, and user-visible risk. The point was not finding the function; it was understanding what could become inconsistent afterward.
Known limitations
This benchmark should be read as evidence of improved context completeness and cognitive leverage, not as a promise of perfect answers. Some cases are partially or fully solvable with normal repository search. The value is in evaluating whether the agent can connect core mechanisms to cross-module navigation, fallbacks, callers, and downstream risk. Results may vary across repositories and model runs.
Try it yourself
See what Tessra surfaces in your repo.
Index an Angular, Django, or Flutter repo and try local context for 7 days.