← All benchmarks
May 2026

Django · NetBox

Django internals where confident guesses fail. NetBox is a real Django codebase with cross-app signals, object-level permission QuerySets, GenericRelations, and weighted search indexes. This benchmark tests whether an agent can follow the implementation — not just recall Django patterns.

View on GitHub ↗
Stack
Django / Python · ~1,165 Python files, 9 apps
Tessra version
v2.19.x
Date evaluated
May 2026
Branch / snapshot
main branch, May 2026 snapshot
Claude Haiku 4.5
12.375 / 25 (49.5%) 24.0 / 25 (96%) +11.625 pts (+46.5 pp)
Claude Sonnet 4.6
13.25 / 25 (53%) 23.125 / 25 (92.5%) +9.875 pts (+39.5 pp)
Tessra improved both lower-cost and premium models by grounding answers in the current implementation instead of generic Django or NetBox assumptions.
Case Question Without Tessra With Tessra Δ
01 What signal receivers fire on Site.save()? Which models in other apps update? Haiku 1.25 / 5 · Sonnet 1.25 / 5 Haiku 5.0 / 5 · Sonnet 3.125 / 5 Haiku +3.75 · Sonnet +1.875
03 What are Device's direct parents? How many mixins compose NetBoxFeatureSet? Haiku 4.0 / 5 · Sonnet 4.0 / 5 Haiku 5.0 / 5 · Sonnet 5.0 / 5 Haiku +1.0 · Sonnet +1.0
05 Where does Interface._site update when a Rack changes Site? Haiku 4.125 / 5 · Sonnet 4.0 / 5 Haiku 5.0 / 5 · Sonnet 5.0 / 5 Haiku +0.875 · Sonnet +1.0
07 How does Device.objects.restrict(user) apply object-level permissions? Haiku 0.625 / 5 · Sonnet 1.625 / 5 Haiku 5.0 / 5 · Sonnet 5.0 / 5 Haiku +4.375 · Sonnet +3.375
08 What fields does DeviceIndex index for global search, and with what weights? Haiku 2.375 / 5 · Sonnet 2.375 / 5 Haiku 4.0 / 5 · Sonnet 5.0 / 5 Haiku +1.625 · Sonnet +2.625
restrict() uses pk__in subquery, not .distinct()
Both models confidently guessed the common Django answer: .filter(...).distinct(). The actual NetBox code does something more surgical: it builds a separate pk__in subquery to avoid duplicate rows without applying DISTINCT to the outer query. Tessra surfaced the exact implementation and the inline comment explaining why.
Without Tessra
With Tessra
# What all models guessed without Tessra:
return self.filter(constraints_q).distinct()
# Actual code in utilities/querysets.py:64-68:
allowed_objects = self.model.objects.filter(attrs)
return self.filter(pk__in=allowed_objects)
# "#8715: Avoid duplicates when JOIN on M2M without DISTINCT"
.distinct() affects the entire outer query. The pk__in subquery is surgical and avoids side effects on M2M joins.
Cross-app signal cascade hidden from the origin model
Case 01 asks what updates when Site.save() fires signals. The answer is not visible from the Site model alone: a receiver propagates cached field updates to Prefix, Cluster, and WirelessLAN via bulk_update. Without Tessra, the models missed or confused the cascade. With Tessra, they identified the receiver, the affected apps, and the updated fields.
DeviceIndex weights: asset_tag outranks name
Without Tessra, both models assumed name was the highest-priority search field and returned wrong weights for serial and asset_tag. The code shows asset_tag=50 has higher priority than name=100 because lower numbers rank higher.
Without Tessra
With Tessra
# What models assumed:
('name', 100),      # "name is always most important"
('serial', 500),    # invented
('asset_tag', 500)  # invented
# Actual code in dcim/search.py:55-62:
('asset_tag',       50),   # ← HIGHER priority (lower number)
('serial',          60),
('name',           100),
('virtual_chassis', 200),
('description',    500),
('comments',      5000),
This benchmark measures architectural navigation on NetBox only. Cases where a model already knew the answer from public training data were excluded from the canonical panel. Results may vary on other Django projects with different architecture, conventions, and documentation. LLM outputs are non-deterministic.
Try it yourself

See what Tessra surfaces in your repo.

Index an Angular, Django, or Flutter repo and try local context for 7 days.