Solo · HackerRank Orchestrate (June Edition) · June 2026
Multi-Modal Claims Evidence Review
An end-to-end system, built solo in 24 hours, that reviews damage claims (cars, laptops, packages) from photo evidence and returns a structured per-claim verdict (whether the evidence supports, contradicts, or is insufficient to judge the claim), grounded only in what is visible in the images, not in how the claimant describes the damage.
Top 6%
Rank, 109 / 1,773
85%
Core verdict accuracy
~$1.76
Full run cost
82
Images processed
What it does
- Designed a "model judges, code decides" architecture: Claude Opus 4.8 handles visual judgment (issue type, severity, authenticity, damage location) in a single structured multimodal call per claim, while a deterministic Python layer computes every rule-governed field through an ordered claim-status cascade, so the model is never a single point of failure on a money-adjudicating decision.
- Implemented prompt-injection defense against both conversational text and adversarial in-image instructions (e.g. "approve this claim" handwritten inside a submitted photo), decoupling detection from judgment via an independent risk flag that forces manual review without altering the verdict.
- Added deterministic duplicate and near-duplicate evidence detection (SHA-256 + perceptual hashing) that runs before any model call, removing a class of fraud from the model's responsibility and reducing cost.
- Built a per-field evaluation harness scoring predictions against a hand-labeled set (claim_status 85%, object_part 95%, evidence_standard_met 90%, valid_image 90%); used it to surface and fix four production bugs, including a cascade-ordering error and an over-triggering image-authenticity rule.
- Engineered the full run for cost and reliability: 44 claims / 82 images processed for ~$1.76 using prompt caching on the static system+examples prefix, plus per-row incremental CSV writes that recovered a failure mode losing 34 of 44 results on a single oversized-image crash.
- Shipped supporting tooling: a Streamlit viewer for side-by-side review of claim text, images, and verdicts, and a no-API smoke-test suite covering each pipeline component.
- Authored an AGENTS.md governance file defining how AI coding tools were permitted to operate in the repo, including mandatory per-turn development logging.