“Enabling tools on SimpleQA Verified results in near perfect performance.”
OpenAI reports GPT-5’s hallucination rate as 9.6%[1] with browsing enabled. But this is a blended average. GPT-5 only searches 31%[3] of the time. With browsing disabled, the rate is 47%[1].
For the 69% of queries where GPT-5 doesn’t search[3], users get the high-error regime — silently, without warning, disguised as the same confident output.
Sources: OpenAI GPT-5 System Card · Nectiv/SearchEngineLand · OpenAI GPT-5.2 Update
Capability ≠ Propensity.
Having tools is not the same as using them. The models can search. They choose not to. And when they choose wrong, they fabricate with confidence.
Tool Availability vs. Tool Usage
| Metric | Google / OpenAI | VERITAS |
|---|---|---|
| Philosophy | “Let the model decide” | “Force the model” |
| Architecture | Optional Retrieval (RAG) | Mandatory Pipeline (C1–C6) |
| Search Trigger Rate | 31% (GPT-5)[3] / <50% (Gemini)[12] | 100% (hardcoded) |
| Failure Mode | Hallucination (fabrication) | Refusal (silence) |
| Cost Incentive | Minimize search (save $) | Maximize truth (scrape everything) |
| When It Doesn’t Know | Invents an answer (88–93%)[4] | Says “I don’t know” (9% refusals) |
The economic incentive is hallucination. Google charges $14–$35 per 1,000 search grounding queries[35]. Every search adds latency and compute cost. RLHF training rewards confident answers over honest refusals[4]. The model’s parametric confidence — even when misplaced — is the cheaper path. Hallucination is not a bug. It’s a cost optimization.
Xu et al. (2024) proved hallucination is mathematically inevitable in autoregressive LLMs[26]. You can’t solve it inside the model. So we solve it around it.
Google DeepMind · N=1,000 · 47 models evaluated · Kaggle 2025/26
| # | Model | F-Score | Fabrication | Cost / 1k Queries |
|---|
No made-up figures. Here is exactly what Veritas costs, how it’s calculated, and how it compares.
Basis: Gemini 2.5 Flash Lite[35] — $0.075/1M input, $0.30/1M output (Google AI pricing, Feb 2026). 6 LLM calls per query (C1–C6) + 20 web scrapes via Camoufox (free, no API fee).
| Model | Input $/1M | Output $/1M | + Search Fee |
|---|---|---|---|
| Gemini 2.5 Flash Lite[35] | $0.075 | $0.30 | — |
| GPT-5[34] | $2.00 | $8.00 | incl. |
| Gemini 3 Pro[35] | $1.25 | $5.00 | +$14/1k grounding[16] |
| o3[34] | $10.00 | $40.00 | — |
| Claude Opus 4.5[36] | $15.00 | $75.00 | — |
Sources: Google AI Studio[35], OpenAI API pricing[34], Anthropic API pricing[36]. Prices as of Feb 2026. Subject to change. Veritas uses 6 Flash Lite calls per Ask query (~2k tokens each) = total ~12k tokens. Competitors use 1 call per query (~500 in, ~200 out) but skip verification.
The speed difference IS the accuracy difference. Models that respond in 3 seconds don’t search. Models that search take time. Veritas always searches — that’s why it takes 115 seconds per Ask query and why it never fabricates. 89.1% F-Score with the cheapest model on the market. Architecture beats budget.
Not all wrong answers are equal. The cause defines the consequence.
Every prompt. Every answer. Every source count. We don’t hide errors — we analyze them.
| # | Query | Status | Scrapes |
|---|