A thread on how to stress-test RAG citations so users stop getting confident wrong answers
Your RAG app can confidently invent facts and cite sources that don't exist. How do you stop it from misleading users? The answer isn't just better prompts—it's rigorous stress-testing. Here's how to break your RAG's citations before your users do.
🧵 1/6
Headline: FAKE CITATIONS
AI tools can "hallucinate" sources, providing real-looking but completely fake citations[15]. They present fabricated information as if it were true, without warning[15].
Why it matters: This isn't just an error; it's a critical trust failure.
🧵 2/6
Headline: FAITHFULNESS CHECKS
First, test for faithfulness: does the answer stay true to the retrieved documents[4]? An LLM-as-a-judge can verify if every claim in your answer can be traced back to the source material[2][4].
Why it matters: This is your baseline defense against hallucination.
🧵 3/6
Headline: HALLUCINATION TESTING
Go beyond the happy path. Stress-test by asking questions the system *shouldn't* be able to answer from its knowledge base[2]. A good response is "I don't have that information," not a fabricated answer[2].
Why it matters: This tests if your RAG fails gracefully.
🧵 4/6
Headline: ADVERSARIAL ATTACKS
Deliberately try to break the system with adversarial tests[2]. Use prompt injections, jailbreaks, and queries on forbidden topics to see if it bypasses safeguards[2].
Why it matters: This uncovers unsafe behavior before attackers do.
🧵 5/6
Ultimately, you are responsible for what you submit[15]. If an AI suggests a source, always verify it yourself. If you can't find a citation with a quick search, it probably doesn't exist[15].
What's the wildest AI hallucination you've seen?
🧵 6/6
Sign Up To Try Advanced Features
Get more accurate answers with Super Pandi, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.