Authors: Asai, Wu, Wang, et al.
Self-RAG enhances retrieval-augmented generation by training models to reflect on their own process. Special reflection tokens enable the model to decide when to retrieve, evaluate relevance, assess support, and judge utility - creating a self-correcting RAG system.
Awareness uses Self-RAG principles for intelligent retrieval:
- Before generating, LLM evaluates: "Do I need more information?"
- Only trigger vector search if [Retrieve] = Yes
- Saves compute and reduces irrelevant context
- After retrieval, score each result
- [IsRel] token indicates if passage is on-topic
- Discard low-relevance retrievals
- During generation, check if claims are supported
- [IsSup] verifies grounding in retrieved docs
- Reduces hallucination risk
- Post-generation reflection
- [IsUse] evaluates if retrieval helped
- Learn when retrieval is most valuable
// Self-RAG adaptive retrieval in Awareness
async function generateWithSelfRAG(query: string) {
// Step 1: Decide if retrieval is needed
const shouldRetrieve = await llm.reflect({
prompt: "Should I retrieve info to answer this?",
query,
token: "[Retrieve]"
});
let context = "";
if (shouldRetrieve) {
const results = await vectorStore.search(query);
// Step 2: Filter by relevance
const relevant = results.filter(r =>
llm.reflect({
prompt: "Is this passage relevant?",
passage: r.content,
token: "[IsRel]"
})
);
context = relevant.map(r => r.content).join("\n");
}
// Step 3: Generate with support verification
const response = await llm.generate({
query,
context,
verifySupport: true // [IsSup] checking
});
return response;
}Reflection-based retrieval control - when and what to retrieve