Back to Research Papers

Self-RAG: Learning to Retrieve, Generate, and Critique

View on arXivICLR 2024Retrieval & RAG

Authors: Asai, Wu, Wang, et al.

Abstract

Self-RAG enhances retrieval-augmented generation by training models to reflect on their own process. Special reflection tokens enable the model to decide when to retrieve, evaluate relevance, assess support, and judge utility - creating a self-correcting RAG system.

Key Contributions

  • 1.Reflection tokens for metacognitive reasoning
  • 2.[Retrieve]: Should I retrieve additional information?
  • 3.[IsRel]: Is the retrieved passage relevant?
  • 4.[IsSup]: Is my generation supported by the passage?
  • 5.[IsUse]: Is the retrieval useful for the task?
  • 6.Adaptive retrieval - only fetch when needed
  • 7.Outperforms ChatGPT and Llama2-chat on multiple benchmarks

Implementation in Awareness

Awareness uses Self-RAG principles for intelligent retrieval:

Adaptive Retrieval Decision:

- Before generating, LLM evaluates: "Do I need more information?"

- Only trigger vector search if [Retrieve] = Yes

- Saves compute and reduces irrelevant context

Relevance Filtering:

- After retrieval, score each result

- [IsRel] token indicates if passage is on-topic

- Discard low-relevance retrievals

Support Verification:

- During generation, check if claims are supported

- [IsSup] verifies grounding in retrieved docs

- Reduces hallucination risk

Utility Assessment:

- Post-generation reflection

- [IsUse] evaluates if retrieval helped

- Learn when retrieval is most valuable

Code Example

// Self-RAG adaptive retrieval in Awareness
async function generateWithSelfRAG(query: string) {
  // Step 1: Decide if retrieval is needed
  const shouldRetrieve = await llm.reflect({
    prompt: "Should I retrieve info to answer this?",
    query,
    token: "[Retrieve]"
  });

  let context = "";
  if (shouldRetrieve) {
    const results = await vectorStore.search(query);

    // Step 2: Filter by relevance
    const relevant = results.filter(r =>
      llm.reflect({
        prompt: "Is this passage relevant?",
        passage: r.content,
        token: "[IsRel]"
      })
    );

    context = relevant.map(r => r.content).join("\n");
  }

  // Step 3: Generate with support verification
  const response = await llm.generate({
    query,
    context,
    verifySupport: true // [IsSup] checking
  });

  return response;
}

Usage in Awareness

Reflection-based retrieval control - when and what to retrieve

Related Papers