Q: 2
A Generative Al Engineer is building a RAG application that answers questions about internal
documents for the company SnoPen AI.
The source documents may contain a significant amount of irrelevant content, such as
advertisements, sports news, or entertainment news, or content about other companies.
Which approach is advisable when building a RAG application to achieve this goal of filtering
irrelevant information?
Options
Discussion
Option C. D is a trap since just chunking docs won't actually stop off-topic answers.
C , that's the go-to for filtering out unrelated stuff at the model level. Directly controls answer scope even with messy docs.
C or D?
Pretty sure C is what they're looking for here since a system prompt directly tells the model to ignore unrelated questions, acting as a hard filter. Consolidating docs (D) makes retrieval cleaner but doesn't really block the irrelevant info from getting into answers if the retriever messes up. Let me know if you see it differently!
Call it C is the right pick. Setting clear boundaries in the system prompt tells the model what not to answer, so even if noisy or irrelevant docs slip through retrieval, you still get focused output. D can help some by grouping docs, but doesn't guarantee filtering out unrelated content. Unless I'm missing something, C is key for relevance-anyone see it differently?
My vote is C. Including the rule in the system prompt actually tells the model not to answer unrelated questions, so it directly filters out off-topic stuff. D helps with retrieval but doesn't fully block irrelevant responses. Pretty sure this is what they want here, agree?
Probably C. Adding guardrails to the system prompt keeps the RAG app on-topic, even if retrieval brings in messy or irrelevant content. D sounds good but isn't enough by itself - context window can still get noisy data mixed in.
Not C, D. Consolidating SnoPen AI docs into a single chunk in the vector DB might help retrieval focus, so I think that's the way here.
Be respectful. No spam.