A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values. Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)
Call it C is the right pick. Setting clear boundaries in the system prompt tells the model what not to answer, so even if noisy or irrelevant docs slip through retrieval, you still get focused output. D can help some by grouping docs, but doesn't guarantee filtering out unrelated content. Unless I'm missing something, C is key for relevance-anyone see it differently?
Option B makes sense, since letting the LLM read chat logs and then present booking options as buttons helps automate actual bookings without human help. The others don't really drive the booking flow. I think B is best for this use case, but open to other takes.
I think B might be right here because setting clear user expectations about RAG behavior sounds like a decent mitigation step. Letting users know what to expect could help with perception of outputs, especially if some risk of offensive content remains. Not totally sure though, since D does involve more direct control. Agree?
A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a
huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation?
This one nails all the requirements since you can deploy it entirely on your own Databricks setup, so no confidential info leaves your infrastructure. It's also at the top tier for open-weight LLMs quality-wise, which matters here since latency and throughput aren't priority. I think it's the most compliant choice for high-sensitivity cases like this, unless someone knows a better private LLM option?
If we leave out latency and cost, and the model must stay fully in-house for compliance, does Llama2-70B really match up to DBRX Instruct for answer quality in a Databricks-centric environment?
Is DBRX INSTRUCT actually the best choice if the org cares only about max quality and full on-prem for compliance? Had something like this in a mock and they wanted an open weights model that could be airgapped. Wouldn't Llama2-70B also qualify if set up correctly, or does DBRX currently beat it in answer quality?
Looks like B is right since evaluation should only happen after the LLM generates a response, otherwise you don't have anything to test. The order in A mixes that up, and D gets the workflow out of sequence. Pretty sure about this but let me know if I'm missing something.
My vote is it's B here. You want to evaluate your model after it generates a response but before deploying, not earlier or in the wrong order. Option A swaps those steps, and D starts with user queries before even loading data, which doesn't fit typical RAG workflows. If anyone's seen different in Databricks docs let me know!