A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a
huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation?
This one nails all the requirements since you can deploy it entirely on your own Databricks setup, so no confidential info leaves your infrastructure. It's also at the top tier for open-weight LLMs quality-wise, which matters here since latency and throughput aren't priority. I think it's the most compliant choice for high-sensitivity cases like this, unless someone knows a better private LLM option?
If we leave out latency and cost, and the model must stay fully in-house for compliance, does Llama2-70B really match up to DBRX Instruct for answer quality in a Databricks-centric environment?
Is DBRX INSTRUCT actually the best choice if the org cares only about max quality and full on-prem for compliance? Had something like this in a mock and they wanted an open weights model that could be airgapped. Wouldn't Llama2-70B also qualify if set up correctly, or does DBRX currently beat it in answer quality?