View Generative AI Engineer Associate Exam Questions

Q: 1

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values. Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)

Options

Correct Answer:

C, E

Explanation

The goal is to replace an intuition-based chunking strategy with a methodical, data-driven one. This requires establishing an evaluation framework to measure the performance of different chunking approaches.

Option (C) proposes a classic information retrieval methodology: define a quantitative metric (like recall or NDCG) and systematically experiment with different chunking strategies (e.g., varying size, splitting by paragraphs). The strategy that maximizes the metric is chosen.

Option (E) presents a modern, LLM-centric evaluation approach. An "LLM-as-a-judge" is used to score the quality and relevance of retrieved chunks against a set of known questions. The chunking parameters are then tuned to optimize this quality score. Both options represent systematic, evaluation-led optimization strategies directly targeting the chunking process.

Why Incorrect

A. Changing embedding models is a valid RAG optimization, but it does not address the specific task of optimizing the chunking strategy.

B. This improves the retrieval step by pre-filtering, but it does not optimize how the source documents are chunked in the first place.

D. This method is unreliable and not a standard practice. Relying on an LLM to guess an optimal token count is less robust than empirical evaluation.

References

1. Databricks Documentation, Mosaic AI Agent Framework, Evaluate RAG chains: The documentation details the use of evaluation metrics to assess RAG quality. It states, "After you have a baseline RAG chain, the next step is to improve it. This is an iterative process that is guided by metrics...". This supports the systematic, metric-driven approach described in options (C) and (E). The documentation explicitly lists metrics like contextrecall and contextprecision which are used to evaluate the retrieval component that is directly affected by the chunking strategy.

2. Databricks Blog, "LLM-as-a-Judge is a Gold Standard for LLM Evaluation" (July 25, 2023): This article explains the methodology of using a powerful LLM to evaluate the output of another model, which is the core concept of option (E). It notes, "LLM-as-a-judge...can be used to evaluate the quality of model responses on a variety of criteria," which would include the quality of retrieved context for a RAG application.

3. Databricks Blog, "Building High-Quality RAG Applications with Databricks" (May 2, 2024): In the "Chunking" section, the article emphasizes the importance of evaluation: "The choice of chunking strategy can significantly impact retrieval quality... It is important to evaluate the impact of different chunking strategies on your specific use case." This directly validates the experimental approaches in both (C) and (E).

Q: 2

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies. Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

Options

Discussion

Vikram M. Jan 14, 2026 12:22 am

Option C. D is a trap since just chunking docs won't actually stop off-topic answers.

Luna N. Jan 22, 2026 1:21 am

C , that's the go-to for filtering out unrelated stuff at the model level. Directly controls answer scope even with messy docs.

PreciseReviewer3822 Jan 29, 2026 7:53 pm

C or D? Pretty sure C is what they're looking for here since a system prompt directly tells the model to ignore unrelated questions, acting as a hard filter. Consolidating docs (D) makes retrieval cleaner but doesn't really block the irrelevant info from getting into answers if the retriever messes up. Let me know if you see it differently!

Ajay Jan 26, 2026 5:24 am

Call it C is the right pick. Setting clear boundaries in the system prompt tells the model what not to answer, so even if noisy or irrelevant docs slip through retrieval, you still get focused output. D can help some by grouping docs, but doesn't guarantee filtering out unrelated content. Unless I'm missing something, C is key for relevance-anyone see it differently?

Riley P. Jan 21, 2026 8:48 am

My vote is C. Including the rule in the system prompt actually tells the model not to answer unrelated questions, so it directly filters out off-topic stuff. D helps with retrieval but doesn't fully block irrelevant responses. Pretty sure this is what they want here, agree?

MayaB Jan 14, 2026 12:24 am

Probably C. Adding guardrails to the system prompt keeps the RAG app on-topic, even if retrieval brings in messy or irrelevant content. D sounds good but isn't enough by itself - context window can still get noisy data mixed in.

Quinn X. Jan 20, 2026 2:08 pm

Not C, D. Consolidating SnoPen AI docs into a single chunk in the vector DB might help retrieval focus, so I think that's the way here.

Be respectful. No spam.

Correct Answer:

Explanation

The most advisable and direct approach among the options is to use the system prompt to set explicit rules for the model's behavior. By instructing the application not to answer questions unrelated to SnoPen AI, you are implementing a "guardrail." This constrains the language model's generation step, ensuring that even if the retriever mistakenly fetches irrelevant context from the noisy source documents, the final output remains focused on the intended topic. This is a standard and effective prompt engineering technique for controlling the scope and relevance of a RAG application's responses.

Why Incorrect

A. Keeping all articles pollutes the vector index with irrelevant data, which degrades retrieval quality, increases the chance of hallucination, and raises operational costs.

B. Forcing the model to assume all retrieved context is about SnoPen AI is dangerous; it will directly lead to factual inaccuracies and hallucinations when irrelevant content is retrieved.

D. Consolidating all documents into a single chunk is technically incorrect for a RAG system. It makes targeted retrieval impossible and would likely exceed the model's context window.

References

1. Databricks Official Documentation, "Query foundation models": The documentation for the Foundation Model APIs explains the systemprompt parameter. It states, "A system prompt can be used to specify the persona and rules for the model." This directly supports using the system prompt to enforce a rule like not answering off-topic questions.

Source: Databricks AI and Machine Learning Documentation > Foundation models > Query foundation models.

2. Databricks Blog, "Building High-Quality RAG Applications": This official blog post discusses techniques for improving RAG quality. It emphasizes the importance of the prompt template in guiding the model, stating, "The prompt template is critical for instructing the LLM on how to use the provided context to answer the user’s question... It can also include instructions on what to do if the answer is not in the context." This principle directly applies to instructing the model not to answer questions outside its designated scope (SnoPen AI).

Source: Databricks Blog, "Building High-Quality RAG Applications", section on "Prompt Engineering".

3. Gao, Y., et al. (2023). "Retrieval-Augmented Generation for Large Language Models: A Survey." This academic survey on RAG discusses the generator component. It highlights that the generator's role is to synthesize an answer based on the prompt and retrieved documents. Controlling the generator through carefully crafted prompts (prompt engineering) is a key technique to ensure the final output is relevant and faithful to the provided context.

Source: Gao, Y., et al. (2023). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv preprint arXiv:2312.10997, Section 4.2 "Generator". DOI: https://doi.org/10.48550/arXiv.2312.10997

Q: 3

A Generative Al Engineer is tasked with developing an application that is based on an open source large language model (LLM). They need a foundation LLM with a large context window. Which model fits this need?

Options

Correct Answer:

Explanation

The primary requirement is a foundation LLM with a large context window. Among the options provided, DBRX has the largest context window by a significant margin. DBRX is a state-of-the-art, open-source, mixture-of-experts (MoE) model from Databricks that was trained with a context length of 32,768 (32k) tokens. This extensive context window makes it highly suitable for applications that need to process and generate text based on long documents, detailed histories, or complex instructions, directly addressing the engineer's need.

Why Incorrect

A. DistilBERT: This is a smaller, distilled version of BERT with a maximum context window of only 512 tokens. It is also primarily an encoder model, not a generative one.

B. MPT-30B: The base MPT-30B model has a 2,048 (2k) token context window, with some fine-tuned variants extending to 8,192 (8k) tokens, which is still considerably smaller than DBRX.

C. Llama2-70B: The Llama 2 family of models was trained with a context length of 4,096 (4k) tokens. While larger than DistilBERT, this is only one-eighth the size of DBRX's context window.

References

1. DBRX: Databricks. (2024, March 27). Introducing DBRX: A New State-of-the-Art Open LLM. Databricks Blog. "DBRX has a 32k token context window". Link

2. Llama2-70B: Touvron, H., et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. Meta AI. Section 2.1, "Pretraining," states, "We adopt most of the pretraining setting and model architecture from Llama 1... we increase the context length to 4096." Link

3. MPT-30B: MosaicML. (2023, June 22). Introducing MPT-30B: A New Standard for Open-Source, Commercially Usable LLMs. Databricks Blog. "MPT-30B is a decoder-style transformer with 30B parameters. It was pretrained on 1T tokens of English text and code. The base model has a context window of 2k tokens." Link

4. DistilBERT: Sanh, V., et al. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Hugging Face. The official model card for distilbert-base-uncased specifies, "The maximum input sequence length is 512." Link

Q: 4

A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform. Which input/output pair will support their goal?

Options

Discussion

ThoughtfulSec1776 Jan 23, 2026 9:03 pm

Option B makes sense, since letting the LLM read chat logs and then present booking options as buttons helps automate actual bookings without human help. The others don't really drive the booking flow. I think B is best for this use case, but open to other takes.

Leo Z. Jan 18, 2026 2:47 pm

Option B, saw something super similar in a practice exam and B was correct there too.

Daniel Jan 17, 2026 8:40 pm

C or D for me. Grouping chat logs by users (A) seems more like analysis after the fact, not direct customer help. If the LLM outputs cancellation options (D), that could reduce human escalations for people trying to cancel, which is a common scenario too. Pretty sure B is better for bookings, but if the focus is on cancellations, D might be solid. Anyone else think C could fit if sentiment matters for escalation routing?

DirectDev2652 Jan 17, 2026 9:43 pm

Parker T. Jan 31, 2026 2:38 pm

Jordan R. Jan 27, 2026 12:22 pm

Grouping chat logs by users makes sense to me here. A

Luke T. Jan 30, 2026 11:08 am

B lines up with what’s in the official study guide and matches what I’ve seen on similar practice tests.

PreciseOps1268 Jan 21, 2026 9:20 am

B tbh, had something like this in a mock. Using chat logs as input and giving booking detail buttons keeps the LLM handling bookings directly, which should cut down on escalations. Pretty sure that's what they're looking for.

SharpReviewer6582 Jan 25, 2026 10:59 am

I see what you mean, but B still makes more sense to me for minimizing escalations. If the LLM takes chat logs and offers booking option buttons, it can guide customers step by step without needing a human. A and D don't really let the AI take action directly in the booking process. I think it's B, but could be missing something subtle.

Taylor E. Jan 16, 2026 6:22 am

Probably B, the button interface avoids escalation traps that D and A might cause. C is about reviews, not bookings.

Be respectful. No spam.

Correct Answer:

Explanation

The objective is to build an interactive LLM solution to automate restaurant bookings from customer inquiries. The most direct and effective design is a conversational agent. This agent would process the user's natural language input from an online chat (the input) and respond with structured, interactive elements like buttons for booking details (e.g., party size, date, time). This input/output pair directly facilitates the booking task, guides the user efficiently, and minimizes the need for human intervention by automating the information-gathering process required for a reservation.

Why Incorrect

A. Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions

This describes a batch data analysis or summarization task, not a real-time, interactive solution for handling a current booking request.

C. Input: Customer reviews; Output: Classify review sentiment

This is a sentiment analysis task on a different data source (reviews). It is irrelevant to the primary goal of automating booking inquiries.

D. Input: Online chat logs; Output: Cancellation options

This is too specific and only addresses a single, narrow task (cancellations), not the broader goal of handling common booking inquiries.

References

1. Databricks Official Documentation, "LLM-powered chatbot with RAG": This tutorial demonstrates building an interactive chatbot. The core architecture involves taking a user's query from a chat interface as input and using an LLM to generate a relevant response as output, which is the fundamental pattern described in the correct answer. The example shows a predict function that takes a messages array (the chat log) as input.

Source: Databricks Generative AI Documentation, "Example: LLM-powered chatbot with RAG", Section: "Create a serving endpoint for the chain".

2. Databricks Official Documentation, "AI Functions": The aiquery function documentation illustrates the input/output paradigm for task-oriented LLM calls. It shows how a user provides a prompt (the input, analogous to a user's chat message) to an LLM endpoint to receive a specific, task-oriented response (the output). This supports the concept of using an LLM to process an inquiry and generate a structured, actionable response.

Source: Databricks SQL Language Reference, "aiquery function".

3. Stanford University Courseware, CS224N: NLP with Deep Learning: Lectures on task-oriented dialogue systems explain that these systems are designed to help users achieve a specific goal (e.g., booking a restaurant). The system processes user utterances (input) to understand intent and then generates system responses (output) that can include questions or options to gather necessary information (slots) like time, date, and party size to complete the task.

Source: Stanford University, CS224N Course Materials, "Lecture 15: Conversational AI".

Q: 5

A Generative Al Engineer has already trained an LLM on Databricks and it is now ready to be deployed. Which of the following steps correctly outlines the easiest process for deploying a model on Databricks?

Options

Discussion

QuickOps8687 Jan 25, 2026 3:59 am

B I've seen similar steps outlined in official Databricks guides and some practice tests. Logging with MLflow and registering to Unity Catalog is the streamlined method they push for production serving. Someone correct me if recent exam updates changed this.

Luna P. Jan 29, 2026 8:36 pm

Had something like this in a mock. B is the way-MLflow logging, straight to Unity Catalog, then start the endpoint. It's the native flow they want for Databricks deployments. Anybody disagree?

Aisha X. Jan 14, 2026 10:19 pm

Nah, I don't think A is the right approach here. B is what Databricks workflow expects, since logging with MLflow and then registering to Unity Catalog makes serving way more streamlined. Uploading a pickle (A) skips key integration steps and can cause headaches later. See a lot of people tripped up by that trap option.

Cameron M. Jan 13, 2026 11:30 pm

B but only if the model was logged with MLflow in the first place. If you didn't use MLflow tracking during training, A could technically work, though less integrated. Seen some confusion on this in practice exams.

Priya Jan 19, 2026 1:56 pm

A seems like the shortest path since uploading a pickle to Unity Catalog and then registering feels direct, no extra MLflow step. Pretty sure that's what Databricks docs say too, but open if I'm missing something.

Alex Jan 26, 2026 1:20 am

Likely A on this one since uploading to Unity Catalog then registering feels simple, not sure why everyone skips it.

Ava Jan 21, 2026 5:21 am

I always see folks pick B, but I thought A could work too since you're uploading to Unity Catalog and starting the endpoint. Maybe I'm missing something about how MLflow logs vs. pickle objects?

Jordan Jan 24, 2026 6:20 am

Why not C? Docker is more work than the managed MLflow/Unity Catalog flow Databricks pushes, so B fits best here.

Priya B. Jan 28, 2026 8:38 pm

Its B, Flask (D) looks easy but isn't native for Databricks serving, which wants MLflow and Unity Catalog.

Aaron N. Jan 14, 2026 11:39 pm

Probably B here. You log with MLflow, register to Unity Catalog, then use the serving endpoint. A and D add extra manual steps and C is more for custom containers, not the Databricks built-ins. Pretty sure B is what exam wants.

Be respectful. No spam.

Correct Answer:

Explanation

The most streamlined and integrated method for deploying a model on Databricks leverages its native MLOps capabilities. The process begins by logging the trained model and its artifacts using MLflow during the training run. This captures the model, its dependencies, and metadata. The logged model is then registered directly into the Unity Catalog, which provides centralized governance, lineage, and access control. Finally, a Databricks Model Serving endpoint is created from the registered model version. This entire workflow is managed within the Databricks ecosystem, minimizing manual steps and abstracting away the underlying infrastructure, making it the easiest and most efficient process.

Why Incorrect

A: This process includes redundant manual steps. MLflow's logmodel function automatically handles packaging (pickling) and logging, making separate pickling and uploading to a volume unnecessary.

C: This describes a generic containerization workflow. It bypasses the integrated, simplified, and managed MLOps tools like MLflow and Databricks Model Serving, which is not the easiest path on Databricks.

D: This involves manually building a web service. Databricks Model Serving automates this entire process, providing a fully managed, scalable, and secure REST API endpoint without requiring custom Flask/Gunicorn code.

References

1. Databricks Official Documentation, "Manage model lifecycle in Unity Catalog": This document outlines the standard MLOps workflow on Databricks. The typical workflow section explicitly states the sequence: "Log the model to MLflow... Register the model with Unity Catalog... Deploy the model for batch or real-time inference." This directly supports the sequence described in option B.

Source: Databricks Machine Learning Guide > MLOps > Model lifecycle > Manage model lifecycle in Unity Catalog.

2. Databricks Official Documentation, "Log, register, and deploy a model with MLflow": This tutorial provides a code-based example of the end-to-end process. It demonstrates using mlflow..logmodel() to log the model, followed by mlflow.registermodel() to place it in the registry (Unity Catalog), and then points to creating a serving endpoint as the deployment step.

Source: Databricks Machine Learning Guide > MLflow > Log, register, and deploy a model with MLflow.

3. Databricks Official Documentation, "Model serving with Databricks": This document explains that the prerequisite for creating a serving endpoint is having a model registered in the MLflow Model Registry or Unity Catalog. It states, "After you log a model... you can automatically create a model serving endpoint for the model," confirming that serving is a direct step following model registration.

Source: Databricks Machine Learning Guide > Model Serving > Model serving with Databricks.

Q: 6

A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs. Which action would be most effective in mitigating the problem of offensive text outputs?

Options

Discussion

Reese C. Jan 20, 2026 7:22 pm

Official guide and Databricks practice exams both target D for this kind of RAG quality question. D

Nora Jan 18, 2026 6:13 pm

Option B

Jason Jan 30, 2026 1:42 pm

D , but if "manual review" isn't realistic at scale some orgs might automate, which could change things. Anyone see otherwise?

Ben Q. Jan 14, 2026 10:25 am

Logan Z. Jan 26, 2026 3:16 pm

I don't think B is right here. D deals directly with the root cause since offensive material will keep showing up in RAG outputs unless you curate and manually review upstream data. Notifying users (B) just sidesteps the real issue, kind of a decoy option. Saw similar on a practice test.

QuickReviewer3515 Jan 24, 2026 8:59 am

My vote is D is best since curating the data actually prevents bad stuff from making it into the RAG outputs. B feels reactive, not proactive. Pretty sure that's what Databricks wants here but open to other takes.

Jamie O. Jan 31, 2026 10:01 pm

D imo, because just warning users (B) doesn't stop the offensive outputs, but D actually fixes the root issue in the upstream data. Lots of practice questions try to trick you into picking user notification when direct data curation is safer.

Alex M. Jan 27, 2026 8:22 am

I think B might be right here because setting clear user expectations about RAG behavior sounds like a decent mitigation step. Letting users know what to expect could help with perception of outputs, especially if some risk of offensive content remains. Not totally sure though, since D does involve more direct control. Agree?

Be respectful. No spam.

Correct Answer:

Explanation

The most effective method to mitigate inflammatory outputs in a Retrieval-Augmented Generation (RAG) system is to address the root cause: the content within the knowledge base. The RAG process retrieves information from this upstream data to provide context for the language model's response. If the source documents contain offensive or inflammatory material, the system is likely to retrieve and incorporate it into its output. Proactively curating the data, including manual review and cleaning, ensures that the foundational information used by the RAG system is free from undesirable content. This "Garbage In, Garbage Out" principle is fundamental to building safe and reliable AI systems.

Why Incorrect

A. Increasing update frequency addresses data staleness, not content quality. It could potentially introduce more unvetted, inflammatory data.

B. Informing the user is a disclaimer, not a technical solution. It manages expectations but does not fix the underlying problem.

C. Restricting access to the source data is an access control measure that does not alter the content the RAG system uses.

References

1. Databricks Blog, "A Guide to Building High-Quality RAG Applications" (2024). In the section "Step 1: Data Preparation," the guide states, "The first step to building a high-quality RAG application is preparing your unstructured data... This includes cleaning the data to remove irrelevant information (e.g., HTML tags, boilerplate text), checking for and handling PII, and ensuring the data is accurate and up-to-date." This emphasizes that data quality control is the foundational step.

2. Gao, Y., et al. (2024). "Retrieval-Augmented Generation: A Survey". arXiv:2312.10997. In Section 5.3, "Trustworthiness," the paper discusses how the quality of the retrieval corpus directly impacts the final generation's safety. It notes that issues like toxicity and bias are often inherited from the retrieved documents, highlighting the importance of a clean, high-quality knowledge source.

3. Databricks Documentation, "What is Retrieval-Augmented Generation (RAG)?". The documentation explains that RAG "grounds the LLM on a specific set of external knowledge sources." This core principle implies that the content and quality of those external sources directly dictate the boundaries and nature of the model's potential responses. Curating these sources is therefore the most direct way to control output quality.

Q: 7

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible. Which combination of chaining components and configuration meets these requirements?

Options

Discussion

SkepticalOps2572 Feb 1, 2026 9:25 pm

Nah, I'm sticking with A. D looks nice because of agents, but setting one up adds work that isn't needed for basic RAG. B is a common trap-fine-tuning is pricey and overkill just to keep the answers current. If someone has seen otherwise in recent exams, let me know.

TaylorT Jan 20, 2026 11:15 am

Not B, it's A. Frequent fine-tuning (option B) is way more effort and cost than just updating retriever indexes. A lines up with RAG, fits new docs well, and is less complex than setting up agents or fine-tuning. D sounds tempting but agent setup is overkill for this use case. Pretty sure it's A here, unless I'm missing something subtle.

Jordan J. Jan 24, 2026 9:21 pm

I don’t think it’s D, A fits better. Fine-tuning and agent configs add both cost and dev time, while using a retriever like in A covers new docs cheaply. Not totally certain but A looks right here.

Jason N. Jan 22, 2026 6:24 pm

A tbh

DirectNeteng7582 Jan 15, 2026 5:48 pm

I don't think it's D, agents add extra config and maintenance which isn't the lowest effort. A avoids the trap of B (fine-tuning is costly) so pretty sure A is right here.

Luna I. Jan 17, 2026 2:09 am

Official Databricks docs and a bit of hands-on with prompt chaining cover setups like A pretty well.

Robin Jan 17, 2026 4:04 am

A for this one

Avery L. Jan 17, 2026 12:22 am

C or D for me. Both mention agents or prompt engineering, which feels like less setup than a whole retriever pipeline. I think C is especially simple, just doing the prompt work plus LLM, so maybe lowest cost? Could be missing something though.

Meera Jan 18, 2026 11:58 pm

Makes sense, A fits the RAG pattern and handles new docs easiest. Pretty sure that's right.

Sam Jan 20, 2026 11:16 am

Not B, A. Fine-tuning every time would be expensive, RAG covers updates without much extra work.

Be respectful. No spam.

Correct Answer:

Explanation

This scenario describes the ideal use case for Retrieval-Augmented Generation (RAG). The RAG architecture, as outlined in option A, combines a retriever with a Large Language Model (LLM). The retriever searches an external knowledge base (e.g., a vector index of the new documents) for relevant context based on the user's query. This context is then inserted into the prompt and passed to the LLM. This approach directly addresses the need to incorporate frequently updated documents without the high cost and development effort of repeatedly fine-tuning the entire model. Updating the vector index is significantly cheaper and faster than retraining an LLM.

Why Incorrect

B. Frequently fine-tuning an LLM is extremely expensive in terms of computation and time, directly contradicting the requirement for the lowest possible cost.

C. Using only prompt engineering and an LLM provides no mechanism to access the new documents; the LLM's knowledge is limited to its original training data.

D. This option is overly complex and costly. It includes fine-tuning (expensive) and an agent (more complex than a simple retriever), violating the "least cost and least development effort" constraints.

References

1. Databricks Documentation, "Introduction to Retrieval-Augmented Generation (RAG)": "RAG is a design pattern that can improve the quality of LLM output by providing the model with relevant, specific information from an external knowledge source. This helps the model to produce more accurate, up-to-date, and relevant responses... RAG is a more cost-effective approach than fine-tuning a model for every domain-specific task." This directly supports option A as the cost-effective solution for incorporating up-to-date information.

Source: Databricks AI and Machine Learning Documentation > Generative AI > RAG.

2. Databricks Blog, "What is Retrieval-Augmented Generation (RAG)?": "Fine-tuning is a great option for adapting an LLM to a particular style or to learn a new skill, but it is not as effective for knowledge injection... RAG allows LLMs to access and utilize up-to-date information without the need for constant retraining, making it a more scalable and cost-effective approach." This explains why RAG (Option A) is superior to fine-tuning (Option B) for this specific use case.

Source: Databricks Blog, AI & Machine Learning section, Published August 24, 2023.

3. Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems 33.: The foundational academic paper on RAG introduces it as a method where the parametric memory of a pre-trained model is augmented with a non-parametric memory (the retriever). The abstract states, "Our method can be fine-tuned end-to-end, and allows the retriever and generator to be trained jointly," but its primary advantage for this scenario is the ability to update the knowledge base without retraining the entire model.

Source: Section 1 (Introduction), which contrasts the RAG approach with models that store all knowledge in their parameters, highlighting RAG's ability to easily update its knowledge source. (Available on arXiv:2005.11401 and at NeurIPS proceedings).

Q: 8

A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck team. The system can answer text based questions about the monster truck team, lookup event dates via an API call, or query tables on the team’s latest standings. How could the Generative AI Engineer best design these capabilities into their system?

Options

Discussion

Grace Jan 19, 2026 3:43 am

B. but only if the agent can actually use all tools in context. Some frameworks lock agents to text or API-if so, D might be closer. Practice exams vary on this wording.

Liam Feb 1, 2026 3:03 am

Pretty sure it's B. With agent-based systems, you can define tools for APIs and tables, then describe them in the prompt so the LLM chooses as needed. Way cleaner than stuffing all info in a system prompt. Unless I'm missing a case here?

Nathan I. Jan 23, 2026 3:21 am

Its B

PreciseOps5495 Jan 23, 2026 8:15 pm

Daniel Z. Jan 31, 2026 5:20 pm

B makes more sense here. Agents with tool descriptions let the LLM pick the right capability per query, instead of hardcoding every possible path like in C. Some might pick C for simplicity, but B is how these workflows scale. Disagree?

Luna J. Jan 25, 2026 5:33 am

B imo

Avery R. Jan 29, 2026 9:31 pm

C/D? I've seen C suggested before in practice tests, since you could just parse the LLM response and handle with if-else logic. Not the most scalable, but maybe gets the job done fast. Official guide might push for agent tools, but C seems workable for small projects.

Arjun H. Jan 23, 2026 1:58 am

Why not D here? Agent could just use a big prompt plus RAG and direct info.

Sanjay J. Jan 19, 2026 5:39 am

Nah, I don’t think it’s C. B fits better since agents with tool access scale way beyond manual parsing.

Noah Jan 26, 2026 12:14 pm

Pretty sure I encountered exactly similar question in my exam, it's B.

Be respectful. No spam.

Correct Answer:

Explanation

The most effective and standard design for a system with multiple, distinct capabilities (knowledge retrieval, API calls, database queries) is an agent-based architecture. This approach involves defining each capability as a "tool" and providing the LLM with a system prompt that describes these tools. The agent then uses its reasoning capabilities to understand the user's query, select the appropriate tool or sequence of tools, and execute them to generate a comprehensive answer. This design is flexible, scalable, and leverages the core strengths of modern LLMs in planning and tool use, which is superior to rigid, hard-coded logic.

Why Incorrect

A. This is an incomplete solution. A RAG architecture only addresses the text-based question capability and ignores the required API calls and table queries.

C. This method is brittle and not robust. Relying on simple text parsing of LLM output is less reliable than structured methods like function calling used by agents.

D. This design is not scalable. Placing large, dynamic datasets like event dates and standings into the system prompt is inefficient and will quickly exceed context window limits.

References

1. Official Vendor Documentation (Databricks): The Databricks documentation on Function Calling with Foundation Model APIs explains this pattern. The model is given a user query and a list of available tools (functions), and it intelligently chooses which function to call. This is the core mechanism of the agent described in option B.

Source: Databricks Documentation, "Function calling on Databricks," Section: "How does function calling work?".

2. Official Vendor Documentation (Databricks): A Databricks blog post on building agents details this architecture. It states, "An agent has access to a suite of tools... Based on the user input, the agent has to decide which, if any, of these tools to call." This directly supports the design in option B.

Source: Databricks Blog, "Building LLM-powered bots and agents with Databricks," May 22, 2024, Section: "What are LLM-powered agents?".

3. Academic Publication: The paper "ReAct: Synergizing Reasoning and Acting in Language Models" is a foundational work for LLM agents. It demonstrates how LLMs can generate both reasoning traces and task-specific actions (like using a tool) to solve problems, which is the principle behind the agent system in option B.

Source: Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv preprint arXiv:2210.03629. Section 2: "ReAct: Reasoning and Acting". DOI: https://doi.org/10.48550/arXiv.2210.03629

Q: 9

A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a

huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation?

Options

Discussion

Vikram V. Feb 2, 2026 7:13 pm

DBRX INSTRUCT

This one nails all the requirements since you can deploy it entirely on your own Databricks setup, so no confidential info leaves your infrastructure. It's also at the top tier for open-weight LLMs quality-wise, which matters here since latency and throughput aren't priority. I think it's the most compliant choice for high-sensitivity cases like this, unless someone knows a better private LLM option?

Sam R. Feb 1, 2026 1:30 pm

If we leave out latency and cost, and the model must stay fully in-house for compliance, does Llama2-70B really match up to DBRX Instruct for answer quality in a Databricks-centric environment?

QuietAuditor6786 Jan 13, 2026 8:09 pm

Yeah, DBRX INSTRUCT fits perfectly here. Only it gives you strong answer quality without any external API calls or third-party risk, so checks all the boxes for regulatory needs. Not 100 percent sure if there's an even bigger on-prem model around, but for Databricks setups this is the one.

Sean V. Jan 16, 2026 9:16 pm

DBRX Instruct, since it's open weights and can be run fully on-premise so no sensitive data leaves your environment. Good balance of quality and compliance here.

Ajay W. Jan 24, 2026 9:59 am

BGE-large

CuriousTester8645 Jan 18, 2026 5:34 am

Is DBRX INSTRUCT actually the best choice if the org cares only about max quality and full on-prem for compliance? Had something like this in a mock and they wanted an open weights model that could be airgapped. Wouldn't Llama2-70B also qualify if set up correctly, or does DBRX currently beat it in answer quality?

Be respectful. No spam.

Q: 10

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport. What are the steps needed to build this RAG application and deploy it?

Options

Discussion

Chris Y. Jan 28, 2026 3:17 pm

A is wrong, B. Evaluate should come after generating the LLM response, not before.

Parker Jan 31, 2026 7:47 am

Pretty sure B. The evaluate step comes after response generation, not before deployment. Seen it asked this way elsewhere.

HelpfulLead8824 Jan 22, 2026 4:38 am

I see why some would pick A since it includes all the right steps and evaluation comes before LLM response. Practice exams sometimes flip these but official guides tend to follow A's logic. I think it makes sense, but maybe I'm missing something?

Jordan Y. Jan 23, 2026 2:34 pm

Looks like B is right since evaluation should only happen after the LLM generates a response, otherwise you don't have anything to test. The order in A mixes that up, and D gets the workflow out of sequence. Pretty sure about this but let me know if I'm missing something.

Ivy Feb 2, 2026 1:47 pm

My vote is it's B here. You want to evaluate your model after it generates a response but before deploying, not earlier or in the wrong order. Option A swaps those steps, and D starts with user queries before even loading data, which doesn't fit typical RAG workflows. If anyone's seen different in Databricks docs let me know!

Arjun Jan 25, 2026 7:39 am

Option D

Emma U. Jan 23, 2026 9:21 am

B vs A. Had something like this in a mock exam and B matches the recommended RAG flow order.

Karan C. Jan 18, 2026 9:24 pm

D imo. I've seen similar in practice tests and official docs so maybe double check the official guide sequence for RAG.

Be respectful. No spam.

Correct Answer:

Explanation

The correct sequence for building and deploying a Retrieval-Augmented Generation (RAG) application follows a logical progression from data preparation to deployment. First, the knowledge base (documents) is ingested and indexed into a vector database like Databricks Vector Search. This is the offline preparation stage. The online, or inference, stage begins when a user submits a query. The system retrieves relevant documents from the vector store and passes them, along with the original query, to an LLM, which then generates a response. Following generation, the entire application is evaluated for quality and accuracy. Once it meets performance criteria, it is deployed for end-user access, typically using a tool like Databricks Model Serving.

Why Incorrect

A. This option is incorrect because the evaluation step must occur after the LLM generates a response, as the quality of the final output is a key metric.

C. This option is critically incomplete. It omits the essential online RAG steps of handling user queries, retrieving documents, and generating responses with an LLM.

D. This option incorrectly places user queries before the foundational steps of document ingestion and indexing. The knowledge base must be built before it can be queried.

References

1. Databricks Documentation, "What is Retrieval-Augmented Generation (RAG)?": This document outlines the RAG architecture, showing the initial offline step of indexing data into a vector database, followed by the online steps of retrieving documents based on a user query and then sending the augmented prompt to an LLM to generate an answer. This supports the sequence: Ingest -> Index -> Query -> Retrieve -> Generate.

Reference: Databricks AI and Machine Learning Guide > Generative AI > RAG > What is Retrieval-Augmented Generation (RAG)?, Architecture section.

2. Databricks Documentation, "LLMOps: LLM development lifecycle": This guide describes the end-to-end lifecycle for LLM applications. It places evaluation after the development and prompt engineering phase (which includes generation) and before the deployment phase. This confirms the sequence: ...Generate -> Evaluate -> Deploy.

Reference: Databricks AI and Machine Learning Guide > LLMOps > LLM development lifecycle, "Step 4. Evaluate and refine the application" and "Step 5. Deploy the LLM".

3. Databricks Blog, "Building RAG Applications with the Databricks Platform": This official blog post details the RAG workflow, starting with creating a knowledge base (ingest/index), followed by the retrieval and generation chain, and then discussing the importance of evaluation before deployment using Model Serving.

Reference: Databricks Blog, "Building RAG Applications with the Databricks Platform", sections "Creating a knowledge base with Databricks Vector Search" and "Deploying the RAG Chain to a scalable real-time endpoint".

Question 1 of 20 · Page 1 / 2

Premium Access Includes

✓ Quiz Simulator
✓ Exam Mode
✓ Progress Tracking
✓ Question Saving
✓ Flash Cards
✓ Drag & Drops
✓ 3 Months Access
✓ PDF Downloads

Get Premium Access

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE