Question 4 - Isaca AAIA Real Exam Questions [March 2026 Update]

Q: 4

A generative AI system has a validation control in place to reject inappropriate questions by checking them against built-in ethical standards. Which of the following enables malicious actors to circumvent this control through prompt engineering?

Options

Correct Answer:

Explanation

Presenting a malicious or inappropriate query within a theoretical or hypothetical context is a well-documented prompt engineering technique known as "jailbreaking" or a "role-playing attack." This method reframes a forbidden request as a seemingly harmless exercise, such as a fictional story, a thought experiment, or a movie script. By doing so, the attacker tricks the AI model into bypassing its built-in ethical and safety controls, as the model processes the request within the "safe" theoretical frame rather than recognizing its harmful real-world implications. This directly manipulates the model's contextual understanding to circumvent its validation controls.

Why Incorrect

A. Submitting the same questions in a foreign language translated by another AI-based system: This is a form of obfuscation that may work if safety filters are not robustly multilingual, but it is less direct than manipulating the ethical context itself.

C. Asking the same questions later when the algorithm has changed after further learning: This is not a prompt engineering technique. It is a passive approach that relies on potential model drift or updates, not an active manipulation of the input prompt.

D. Randomly placing keywords unrelated to the main topic: This is a simple obfuscation or "noise injection" technique. It aims to confuse input classifiers but is generally less effective against sophisticated models than contextual manipulation.

References

1. Wei

Haghtalab

& Steinhardt

J. (2024). Jailbroken: How Does LLM Safety Training Fail?. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS). Section 3.1

"Attack Method

" describes prefix injection attacks

including role-playing scenarios (e.g.

"You are an actor...") which are a form of presenting theoretical situations.

2. Perez

et al. (2022). Red Teaming Language Models with Language Models. arXiv:2202.03286 [cs.CL]. Section 2.2 discusses how red teaming can involve creating specific contexts

such as writing a story

to elicit harmful outputs that would otherwise be blocked.

3. Qi

et al. (2023). Fine-tuning Aligned Language Models Compromises Safety

Even When Users Do Not Intend To. arXiv:2310.03693 [cs.LG]. Section 2.2

"Jailbreaking Attacks

" explicitly mentions "pretending" scenarios (e.g.

"act as if you are...") as a primary method for bypassing safety alignments.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE