Prompt Engineering Glossary
Essential terms and concepts every prompt engineer should know. Browse 103 key definitions with examples and practical tips.
- Chain of Thought Prompting
- Chain of thought prompting is a technique that encourages an AI model to break down complex reasoning into sequential, intermediate steps before arriving at a final answer.
- Context Window
- A context window is the maximum amount of text (measured in tokens) that an AI model can process in a single interaction, including both the input prompt and the generated output.
- Few-Shot Prompting
- Few-shot prompting is a technique where you provide the AI model with a small number of examples (typically 2-5) within the prompt to demonstrate the desired format, style, or reasoning pattern.
- Fine-Tuning
- Fine-tuning is the process of further training a pre-trained AI model on a specific dataset to specialize its behavior for particular tasks or domains.
- Grounding
- Grounding is the practice of anchoring AI responses to specific, verifiable sources of information such as documents, databases, or real-time data.
- Hallucination
- A hallucination occurs when an AI model generates information that sounds plausible but is factually incorrect, fabricated, or unsupported by its training data.
- In-Context Learning
- In-context learning is the ability of a large language model to learn and adapt its behavior based on examples or instructions provided directly within the prompt, without any changes to the model's underlying weights.
- Instruction Tuning
- Instruction tuning is a training technique where a pre-trained language model is further trained on a curated dataset of instruction-response pairs to improve its ability to follow natural language instructions.
- Large Language Model (LLM)
- A large language model (LLM) is an AI system trained on massive amounts of text data that can understand, generate, and reason about natural language.
- Multi-Modal AI
- Multi-modal AI refers to artificial intelligence systems that can process and generate content across multiple types of data — such as text, images, audio, and video — within a single model.
- Negative Prompting
- Negative prompting is a technique where you explicitly tell the AI model what to avoid, exclude, or not do in its response.
- Persona Prompting
- Persona prompting is a technique where you ask the AI to adopt a specific identity, personality, or character to shape the tone, vocabulary, and perspective of its responses.
- Prompt Chaining
- Prompt chaining is a strategy where you break a complex task into a sequence of simpler prompts, feeding the output of one step as input to the next.
- Prompt Engineering
- Prompt engineering is the practice of designing, refining, and optimizing the text inputs (prompts) given to AI models to elicit the most useful, accurate, and relevant outputs.
- Prompt Injection
- Prompt injection is a security vulnerability where a malicious user crafts input that overrides or manipulates the AI model's original instructions, causing it to ignore its guidelines or perform unintended actions.
- Prompt Template
- A prompt template is a reusable, pre-structured prompt with placeholder variables that can be filled in with specific details for each use.
- Retrieval-Augmented Generation (RAG)
- Retrieval-augmented generation (RAG) is an architecture that enhances AI model responses by first retrieving relevant information from an external knowledge base and then including that information in the prompt for the model to reference.
- Role Prompting
- Role prompting is a technique where you assign the AI model a specific professional role or area of expertise to shape the depth, vocabulary, and perspective of its responses.
- Self-Consistency
- Self-consistency is a prompting strategy where you generate multiple responses to the same question using chain-of-thought reasoning, then select the most common answer among them.
- System Prompt
- A system prompt is a special set of instructions provided to an AI model before the user's message that defines the model's behavior, personality, constraints, and response format for the entire conversation.
- Temperature
- Temperature is a parameter that controls the randomness and creativity of an AI model's output.
- Token
- A token is the basic unit of text that AI models use to process and generate language.
- Top-P (Nucleus Sampling)
- Top-P, also known as nucleus sampling, is a parameter that controls which tokens the model considers when generating each word.
- Tree of Thought Prompting
- Tree of thought prompting is an advanced reasoning technique where the AI model explores multiple branching solution paths simultaneously, evaluates each branch, and backtracks from dead ends before selecting the best path to the answer.
- Zero-Shot Prompting
- Zero-shot prompting is the simplest prompting approach where you give the AI model a task instruction without providing any examples.
- Agentic AI
- Agentic AI refers to AI systems that can autonomously plan, execute, and iterate on multi-step tasks with minimal human intervention.
- Tool Use (Function Calling)
- Tool use, also called function calling, is the ability of an AI model to invoke external tools, APIs, or functions during a conversation to perform actions beyond text generation.
- Model Context Protocol (MCP)
- The Model Context Protocol (MCP) is an open standard developed by Anthropic that provides a universal way to connect AI models to external data sources, tools, and services.
- Reasoning Model
- A reasoning model is an AI system specifically trained to perform extended, step-by-step thinking before producing a final answer.
- AI Guardrails
- AI guardrails are safety mechanisms, rules, and constraints built into AI systems to prevent harmful, biased, or undesired outputs.
- Structured Output
- Structured output refers to AI model responses that follow a specific, machine-readable format such as JSON, XML, CSV, or a defined schema.
- Context Caching
- Context caching is an optimization technique where AI providers store and reuse previously processed prompt prefixes across multiple API calls.
- Embedding
- An embedding is a numerical vector representation of text that captures its semantic meaning in a high-dimensional space.
- Vector Database
- A vector database is a specialized database designed to store, index, and efficiently query high-dimensional embedding vectors.
- Prompt Optimization
- Prompt optimization is the systematic process of iteratively refining prompts to improve the quality, accuracy, and consistency of AI model outputs.
- AI Alignment
- AI alignment is the field of research and practice focused on ensuring that AI systems behave in accordance with human values, intentions, and goals.
- Knowledge Cutoff
- A knowledge cutoff is the date beyond which an AI model has no training data, meaning it cannot answer questions about events, discoveries, or changes that occurred after that point.
- Inference
- Inference is the process of using a trained AI model to generate predictions or outputs from new inputs.
- Beam Search
- Beam search is a decoding strategy that explores multiple candidate output sequences simultaneously during text generation, keeping the top-k most probable sequences (the "beam width") at each step.
- Tokenizer
- A tokenizer is the component that converts raw text into a sequence of tokens (numerical IDs) that an AI model can process, and converts model output tokens back into readable text.
- Attention Mechanism
- An attention mechanism is a neural network component that allows a model to dynamically weigh the importance of different parts of the input when generating each part of the output.
- Transformer
- A transformer is the neural network architecture that powers virtually all modern large language models, including GPT, Claude, Gemini, and LLaMA.
- Prompt Caching
- Prompt caching is a performance optimization where the model's computed internal representations (key-value attention states) of a static prompt prefix are stored and reused across multiple requests.
- Constitutional AI
- Constitutional AI (CAI) is a training methodology developed by Anthropic where an AI model is guided by a set of written principles (a "constitution") to self-critique and revise its own outputs during training.
- Reinforcement Learning from Human Feedback (RLHF)
- Reinforcement learning from human feedback (RLHF) is a training method where human evaluators rank or score multiple AI outputs, and those preferences are used to train a reward model that further fine-tunes the language model.
- Chain of Verification
- Chain of verification (CoVe) is a prompting technique where the AI model first generates an initial response, then creates specific verification questions about its own claims, answers those questions independently, and finally revises the original response based on the verification results.
- Meta-Prompting
- Meta-prompting is the practice of using an AI model to generate, refine, or optimize prompts for other AI tasks.
- AI Agent
- An AI agent is a software system that uses a large language model as its reasoning core to autonomously plan, execute, and adapt multi-step workflows using external tools and data sources.
- Few-Shot Chain of Thought
- Few-shot chain of thought is a prompting technique that combines few-shot examples with explicit step-by-step reasoning demonstrations.
- Prompt Leaking
- Prompt leaking is an attack technique where a user crafts inputs designed to trick an AI model into revealing its hidden system prompt or confidential instructions.
- AI Hallucination Detection
- AI hallucination detection encompasses the methods, tools, and techniques used to identify when an AI model generates false, fabricated, or unsupported information.
- Model Distillation
- Model distillation is a technique for creating a smaller, more efficient "student" model that approximates the behavior of a larger "teacher" model.
- Synthetic Data
- Synthetic data is artificially generated data created by AI models or algorithmic processes rather than collected from real-world events.
- Data Poisoning
- Data poisoning is an adversarial attack that corrupts an AI model's training data to manipulate its behavior in targeted ways.
- Jailbreaking
- Jailbreaking refers to techniques used to bypass an AI model's built-in safety restrictions, content policies, and behavioral guidelines to produce outputs the model was trained to refuse.
- Semantic Search
- Semantic search is an information retrieval approach that finds results based on the meaning of a query rather than exact keyword matches.
- Prompt Versioning
- Prompt versioning is the practice of tracking changes to prompts over time using version control principles — assigning version identifiers, recording modifications, and maintaining a history of prompt iterations.
- Output Parsing
- Output parsing is the process of extracting structured, machine-readable data from an AI model's free-form text responses.
- Latent Space
- Latent space is the high-dimensional internal representation space where AI models encode the meaning, relationships, and features of input data as numerical vectors.
- Zero-Shot Chain of Thought
- Zero-shot chain of thought is a prompting technique where you append a simple phrase like "Let's think step by step" to a question without providing any reasoning examples.
- Prompt Compression
- Prompt compression encompasses techniques for reducing the length of a prompt while preserving its essential meaning and effectiveness.
- AI Safety
- AI safety is the interdisciplinary field focused on ensuring that AI systems behave as intended, remain under human control, and do not cause unintended harm.
- Red Teaming
- Red teaming in AI is the practice of systematically probing an AI system for vulnerabilities, failure modes, and harmful behaviors through adversarial testing.
- Benchmark
- A benchmark in AI is a standardized test suite with predefined tasks, datasets, and evaluation metrics used to measure and compare model performance.
- Perplexity
- Perplexity is a standard metric for evaluating how well a language model predicts a sequence of text.
- Logits
- Logits are the raw, unnormalized numerical scores that a language model assigns to each token in its vocabulary as the potential next token.
- Sampling
- Sampling is the process of selecting the next token from the probability distribution a language model produces at each generation step.
- Stop Sequence
- A stop sequence is a predefined token, string, or pattern that signals the AI model to immediately stop generating text when encountered in the output.
- JSON Mode
- JSON mode is a model configuration setting that constrains the AI's output to be valid, parseable JSON.
- Vision-Language Model (VLM)
- A vision-language model (VLM) is an AI system that can process, understand, and reason about both visual inputs (images, screenshots, diagrams) and text simultaneously within a single model architecture.
- Function Calling
- Function calling is an AI model capability where the model analyzes a user's prompt and generates structured JSON specifying which external function to invoke and what arguments to pass.
- Test-Time Compute
- Test-time compute is the practice of allocating additional computational resources during inference — when the model generates a response — rather than during training.
- AI Overview
- An AI Overview is an AI-generated summary box that appears at the top of Google search results, synthesizing information from multiple web sources to answer a user's query directly.
- Generative Engine Optimization (GEO)
- Generative engine optimization (GEO) is the practice of structuring and enhancing content so that AI-powered platforms — like ChatGPT, Perplexity, and Google AI Overviews — cite, reference, or recommend it when generating responses.
- Answer Engine Optimization (AEO)
- Answer engine optimization (AEO) is a content strategy focused on structuring web content to appear as direct answers in featured snippets, People Also Ask boxes, voice search results, and AI-generated summaries.
- Thinking Model
- A thinking model is an AI system that uses extended inference-time computation to reason through problems before producing a final answer.
- Prompt Routing
- Prompt routing is the practice of automatically directing each user prompt to the most suitable AI model based on task type, complexity, and cost constraints.
- Multimodal Prompting
- Multimodal prompting is the practice of combining multiple input types — such as text, images, audio, or video — within a single prompt to give an AI model richer context for its response.
- Prompt Tuning
- Prompt tuning is a parameter-efficient technique that adapts a large language model to specific tasks by training small learnable vectors called "soft prompts" that are prepended to the input.
- Instruction Following
- Instruction following is an AI model's ability to accurately understand and execute explicit directions given in a prompt — including format requirements, length constraints, tone specifications, and multi-step procedures.
- Code Interpreter
- A code interpreter is an AI capability that allows a model to write and execute code — typically Python — in a sandboxed environment to solve analytical, mathematical, or data processing tasks.
- Deep Research
- Deep research is an AI capability where the model autonomously conducts multi-step web research to produce comprehensive, sourced reports on complex topics.
- Mixture of Experts (MoE)
- Mixture of experts (MoE) is a neural network architecture that divides a model into many specialized sub-networks called "experts" and uses a routing mechanism to activate only a small subset of them for each input.
- Knowledge Graph
- A knowledge graph is a structured database that represents real-world entities (people, places, concepts) and the relationships between them as an interconnected network of nodes and edges.
- Quantization
- Quantization is a technique that reduces an AI model's numerical precision — for example, converting 16-bit floating-point weights to 4-bit integers — to shrink the model's memory footprint and speed up inference.
- LoRA (Low-Rank Adaptation)
- LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adapts a pre-trained AI model to new tasks by injecting small trainable matrices into the model's layers while keeping the original weights frozen.
- Prompt Ensembling
- Prompt ensembling is a technique that runs multiple variations of a prompt for the same task and combines their outputs to produce a more accurate and robust final result.
- Self-Reflection
- Self-reflection is a prompting technique where an AI model evaluates, critiques, and improves its own output in one or more follow-up steps.
- Direct Preference Optimization (DPO)
- Direct preference optimization (DPO) is a training technique that aligns AI models with human preferences by learning directly from pairs of preferred and rejected outputs — without needing a separate reward model.
- KV-Cache
- A KV-cache (key-value cache) stores the computed attention key and value matrices from previously processed tokens so the model does not need to recalculate them when generating each new token.
- AI Watermarking
- AI watermarking is the practice of embedding hidden, machine-detectable patterns into AI-generated content — text, images, audio, or video — so that the content can later be identified as AI-produced.
- Prompt Injection Defense
- Prompt injection defense refers to the techniques and strategies used to protect AI systems from prompt injection attacks, where malicious inputs attempt to override the model's original instructions.
- Context Stuffing
- Context stuffing is the technique of loading relevant information — documents, data, or examples — directly into an AI model's prompt to give it the knowledge needed to answer accurately.
- Model Collapse
- Model collapse is a phenomenon where AI models progressively degrade when trained on data generated by other AI models rather than human-created content.
- Autonomous Agent
- An autonomous agent is an AI system that can independently plan, decide, and execute multi-step tasks to achieve a goal with minimal human oversight.
- Benchmark Contamination
- Benchmark contamination occurs when an AI model's training data accidentally or deliberately includes questions and answers from the benchmark tests used to evaluate it.
- Emergent Behavior
- Emergent behavior in AI refers to capabilities that appear unexpectedly in large language models as they scale up in size, without being explicitly programmed or trained for those tasks.
- Catastrophic Forgetting
- Catastrophic forgetting is a phenomenon where a neural network rapidly loses previously learned knowledge when it is trained on new data or tasks.
- Few-Shot Learning
- Few-shot learning is a machine learning approach where a model learns to perform a new task from only a handful of training examples — sometimes as few as one to five.
- Transfer Learning
- Transfer learning is a machine learning technique where a model trained on one task or dataset is reused as the starting point for a different but related task.
- Semantic Similarity
- Semantic similarity is a measure of how close two pieces of text are in meaning, regardless of whether they share the same words.
- Grok
- Grok is the family of conversational AI models built by xAI, distinguished from other major assistants by its real-time access to posts on X (formerly Twitter) and a less filtered response style.
- xAI
- xAI is the artificial intelligence research company founded by Elon Musk in 2023.