Skip to main content

Constitutional AI

Constitutional AI (CAI) is a training methodology developed by Anthropic where an AI model is guided by a set of written principles (a "constitution") to self-critique and revise its own outputs during training. Instead of relying solely on human feedback for every example, the model evaluates whether its responses violate the stated principles and generates improved alternatives, scaling the alignment process.

Example

During training, the model generates a response to a harmful request. The constitution includes the principle "Choose the response that is least likely to be used for harmful purposes." The model re-reads its response, identifies that it provides dangerous instructions, and rewrites the response to decline helpfully — all without a human reviewing that specific example.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts