Skip to main content

Data Poisoning

Data poisoning is an adversarial attack that corrupts an AI model's training data to manipulate its behavior in targeted ways. Attackers inject malicious examples into training datasets to create backdoors, degrade performance on specific inputs, or bias the model toward particular outputs. It is one of the most difficult AI security threats to detect because the corrupted data can appear normal during inspection.

Example

An attacker contributes thousands of subtly mislabeled images to a public dataset used for training self-driving car models — labeling stop signs photographed at night as "speed limit" signs. The trained model then misclassifies stop signs under low-light conditions while performing normally otherwise.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts