Our Approach to User Safety (original) (raw)

  1. All Collections
  2. Safeguards
  3. Our Approach to User Safety

User safety is core to Anthropic’s mission of creating reliable, interpretable, and steerable AI systems. As we launch new ways for people to interact with Claude, we also expect to see new types of potential harm materialize, whether through the generation of misinformation, objectionable content, hate speech or other misuses. We are actively investing in and experimenting with additional safety features to supplement our existing model safety efforts and are working to provide helpful tools to a wide audience while also doing our best to mitigate harm. Launching new products in open beta allows us to experiment, iterate and hear your feedback. Here are some of the safety features we’ve introduced:

These features are not failsafe, and we may make mistakes through false positives or false negatives. Your feedback on these measures and how we explain them to users will play a key role in helping us improve these safety systems, and we encourage you to reach out to us at [email protected] with any feedback you may have. To learn more, read about our core views on AI safety.


Related Articles

Reporting, Blocking, and Removing Content from ClaudeI’m planning to launch a product using the Claude API. What steps should I take to ensure I’m not violating Anthropic’s Usage Policy?Safeguards warnings and appealsCSAM Detection and ReportingAPI Safeguards Tools