OpenAI launches safety bug bounty program for AI abuse risks
OpenAI launched a public Safety Bug Bounty program to identify AI abuse and safety risks across its products, according to a company statement. The program complements OpenAI's existing Security Bug Bounty by accepting issues that pose safety risks even when they don't qualify as security vulnerabilities.
The program targets three main categories of AI-specific safety scenarios. Agentic risks include third-party prompt injection and data exfiltration where attacker text hijacks victim agents to perform harmful actions or leak sensitive information. The behavior must be reproducible at least 50% of the time.
OpenAI proprietary information vulnerabilities involve model generations that return proprietary reasoning information and other exposures of company confidential data. Account and platform integrity issues cover vulnerabilities in anti-automation controls, account trust signals, and restriction evasions.
The company stated that jailbreaks remain outside the program's scope, though it runs private campaigns for specific harm types including biorisk content issues. Submissions undergo review by OpenAI's Safety and Security Bug Bounty teams, with potential routing between programs based on scope.
Researchers can submit flaws that facilitate direct paths to user harm with actionable remediation steps for case-by-case reward consideration. General content-policy bypasses without demonstrable safety impact are excluded from the program.
The program requires compliance with third-party terms of service for any testing involving Model Context Protocol risks. OpenAI described the initiative as part of ongoing partnerships with safety and security researchers to address risks beyond conventional security vulnerabilities.
