OpenAI’s latest AI models have a new safeguard to prevent biorisks

Tech Crunch - Apr 16th, 2025

OpenAI has implemented a new monitoring system to oversee its latest AI models, o3 and o4-mini, in response to potential risks of these models providing harmful guidance on biological and chemical threats. This system, termed a 'safety-focused reasoning monitor,' is tailored to detect prompts associated with biorisks and instruct the AI to withhold advice. The initiative follows internal assessments indicating that o3, in particular, is more proficient at addressing inquiries related to biological threats than its predecessors. OpenAI's test results demonstrated a 98.7% success rate in blocking risky prompts.

The development underscores OpenAI's proactive measures to mitigate misuse of its AI technology by bad actors, especially given the models' enhanced capabilities compared to earlier versions. While OpenAI affirms that o3 and o4-mini do not surpass its 'high risk' threshold, the company continues to explore how these models could potentially facilitate the creation of biological and chemical threats. Concerns remain, however, as some researchers argue that OpenAI may not be prioritizing safety sufficiently. Despite these efforts, OpenAI has opted not to release a safety report for its newly launched GPT-4.1 model, raising further questions about the transparency of its safety practices.

Open Ai Ai Safety Preparedness Framework O3 And O4 Mini Models Biological Threats Chemical Threats Ai Monitoring System

Story submitted by Fairstory

RATING

7.2

Fair Story

Consider it well-founded

The article provides a clear and timely overview of OpenAI's new safety measures for its AI models, o3 and o4-mini. It accurately presents OpenAI's claims and internal benchmarks, offering insight into the company's efforts to address potential risks. However, the story relies heavily on OpenAI's perspective, with limited input from independent sources or external experts. This affects the balance and depth of the analysis, as well as the potential impact and engagement of the article.

While the article is well-written and easy to understand, it would benefit from greater transparency regarding the testing processes and potential biases. Including a broader range of viewpoints and exploring the broader implications of AI safety measures could enhance the story's relevance and influence. Overall, the article effectively addresses a significant topic but could be strengthened by incorporating more diverse perspectives and independent verification of the claims presented.

Accuracy

The story presents factual claims about OpenAI's new safety measures for its AI models, o3 and o4-mini. The deployment of a safety-focused reasoning monitor is accurately described, as is the increase in capability of these models compared to previous iterations. The story also correctly cites OpenAI's internal benchmarks and red teaming efforts, noting that the models refused to respond to risky prompts 98.7% of the time. However, the story does not independently verify these claims, relying heavily on OpenAI's own reports. While the claims are consistent with OpenAI's official communications, independent verification would strengthen the story's accuracy.

Balance

The article focuses primarily on OpenAI's perspective, highlighting the company's efforts to mitigate risks associated with its AI models. While it mentions concerns from researchers about OpenAI's safety prioritization, these perspectives are not explored in depth. The story could benefit from a more balanced viewpoint by including more detailed comments from external experts or critics, which would provide a fuller picture of the potential risks and benefits of the new AI models.

Clarity

The article is well-structured and uses clear language to explain the technical aspects of OpenAI's new safety measures. The logical flow of information helps readers understand the significance of the new AI models and the associated risks. However, the story could be improved by providing more context on the broader implications of these developments in the field of AI safety.

Source quality

The primary source of information is OpenAI, a credible and authoritative entity in the field of artificial intelligence. However, the reliance on OpenAI's internal reports and benchmarks without input from independent experts or third-party sources limits the breadth of perspectives. Including a wider variety of sources, such as academic experts or industry analysts, would enhance the credibility and reliability of the reporting.

Transparency

The article provides a clear description of the new safety measures and their intended purpose. However, it lacks transparency regarding the methodology used by OpenAI to test the effectiveness of the safety monitor. Additionally, the article does not disclose any potential conflicts of interest that might affect the impartiality of the reporting. Greater transparency about the testing processes and potential biases would improve the article's trustworthiness.