OpenAI Believes DeepSeek ‘Distilled’ Its Data For Training—Here's What To Know About The Technique

Forbes - Jan 29th, 2025
Open on Forbes

OpenAI is investigating whether its AI models' outputs were used by Chinese startup DeepSeek to train a new open-source model that has garnered significant attention and caused a stir in U.S. financial markets. The technique in question is known as 'distillation,' where outputs from a more advanced AI model (teacher) are used to train a less resource-intensive model (student). OpenAI's terms explicitly prohibit using its outputs to develop competing models, raising concerns about potential violations and intellectual property breaches.

This situation underscores a growing challenge in the AI industry: how to protect proprietary technology from being leveraged by third parties without consent. OpenAI is reportedly taking measures to safeguard its intellectual property and is working with the U.S. government to address these concerns. The incident highlights the complex intersection of AI development, intellectual property rights, and international competition, emphasizing the need for robust regulatory frameworks and collaborative efforts to prevent unauthorized use of advanced AI technologies.

Story submitted by Fairstory

RATING

6.4
Moderately Fair
Read with skepticism

The article provides a timely and relevant discussion on the use of AI distillation and the potential misuse of AI outputs, which are significant issues in the tech industry. It effectively explains complex concepts in an accessible manner, contributing to its clarity and readability. However, the story would benefit from a more balanced presentation of perspectives, including direct input from the parties involved, to enhance its accuracy and source quality. While it raises important public interest topics and has the potential to influence ongoing debates, the lack of detailed evidence and transparency in sourcing limits its overall impact and engagement potential.

RATING DETAILS

7
Accuracy

The article presents several factual claims that are generally consistent with known practices in the AI industry, such as the use of distillation techniques. However, the claim that OpenAI believes DeepSeek used its AI outputs for training needs further verification, as it is based on a report from the Financial Times, and lacks direct confirmation from the involved parties. The description of distillation as a technique aligns with established knowledge, but the impact on U.S. financial markets and the specifics of OpenAI's countermeasures require additional evidence to ensure accuracy.

6
Balance

The article primarily presents OpenAI's perspective and includes a critical viewpoint from Mike Masnick of TechDirt. However, it lacks direct input from DeepSeek or other independent experts who could provide a more balanced view of the allegations and the implications of AI distillation. The absence of diverse perspectives, especially from the accused party, limits the story's balance.

8
Clarity

The article is generally clear and well-structured, explaining complex concepts like AI distillation in accessible language. The logical flow from the introduction of the allegations to the discussion of potential countermeasures is coherent, and the tone remains neutral. However, the inclusion of unrelated promotional content about text alerts slightly disrupts the narrative.

6
Source quality

The article cites the Financial Times and includes comments from Ben Thompson and Mike Masnick, both of whom are credible sources in the tech industry. However, it does not provide direct quotes or statements from OpenAI or DeepSeek, which would enhance the reliability and authority of the information presented. The reliance on secondary sources without direct attribution weakens the overall source quality.

5
Transparency

The article lacks transparency regarding the methodology used to gather information, such as the specific sources of the claims and the context in which statements were made. While it references the Financial Times and includes some expert opinions, it does not clearly explain the basis for the key allegations or the potential conflicts of interest that might affect the reporting.

Sources

  1. https://opentools.ai/news/openai-accuses-deepseek-of-cheating-with-ai-distillation-a-new-frontier-in-tech-rivalry
  2. https://itc.ua/en/news/evidence-is-openai-says-china-s-deepseek-stole-chatgpt-data-for-training/