AI Model Collapse and Why It Matters

SUMMARY Something is quietly going wrong with artificial intelligence. Not the dramatic failures that make headlines. Not rogue chatbots or biased algorithms. Something more fundamental is happening beneath the surface of the AI industry, and it carries profound implications for investors, companies and regulators alike.


Researchers call it model collapse. The phenomenon occurs when AI systems train on content generated by other AI systems rather than original human-created material. Over successive generations of training, the models progressively degrade. Outputs become repetitive, homogeneous and eventually nonsensical. If you saw the 1996 comedy Multiplicity starring Michael Keaton, you already understand the concept. Keaton’s character clones himself, then the clone makes a clone, and by the third or fourth generation the copies become increasingly degraded and erratic. The same principle applies to AI training on AI-generated content.

landmark study published in Nature in 2024 by researchers at British and Canadian universities documented how this degradation proceeds through two distinct phases. In early model collapse, the AI begins losing information from the edges of its training distribution. The rare cases, unusual perspectives and minority viewpoints quietly disappear first. In late model collapse, the system loses substantial performance altogether, confusing concepts and losing most variation in its outputs. The researchers demonstrated that even under optimal learning conditions, this collapse is mathematically inevitable when models train recursively on synthetic data.

Why Model Collapse Matters Now More Than Ever

Here is the uncomfortable reality facing the AI industry: the internet is rapidly filling with AI-generated content. Every blog post drafted by ChatGPT, every image created by Midjourney and every line of code written by GitHub Copilot joins the vast pool of data that companies scrape to train the next generation of models. Researchers at Epoch AI have predicted that the world may run out of new human-generated textsuitable for training sometime between 2026 and 2032. We are approaching that window now.

This creates what computer scientists describe as a feedback loop. AI outputs pollute the training data for future AI systems, which then produce lower-quality outputs that pollute the next round of training data. A February 2026 article in Communications of the ACM observed that model collapse is not a theoretical future risk but something happening in production systems today. The author noted degradation patterns appearing in commercial tools, including a background remover that started failing on specific hair textures and image generators producing increasingly homogeneous outputs.

The core problem: Companies training large language models mostly scrape data from the internet. As more AI-generated content proliferates online, future models will inevitably train on synthetic data rather than authentic human expression. This recursive contamination compounds errors across generations of models.

Securities Law Implications for AI Companies

For securities lawyers and investors, model collapse raises pressing questions about disclosure obligations. When an AI company’s core product faces a fundamental technical limitation, at what point does that limitation become material to investors?

The SEC’s Investor Advisory Committee addressed AI disclosure concerns at its December 2025 meeting. The committee noted significant inconsistency in how public companies discuss AI risks. Only about 40% of S&P 500 companies provide AI-related disclosures at all, and just 15% disclose information about board oversight of AI systems. Yet 60% of these same companies identify AI as a material risk factor. This gap between acknowledged risk and actual disclosure creates potential liability exposure.

The committee recommended that the SEC issue guidance requiring companies to define what they mean by artificial intelligence in their specific context, disclose board oversight mechanisms for AI deployment and report separately on how AI affects both internal operations and consumer-facing products when material. These recommendations would integrate AI disclosure into existing Regulation S-K requirements rather than creating standalone reporting obligations.

Model collapse fits squarely within these concerns. An AI company that fails to disclose degradation in model quality over successive training generations could face claims of material omission. If investors reasonably expect continued improvement in AI capabilities but the underlying technology faces inherent limitations from data contamination, that expectation gap creates securities fraud risk under traditional 10b-5 analysis.

The Rise of AI Washing Enforcement

The SEC has demonstrated willingness to pursue companies that misrepresent their AI capabilities. In 2024, the agency charged two investment advisory firms for falsely claiming to use AI in their investment processes when they did not. One firm marketed itself as the “first regulated AI financial advisor” despite lacking genuine AI functionality. Both settled for combined penalties of $400,000.

Enforcement has intensified since then. In April 2025, the SEC and Department of Justice brought parallel civil and criminal actions against the founder of a technology startup for raising over $42 million by claiming his mobile shopping app was “AI-powered.” In reality, overseas contractors manually processed transactions. The executive allegedly fabricated automation rates above 90% when the actual rate was essentially zero.

Acting U.S. Attorney Matthew Podolsky articulated the regulatory concern directly: this type of deception victimizes innocent investors, diverts capital from legitimate startups, makes investors skeptical of real breakthroughs and ultimately impedes genuine AI development. The SEC has restructured its enforcement approach accordingly, creating a dedicated Cybersecurity and Emerging Technologies Unit with AI washing as an immediate priority.

Through the first half of 2025, Stanford’s Securities Class Action Clearinghouse identified 53 securities class actions related to AI. Plaintiffs in these cases have alleged that companies overstated AI efficiencies, rebranded legacy technology as artificial intelligence without meaningful capability improvements, concealed licensing or performance problems and exaggerated the pace and feasibility of AI integration into their products.

Product Liability and the AI Training Data Problem

Model collapse also raises novel product liability questions. In September 2025, Senators Dick Durbin and Josh Hawley introduced the AI LEAD Act, which would for the first time explicitly classify AI systems as products under federal law and create a dedicated cause of action for AI-related product liability.

The proposed legislation identifies several grounds for developer liability including defective design where a reasonable alternative was feasible, inadequate testing before release, failure to provide adequate warnings and failure to maintain reasonably safe products post-sale. Notably, the bill addresses training data directly by including “design, training data selection, testing protocols, and adequacy of warnings” in its compliance framework.

An AI model suffering from collapse due to synthetic data contamination could face defective design claims. If a company knew or should have known that its training data contained substantial AI-generated content, and if that contamination predictably degraded model performance, the traditional risk-benefit analysis of product liability law would apply. Did the manufacturer consider alternative designs? Were adequate warnings provided about performance limitations? Did the company maintain reasonable post-sale monitoring?

Several high-profile product liability cases filed against AI companies in late 2025 demonstrate the expanding theory of claims. Parents have sued Character Technologies alleging that its AI chatbot caused severe mental health harm to their children. While these cases involve different fact patterns than model collapse, they establish that courts will treat AI systems as products subject to strict liability for design defects. Once that threshold is crossed, training data quality becomes a legitimate design consideration subject to judicial scrutiny.

What Responsible Companies Should Do Now

Organizations developing or deploying AI systems should treat model collapse as both a technical and legal governance priority. Several practices can reduce exposure.

First, establish rigorous data provenance tracking. Companies need to know what proportion of their training data originated from AI systems rather than human sources. The challenge is that synthetic content often appears indistinguishable from authentic material. The World Economic Forum has emphasized that robust provenance systems implemented proactively cost substantially less than retroactive tracing after problems emerge.

Second, document design decisions related to training data selection. If litigation arises, companies will need to demonstrate that they considered synthetic data contamination risks and made reasonable choices given available information. This includes retaining records of alternative approaches considered and the rationale for choices made.

Third, validate AI capability claims against technical reality. Before making public statements about AI performance, companies should verify that representations match actual system behavior. Vague claims about products being “AI-powered” without substantiation increasingly attract regulatory attention.

Fourth, disclose known limitations appropriately. If model performance may degrade over time due to training data issues, investors and customers deserve to understand that risk. The question is not whether to disclose but how to frame technical limitations in terms that non-specialists can understand while remaining legally defensible.

The Broader Investment Landscape

For investors evaluating AI companies, model collapse adds another dimension to due diligence. Eye-popping valuations of AI firms rest on assumptions about continued capability improvement. If training data scarcity and synthetic data contamination impose fundamental limits on that improvement trajectory, current valuations may not hold.

This does not mean the AI industry faces imminent crisis. Researchers are actively developing mitigation strategies including methods to detect and filter synthetic data from training sets, techniques for maintaining real human data alongside synthetic content and approaches to watermarking AI-generated material for later identification. Some studies suggest that if sufficient authentic data remains in the training mix, collapse can be avoided or substantially delayed.

But the investment thesis for many AI companies depends on extrapolating recent progress indefinitely forward. Model collapse suggests those extrapolations deserve skepticism. Companies that cannot demonstrate sustainable training data strategies may find their competitive positions more fragile than current share prices reflect.

The combination of securities enforcement targeting AI misrepresentation, emerging product liability frameworks and fundamental technical constraints from data quality creates a complex risk environment. Companies navigating this space need legal counsel who understands both the regulatory landscape and the underlying technology well enough to anticipate where problems may emerge before they become expensive litigation.

Something is quietly going wrong with artificial intelligence. Not the dramatic failures that make headlines. Not rogue chatbots or biased algorithms. Something more fundamental is happening beneath the surface of the AI industry, and it carries profound implications for investors, companies and regulators alike.

Researchers call it model collapse. The phenomenon occurs when AI systems train on content generated by other AI systems rather than original human-created material. Over successive generations of training, the models progressively degrade. Outputs become repetitive, homogeneous and eventually nonsensical. If you saw the 1996 comedy Multiplicity starring Michael Keaton, you already understand the concept. Keaton’s character clones himself, then the clone makes a clone, and by the third or fourth generation the copies become increasingly degraded and erratic. The same principle applies to AI training on AI-generated content.

landmark study published in Nature in 2024 by researchers at British and Canadian universities documented how this degradation proceeds through two distinct phases. In early model collapse, the AI begins losing information from the edges of its training distribution. The rare cases, unusual perspectives and minority viewpoints quietly disappear first. In late model collapse, the system loses substantial performance altogether, confusing concepts and losing most variation in its outputs. The researchers demonstrated that even under optimal learning conditions, this collapse is mathematically inevitable when models train recursively on synthetic data.

Why Model Collapse Matters Now More Than Ever

Here is the uncomfortable reality facing the AI industry: the internet is rapidly filling with AI-generated content. Every blog post drafted by ChatGPT, every image created by Midjourney and every line of code written by GitHub Copilot joins the vast pool of data that companies scrape to train the next generation of models. Researchers at Epoch AI have predicted that the world may run out of new human-generated text suitable for training sometime between 2026 and 2032. We are approaching that window now.

This creates what computer scientists describe as a feedback loop. AI outputs pollute the training data for future AI systems, which then produce lower-quality outputs that pollute the next round of training data. A February 2026 article in Communications of the ACM observed that model collapse is not a theoretical future risk but something happening in production systems today. The author noted degradation patterns appearing in commercial tools, including a background remover that started failing on specific hair textures and image generators producing increasingly homogeneous outputs.

The core problem: Companies training large language models mostly scrape data from the internet. As more AI-generated content proliferates online, future models will inevitably train on synthetic data rather than authentic human expression. This recursive contamination compounds errors across generations of models.

Securities Law Implications for AI Companies

For securities lawyers and investors, model collapse raises pressing questions about disclosure obligations. When an AI company’s core product faces a fundamental technical limitation, at what point does that limitation become material to investors?

The SEC’s Investor Advisory Committee addressed AI disclosure concerns at its December 2025 meeting. The committee noted significant inconsistency in how public companies discuss AI risks. Only about 40% of S&P 500 companies provide AI-related disclosures at all, and just 15% disclose information about board oversight of AI systems. Yet 60% of these same companies identify AI as a material risk factor. This gap between acknowledged risk and actual disclosure creates potential liability exposure.

The committee recommended that the SEC issue guidance requiring companies to define what they mean by artificial intelligence in their specific context, disclose board oversight mechanisms for AI deployment and report separately on how AI affects both internal operations and consumer-facing products when material. These recommendations would integrate AI disclosure into existing Regulation S-K requirements rather than creating standalone reporting obligations.

Model collapse fits squarely within these concerns. An AI company that fails to disclose degradation in model quality over successive training generations could face claims of material omission. If investors reasonably expect continued improvement in AI capabilities but the underlying technology faces inherent limitations from data contamination, that expectation gap creates securities fraud risk under traditional 10b-5 analysis.

The Rise of AI Washing Enforcement

The SEC has demonstrated willingness to pursue companies that misrepresent their AI capabilities. In 2024, the agency charged two investment advisory firms for falsely claiming to use AI in their investment processes when they did not. One firm marketed itself as the “first regulated AI financial advisor” despite lacking genuine AI functionality. Both settled for combined penalties of $400,000.

Enforcement has intensified since then. In April 2025, the SEC and Department of Justice brought parallel civil and criminal actions against the founder of a technology startup for raising over $42 million by claiming his mobile shopping app was “AI-powered.” In reality, overseas contractors manually processed transactions. The executive allegedly fabricated automation rates above 90% when the actual rate was essentially zero.

Acting U.S. Attorney Matthew Podolsky articulated the regulatory concern directly: this type of deception victimizes innocent investors, diverts capital from legitimate startups, makes investors skeptical of real breakthroughs and ultimately impedes genuine AI development. The SEC has restructured its enforcement approach accordingly, creating a dedicated Cybersecurity and Emerging Technologies Unit with AI washing as an immediate priority.

Through the first half of 2025, Stanford’s Securities Class Action Clearinghouse identified 53 securities class actions related to AI. Plaintiffs in these cases have alleged that companies overstated AI efficiencies, rebranded legacy technology as artificial intelligence without meaningful capability improvements, concealed licensing or performance problems and exaggerated the pace and feasibility of AI integration into their products.

Product Liability and the AI Training Data Problem

Model collapse also raises novel product liability questions. In September 2025, Senators Dick Durbin and Josh Hawley introduced the AI LEAD Act, which would for the first time explicitly classify AI systems as products under federal law and create a dedicated cause of action for AI-related product liability.

The proposed legislation identifies several grounds for developer liability including defective design where a reasonable alternative was feasible, inadequate testing before release, failure to provide adequate warnings and failure to maintain reasonably safe products post-sale. Notably, the bill addresses training data directly by including “design, training data selection, testing protocols, and adequacy of warnings” in its compliance framework.

An AI model suffering from collapse due to synthetic data contamination could face defective design claims. If a company knew or should have known that its training data contained substantial AI-generated content, and if that contamination predictably degraded model performance, the traditional risk-benefit analysis of product liability law would apply. Did the manufacturer consider alternative designs? Were adequate warnings provided about performance limitations? Did the company maintain reasonable post-sale monitoring?

Several high-profile product liability cases filed against AI companies in late 2025 demonstrate the expanding theory of claims. Parents have sued Character Technologies alleging that its AI chatbot caused severe mental health harm to their children. While these cases involve different fact patterns than model collapse, they establish that courts will treat AI systems as products subject to strict liability for design defects. Once that threshold is crossed, training data quality becomes a legitimate design consideration subject to judicial scrutiny.

What Responsible Companies Should Do Now

Organizations developing or deploying AI systems should treat model collapse as both a technical and legal governance priority. Several practices can reduce exposure.

First, establish rigorous data provenance tracking. Companies need to know what proportion of their training data originated from AI systems rather than human sources. The challenge is that synthetic content often appears indistinguishable from authentic material. The World Economic Forum has emphasized that robust provenance systems implemented proactively cost substantially less than retroactive tracing after problems emerge.

Second, document design decisions related to training data selection. If litigation arises, companies will need to demonstrate that they considered synthetic data contamination risks and made reasonable choices given available information. This includes retaining records of alternative approaches considered and the rationale for choices made.

Third, validate AI capability claims against technical reality. Before making public statements about AI performance, companies should verify that representations match actual system behavior. Vague claims about products being “AI-powered” without substantiation increasingly attract regulatory attention.

Fourth, disclose known limitations appropriately. If model performance may degrade over time due to training data issues, investors and customers deserve to understand that risk. The question is not whether to disclose but how to frame technical limitations in terms that non-specialists can understand while remaining legally defensible.

The Broader Investment Landscape

For investors evaluating AI companies, model collapse adds another dimension to due diligence. Eye-popping valuations of AI firms rest on assumptions about continued capability improvement. If training data scarcity and synthetic data contamination impose fundamental limits on that improvement trajectory, current valuations may not hold.

This does not mean the AI industry faces imminent crisis. Researchers are actively developing mitigation strategies including methods to detect and filter synthetic data from training sets, techniques for maintaining real human data alongside synthetic content and approaches to watermarking AI-generated material for later identification. Some studies suggest that if sufficient authentic data remains in the training mix, collapse can be avoided or substantially delayed.

But the investment thesis for many AI companies depends on extrapolating recent progress indefinitely forward. Model collapse suggests those extrapolations deserve skepticism. Companies that cannot demonstrate sustainable training data strategies may find their competitive positions more fragile than current share prices reflect.

The combination of securities enforcement targeting AI misrepresentation, emerging product liability frameworks and fundamental technical constraints from data quality creates a complex risk environment. Companies navigating this space need legal counsel who understands both the regulatory landscape and the underlying technology well enough to anticipate where problems may emerge before they become expensive litigation.


You may also enjoy:

and if you like what you read, please subscribe below or in the right-hand column.