AI’s Fatal Flaw: Why Self-Generated Data is Undermining Machine Learning

Richard Foster-FletcherNovember 24, 2024

Artificial intelligence is increasingly consuming its own outputs, creating significant risks for the quality and reliability of future models. The rapid growth of AI has outpaced the availability of high-quality, human-generated data, forcing AI systems to increasingly rely on AI-generated content. This practice, while seemingly a practical solution to the data shortage, introduces serious problems, including the degradation of model accuracy and the potential for what experts are calling “model collapse.”

The Feedback Loop Problem

The core issue lies in the feedback loop created when AI systems ingest data that was originally produced by other AI systems. This process undermines the integrity of the AI’s learning capabilities. When AI models are trained on outputs that are not rooted in reality, the risk of their outputs becoming increasingly distorted and less reliable grows exponentially. This is not merely a technical glitch but a systemic flaw that could undermine the entire AI ecosystem if not addressed.

The New York Times has highlighted how this self-referential training can cause AI outputs to drift away from reality. This drift occurs because the AI is no longer learning from a diverse and accurate dataset but from a pool of content that may already be biased, incomplete, or outright erroneous. As AI continues to generate more of its own training data, the risk of amplifying these inaccuracies grows, leading to models that are less effective and more prone to producing flawed outputs.

Model Collapse: A Real and Present Danger

This problem is compounded by the sheer scale at which AI is being deployed. AI-generated content is now flooding the internet, filling websites, social media platforms, and even news outlets. Scientific American reports that large language models are generating vast amounts of text, which is increasingly indistinguishable from content created by humans. This saturation of synthetic content not only makes it harder to find reliable human-generated data but also increases the likelihood that future AI models will be trained on inferior data, further exacerbating the problem.

One of the most alarming consequences of this trend is the potential for “model collapse,” a scenario where AI models become so detached from reality that their outputs are no longer usable. This collapse is not just a theoretical risk; it is already being observed in some AI systems that produce outputs filled with biases, inaccuracies, and absurdities. These flawed outputs are then fed back into the training process, creating a vicious cycle that degrades the model’s performance over time.

Synthetic Data: Necessity or Risk?

AI companies are aware of these risks but are often left with few alternatives. As The Atlantic notes, the demand for more advanced AI models is pushing developers to use whatever data is available, including AI-generated content. The difficulty of distinguishing between human-generated and synthetic data means that even the most well-intentioned efforts to maintain data quality are likely to fall short. This situation is driving a reliance on potentially flawed training material, which could have far-reaching implications for businesses and consumers.

Despite these challenges, some experts argue that synthetic data is not inherently bad. There are specific scenarios where AI-generated content can be useful, such as in training smaller models or in situations where the accuracy of the output can be easily verified. However, these instances are the exception rather than the rule. The broader use of AI-generated content in training large models poses significant risks that cannot be overlooked.

Impact on Internet Integrity and Business Decisions

The proliferation of AI-generated content also raises broader concerns about the integrity of the internet as a whole. The dead internet theory, which suggests that much of the internet’s content is now generated by bots and AI rather than humans, is gaining traction. While this theory is still speculative, it reflects a growing unease about the direction in which AI is taking the digital landscape. If AI continues to dominate content creation, the internet could become less a reflection of human knowledge and creativity and more a repository of synthetic, machine-generated data.

This shift has significant implications for businesses that rely on the internet for information, customer engagement, and brand management. As AI-generated content becomes more prevalent, companies may find it increasingly difficult to trust the data they are using to make decisions. The risk of basing business strategies on flawed or biased information grows as the quality of online content declines. This could lead to poor decision-making, reduced competitiveness, and ultimately, a loss of consumer trust.

The Dead Internet Theory: More than a Conspiracy?

The dead internet theory also touches on deeper fears about the role of AI in shaping public discourse and influencing political outcomes. Some experts, like Jake Renzella and Vlada Rozova, have warned that AI-generated content could be used to support autocratic regimes, spread propaganda, and manipulate public opinion. While these concerns may sound alarmist, they are rooted in the growing influence of AI on the flow of information and the potential for AI to be used as a tool for social and political control.

Fortunately, there is evidence that the dead internet theory has not yet fully materialized. Forbes reports that the vast majority of viral content—such as provocative opinions, clever observations, and creative reinterpretations—are still generated by humans. However, the growing presence of AI-generated content on the internet cannot be ignored, and businesses must remain vigilant to the risks posed by AI’s self-consumption problem.

Addressing the Risks: Business Strategies for Safer AI

The issue of AI’s self-cannibalisation is not just a technical challenge but a strategic one that businesses must address head-on. Companies that are deploying AI systems need to be aware of the risks associated with using AI-generated content for training. They must invest in rigorous data governance practices to ensure that their models are trained on accurate, high-quality data. Failure to do so could result in AI systems that are not only ineffective but also potentially damaging to the business.

In conclusion, the practice of using AI-generated content to train new AI models is creating a significant risk of model collapse and degraded performance. As AI continues to grow in importance across industries, businesses must take proactive steps to address these risks. By prioritising data quality and investing in robust training practices, companies can mitigate the dangers posed by AI’s self-cannibalisation and ensure that their AI systems remain reliable, effective, and aligned with reality.

Written by

Richard Foster-Fletcher

Richard stands at the forefront of ethical artificial intelligence as an AI Advisor, Author, Speaker, and LinkedIn Top Voice. He is the visionary behind MKAI.org (Morality and Knowledge in Artificial Intelligence), an initiative dedicated to fostering AI’s responsible development and application. Through his stewardship of the Boundless Podcast, Richard delves into discussions about AI inclusivity and digital ethics, contributing to a more equitable technological future. His profound insights have illuminated lecture halls at globally renowned institutions, including the London School of Economics (LSE), University College London (UCL), Oxford University, and Imperial College London, guiding the next generation of tech leaders.

Follow Me

Richard Foster-Fletcher delivered a truly engaging and thought-provoking keynote at the AI Tomorrow Summit 2024 in Türkiye. His insights into the transformative potential of AI for Türkiye’s economy were both enlightening and impactful. Richard’s emphasis on the pivotal role of generative AI for SMEs, the importance of upskilling, and ethical considerations in AI development resonated deeply with our audience. His ability to articulate complex concepts with clarity and relevance made the session highly valuable. AIPA were thrilled to have Richard speak and look forward to future collaborations.

Ulas Malli, Board Member, AIPA

Richard has the unique ability to bring out the best in a person. He has three rare qualities to achieve that: incredible empathy, a deep cultural background, and an open mind second to none! I’ve been interviewed all my life on national television, radio, newspapers, conferences, and podcasts. Richard tops them all! He pierces right through the surface with his eagle vision into the core of how we perceive life. My mind sees connections everywhere and in everything. Richard sees that in everybody. Do not miss a chance to sit down and talk with Richard! You will probably learn incredible things about yourself, your work, and the world! Talking with Richard will take you to the next level.

Denis Rothman, AI Expert & Ethicist | Author & Speaker

The physicist John Wheeler once remarked, “We shape the world by the questions we ask.” If that’s so, then Richard is shaping the world. I had the pleasure of joining him as a guest on his podcast and was *profoundly* impressed with the depth, generosity and thoughtfulness of his questions. He’s a star!

Andrew Zolli, Chief Impact Officer at Planet

I thoroughly enjoyed my recent conversation with Richard Foster-Fletcher. His thoughtful approach to the important intersections of AI, ethics, and education allowed us to explore critical issues in a way that was both engaging and insightful. Richard brings a unique perspective to these discussions, and I appreciated the depth and clarity with which he led our conversation.

Dan Ariely, Professor of psychology and behavioral economics at Duke University

Richard delivered an insightful and thought-provoking talk on the expansive implications of artificial intelligence on society, the future of work, and ethical considerations. His ability to articulate complex concepts in an accessible manner prompted a dynamic and engaging discussion among attendees. Richard’s presentation not only highlighted the potential risks and rewards of AI adoption but also encouraged a deeper understanding of its impact. The session was very well-received, sparking meaningful dialogue and a multitude of questions from an engaged audience. We are appreciative of Richard’s contribution to our event and his role in advancing the conversation around AI. His expertise and passion for the subject matter were evident and greatly appreciated by all participants.

Andy Murray, Executive Director at Major Projects Association

Richard provided an excellent platform for our discussion, skillfully getting to the heart of the matter and allowing me to share my views effectively. His thoughtful questions, combined with a warm and engaging manner, made for a truly insightful conversation. I appreciated his genuine interest and the respectful way he facilitated our discussion.

Sir Anthony Seldon, Author, Historian and Educator

Richard Foster-Fletcher is an asset to humanity. He brings intelligence and heart to our ever-changing future. He’s a team player who cares about people and their important role on the planet as stewards for future generations. He is a great communicator and thinks outside of the box when creating solutions to difficult problems. Richard also sees the value in healthy partnerships when expanding new ideas. It’s an honor to work with Richard.

Laura Cox, Executive Producer

Richard delivered a truly engaging and thought-provoking session at our Director As Strategic Leader programme. His insights into the opportunities and challenges of AI were both enlightening and inspiring. Richard highlighted the need to channel AI advancements into creating new roles and fostering grassroots innovation, emphasising the importance of responsible AI development. His ability to articulate complex concepts in an accessible manner made the session highly impactful. Richard’s emphasis on transparency, accountability, and aligning AI with human values resonated deeply with our participants. We were delighted to have him speak and look forward to future collaborations.

Graham Bell, Director of Digital Education at Cranfield School of Management

Richard is a thorough professional who takes his work seriously, is knowledgeable and possibly the best at spreading the word about AI. Richard’s probing questions about this subject, make you think. He gets the best out of you!

Nikhil Malhotra, Chief Innovation Officer, Tech Mahindra

Richard’s keynote at our recent EMEA Conference was both enlightening and engaging. His presentation, “AI in Action – Balancing Innovation and Integrity in Decision-Making” adeptly addressed the complex interplay of AI innovation and ethics, enhancing our understanding of these critical issues. Richard combined humour and humility to make the intricate topics of bias and the future of work approachable, while his interactive style enriched our session. We are grateful for his significant contribution to the conference and for advancing the conversation on ethical AI.

Megan Hendricks, Executive Director, MBA Career Services & Employee Alliance