Reddit Demands Cash For AI Tools Training From Its Content

Joseph Iyanu
Reddit to charge companies for API access, capitalizing on valuable data used in training AI systems like Google’s Bard & OpenAI’s ChatGPT.
Reddit app icon. Photo: Brett Jordan | Pexels

Reddit has long been an online hub for discussions on a wide range of subjects. As a result, companies like Google and OpenAI have turned to it for training their AI systems. Recently, Reddit decided it’s time to get paid for its valuable data. It announced plans to charge companies for access to its API, which allows external parties to download and analyze the site’s many conversations.

A New Revenue Stream For Reddit

In the past, Reddit’s discussions served as free resources for tech giants like Google, OpenAI, and Microsoft. Now, with AI becoming increasingly important, Reddit is seeking compensation. Reddit’s founder and CEO, Steve Huffman, explained that while the site’s data is extremely valuable, they don’t need to give it away for free. Moreover, these AI systems could potentially become competitors, offering automated alternatives to Reddit’s conversations.

Live trading. Photo: | Pexels

Currently, Reddit is preparing for a possible IPO later this year. The company, established in 2005, earns most of its revenue from advertising and e-commerce transactions on its platform. Reddit has yet to finalize the pricing details for API access but plans to announce them in the coming weeks.

Valuable Conversations For AI Development

Reddit’s forums have gained importance as large language models (LLMs) become essential for new AI technology. LLMs are advanced algorithms developed by companies like Google and OpenAI, with Reddit conversations serving as crucial data inputs. For instance, Google’s Bard and OpenAI’s ChatGPT both use Reddit data for training.

In addition to Reddit, other companies are recognizing the value of their content. Shutterstock, for example, sold image data to OpenAI to develop DALL-E, an AI program that generates images from text prompts. Twitter’s owner, Elon Musk, also mentioned plans to restrict API usage and potentially charge significant fees.

User interface of Open AI chatgpt. Photo: Airam Dato-on | Pexels

To improve their AI models, developers require vast amounts of computing power and data. While some of the largest AI developers possess ample computing resources, they still rely on external data sources, such as Wikipedia, digitized books, academic articles, and Reddit, to enhance their algorithms.

Symbiotic Relationship With Search Engines

Historically, Reddit has had a symbiotic relationship with search engines like Google and Microsoft, which index Reddit’s web pages for search results. While this “scraping” can be unwelcome for some websites, Reddit benefits by appearing higher in search rankings.

Huffman believes the site’s data is particularly valuable due to its constant updates. These fresh, relevant conversations help large language modeling algorithms produce the best results.

Reddit’s API will still be free for developers building applications to help people use the platform and for researchers studying the data for academic or non-commercial purposes.

Google homepage. Photo: | Pexels

Reddit also aims to incorporate more machine learning into its own operations, potentially using AI to identify bot-generated text and label it accordingly. The company has also promised to improve software tools for moderators and support third-party bots that help manage forums.

Time For AI Makers To Pay Up

As AI technology advances, Reddit sees an opportunity to capitalize on its valuable data. By charging companies for API access, it hopes to generate additional revenue and ensure fair compensation for its contributions to AI development.

