Connect with us

Technology

OpenAI Faces Copyright Allegations: Was GPT-4o Trained on Unauthorized Books?

Published

on

OpenAI Faces Copyright Allegations: Was GPT-4o Trained on Unauthorized Books?

OpenAI, one of the leading organizations in the field of artificial intelligence (AI), is now facing allegations of copyright infringement. Several reports have claimed that OpenAI has used copyrighted material, especially non-public books, without permission to train its latest AI models. Recently, a report by the AI ​​Disclosures Project has made the matter more serious, alleging that OpenAI’s GPT-4o model may be based on paywalled books from O’Reilly Media.

How do AI models work?

AI models are basically prediction engines that are trained on large amounts of data. These data sources include books, movies, TV shows, and other text content. When an AI model writes an essay on a Greek tragedy or paints a Ghibli-style painting, it is just predicting the content from its vast knowledge. This means that the AI ​​is not creating anything new but is reproducing already existing data.

Copyright problems in AI models

A major challenge for AI companies is that they need high-quality data to train their models. When public web data starts to run out, companies look for more specific and high-level data. However, a major risk of this is that many times this data is protected under copyright.

According to a report by the AI ​​Disclosures Project, OpenAI may have used O’Reilly Media’s paywalled books to train GPT-4o. The claim is based on a technique called DE-COP, which helps identify whether a language model’s training data contained copyrighted material.

Advertisement

Suspicious use of O’Reilly Media books

Researchers from the AI ​​Disclosures Project, which also includes O’Reilly Media CEO Tim O’Reilly, analyzed GPT-4o, GPT-3.5 Turbo, and other OpenAI models. They checked 13,962 paragraphs from 34 O’Reilly books for the likelihood of them being part of the model’s training data.

According to the report’s findings, GPT-4o identified O’Reilly Media’s non-public books at a higher rate than earlier OpenAI models. This could mean that GPT-4o was trained on these books. However, it is also possible that these books were copied and pasted by users on ChatGPT and the data was collected from there.

Data collection policy of AI companies

OpenAI is already facing several lawsuits over its data collection policy. The company has licensing agreements with some news publishers, social networks, and media libraries, allowing it to use the content legally. Apart from this, OpenAI also gives copyright owners the option to exclude their content from AI training. However, there are many flaws in this process and controversy still remains about it.

Conclusion

It has become clear to OpenAI and other AI companies that the demand for high-quality training data is constantly growing. But this process must not violate copyright laws. Transparency and proper data use policies are needed in the AI ​​industry so that AI technology can be developed legally and ethically. These allegations against OpenAI could prove to be a turning point for the future of AI, as it has sparked a global debate over the transparency of the sources used for AI training.

FAQs

Q. What are the copyright allegations against OpenAI?

A. OpenAI is accused of using copyrighted, non-public books without permission to train its AI models, particularly GPT-4o.

Advertisement

Q. Which report raised concerns about OpenAI’s data usage?

A. A report by the AI Disclosures Project claims that GPT-4o may have been trained on paywalled books from O’Reilly Media.

Q. How do AI models like GPT-4o learn?

A. AI models are trained on vast amounts of data, including books, movies, and online content, to recognize patterns and generate text.

Q. Does OpenAI have legal agreements for training data?

A. OpenAI has licensing deals with some publishers but faces criticism for allegedly using unauthorized content.

Q. Why is this issue important for AI development?

A. It raises ethical and legal concerns about data sourcing, copyright protection, and transparency in AI training.

Advertisement
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2024 AAZKANEWS.COM.