Not long after the hype over various AI-driven tools like ChatGPT and DALL-E made the news, stories of copyright violations soon followed. What are the specific concerns during this “wild west” period of AI content?
Large language models scraping copyrighted content
ChatGPT and other generative AI tools require incredible amounts of data to produce their content. Unfortunately, the best place for AI large language models to learn is the internet. But these tools aren’t just learning from the websites they scrape; they’re regurgitating the content. The large language AI’s output doesn’t necessarily reproduce scraped website text verbatim. Still, writers are discovering that AI often repackages text in a form so close to the original that they can easily recognize their work in the AI’s output.
Copyrighted material on websites, books, and other text-based media is legally protected, so writers are demanding compensation for what amounts to plagiarism by tools like ChatGPT without citation or credit.
AI imaging tools adopting artists’ work
A pending lawsuit from Getty Images accuses Stability AI of using 12 million of its images without permission to train its text-to-image tool, Stable Diffusion. “These images were all taken without consent, without compensation. There’s no attribution or credit given to the artists,” said attorney Matthew Butterick.
The question of how much sampling from Getty’s original images ends up in Stable Diffusion’s output isn’t up for much debate. In some instances, the AI-produced images even had the Getty watermark included.
Lawsuits surge as the law catches up with AI technology
AI is the latest example of rapidly developing technology racing miles ahead of regulatory oversight. Biometric data, intellectual property and copyright plagiarism are just a few legal concerns as tech companies rush out new products without regard for the human-created work that made machine learning possible.
The Hollywood writers’ strike recently ended, but the actors’ strike is ongoing as of this writing. Current and future AI technology was among the most contentious subjects in both disputes. Studios scanned actors’ faces, bodies and voices, then asked them to sign away their rights for future use. Though we’re not quite there yet, the technology needed to create new feature films with existing scans is quickly becoming a reality. And it’s not just the movie studios toying with this possibility. TikTok recently settled a lawsuit with a voice actor after she discovered the app was using her voice for its text-to-speech feature.
Two possible outcomes from these lawsuits include opt-out code that websites can include, preventing large language models from scraping their pages, and some form of compensation for artists when AI samples their work like the program stock photography website Shutterstock recently launched.