On Monday, court documents revealed that AI company Anthropic spent millions of dollars physically scanning print books to build Claude, an AI assistant similar to ChatGPT. In the process, the company cut millions of print books from their bindings, scanned them into digital files, and threw away the originals solely for the purpose of training AI—details buried in a copyright ruling on fair use whose broader fair use implications we reported yesterday.

The 32-page legal decision tells the story of how, in February 2024, the company hired Tom Turvey, the former head of partnerships for the Google Books book-scanning project, and tasked him with obtaining “all the books in the world.” The strategic hire appears to have been designed to replicate Google’s legally successful book digitization approach—the same scanning operation that survived copyright challenges and established key fair use precedents.

While destructive scanning is a common practice among smaller-scale operations, Anthropic’s approach was somewhat unusual due to its massive scale. For Anthropic, the faster speed and lower cost of the destructive process appear to have trumped any need for preserving the physical books themselves.

Read full article

Comments


From Ars Technica - All content via this RSS feed

  • PlzGivHugs@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    22 hours ago

    As much as this sucks, it highlights the shitty state of the ebook landscale as much or moreso than the evil of AI in this case.

    There is numerous potential reasons someone would want a digital copy of a book, yet somehow its cheaper to buy physical books and destroy them rather than just buying a digital copy. The companies responsible for publishing and distribution put so many roadblocks in the way, from prices that are regularly higher than physical books, to intrusive DRM, to just requiring the use of an overcomplicated and ad-filled website just to buy it. Its text. Its been a solved technical problem for nearly as long as computers have existed. The fact that I can’t just buy a pdf or an epub of a book, (nonetheless at a price that makes sense given the distribution method) is absurd.

  • mindbleach@sh.itjust.works
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    21 hours ago

    Company A gets existing plaintext files from torrents, people are big mad.

    Company B buys physical media for mass processing, people are big mad.

    The nature of bad faith is that there is no right answer.