“Finding my books on the Books3 data set was disappointing and disorienting: writing is how I’ve made my life, artistically, and—this is important—practically too. . . . Books and writing are how I pay my mortgage, my children’s tuition, my grocery bill. To see my work so cavalierly stolen and used, without my consent, by corporations eager only to increase their own profits, is frankly terrifying.”—Elisabeth de Mariaffi, in The Walrus
Books3, if you’re not familiar with it, is a dataset of books—thousands of them (as in around 183,000)—that were downloaded from pirated sources—so the authors received nothing for their work—and then used to train the AI language models of several companies, including Meta and Bloomberg.
Odds are, you’ve not heard of de Mariaffi.
Odds are, you have heard of Mark Zuckerberg and Mike Bloomberg.
Bloomberg is estimated to be worth $96 billion. Zuckerberg? About $115 billion.
Neither probably thinks about making their mortgage payments or the size of the grocery bill.
There are lawsuits against Books3 by authors and other interested parties.
There are lawsuits against OpenAI for illegally using authors’ works. There are some more famous writers—Jodi Picoult, George R.R. Martin, George Saunders, John Grisham, Jonathan Franzen—involved in suits, as are some, well, outliers, like Mike Huckabee and Sarah Silverman.
While the name brands probably aren’t too concerned about the price of a gallon of milk, what is notable about these undertakings is that these people are trying to protect their work from the potential unfair reuse of manipulated variants thereof that would lead to increased corporate profitability and no benefit redounding to them.
Think about it: Books3, used by humongous corporations, didn’t even plunk down $20 for a copy of The Firm.