this post was submitted on 26 Jul 2023
859 points (96.5% liked)

Technology

59689 readers
3370 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 14 points 1 year ago* (last edited 1 year ago) (2 children)

Isn’t learning the basic act of reading text?

not even close. that's not how AI training models work, either.

if your position is that only humans can learn and adapt text

nope-- their demands are right at the top of the article and in the summary for this post:

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools

that broadly rules out any AI ever

only if the companies training AI refuse to pay

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (1 children)

Isn’t learning the basic act of reading text?

not even close. that’s not how AI training models work, either.

Of course it is. It's not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human's learning process, would that matter for you? I doubt that very much.

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

What we're broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That's not materially different from how anyone learns to write. Even my use of the word "materially" in the last sentence is, surely, based on seeing it used in similar patterns of text.

The difference is that a human's ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

There's a case here that the renumeration process we have for original work doesn't fit well into the AI training models, and maybe Congress should remedy that, but on its face I don't think it's feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago) (1 children)

Of course it is. It’s not a 1:1 comparison

no, it really isn't--it's not a 1000:1 comparison. AI generative models are advanced relational algorithms and databases. they don't work at all the way the human mind does.

but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.

no, the results are just designed to be familiar because they're designed by humans, for humans to be that way, and none of this has anything to do with this discussion.

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

nobody is saying it should be individually-licensed. these companies can get bulk license access to entire libraries from publishers.

That’s not materially different from how anyone learns to write.

yes it is. you're just framing it in those terms because you don't understand the cognitive processes behind human learning. but if you want to make a meta comparison between the cognitive processes behind human learning and the training processes behind AI generative models, please start by citing your sources.

The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

this is not the difference between humans and AI learning, this is the difference between human and computer lifespans.

There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models

no, it's a case of your lack of imagination and understanding of the subject matter

and maybe Congress should remedy that

yes

but on its face I don’t think it’s feasible to just shut it all down.

nobody is suggesting that

Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

lmao

[–] [email protected] 4 points 1 year ago (1 children)

You're getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That's irrelevant. There's nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn't. Either it's derivative of it isn't.

What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That's true regardless of algorithm and there's certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.

Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy's estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy's writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.

So it's really, really hard to make the case that there's any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.

The problem is that as a consumer, if I buy a book for $12, I'm fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn't be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (1 children)

You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

you're accusing me of what you are clearly doing after I've explained twice how you're doing that. I'm not going to waste my time doing it again. except:

Where copyright comes into play is in whether the new work produced is derivative or transformative.

except that the contention isn't necessarily over what work is being produced (although whether it's derivative work is still a matter for a court to decide anyway), it's regarding that the source material is used for training without compensation.

The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it.

and, likewise, so are these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.

[–] [email protected] -1 points 1 year ago (2 children)

these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.

That wouldn't be copyright infringement.

It isn't infringement to use a copyrighted work for whatever purpose you please. What's infringement is reproducing it.

[–] [email protected] 7 points 1 year ago (1 children)

It isn’t infringement to use a copyrighted work for whatever purpose you please.

and you accused me of "completely misunderstanding copyright law" lmao wow

[–] [email protected] -3 points 1 year ago (1 children)
[–] [email protected] 4 points 1 year ago (2 children)

It's infringement to use copyrighted material for commercial purposes.

[–] [email protected] 1 points 1 year ago

If I buy my support staff "IT for Dummies", and they then, sometimes, reproduce the same/similar advice (turn it off and on again), I owe the textbook writers money? That's news to me.

[–] [email protected] 0 points 1 year ago

No, it isn't. There are enumerated rights a copyright grants the holder a monopoly over. They are reproduction, derivative works, public performances, public displays, distribution, and digital transmission.

Commercial vs non-commercial has nothing to do with it, nor does field of endeavor. And aside from the granted monopoly, no other rights are granted. A copyright does not let you decide how your work is used once sold.

I don't know where you guys get these ideas.

[–] [email protected] -1 points 1 year ago* (last edited 1 year ago) (5 children)

Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?

Even if we say we are going to pay out a measly dollar for every work it looks over, you’re immediately talking millions of dollars in operating costs. Doesn’t this just box out anyone who can’t afford to spend tens or even hundreds of millions of dollars on AI development? Maybe good if you’ve always wanted big companies like Google and Microsoft to be the only ones able to develop these world-altering tools.

Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?

[–] [email protected] 11 points 1 year ago* (last edited 1 year ago) (1 children)

i admit it's a hug issue, but the licensing costs are something that can be negotiated by the license holders in a structured settlement.

moving forward, AI companies can negotiate licensing deals for access to licensed works for AI training, and authors of published works can decide whether they want to make their works available to AI training (and their compensation rates) in future publishing contracts.

the solutions are simple-- the AI companies like OpenAI, Google, et al are just complaining because they don't want to fork over money to the copyright holders they ripped off and set a precedent that what their doing is wrong (legally or otherwise).

[–] [email protected] -3 points 1 year ago (1 children)

Sure, but what I’m asking is: what do you think is a reasonable rate?

We are talking data sets that have millions of written works in them. If it costs hundreds or thousands per work, this venture almost doesn’t make sense anymore. If its $1 per work, or cents per work, then is it even worth it for each individual contributor to get $1 when it adds millions in operating costs?

In my opinion, this needs to be handled a lot more carefully than what is being proposed. We are potentially going to make AI datasets wayyyy too expensive for anyone to use aside from the largest companies in the market, and even then this will cause huge delays to that progress.

If AI is just blatantly copy and pasting what it read, then yes, I see that as a huge issue. But reading and learning from what it reads, no matter how rudimentary that “learning” may be, is much different than just copying works.

[–] [email protected] 8 points 1 year ago (1 children)

that's not for me to decide. as I said, it is for either the courts to decide or for the content owners and the AI companies to negotiate a settlement (for prior infringements) and a negotiated contracted amount moving forward.

also, I agree that's it's a massive clusterfuck that these companies just purloined a fuckton of copyrighted material for profit without paying for it, but I'm glad that they're finally being called out.

[–] [email protected] -4 points 1 year ago (1 children)

Dude, they said

If AI is just blatantly copy and pasting what it read, then yes, I see that as a huge issue.

That’s in no way agreeing “that’s it’s a massive clusterfuck that these companies just purloined a fuckton of copyrighted material for profit without paying for it”. Do you not understand that AI is not just copy and pasting content?

[–] [email protected] 6 points 1 year ago (2 children)

AI isn't doing anything creative. These tools are merely ways to deliver the information you put into it in a way that's more natural and dynamic. There is no creation happening. The consequence is that you either pay for use of content, or you've basically diminished the value of creating content and potentiated plagiarism at a gargantuan level.

Being that this "AI" doesn't actually have the capacity for creativity, if actual creativity becomes worthless, there will be a whole lot less incentive to create.

The "utility" of it right now is being created by effectively stealing other people's work. Hence, the court cases.

[–] [email protected] -2 points 1 year ago (1 children)

Please first define “creativity” without artificially restricting it to humans. Then, please explain how AI isn’t doing anything creative.

[–] [email protected] -5 points 1 year ago* (last edited 1 year ago) (1 children)

Sure, AI is not doing anything creative, but neither is my pen, its the tool im using to be creative. Lets think about this more with some scenarios:

Lets say software developer “A” comes along, and they’re pretty fucking smart. They sit down, read through all of Mark Twains novels, and over the course of the next 5 years, create a piece of software that generates works in Twain’s style. Its so good that people begin using it to write real books. It doesn’t copy anything specifically from Twain, it just mimics his writing style.

We also have developer “B”. While Dev A is working on his project, Dev B is working on a very similar project, but with one difference: Dev B writes an LLM to read the books for him, and develop a writing style similar to Twain’s based off of that. The final product is more or less the same as Dev A’s product, but he saves himself the time of needing to read through every work on his own, he just reads a couple to get an idea of what the output might look like.

Is the work from Dev A’s software legitimate? Why or why not?

Is the work from Dev B’s software legitimate? Why or why not?

Assume both of these developers own copies of the works they used as training data, what is honestly the difference here? This is what I am struggling with so much.

[–] [email protected] 3 points 1 year ago (1 children)

Both developers have created a parrot tool. A utility to plagiarise a style.

[–] [email protected] -2 points 1 year ago

So now the output of both programs is “illegimate” in your eyes, despite one of them never even getting direct access to the original text.

Now lets say one of them just writes a story in the style of Twain, still plagiarism? Because I don’t know if you can copyright a style.

The first painter painted on cave walls with his fingers. Was the brush a parrot tool? A utility to plagiarize? You could use it for plagiarism, yes, and by your logic, it shouldn’t be used. And any work created using it is not “legitimate”.

[–] [email protected] 5 points 1 year ago

Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?

Congress has been here before. In the early days of radio, DJs were infringing on recording copyrights by playing music on the air. Congress knew it wasn't feasible to require every song be explicitly licensed for radio reproduction, so they created a compulsory license system where creators are required to license their songs for radio distribution. They do get paid for each play, but at a rate set by the government, not negotiated directly.

Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?

I'd say no one. Just like Taylor Swift gets the same payment as your garage band per play, a compulsory licensing model doesn't care who you are.

[–] [email protected] 5 points 1 year ago

Doesn't this just box out anyone who can't afford to spend tens or even hundreds of millions of dollars on Al development?

The government could allow the donation of original art for the purpose of tech research to be a tax write-off, and then there can be non-profits that work between artists and tech developers to collect all the legally obtained art, and grant access to those that need it for projects

That's just one option off the top of my head, which I'm sure would have some procedural obstacles, and chances for problems to be baked in, but I'm sure there are other options as well.

[–] [email protected] 1 points 1 year ago

Why is any of that the author's problem