this post was submitted on 21 Feb 2024

165 points (95.1% liked)

Technology

59168 readers

2380 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

165

Why The New York Times might win its copyright lawsuit against OpenAI (arstechnica.com)

submitted 8 months ago by [email protected] to c/[email protected]

36 comments fedilink hide all child comments

Why The New York Times might win its copyright lawsuit against OpenAI::The AI community needs to take copyright lawsuits seriously.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 61 points 8 months ago* (last edited 8 months ago) (3 children)

Some of the prior cases described in this article, as precedents that could spell trouble for OpenAI, frankly sound like miscarriages of justice. Using copyright to prevent organizations from photocopying articles for internal use? What the heck?

If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.

[–] [email protected] 28 points 8 months ago (2 children)

You are not wrong that monopolies granted by copyright are regularly and unfairly abused.

That being said, AI trainers are getting away with plagiarism right now. More importantly, it's not just violation of a single copy, it's potentially the creation of tools that enable mass derivative copies. Authors that create training data need to be compensated.

[–] [email protected] -4 points 8 months ago (1 children)

Authors that create training data need to be compensated.

There should not be a problem with that. The people who work on training datasets are already being paid.

The reason you are getting downvoted is that these lawsuits are not about that. These are about giving money to corporations like the NYT - or Reddit, or Facebook, etc - for the "intellectual property" that they already have lying around. It's pure grift.

Because the creation of all that is already paid for, that leaves all the more money for lawyers and PR campaigns to extract money for nothing from society.

[–] [email protected] 15 points 8 months ago (1 children)

There should not be a problem with that. The people who work on training datasets are already being paid.

How are the people whose articles and comments are being scraped compensated?

Because the creation of all that is already paid for

"This perfectly good movie has already been made and paid for, that means I can watch it without compensating the studio."

I do not agree with Reddit selling the comments of their users. Even so that's a ridiculous statement to make.

[+] [email protected] -8 points 8 months ago (1 children)

How are the people whose articles and comments are being scraped compensated?

By people who work on training datasets I mean, EG, the people on Amazon Mechanical Turk. I am not working on a dataset by writing this comment. I'm putting some things straight and getting exactly the payment I was promised - IE none.

“This perfectly good movie has already been made and paid for, that means I can watch it without compensating the studio.”

Let's take the NYT as an example. To publish their newspapers, they need to pay reporters, but also editors and assistants. They also need offices, and for those they need to pay maintenance and janitorial staff. To get it out there, they need printers, server admins and such.

In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that's older than a few days. Before the internet, either an issue sold enough or it didn't. No one cares about yesterday's news. I doubt the internet changes that very much. That's what I mean by "it's already paid for".

For a movie, the time horizon is probably a year or so. IDK to be honest. AFAIK, it used to be that if a blockbuster did not make a profit in cinemas, it was over. Maybe the time horizon was longer for direct-to-DVD productions. I guarantee you that no corporations plans ahead more than a few years. Patents last only 20 years and that's more than enough to finance all the expense for R&D that has created modern tech.

I think it is absolutely ridiculous that corporations can still extract money for something that was made in the 1940ies and even earlier. That does not pay for movies, because it's not money that was ever calculated with. It only pays for the creation of paywalls. As long as enforcing a copyright pays for that enforcement, it will be done.

[–] [email protected] 4 points 8 months ago (1 children)

In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that's older than a few days. Before the internet, either an issue sold enough or it didn't. No one cares about yesterday's news. I doubt the internet changes that very much. That's what I mean by "it's already paid for".

So the moment a property breaks even, + makes "some profit", you should no longer need to pay for it? Only when people still "care", in that case they should pay?

Just because it's a news article or a comment doesn't mean it's fair game all of a sudden.

And movies can make back their budget in the opening week(end) when they're popular. The timeframe is irrelevant for your argument. At least if we're talking about anything less than a decade or two old, because...

I think it is absolutely ridiculous that corporations can still extract money for something that was made in the 1940ies and even earlier.

... with this I do agree.

[–] [email protected] 0 points 8 months ago (1 children)

So the moment a property breaks even, + makes “some profit”, you should no l onger need to pay for it? Only when people still “care”, in that case they should pay?

That's not what I wrote, is it?

The problem with your idea here is that some movies/games/etc never make back the investment. That would mean that they would never run out of copyright if we did it that way. That some movies are duds also means that, on average, the rate of return on such investments is dragged down.

In a functioning market, the average ROI should be the same across the board. If something has a lower return, then people simply don't invest in it. That's clear, I hope. This means, that putting a cap on the profit that may be expected will reduce investments.

Obviously, the only returns that matter for this reasoning, are expected returns. Only the expected returns fund movies, etc., and that's why the timeframe matters.

At this point, ideology (or philosophy) becomes important. One has to ask: What is property about?

There are different philosophical views around this subject, but I am really only concerned with the practical outcome. The political right tends to hold an expansive, absolute view of property rights, to the point of rejecting taxation as illegitimate. The original definition of the political right was as supporters of the monarchy. It makes sense that the right would morph into something that supports all kinds of heritable privilege or right. The anti-capitalistic right seems to have largely disappeared. They often don't agree that intellectual property is property.

The left tends to hold more nuanced and pragmatic views. Property rights are balanced against other rights; the interests of other people and society at large. The US Constitution takes this view of copyrights and patents. [The United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

This latter view is, more or less, one to which I subscribe. Without copyright, there would be only public domain. Copyright integrates creative works into a capitalistic system by turning them into capital. Intellectual property enables people to make a profit with intellectual products. However, there is a clear limit to how much profit one may extract. One may only expect that profit, which actually incentivizes intellectual production. I actually hold the general view that all (commercial) property is only legitimate as long as it works beneficially for society.

So, I do not believe that anyone is entitled to windfall profits. I have published stuff on the web for my own reasons. Other people have found a new use for that by creating AI training datasets. I do not believe I have any moral justification to demand a share of their work.

I hope that clears it up.

Clearly you do not agree with this view. Obviously, you have some more absolutist view of intellectual property. I would appreciate it if you laid out your view of things. You don't need to answer these question, I'm just putting them here to say what is unclear: How does one create or obtain intellectual property? What can be intellectual property and what are the limits? To what does this property entitle one?

Finally: How come that these threads bring out so much support for right wing views? Looking at other threads, I would have expected left wing views to dominate. Looking at the piracy community around the corner, I would have thought that even among the right, copyright abolitionist views would dominate here. What gives?

[–] [email protected] 2 points 8 months ago (1 children)

That's exactly what you wrote?

In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that's older than a few days. Before the internet, either an issue sold enough or it didn't. No one cares about yesterday's news. I doubt the internet changes that very much. That's what I mean by "it's already paid for".

Your argument was that the sources that get scraped have already been paid for. I don't see how it's any different for newspapers than it is for movies and such. It's not like news agencies are eternally profitable and never go bankrupt. Nor do I want corporations to profit for free off the comments I wrote, even if I may or may not have signed my soul away in some EULA nobody reads.

[–] [email protected] 0 points 8 months ago (1 children)

I take it that my post was too long to read. The only thing I can do is write more, which obviously will not help. So there's nothing I can do.

I don't believe you actually want that right-wing hellhole you are clamoring for. But in the end, what counts is what you vote for, what you ask for, and not what you want inside.

[–] [email protected] 1 points 8 months ago (1 children)

You seem to have misinterpreted my "alignment", if you will. I do agree my arguments here leaned pretty heavily on the corporate side.

But many of these AI are either run or backed by these same massive corporations. Corporations who staunchly defend their own copyright, yet don't mind taking from the little guy and breaking their own unfair rules even further.

I am, generally, anti-AI. As may have been apparent. I wish not for my words to be vacuumed up into a black box to be spat back out at me.
Whilst I think some amount of copyright is fair, 80 years is far too many. Putting a cap on how profiting any property can be is an interesting take.

But that's not part of the conversation. It's wrong for AI companies to take whatever data they can get their hands on just because it's out there for human eyes to read. Whether that content has outlived its newsworthy usefulness or not.

[–] [email protected] 0 points 8 months ago (1 children)

I understand your "alignment" correctly.

You're obviously not reading anything I write.

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago) (1 children)

You are making baseless assumptions about me, though it is true I initially didn't particularly care to read the entirety of your comment.

Ultimately I don't care for the NYT. What I do care about is the starving artist whose work is being ripped off. I care about web crawlers not respecting any wishes of the creator and consent being forcefully taken.
If they wish not to partake that wish ought to be respected. Better yet, it should be opt-in before your works are allowed to be used.

But the current society isn't about being fair. They can store your data for advertisement because you surely have nothing to hide and cannot be affected by targeted propaganda. They can use your work for their own means and charge a profit. You get to be happy you're allowed to exist at all to lick their boots. You will own nothing and be happy.

Cool, you're fine with your work being used by massive corporations to make their own profits off of your work. Not everyone may agree to that, and an artist should be able to control how their work is appropriated for some time.

I suppose it's my fault for not being able to voice these awful gut feelings properly. You equate my view of personal liberty with some sort fascist mindset. You are wrong. And you who cares not for their own work does not get to import that view onto others.

Next you'll call me wrong, saying you do care about your work. Which I'm sure you do, my statement was hyperbolic to some extent. But surely you must understand that your view of some sort of ROI cap does not match that of the corporations taking as they please. OpenAI suddenly stopped being so open when their model became popular.

[–] [email protected] 0 points 8 months ago (1 children)

What assumption am I making about you? I think I got it quite right.

I suppose it’s my fault for not being able to voice these awful gut feelings properly.

That's not the problem. The problem is that you are acting on gut feeling. Your policy preferences are based on gut feeling.

I am guessing that you want your future to be a certain way. You want future society to be a certain way. To get there, we need to take the right steps. But you're not thinking about that at all. You're just thinking about what steps you feel like taking.

That won't get you to where you want to be. You haven't even thought about where it will actually take you.

[–] [email protected] -1 points 8 months ago* (last edited 8 months ago) (1 children)

What assumption am I making about you?

Me being some radical right-winger, Mr. or Ms. AI-techbro.

The problem is that you are acting on gut feeling.

Is your "I don't mind my work being used in someone else's venture" any less of a gut feeling? I believe not.

You haven't even thought about where it will actually take you.

More of these baseless assumptions of yours, but going into future ramifications I may or may not have considered isn't part of this conversation.
You didn't even respond to my main points and instead latched onto what seems to you to be the weakest part of my argument. Are you reading my replies properly?

Companies taking whatever they please, be it data otherwise, without oversight is problematic. Regardless of how much you personally enjoy being trampled on for the sake of "progress" or not.

[–] [email protected] 1 points 8 months ago (1 children)

Me being some radical right-winger,

I have never claimed that. I explicitly wrote that I don't believe you want what you are clamoring for.

You didn’t even respond to my main points

You do not read long posts, remember?

The defining feature of rich people is that they own a lot of property. When you make it so that more money must be paid to property owners, you disproportionately benefit the rich.

[–] [email protected] 1 points 8 months ago (1 children)

You do not read long posts, remember?

"though it is true I initially didn't [...]"

That said, I read it again, I suppose I have been uncharitable. You make some good points, and perpetual ironclad intellectual property hoarded by massive corporations isn't something my current views adequately address.
But just because I don't have an answer to that doesn't mean I have to agree with AI companies scraping every last corner of the internet for their datasets.

You say you disagree with property owners always receiving compensation for their work being used.
To some extent I agree with your disagreement.

Even so I cannot view AI companies taking the work of whomever they please without compensation as morally justifiable. Especially if those artists are small and have no way to defend themselves.
IP hoarders are a separate issue.

[–] [email protected] 1 points 8 months ago (1 children)

But just because I don’t have an answer to that doesn’t mean I have to agree with AI companies scraping every last corner of the internet for their datasets.

You don't have to agree. It's a value judgement. What is important to you? There is no correct answer.

My conviction is that property is mainly a means to an end. That end is human well-being, but if you pressed me on what exactly that means, I'd start flailing.

You can believe that intellectual property is fundamentally important. Mind that what you think of as intellectual property is probably broader/different from copyright in law. You can say that enforcing this kind of property right is an end in itself, that justifies the terrible consequences. Small artists would get shafted one way or the other.

[–] [email protected] 1 points 8 months ago (1 children)

Small artists would get shafted one way or the other.

And ideally they wouldn't. Letting AI companies take as they please is one part of that, therefore it should be stopped.

Injustice is such a frustrating thing. But when the opposing party has trillions of dollars and you draw in your free time there's literally nothing you can do. So much for equality.

[–] [email protected] 1 points 8 months ago (1 children)

That's a result of your values. Your views on property are incompatible with equality.

You made the assumption that I do not care if my writings are used for AI training but I actually do. I like it. I like knowing that I helped other people. I feel the same way about taxes, but this is better since it does not cost me anything.

This may be too long but here's a quick overview of what your views on property mean for small artists.

Per Google, Getty Images' archive is the largest privately-owned photographic archive in the world, containing over 130 million images dating back to the beginning of photography and beyond. Unsurprisingly, Getty is suing over AI.

How many images does your small artist own? A few dozen? A few hundred?

So when your small artist gets a few dollars, Getty gets many millions. Of course, they won't be getting the same per image. Getty can pay lawyers millions to negotiate and there will still be many millions left in profit. Your small artist can't do that. Even the negotiation would cost more than their images are worth. They can only upload to their images to Adobe or Shutterstock and accept whatever they are given.

Even the most selfless non-profit would have to take a big chunk just to handle the cost of running the website, dealing with copyright infringement, bad quality images, "naughty" images, track payment information, handle the money,... But why should they be selfless? After all, the website is basically their property.

Now we reach the point where it gets bad.

Remember that the rent for these images does not create anything of value. No one is paid to make anything new. Money is transferred to property owners, because they own property. It ends up mainly with rich people, because they own so much property. Much of the money for "small artists" is wasted on bureaucracy. A good chunk also ends up with rich people, because middle men are unavoidable.

Since we are mainly transferring and not creating wealth, it must come from somewhere. It comes from subscription fees for AI services. It can't come from anywhere else, right?

For example, a subscription for Photoshop has to include these fees. What Photoshop calls generative fill is genAI.

Now riddle me this: Who pays subscriptions for Photoshop?

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago) (1 children)

Now riddle me this: Who pays subscriptions for Photoshop?

Ah, my favourite argument.

"Clothing brands are using slavery to produce clothes."

- "They ought to be produced sustainably!"

"Who's going to pay for that?"

Perhaps generative AI models, as they currently exist, are too good to be true.

If they can only afford to pay individual artists pennies, perhaps it is something that shouldn't be taken. If those artists are happy for their work to contribute to someone else's MRR scheme then they should be free to submit their work wherever, or tag it with something.

I don't care what Getty Images gets paid. I don't know how the licensing works for that platform, but Getty doesn't make all of its own images and those photographers and artists who contribute ought to be compensated in turn.
If that makes the work infeasible, so be it.

But if that's a system that gets implemented now then the existing companies would have an insurmountable headstart. Should they be forced to completely wipe their work?

And Getty shouldn't be making money off pictures from the 1800s. I agree with that too.

And I understand no "value is generated" by paying property owners. I guess you should give your books away to people on the street since the value has already been created by writing them and there's no more point in selling them.
Not literally, of course. But if individual artists don't make any money and/or reputation off their already made work they won't be able to continue "generating new value", I guess they'll have to find some office job, or head out to the not-yet-automated mines.

It's a shit situation which is profiting off a free and open internet. An internet which is slowly closing itself off further and further.

And I think, with all due respect, you're misguided for being happy to contribute to your own replacement. If you sell your books on Amazon; Amazon is getting flooded with AI generated books, making it even more difficult for yours to stand out in the sea of regurgitated garbage. Maybe you personally have a system for getting around that, not everybody does. Alternatively you don't rely on the income of your books, but at that point why bother publishing at all? Might as well send them directly to OpenAI or whomever.

Edit: Oh, better yet, because I just ran into it: Getty/Adobe Stock licensing you generated images! I'd laugh if it didn't make me want to cry.

[–] [email protected] 0 points 8 months ago (1 children)

Edit: Oh, better yet, because I just ran into it: Getty/Adobe Stock licensing you generated images! I’d laugh if it didn’t make me want to cry.

If that's not what you want, you really should think about what you support.

Your ideas mean that wealth must be transferred to property owners. This wealth has to come from somewhere. It must be created through work.

Wealth is taken from workers and given to owners. That's what you are demanding.

Where images are concerned, wealth must go from artists to owners.

[–] [email protected] 1 points 8 months ago

Thank you for reading my post, only responding to a single point, and making a strawman of the rest of my argument.

Why do I even bother.

Enjoy replacing yourself.

[+] [email protected] -6 points 8 months ago* (last edited 8 months ago) (1 children)

[removed by mod]

[–] [email protected] 4 points 8 months ago (2 children)

It looks like someone hasn't seen the video of Copilot spitting out the Quake inverse sqrt algorithm verbatim.

[–] [email protected] 0 points 8 months ago (1 children)

While it got popularised as "Carmack's reverse" the algorithm is actually significantly older.

Also you'd have to show that it was literally copy+pasted, including comments and all, to even have a chance at a copyright claim: Algorithms are not subject to copyright, similar to how story structures aren't. This is like saying "I asked an author to write a book and they plagiarised the hero's arc!". And even if it was copied straight-out you'd have an uphill battle to fight, to wit, wikipedia is quoting the thing verbatim.

That said copilot seems to be severely over-fitted in places, and I don't like the thing one single bit, and the only thing it's generally good at is writing code faster that shouldn't have been written in the first place, but inverse sqrt isn't a good example.

[–] [email protected] 3 points 8 months ago* (last edited 8 months ago) (2 children)

It didn't just get the gist if the algorithm though, it literally had the same magic number (which isn't even the most optimal iirc), the same COMMENTS (//what the fuck?), same variable names, etc.

It didn't produce the algorithm logically, it copied it.

Wikipedia is also adhering to the GPL license of the code. Copilot is not, especially if it's working on proprietary code or adding an MIT license header to copied GPL code (lol)

[–] [email protected] 1 points 8 months ago (1 children)

It didn’t produce the algorithm logically, it copied it.

The magic number is part of the logic of the thing.

But yes as said copilot is overfitted. Inverse sqrt still isn't a good example, it's nearly as bad as Oracle trying to claim to have found copyright infringement in Android's standard Java library by saying that Math.average or whatnot is identical. There are way better examples of why copilot is fucked up.

[–] [email protected] 1 points 8 months ago

The magic number is part of the logic, yes, but that's not even the best magic number for the job iirc, and nobody remembers how they got it.

I just used this as an example because it's incredibly clear that it was copied verbatim (again, comments like "what the fuck?" showing up, you can't tell me it came up with that itself)

[–] [email protected] 1 points 8 months ago

I had bing chat spit back at me the question I posted on stack overflow the day before. You know, the example code I provided which didn't exactly work as I wanted.

[–] [email protected] 6 points 8 months ago

If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.

Exactly! Copyright law is terrible. We need to hold AI companies to the same standard that everyone else is held. Then we might actually get big corporations lobbying to improve copyright law for once. Giving them a free pass right now would be a terrible waste of an opportunity in addition to being an injustice.

[–] [email protected] 1 points 8 months ago

I think the photocopying thing models fairly well with user licenses for software. Without commenting on whether that's right in the grand scheme of things, I can see that as analogous. Most folks accept that they need individual user licenses for software right? I get that photocopying can't be controlled the same way software can but the case was in the 90s? I mean these things aren't about whether the provider of the article/software faces increased marginal cost for additional copies/users but that the user/company is getting more use than they paid for. License agreements. Seems like a problem with the terms of licenses and laws rather than how they were judged as following them or not. Their use didn't seem to be transformative and the for profit nature of their use sort of overruled the "research" fair use.

I also think the mp3.com thing sucks, but again, the way the law is, that's a reasonable/logical outcome. Same thing that will kill someone offering ebooks to people who show a proof of purchase.

I don't know the solution to the situation with NYT/open AI. It's a pretty bad look to be able to spit out an article nearly verbatim. We do need copyright reform, but I think that's at the feet of the legislators, not judges. I only need to see the recent Alabama IVF court ruling to be reminded of the danger of more... interpretative rulings.