this post was submitted on 28 Aug 2024
62 points (100.0% liked)

Open Source

31422 readers
10 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS
 

cross-posted from: https://lemmy.ndlug.org/post/1040526

A judge has dismissed the majority of claims in a copyright lawsuit filed by developers against GitHub, Microsoft, and OpenAI.

The lawsuit was initiated by a group of developers in 2022 and originally made 22 claims against the companies, alleging copyright violations related to the AI-powered GitHub Copilot coding assistant.

Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.

...

Despite this significant ruling, the legal battle is not over. The remaining claims regarding breach of contract and open-source license violations are likely to continue through litigation.

top 6 comments
sorted by: hot top controversial new old
[–] [email protected] 27 points 3 months ago

I hope "normal people" start exploiting that decision too. Training AI to consume stuff from big corporations and using the result to create open source/copyright free stuff from copyrighted works.

[–] mindbleach 18 points 3 months ago (2 children)

I have to admit - my initial outrage over Copilot training on open-source code has vanished.

Now that these networks are trained on literally anything they can grab, including extremely copyrighted movies... we've seen that they're either thoroughly transformative soup, or else the worst compression and search tools you've ever seen. There's not really a middle ground. The image models where people have teased out lookalike frames for Dune or whatever aren't good at much else. The language models that try to answer questions as more than dream-sequence autocomplete poetry will confidently regurgitate dangerous nonsense because they're immune to sarcasm.

The comparisons to a human learning from code by reading it are half-right. There are systems that discern relevant information without copying specific examples. They're just utterly terrible at applying that information. Frankly, so are the ones copying specific examples. Once again, we've advanced the state of "AI," and the A went a lot further than the I.

And I cannot get offended on Warner Brothers' behalf if a bunch of their DVDs were sluiced into a model that can draw Superman. I don't even care when people copy their movies wholesale. Extracting the essence of an iconic character from those movies is obviously a transformative use. If some program will emit "slow motion zoom on Superman slapping Elon Musk," just from typing that, that's cool as hell and I refuse to pretend otherwise. It's far more interesting than whatever legal fictions both criminalized 1700s bootlegging and encouraged Walt Disney's corpse to keep drawing.

So consider the inverse:

Someone trains a Copilot clone on a dataset including the leaked Windows source code.

Do you expect these corporations to suddenly claim their thing is being infringed upon, in front of any judge with two working eyes?

More importantly - do you think that stupid robot would be any help what-so-ever to Wine developers? I don't. These networks are good at patterns, not specifics. Good is being generous. If I wanted that illicit network to shamelessly clone Windows code, I expect the brace style would definitely match, the syntax might parse, and the actual program would do approximately dick.

Neural networks feel like magic when hideously complex inputs have sparse approximate outputs. A zillion images could satisfy the request, "draw a cube." Deep networks given a thousand human examples will discern some abstract concept of cube-ness... and also the fact you handed those thousand humans a blue pen. It's simply not a good match for coding. Software development is largely about hideously complex outputs that satisfy sparse inputs in a very specific way. One line, one character, can screw things up in ways that feel incomprehensible. People have sneered about automation taking over coding since the punched-tape era, and there's damn good reasons it keeps taking their jobs instead of ours. We're not doing it on purpose. We're always trying to make our work take less work. We simply do not know how to tell the machine to do what we do with machines. And apparently - neither do the machines.

[–] [email protected] 3 points 3 months ago

Excellent post. Thanks for sharing. I pretty much completely agree.

[–] [email protected] 2 points 3 months ago* (last edited 3 months ago)
[–] [email protected] 10 points 3 months ago

Copyright exists to bully ordinary people. It does not apply to megacorps.

[–] heavy 8 points 3 months ago

When you're a star, they let you do it.