overview for 31337

‘It’s torture’: brutal heat broils Texas prisons, killing dozens of inmates in c/[email protected]

[–] 31337 4 points 6 months ago (1 children)

I don't know anything about them, but found these that you can donate and volunteer for: https://www.tpcadvocates.org/ https://tifa.org/ https://www.texasjailproject.org/

Donating to and canvassing for politicians that would change this could also help.

Supporting good journalism can also help. I really like the Texas Observer.

Get Ready Now: Republicans Will Refuse to Certify a Harris Win | Trumpist county election officials are preparing to throw the process into chaos. in c/[email protected]

[–] 31337 1 points 6 months ago* (last edited 6 months ago)

There's prediction markets and bookies making odds as well. People putting money on the line are probably a little more accurate than polls by themselves. Looks like people think the odds currently favor Harris, but not by a large margin. 50.9% chance for Harris and 47.1% for Trump (https://www.forbes.com/sites/siladityaray/2024/08/09/harris-has-vaulted-past-trump-as-the-bookies-favorite-to-win-presidential-election/). IIRC, prediction markets significantly favored Clinton in 2016 right before the results came back though.

Training "AI" On Public Data Is Totally Fine And Not Stealing. in c/[email protected]

[–] 31337 -3 points 6 months ago

Yeah, I think that's the current precedent in the US.

Training "AI" On Public Data Is Totally Fine And Not Stealing. in c/[email protected]

[–] 31337 -1 points 6 months ago (1 children)

Incidentally, I read this a while ago, because I was training a classifier on mostly Creative Commons licensed works: https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

... we believe there are strong arguments that, in most cases, using copyrighted works to train generative AI models would be fair use in the United States, and such training can be protected by the text and data mining exception in the EU. However, whether these limitations apply may depend on the particular use case.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 2 points 6 months ago

The EFF link I posted above provides evidence. Again, here's a quote from part of it:

The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.

As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I'm proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 2 points 6 months ago (2 children)

That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 2 points 6 months ago (4 children)

I'm saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 1 points 6 months ago* (last edited 6 months ago) (6 children)

I stated that they can do this, and asked if they could be sued if they used near-verbatim code generated from an LLM, just like they could be sued if they copy-pasted AGPL code.

Edit: Tools like CoPilot tell you if your code is similar to publicly available code so you can avoid these issues.

Edit: Just looked up EFF's position and I tend to agree with it:

Artificial Intelligence and Copyright Law

Artists are understandably concerned about the possibility that automatic image generators will undercut the market for their work. However, much of what is criticized is already considered fair use under copyright law, even if done at scale. Efforts to change copyright law to transform certain fair uses into infringement carry serious implications, are likely to interfere with the innovative potential of AI tools, and ultimately do not benefit artists. In fact, the use of these tools could expand the capacity of artists to create expressive works. Policymakers should emphasize the importance of human labor and investment in what receives copyright protection to maintain wages and dignity. Artists should be protected from efforts by large corporations to both substitute their labor with AI tools and create a new, unnecessary copyright regime around AI-generated art.

Machine Learning is a Fair Use

The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.

The analysis work underlying the creation and use of training sets is like the process to create search engines. Where the search engine process is fair use, it is very likely that processes for machine learning are too. While the act of analysis may potentially implicate copyright, when that act is a necessary step to enabling a non-infringing use, it regularly qualifies as fair use. If the intermediate step were not permitted, fair use would be ineffective. As such, when factual elements of copyrighted works are studied and processed to create training sets—which, once again, is how we humans learn and are inspired by themes and styles in art and other works—that is likely to be found a fair use.

https://www.eff.org/document/eff-two-pager-ai

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 2 points 6 months ago (8 children)

After all, if an “AI” model, open source or not, is allowed to just “train” on my AGPL code and spit it back (with minor modifications at best) to an engineer in AWS that’s it for my project. Amazon will do the Amazon thing and steal the project. So say goodbye to any software freedom we have.

An engineer at AWS can already just copy your code, make minor modifications, and use it. I would think the same legal recourse would apply if it was outputted from an LLM or just a copy-paste? This seems like a tangential issue to whether the LLM was trained on your code or not (not training on your code obviously reduces the probability of the LLM spitting it back out near-verbatim though). Personally, I don't see anything wrong with anyone using public code to build statistical models. And I think the pay-to-scrape models that Reddit, Xitter, and others are employing will help big tech build the "moat" they're looking for. Big tech is asking for AI regulation for similar reasons.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI in c/[email protected]

[–] 31337 3 points 6 months ago (1 children)

Information wants to be free.

Rioting for the Right to Rape Palestinians in c/[email protected]

[–] 31337 40 points 7 months ago (3 children)

Maybe being biased against rape and torture is a good thing ¯\(ツ)/¯. Many newspapers used neutral and "objective" language in the 1800s when covering the lynching of black people, and it's hypothesized that this helped normalize the practice. There are many valid criticisms against "journalistic objectivity."

Also, be mindful that ChatGPT is intentionally biased through training data selection, RLHF, and many guardrails.

Simple, really in c/[email protected]

[–] 31337 2 points 7 months ago

Is what humans have always done. Capitalism has so many contradictions, we have entire legal and regulatory systems and social programs in place to make it viable, and governments still have to bail it out with our tax dollars every 10 years or so.