this post was submitted on 28 Jul 2023
456 points (93.5% liked)

Technology

57453 readers
4586 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 133 points 1 year ago (2 children)

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

[–] [email protected] 45 points 1 year ago (1 children)

The background radiation did go up, but saying "there was hardly any radiation anywhere" is wrong. Today's steel (and background radiation) is pretty much back to pre-nuke levels. Low-background steel Background radiation

[–] [email protected] 25 points 1 year ago (2 children)

It is also worth nothing that we can make low or no radiation-contaminated steel, it's just really expensive and hard and happens in very low quantities.

load more comments (2 replies)
[–] [email protected] 6 points 1 year ago (1 children)

Not really. If it's truly impossible to tell the text apart, than it doesn't really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don't train AI on super-recent text anyway.

[–] [email protected] 26 points 1 year ago (10 children)

This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

load more comments (10 replies)
[–] [email protected] 51 points 1 year ago (9 children)

The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don't even know half of how the machines work or what they are doing.

[–] [email protected] 65 points 1 year ago (4 children)

Probably because LLMs threaten to (and has already started to) shittify a truly incredible number of things like journalism, customer service, books, scriptwriting etc all in the name of increased profits for a tiny few.

[–] [email protected] 53 points 1 year ago (5 children)

again, the issue isn't the technology, but the system that forces every technological development into functioning "in the name of increased profits for a tiny few."

that has been an issue for the fifty years prior to LLMs, and will continue to be the main issue after.

removing LLMs or other AI will not fix the issue. why is it constantly framed as if it would?

we should be demanding the system adjust for the productivity increases we've already seen, as well to what we expect in the near future. the system should make every advancement a boon for the general populace, not the obscenely wealthy few.

even the fears of propaganda. the wealthy can already afford to manipulate public discourse beyond the general public's ability to keep up. the bigger issue is in plain sight, but is still being largely ignored for the slant that "AI is the problem."

[–] [email protected] 20 points 1 year ago

Yep, the problem was never LLMs, but billionaires and the rich. The problems have always been the rich for thousands of years, and yet they are immensely successful at deflecting their attacks to other groups for those thousands of years. They will claim it's Chinese immigrants, or blacks, or Mexicans, or gays, or trans people. Now LLMs and AI are the new boogieman.

We should be talking about UBI, not LLMs.

[–] [email protected] 18 points 1 year ago (17 children)

It’s a capitalism problem not an AI or copyright problem.

load more comments (17 replies)
[–] [email protected] 6 points 1 year ago

This isn’t a technological issue, it’s a human one

I totally agree with everything you said, and I know that it will never ever happen. Power is used to get more power. Those in power will never give it up, only seek more. They intentionally frame the narrative to make the more ignorant among us believe that the tech is the issue rather than the people that own the tech.

The only way out of this loop is for the working class to rise up and murder these cunts en masse

Viva la revolucion!

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

Exactly. I work in AI (although not the LLM kind, just applying smaller computer vision models), and my belief is that AI can be a great liberator for humanity if we have the right political and economic apparatus. The question is what that apparatus is. Some will say it's an inherent feature of capitalism, but that's not terribly specific, nor does it explain the relatively high wealth equality that existed briefly during the middle of the 20th century in America. I think some historical context is important here.

Historical Precedent

During the Industrial Revolution, we had an unprecedented growth in average labor productivity due to automation. From a naïve perspective, we might expect increasing labor productivity to result in improved quality of life and less working hours. I.e., the spoils of that productivity being felt by all.

But what we saw instead was the workers lived in squalor and abject poverty, while the mega-rich captured those productivity gains and became stupidly wealthy.

Many people at the time took note of this and sought to answer this question: why, in an era over greater-than-ever labor productivity, is there still so much poverty? Clearly all that extra wealth is going somewhere, and if it's not going to the working class, then it's evidently going to the top.

One economist and philosopher, Henry George, wrote a book exploring this very question, Progress and Poverty. His answer, in short, was rent-seeking:

Rent-seeking is the act of growing one's existing wealth by manipulating the social or political environment without creating new wealth.[1] Rent-seeking activities have negative effects on the rest of society. They result in reduced economic efficiency through misallocation of resources, reduced wealth creation, lost government revenue, heightened income inequality,[2] risk of growing political bribery, and potential national decline.

Rent-seeking takes many forms. To list a few examples:

  • Land speculation
  • Monopolization of finite natural resources (e.g., oil, minerals)
  • Offloading negative externalities (e.g., pollution)
  • Monopolization of intellectual property
  • Regulatory capture
  • Monopolistic or oligopolistic control of entire markets

George's argument, essentially, was that the privatization of the economic rents borne of god-given things — be it land, minerals, or ideas — allowed the rich and powerful to extract all that new wealth and funnel it into their own portfolios. George was not the only one to blame these factors as the primary drivers of sky-high inequality; Nobel-prize winning economist Joseph Stiglitz has stated:

Specifically, I suggest that much of the increase in inequality is associated with the growth in rents — including land and exploitation rents (e.g., arising from monopoly power and political influence).

George's proposed remedies were a series of taxes and reforms to return the economic rents of those god-given things to society at large. These include:

Land value taxes are generally favored by economists as they do not cause economic inefficiency, and reduce inequality.[2] A land value tax is a progressive tax, in that the tax burden falls on land owners, because land ownership is correlated with wealth and income.[3][4] The land value tax has been referred to as "the perfect tax" and the economic efficiency of a land value tax has been accepted since the eighteenth century.

A Pigouvian tax (also spelled Pigovian tax) is a tax on any market activity that generates negative externalities (i.e., external costs incurred by the producer that are not included in the market price). The tax is normally set by the government to correct an undesirable or inefficient market outcome (a market failure) and does so by being set equal to the external marginal cost of the negative externalities. In the presence of negative externalities, social cost includes private cost and external cost caused by negative externalities. This means the social cost of a market activity is not covered by the private cost of the activity. In such a case, the market outcome is not efficient and may lead to over-consumption of the product.[1] Often-cited examples of negative externalities are environmental pollution and increased public healthcare costs associated with tobacco and sugary drink consumption.[2]

Severance taxes are taxes imposed on the removal of natural resources within a taxing jurisdiction. Severance taxes are most commonly imposed in oil producing states within the United States. Resources that typically incur severance taxes when extracted include oil, natural gas, coal, uranium, and timber. Some jurisdictions use other terms like gross production tax.

such as in the Norwegian model:

The key to Norway’s success in oil exploitation has been the special regime of ownership rights which apply to extraction: the severance tax takes most of those rents, meaning that the people of Norway are the primary beneficiaries of the country’s petroleum wealth. Instead of privatizing the resource rents provided by access to oil, companies make their returns off of the extraction and transportation of the oil, incentivizing them to develop the most efficient technologies and processes rather than simply collecting the resource rents. Exploration and development is subsidized by the Norwegian government in order to maximize the amount of resource rents that can be taxed by the state, while also promoting a highly competitive environment free of the corruption and stagnation that afflicts state-controlled oil companies.

  • Intellectual property reform, e.g., abolishing patents and instead subsidizing open R&D, similar to a Pigouvian anti-tax (research has positive externalities) or Norway's subsidization of oil exploration
  • Implementation of a citizen's dividend or universal basic income, e.g., the Alaska permanent fund or carbon tax-and-dividend:

Citizen's dividend is a proposed policy based upon the Georgist principle that the natural world is the common property of all people. It is proposed that all citizens receive regular payments (dividends) from revenue raised by leasing or taxing the monopoly of valuable land and other natural resources.

...

This concept is a form of universal basic income (UBI), where the citizen's dividend depends upon the value of natural resources or what could be titled as common goods like location values, seignorage, the electromagnetic spectrum, the industrial use of air (CO2 production), etc.[4]

In 1977, Joseph Stiglitz showed that under certain conditions, beneficial investments in public goods will increase aggregate land rents by at least as much as the investments' cost.[1] This proposition was dubbed the "Henry George theorem", as it characterizes a situation where Henry George's 'single tax' on land values, is not only efficient, it is also the only tax necessary to finance public expenditures.[2] Henry George had famously advocated for the replacement of all other taxes with a land value tax, arguing that as the location value of land was improved by public works, its economic rent was the most logical source of public revenue.[3]

...

Subsequent studies generalized the principle and found that the theorem holds even after relaxing assumptions.[4] Studies indicate that even existing land prices, which are depressed due to the existing burden of taxation on labor and investment, are great enough to replace taxes at all levels of government.[5][6][7]

(continued)

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (4 children)

Present Day

Okay, so that's enough about the past. What about now?

Well, monopolization of land and housing via the housing crisis has done tremendous harm:

In 2015, two talented professors, Enrico Moretti at Berkeley and Chang-Tai Hsieh at Chicago Booth, decided to estimate the effect of shortage of housing on US productivity. They concluded that lack of housing had impaired US GDP by between 9.5 per cent and 13.5 per cent.

In a follow-up paper, based on surveying 220 metropolitan areas, they revised the figure upwards – claiming that housing constraints lowered aggregate US growth by more than 50 per cent between 1964 and 2009. In other words, they estimate that the US economy would have been 74 per cent larger in 2009, if enough housing had been built in the right places.

How does that damage happen? It’s simple. The parts of the country with the highest productivity, like New York and San Francisco, also had stringent restrictions on building more homes. That limited the number of homes and workers who could move to the best job opportunities; it limited their output and the growth of the companies who would have employed them. Plus, the same restrictions meant that it was more expensive to run an office or open a factory, because the land and buildings cost more.

And that is just one form of rent-seeking. Imagine the collective toll of externalities (e.g., the climate crisis), monopolistic/oligopolistic markets such as energy and communications, monopolization of valuable intellectual property, etc.

So I would tend to say that — unless we change our policies to eliminate the housing crisis, properly price in externalities, eliminate monopolies, encourage the growth of free and open IP (e.g., free and open-source software, open research, etc.), and provide critical public goods/services such as healthcare and education and public transit — we are on a trajectory for AI to be Gilded Age 2: Electric Boogaloo. AI merely represents yet another source of productivity growth, and its economic spoils will continue to be captured by the already-wealthy.

I say this as someone who works as an AI and machine learning research engineer: AI alone will not fix our problems; it must be paired with major policy reform so that the economic spoils of progress are felt by all, not just the rich.

Joseph Stiglitz, in the same essay I referred to earlier, has this to say:

My analysis of market models suggests that there is no inherent reason that there should be the high level of inequality that is observed in the United States and many other advanced countries. It is not a necessary feature of the market economy. It is politics in the 21st century, not capitalism, which is at fault. Market and political forces have, of course, always been interwined. Especially in America, where our politics is so money-driven, economic inequalities translate into political inequality.

There is nevertheless considerable hope. For if the growth of inequality was largely the result of inexorable economic laws, public policy could do little more than lean against the wind. But if the growth of inequality is the result of public policy, a change in those policies could lead to an economy with less inequality, and even stronger growth.

load more comments (4 replies)
load more comments (1 replies)
load more comments (3 replies)
[–] [email protected] 4 points 1 year ago

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

Yah I think it's fairly obvious that people are both fascinated and scared by the tech and also acknowledge that under a different economic structure, it would be extremely beneficial for everyone and not just for the very few. I think it's more annoying that people like you assume that everyone is some sort of diet Luddite when they're just trying to see how the tool has the potential to disrupt many, many jobs and probably not in a good way. And don't give me this tired comparison about the industrial revolution because it's a complete false equivalence.

load more comments (7 replies)
[–] [email protected] 29 points 1 year ago (1 children)

We built a machine to mimic human writing. There's going to a point where there is no difference. We might already be there.

[–] [email protected] 12 points 1 year ago (1 children)

The machine used to mimic human text uses human text. If it can't find the difference in it's text and human text, it will begin using AI text to mimic human text. This will eventually lead to errors, repetitions, and/or less human like text.

load more comments (1 replies)
[–] [email protected] 25 points 1 year ago* (last edited 1 year ago) (7 children)

Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

We're less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can't tell. Good luck with the FB aunt's.

GANs final goal is to develop content that is indistinguishable... Are we surprised?

Edit since the person below me made a great point. GANs may be limited but there's nothing that says you can't setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.

[–] [email protected] 22 points 1 year ago (2 children)

For laymen who might not know how GANs work:

Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say "fake or not".

Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It's just guessing at that point.

There literally cannot be a better AI than the twin discriminator at detecting that generator's work. So anyone trying to make tools to detect chatGPT's writing is going to have a very hard time of it.

[–] [email protected] 6 points 1 year ago

Fantastically put!

load more comments (1 replies)
load more comments (6 replies)
[–] [email protected] 22 points 1 year ago

On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don't align at a fundamental level

[–] [email protected] 12 points 1 year ago (3 children)

So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as "created by AI"? OK.

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago)

i laughed pretty hard when south park did their chatgpt episode. they captured the school response accurately with the shaman doing whatever he wanted, in order to find content "created by AI."

load more comments (2 replies)
[–] [email protected] 10 points 1 year ago

I mean, the entire goal of the technology was to create human-like text.

[–] [email protected] 8 points 1 year ago

Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.

[–] [email protected] 8 points 1 year ago

This just illustrates the major limitation of ML: Access to reliable training data. A machine that has no concept of internal reasoning can never be truly trusted to solve novel problems, and novel problems, from minor issues to very complex ones, are solved in a bunch of professions every day. That's what drives our world forward. If we rely too heavily on AI to solve problems for us, the issue of obtaining reliable training data to train future AI's will only expand. That's why I currently don't think AI's will replace large swaths of the work force, but to a larger degree be used as a tool by the humans in the workforce.

[–] [email protected] 6 points 1 year ago (2 children)

i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit

[–] [email protected] 13 points 1 year ago (1 children)

Making google better is not google's goal. Growth is their goal.

[–] [email protected] 7 points 1 year ago

I'm honestly under the impression Google Search is one of their less valuable products, even if it's the one everyone associates the company's name with.

[–] [email protected] 4 points 1 year ago (1 children)

Why buy it when you can get the same data for free?

[–] [email protected] 4 points 1 year ago

Why buy data for accuracy when you don't care and support your company with seo spam?

[–] [email protected] 5 points 1 year ago

I wonder if AI generated texts (or speech) will impact our language. Kinda interesting thing to think about.

[–] [email protected] 5 points 1 year ago (1 children)

OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.

[–] [email protected] 6 points 1 year ago (8 children)

Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.

load more comments (8 replies)
[–] [email protected] 4 points 1 year ago* (last edited 1 year ago)

FWIW It's not clear cut if AI generated data feeding back into further training reduces accuracy, or is generally harmful.

Multiple papers have shown that generated images by high quality diffusion models with a proportion of real images in mix (30-50%) improve the adversarial robustness of the models. Similiar things might apply to language modeling.

load more comments
view more: next ›