this post was submitted on 14 Feb 2024
65 points (94.5% liked)

Technology

59168 readers
2380 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Artificial-intelligence aide handles email, meetings and other things, but its price and limited use have some skeptical

Microsoft’s new artificial-intelligence assistant for its bestselling software has been in the hands of testers for more than six months and their reviews are in: useful, but often doesn’t live up to its price.

The company is hoping for one of its biggest hits in decades with Copilot for Microsoft 365, an AI upgrade that plugs into Word, Outlook and Teams. It uses the same technology as OpenAI’s ChatGPT and can summarize emails, generate text and create documents based on natural language prompts.

Companies involved in testing say their employees have been clamoring to test the tool—at least initially. So far, the shortcomings with software including Excel and PowerPoint and its tendency to make mistakes have given some testers pause about whether, at $30 a head per month, it is worth the price

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 6 points 8 months ago* (last edited 8 months ago) (1 children)

Don't use LLMs in production for accuracy critical implementations without human oversight.

Don't use LLMs in production for accuracy critical implementations without human oversight.

I almost want to repeat that a third time even.

They weirdly ended up being good at information recall in many cases, and as a result have been being used like that in cases where it really doesn't matter much if they are wrong some of the time. But the infrastructure fundamentally cannot self-verify.

This is part of why I roll my eyes when I see employment of LLMs vs humans presented as an exclusionary binary. These are tools to extend and support human labor. Not replace humans in most cases.

So LLMs can be amazing at a wide array of tasks. Like I literally just saved myself a half hour of copying and pasting minor changes in a codebase by having Copilot automate generating methods using a parallel object as a template and the new object's fields. But I also have unit tests to verify behavior and my own review of what was generated with over a decade of experience under my belt.

Someone who has never programmed using Copilot to spit out code for an idea is going to have a bad time. But they'd have a similar bad time if they outsourced a spec sheet to a code farm without having anyone to supervise deliverables.

Oh, and technically, my example doesn't actually require you to know the correct answer before asking. It only requires you to recognize the correct answer when you see it. And the difference between those two usecases is massive.

Edit: In fact, the suggestion to replace the nouns with emojis came from GPT-4. Even though it doesn't have any self-introspection capabilities, I described what I thought was happening and why, and it came up with three suggestions for ways to improve the result. Two I immediately saw were dumb as shit, but the idea to use emojis as representative placeholders while breaking the token pattern was simply brilliant and I'm not sure if I would have thought of that on my own, but as soon as I saw it I knew it would work.

[–] [email protected] 5 points 8 months ago

But that's what the marketers are selling, "this will replace a lot of workers!" and it just cannot