LocalLLaMA

3190 readers

25 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

Has your local thinking model had an 'Aha!' moment similar to the one in Deepeek R1 papers? (lemmy.world)

submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/localllama

4 comments fedilink hide all child comments

Heres a link to the papers, starting around the end of page 8 is revelant paragraph. Thank you hendrik! https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Nice. I'll try it.

Yeah, I also think it's more an expression which somehow emerged, and it's not really a Eureka moment. They also seem to put it that way in the paper. The say it's an Aha for the scientists, and the whole reasoning process is an Aha, but they don't really write the model is having an Aha moment due to some insight it had.

I don't know if you watch Youtube videos, but Computerphile made a video about DeepSeek and an interesting video about Forbidden AI techniques a few days ago. That's also about the reasoning process and how LLMs can be lazy and take unwanted shortcuts, and hide information in the thinking step.

[–] [email protected] 2 points 2 weeks ago* (last edited 2 weeks ago)

they don’t really write the model is having an Aha moment due to some insight it had.

Well, they really can't write it that way because it would imply the model is capable of insight which is a function of higher cognition. That path leads to questioning if machine learning neural networks are capable of any real sparks of sapience or sentience. Thats a 'UGI' conversation most people absolutely don't want to have at this point for various practical, philosophical, and religous/spiritual implications.

So you can't just outright say it, especially not in an academic STEM paper. Science academia has a hard bias against the implication of anything metaphysical or overly abstract at best they will say it 'simulates some cognative aspects of intelligence'.

In my own experience, the model at least says 'ah! aha! Right, right, right, so..` when it thinks it has had an insight of some kind. Whether or not models are truly capable of such thing or is merely some statistical text prediction artifact is a subjective discussion of philosophy kind of like a computer scientist nerds version of the deterministic philosophical zombie arguments.

Thanks for sharing the video! I havent seen computerphile in a while, will take a look especially with that title. Gotta learn about dat forbidden computation :)