17
submitted 7 months ago by [email protected] to c/[email protected]

To better understand how neural networks function, researchers trained a toy 512-node neural network on a text dataset and then tried to identify features within the network that are semantically meaningful. The key observation is that while individual neurons are difficult to attribute specific functionality to, you can find groups of neurons that collectively do seem to fire in response to human-legible features and concepts. By some metric, the 4096-feature decomposition of the 512-node toy model explains 79% of the information within it. The researchers used an AI nicknamed Claude to automatically annotate all the features by guessing how a human would describe them, like for example feature #3647 "Abstract adjectives/verbs in credit/debt legal text", or the "sus" feature #3545. Browse through the visualization and see for yourself!

The researchers called the ability of neural networks to encode more information than they have neurons for as "superposition", and single neurons being responsible for multiple, sometimes seemingly unrelated, concepts as being "polysemantic".

Full paper: https://transformer-circuits.pub/2023/monosemantic-features/index.html
also discussed at: https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand
and hackernews: https://news.ycombinator.com/item?id=38438261

you are viewing a single comment's thread
view the rest of the comments
[-] Varyk 2 points 7 months ago

I like the "god help us" article and although I wish the first example had the representative colors the article described, the entire article helps makes sense of the monoemanticity and intuitively sounds similar to how human intelligence works and what scientists used to talk about when they questioned human consciousness.

this post was submitted on 28 Nov 2023
17 points (100.0% liked)

Science

2901 readers
5 users here now

General discussions about "science" itself

Be sure to also check out these other Fediverse science communities:

https://lemmy.ml/c/science

https://beehaw.org/c/science

founded 2 years ago
MODERATORS