this post was submitted on 29 Nov 2023
1 points (66.7% liked)

Hardware

48 readers
5 users here now

A place for quality hardware news, reviews, and intelligent discussion.

founded 1 year ago
MODERATORS
 

Assuming the training software could be run on the hardware and that we could distribute the load as if it was 2023, would it be possible to train a modern LLM on hardware from 1985?

top 14 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 11 months ago (1 children)

It’d be easier to build a suit in a cave, with a box of scraps.

[–] [email protected] 1 points 11 months ago

They didn't ask if it would be easy.

[–] DannyBoy 1 points 11 months ago

The fastest computer in 1985 was the CRAY-2 supercomputer at 1.9 gigaflops. ChatGPT 3 can be trained on 1024 A100 GPUs in 34 days*. An A100 outputs 312 teraflops. So no, I don’t think it can be done in 1985 if given the entire year. There’s also storage for incoming digital texts for training - the input data didn’t exist back then, not to the capacity. I don’t think it could be done in a reasonable time.

[–] [email protected] 1 points 11 months ago (2 children)

It wouldn’t. Training the neural nets for LLMs are all about brute force, and it’s only been possible in the last few years to train these models without spending billions. Even going back to 2010 I think it’d be largely infeasible.

The good news is if we fast forward even just a few years, training will become relatively cheap compared to today.

[–] [email protected] 1 points 11 months ago (2 children)

So my collection of about 20 Commodore 64 isn't enough? Do I need another 20? /s

[–] [email protected] 1 points 11 months ago (1 children)

I think you'd need roughly a billion of them. Minus the 20 you have, of course.

[–] [email protected] 1 points 11 months ago

Estimate of up to 30 million C64 were produced so I guess not enough if I managed to get every single one in the world

[–] [email protected] 1 points 11 months ago

You would need at least 44 more

[–] [email protected] 1 points 11 months ago

They didn't ask if it it could be done without spending billions, or whether it would be feasible, i.e., practical, just whether it would be possible.

[–] [email protected] 1 points 11 months ago

cant even adress so much memory on 16 bit systems :)

[–] [email protected] 1 points 11 months ago

The growth rate in computing has been exponential. Using the fastest available computer from 1985 continuously for 38 years and still going, you would be passed by a quad GPU-based server in hours.

[–] [email protected] 1 points 11 months ago (1 children)

Do you mean one computer from 1985? No. There is no computer from that year that had enough RAM. If you mean all the computers from 1985, working together, then yes. You only need sufficient RAM, a Turing-complete machine, and probably some centuries to do it.

[–] [email protected] 1 points 11 months ago

Yeah - you got it right. I’m looking to see if it was hyoothetically feasible, not whether it was practical. I know it wouldn’t be practical

[–] [email protected] 1 points 11 months ago

No, you are limited by:

Compute Performance, you will need 10,000%+ more compute than was available per chip, and those PCIe accelerators don't have the ability to compute the way they do now. You are going to have to rely on CPUs which is worse

Lack of scalabality of interconnecting chips to behave as one, increasing IO requirements dramatically.

Lack of memory pooling (yes you qualified it), memory bandwidth and memory sizes (we are talking in megabytes), imagine waiting for 1 billion parameter model calculations to load and store in each layer of a neural network using floppy disks.