overview for theQuandary

'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate in c/[email protected]

[–] [email protected] 1 points 11 months ago

First-up, here's a Veritasium breakdown of why a lot of next-gen AI is leaning into analog computing to save space and power while increasing total numbers of computations per second.

https://www.youtube.com/watch?v=GVsUOuSjvcg

The unreliability of analog makes it unsuited for the deterministic algorithms we normally run on computers, but doesn't have large negative effects on AI algorithms because of their low fidelity nature (and for some algorithms, getting some free entropy is actually a feature rather than a bug).

Here's an Asianometry breakdown of silicon photonics

https://www.youtube.com/watch?v=t0yj4hBDUsc

Silicon Photonics is the use of light between transistors. It's been in research for decades and is already seeing limited use in some networking applications. IBM in particular has been researching this for a very long time in hopes of solving some chip communication issues, but there are a lot of technical issues to solve to put billions of these things in a CPU.

AI changed the equation because it allows analog compute. A multiply generally takes 4-5 cycles with each cycle doing a bunch of shift then add operations in series. With silicon photonics, this is as simple as turning on two emitters, merging the light, then recording the output. If you want to multiply 10 numbers together, you can do it in ONE cycle instead of 40-50 on a normal chip (not including all the setup instructions likely needed by that normal multiplier circuit).

Here's a quick IBM explainer on in-memory compute.

https://www.youtube.com/watch?v=BTnr8z-ePR4

Basically, it takes several times more energy to move two numbers into a CPU than it does to add them together. Ohm's law allows us to do analog multiplication by connecting various resistors and measuring the output.

You can use this to do calculations and the beauty is that your data hardly has to travel at all and you were already having to use energy to refresh it frequently anyway. The total clockspeed is far lower due to physics limitations of capacitors, but if you can be calculating every single cell of a multi-terabyte matrix at the same time, that really doesn't matter as your total compute power will be massively faster in aggregate AND use several times less power.

Of course, all these analog alternatives have absolutely nothing in common with modern GPUs, but simple operations are massively more power efficient with in-memory compute and complex operations are massively more power efficient with silicon photonics.

'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate in c/[email protected]

[–] [email protected] 1 points 11 months ago (1 children)

I heard this same stuff in the 90s about GPUs. "GPUs are too specialized and don't have the flexibility of CPUs".

Startups failing doesn't prove anything. There are dozens of startups and there will only be 2-4 winners. Of course MOST are going to fail. Moving in too early before things have settled down or too late after your competitors are too established are both guaranteed ways to fail.

In any case, algorithms and languages have a symbiotic relationship with hardware.

C is considered fast, but did you know that it SUCKS for old CISC ISAs? They are too irregular and make a lot of assumptions that don't mesh well with the compute model of C. C pls x86 is where things changed. x86 could be adapted to run C code well. C compilers then adapted to be fast on x86 then x86 adapted to run that compiled C code better then the loop goes round and round.

This is true for GPUs too. Apple's M1/M2 GPU design isn't fundamentally bad, but it is different from AMD/Nvidia, so the programmer's hardware assumptions and normal optimizations aren't effective. Same applies to some extent for Intel Xe where they've been spending huge amounts to "optimize" various games (most likely literally writing new code to replace the original game code with versions optimized for their ISA).

The same will happen to AI.

Imagine that one of those startups gets compute-in-SSD working. Now you can do compute on models that would require terabytes of RAM on a GPU. You could do massive amounts of TOPS on massive working sets using just a few watts of power on a device costing just a few hundred dollars. This is in stark contrast to a GPU costing tens of thousands of dollars and costing a fortune in power to run that can't even work on a model that big because the memory hierarchy is too slow.

Such a technology would warp the algorithms around it. You'll simply be told to "make it work" and creative people will find a way to harness that compute power -- especially as it is already naturally tuned to AI needs. Once that loop gets started in earnest, the cost of switching algorithms and running them on a GPU will be far too high. Over time it will be not just cost, but also ecosystem lockin.

I'm not saying that compute-in-memory will be the winner, but I'm quite certain that GPU is not because literally ALL of the prominent algorithms get faster and lower power with their specific ASIC accelerators.

Even if we accept the worst-case scenario and 2-4 approaches rise to the top and each requires a separate ASIC, the situation STILL favors the ASIC approach. We can support dozens of ISAs for dozens of purposes. We can certainly support 2-4 ISAs with 1-3 competitors for each.

'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate in c/[email protected]

[–] [email protected] 1 points 11 months ago

It's also a technology issue. We have companies working on doing compute in memory which should offer such massive power and cost savings that companies might warp their algorithms around it to save all that money. Same goes for silicon photonics.

It's way too early to be certain about anything, so companies are going with the least risky option and idiots are pouring billions into adding "AI" in places where it doesn't belong just like they did with "big data" a few years ago (we're currently doing that at my company).

'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate in c/[email protected]

[–] [email protected] 1 points 11 months ago (5 children)

He's only right in the short term when the technology isn't stable and the AI software architectures are constantly changing.

Once things stabilize, we're most likely switching to either analog compute in memory or silicon photonics both of which will be far less generic than a GPU, but with such a massive power, performance, and cost advantage that GPUs simply cannot compete.

What can a cpu core do that a gpu core can't do? in c/[email protected]

[–] [email protected] 1 points 11 months ago

Security is an interesting reason that most people don't think about.

When you run a program on your computer, you're constantly swapping between user and privileged modes. For example, you don't want a website reading the stuff from your harddrive. Any such attempts must go to the OS which will then say the website doesn't have permission and refuse.

GPUs don't have a privileged mode. This isn't just because it wouldn't be useful. To the contrary, webGL or WebGPU have massive security issues because you are executing third-party code using drivers which themselves generally have root access. GPUs don't add such a mode because their hardware takes forever (in CPU terms) to get all the data loaded up and ready to execute. Moving to a different context every few microseconds would result in the GPU spending most of its time just waiting around for stuff to load up.

The solution to this problem would be decreasing latency, but all the stuff that does that for you is the same stuff that makes CPU cores hundreds of times larger than GPU cores. At that point, your GPU would just turn into an inferior CPU and an inferior GPU too.

Single Threaded Performance vs Multi Threaded Performance; Which is more important in 2023? in c/[email protected]

[–] [email protected] 1 points 11 months ago (1 children)

Single-core is more important for 99% of normal consumers. Most office productivity apps or web browsers are only lightly threaded.

Also, scaling cores efficiently is hard both in software and in hardware. 10 cores at 1x performance are going to be a lot more efficiently used than 20 cores at 0.5x performance.

Russia pivots to Chinese CPUs that aren't subject to US sanctions — Russia's homegrown Linux-based Alt OS now supports Chinese LoongArch chips in c/[email protected]

[–] [email protected] 1 points 11 months ago

China already has a bunch of RISC-V chips in the works, but that takes time. Meanwhile, their MIPS chips have been around for 15+ years at this point and have high availability.

Chinese Company Developing 64-core RISC-V Chip with Tech from U.S. in c/[email protected]

[–] [email protected] 1 points 11 months ago (2 children)

When Xi publicly told his military to be ready to invade Taiwan by 2027 earlier this year (meaning there was probably a non-public memo demanding readiness by 2025), he made things political.

Taiwan doesn't have many natural resources and their government since the 70s has deliberately poured money and legislation into their chip industry to make it so valuable that other countries would be forced to come to their aid if they are attacked.

The idea that hardware and politics are strictly separate has always been a fantasy.

Why can't Microsoft make a Rosetta2-like emulator for Windows on ARM? in c/[email protected]

[–] [email protected] 1 points 1 year ago (4 children)

Hardware.

M-series chips bake in hardware support for x86 memory model, expensive flag calculations, and probably a few other things. Doing these without hardware is a lot harder and won't ever be as performant.

Qualcomm brings receipts: Snapdragon X Elite gets benchmarked, completely dunks on Apple’s M2 processor in c/[email protected]

[–] [email protected] 1 points 1 year ago (2 children)

Apple scheduled a very unusual night meeting at 8pm EST (5pm PST) this evening. They didn't give tons of notice and I believe they didn't announce until after the Qualcomm event either.

I think this unveiling has lit a fire under them and they are worried.

Good times for consumers.

Intel Publishes "X86-S" Specification For 64-bit Only Architecture in c/[email protected]

[–] [email protected] 0 points 1 year ago (1 children)

The only selling point of x86 is backward compatibility. Remove that and you might as well move to a newer, better ISA.

Intel Publishes "X86-S" Specification For 64-bit Only Architecture in c/[email protected]

[–] [email protected] 0 points 1 year ago (3 children)

I wonder why they'd bother if it doesn't actually make a performance difference...