2
'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate
(www.tomshardware.com)
A place for quality hardware news, reviews, and intelligent discussion.
Specialized hardware can make sense for inference of known networks, or actually a bit more broadly, network structures. But for training and research, the structure of models still seems to be in too much flux for specialization much beyond the level of a modern GPU to make sense. For now at least.
The (research) ecosystem would have to settle down a it before you can get a really substantial improvement out o specialization, and be confident that your architecture can still run the latest and greatest in ~2 years after you designed it.
this has been my take, it's an obvious case of the 80-20 rule. During the times of breakthrough/flux, NVIDIA benefits from having both the research community onboard as well as a full set of functionality and great tooling etc. when things slow back down you'll see google come out with a new TPU and amazon will have a new graviton etc.
it's not that hard in principle to staple an accelerator to an ARM core, actually that's kind of a major marketing point for ARM. And nowadays you'd want an interconnect too. There are a decently large number of companies who can sustain such a thing at reasonably market-competitive prices. So once the market settles, the margins will decline.
On the other hand, if you are building large, training-focused accelerators etc... it is also going to be a case of convergent evolution. In the abstract, we are talking about massively parallel accelerator units with some large memory subsystem to keep them fed, and some type of local command processor to handle the low-level scheduling and latency-hiding. Which, gosh, sounds like a GPGPU.
If you are giving it any degree of general programmability then it just starts to look very much like a GPU. If you aren't, then you risk falling off the innovation curve the next time someone has a clever idea, just like previous generations of "ASICs". And you are doing your tooling and infrastructure and debugging all from scratch too, with much less support and resources. GPGPU is turnkey at this stage, do you want your engineers building CUDA or do you want them building your product?
It's also a technology issue. We have companies working on doing compute in memory which should offer such massive power and cost savings that companies might warp their algorithms around it to save all that money. Same goes for silicon photonics.
It's way too early to be certain about anything, so companies are going with the least risky option and idiots are pouring billions into adding "AI" in places where it doesn't belong just like they did with "big data" a few years ago (we're currently doing that at my company).