this post was submitted on 22 Nov 2023
0 points (50.0% liked)

Hardware

33 readers
1 users here now

A place for quality hardware news, reviews, and intelligent discussion.

founded 11 months ago
MODERATORS
 

This is just a nitpicking question. Do Intel chips still have some space/transistors dedicated to SSE3? If they do, why can't they implement SSE3 by other, more powerful instrutions (like AVX) to save die space?

top 21 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 10 months ago

It’s not the die space that’s the issue; it’s the time to validate the correct operation of those instructions with a pipeline that’s designed for something very different.

[–] [email protected] 1 points 10 months ago

Not really in most cases. The decoder might need to spend some more transistors to accommodate the instructions but that should not be much. And the very oldest never used ones can be thrown to some very slow microcode rom or something. In the execution side SSE uses the same registers as the latest AVX does. And the low level compute operations actually done by the execution units are the same. You need to understand that each instruction is actually translated to one or more micro operation by the decoder, they are not direct execution control data.

However there are some old no longer used features in x86 CPUs that do complicate the design somewhat. And there are instructions connected to those features. But that's really not the instructions themselves using the die area. Intel's x86s standard proposes to remove for example the middle privilege level rings and call gates from the CPUs. As well as some no longer relevant memory access modes.

[–] [email protected] 1 points 10 months ago

why can't they implement SSE3 by other, more powerful instrutions (like AVX)

In short, the instruction semantics are slightly different, so they don't do exactly the same thing. But it's likely that the execution unit hardware is re-used for those.

[–] [email protected] 0 points 10 months ago (1 children)

Modern x86 chips are so large that the space the decoder takes is relatively small.

It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

[–] [email protected] 1 points 10 months ago (2 children)

The way x86 instructions are variable length and not self-synchronizing means that you can see up to 15% of your core's power budget go to decode if you aren't running in the small cache of decoded instructions, at least a few generations ago when last I heard. That isn't huge but it does mean that x86 architects have to put thought into how wide to make it, they can't just size it to make sure it's never a bottleneck like ARM designers can.

[–] [email protected] 1 points 10 months ago

Mate, the 90s were a few decades back. ;-)

x86 decoding hasn't been a limiter since then.

[–] [email protected] 0 points 10 months ago

Modern x86 chips are so large that the space the decoder takes is relatively small.

It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

[–] [email protected] 0 points 10 months ago (3 children)

The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn't need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.

[–] [email protected] 1 points 10 months ago (2 children)

This is incorrect. Very few x86 instructions uses microcode as the microcode engine is quite slow. It's mainly used for things like cpuid and such.

[–] [email protected] 1 points 10 months ago

Microcode is used very heavily in modern CPUs. It has been since the 90s.

[–] [email protected] 1 points 10 months ago

A lot of x86 ISA is in the micro and PAL codes. Only the most frequent and performance-limiting ones are on-core for modern x86.

x86 is a huge set, so "very few" is a relative term ;-)

[–] [email protected] 1 points 10 months ago

Are there performance losses or gains through this translation?

[–] [email protected] 1 points 10 months ago (1 children)

You are confusing microcode and micro-ops.

[–] [email protected] 1 points 10 months ago (1 children)

what is microcode is, then?

[–] [email protected] 1 points 10 months ago

It's a way of creating a sequential control circuit based on a piece of memory holding the outputs and next state for each state.

[–] [email protected] 0 points 10 months ago

The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn't need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.

[–] [email protected] 0 points 10 months ago

No, no CPU has seperate FPUs for SSE & AVX - it's compiled to the same set of uOps by microcode.

Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.

[–] [email protected] 0 points 10 months ago (1 children)

No, no CPU has seperate FPUs for SSE & AVX - it's compiled to the same set of uOps by microcode.

Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.

[–] [email protected] 1 points 10 months ago (1 children)

uOps by microcode

That's not how it works, only a few overtly complex instructions are implemented in microcode and they are slow, most instructions use a random logic decoder.

[–] [email protected] 1 points 10 months ago

in x86 that's not the case, only the critical path x86 instructions are implemented directly in logic lookup tables in the decoder. Some of the less used ones are on the uCode ROM on chip. And a bunch more on PAL code on off-chip ROM. And a few of the rarest ones are on the exception manager libraries of the OS.

A big chunk of the x86 ISA is rarely used so this tiered implementation has been used at least since Nehalem if not before.