this post was submitted on 27 Jul 2024
193 points (99.5% liked)

Technology

34780 readers
223 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 27 points 3 months ago* (last edited 3 months ago) (2 children)

The "problem" is that the more you understand the engineering, the less you believe Intel when they say they can fix it in microcode. Without writing an entire essay, the TL/DR is that the instability gets worse over time, and the only way that happens is if applied voltages are breaking down dielectric barriers within the chip. This damage is irreparable, 100% of chips in the wild are irreparably damaging themselves over time.

Even if Intel can slow the bleeding with microcode, they can't repair the damage, and every chip that has ever ran under the bad code will have a measurably shorter lifespan. For the average gamer, that sometimes hasn't even been the average warranty period.

[–] csm10495 7 points 3 months ago

+1. Lots of people are also likely to not have any idea about the situation and just think their PC crashes or acts up more. More of these issues can pop up over time.

A recall forces them to notify customers of the issue so the customer can act on it.

[–] [email protected] 1 points 3 months ago (1 children)

They can most likely prevent further breakdown through software. If the meters and controls are functioning correctly, they can undervolt the CPU. But it's not really a fix if that comes with a performance penalty. If it's a bug where the CPU maxes out the voltage when idle so it can do nothing faster, that could be fixed with no performance penalty, but that seems unlikely.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

I'm sorry but this is just a fundamentally incorrect take on the physics at play here.

You unfortunately can't ever prevent further breakdown. Every time you run any voltage through any CPU, you are always slowly breaking down gate-oxides. This is a normal, non-thermal failure mode of consumer CPUs. The issue is that this breakdown is non-linear. As the breakdown process increases, it increases resistance inside the die, and as a consequence requires higher minimum voltages to remain stable. That higher voltage accelerates the rate of idle damage, making time disproportionately more damaging the more damaged a chip is.

If you want to read more on these failure modes, I'd recommend the following papers:

L. Shi et al., "Effects of Oxide Electric Field Stress on the Gate Oxide Reliability of Commercial SiC Power MOSFETs," 2022 IEEE 9th Workshop on Wide Bandgap Power Devices & Applications

Y. Qian et al., "Modeling of Hot Carrier Injection on Gate-Induced Drain Leakage in PDSOI nMOSFET," 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications