this post was submitted on 27 Jul 2024
193 points (99.5% liked)

Technology

35104 readers
204 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (3 children)

HANG ON BEFORE YOU HIT THE DOWNVOTE BUTTON!

They don't need a recall. If your processor ain't broke yet then the patch will (supposedly) prevent it from breaking and if it's ALREADY broke then Intel will (supposedly) replace it via RMA.

So what's the big fuggin' problem here? That Intel won't use the term "recall"?

[–] [email protected] 27 points 4 months ago* (last edited 4 months ago) (2 children)

The "problem" is that the more you understand the engineering, the less you believe Intel when they say they can fix it in microcode. Without writing an entire essay, the TL/DR is that the instability gets worse over time, and the only way that happens is if applied voltages are breaking down dielectric barriers within the chip. This damage is irreparable, 100% of chips in the wild are irreparably damaging themselves over time.

Even if Intel can slow the bleeding with microcode, they can't repair the damage, and every chip that has ever ran under the bad code will have a measurably shorter lifespan. For the average gamer, that sometimes hasn't even been the average warranty period.

[–] csm10495 7 points 4 months ago

+1. Lots of people are also likely to not have any idea about the situation and just think their PC crashes or acts up more. More of these issues can pop up over time.

A recall forces them to notify customers of the issue so the customer can act on it.

[–] [email protected] 1 points 4 months ago (1 children)

They can most likely prevent further breakdown through software. If the meters and controls are functioning correctly, they can undervolt the CPU. But it's not really a fix if that comes with a performance penalty. If it's a bug where the CPU maxes out the voltage when idle so it can do nothing faster, that could be fixed with no performance penalty, but that seems unlikely.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago)

I'm sorry but this is just a fundamentally incorrect take on the physics at play here.

You unfortunately can't ever prevent further breakdown. Every time you run any voltage through any CPU, you are always slowly breaking down gate-oxides. This is a normal, non-thermal failure mode of consumer CPUs. The issue is that this breakdown is non-linear. As the breakdown process increases, it increases resistance inside the die, and as a consequence requires higher minimum voltages to remain stable. That higher voltage accelerates the rate of idle damage, making time disproportionately more damaging the more damaged a chip is.

If you want to read more on these failure modes, I'd recommend the following papers:

L. Shi et al., "Effects of Oxide Electric Field Stress on the Gate Oxide Reliability of Commercial SiC Power MOSFETs," 2022 IEEE 9th Workshop on Wide Bandgap Power Devices & Applications

Y. Qian et al., "Modeling of Hot Carrier Injection on Gate-Induced Drain Leakage in PDSOI nMOSFET," 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications

[–] [email protected] 19 points 4 months ago

I’ve only recently become aware of the issue and that’s the way it feels.

But in the absence of a definitive test I think folks are concerned that they will be stuck with a CPU that continues to degrade prematurely. That seems like a valid concern.

[–] [email protected] 9 points 4 months ago

So what's the big fuggin' problem here? That Intel won't use the term "recall"?

Would you say the same thing about a car?

"We know the door might fall off but it has not fallen off yet so we are good."

The chances of that door hurting someone are low and yet we still replace all of them because it's the right thing to do.

These processors might fail any minute and you have no way of knowing. There's people who depend on these for work and systems that are running essential services. Even worse, they might fail silently and corrupt something in the process or cause unecessary debugging effort.

If I were running those processors in a company I would expect Intel to replace every single one of them at their cost, before they fail or show signs of failing.

Those things are supposed to be reliable, not a liability.