this post was submitted on 13 Mar 2024
13 points (84.2% liked)

Hardware

4877 readers
1 users here now

This is a community dedicated to the hardware aspect of technology, from PC parts, to gadgets, to servers, to industrial control equipment, to semiconductors.

Rules:

founded 4 years ago
MODERATORS
 

TDLR: I come to conclusion that my computer with 700W is using more that 1000W supported by PSU. Need confirmation before waste more $$$

Once and while my GPU crash, small freezes or window using gpu become noise. System still running, I can connect using ssh from other machine and kill everything to restart UI.

I am running prometheus/node_exporter that is collected by a raspberry pi, so there are a bunch of standard metrics.

First I was suspicious about temperature. Yes it get hot, but don't appears to be clear. Sometimes work well for long period of time on hot.

Looking into the metrics I found "node_hwmon_in_volts", gauge, "node_hwmon_in_volts Hardware monitor for voltage (input)". That is the only electronic metric I have, the motherboard don't appears to have a good driver for linux. I didn't find

I have 2 other intel computers and none report that, but both my AMD and raspberry pi report it. Is the "Power" in the chart. The raspberry pi report 12, that I read as 12V, but on my AMD computer normally below 1.0.

When idling, it stay on 0.7. On load fluctuate a bit. On heavy load it goes over 1.0 many times (red line). While some times ti goes without issue, I start to see the pattern that when above 1.0, its has tendency to work bad and crash, like when doing AI or player heavy. When I downgrade the graphic "playing low", no issue.

According with partpick my computer should use around 700W. Multiplied by 1.5 (as normally recommended) I have 1050W. So I bought a Cougar GEX x2 1000W. That according to cultists psu-tier-list it is a recommended B tier. So should be good.

Does my logic make any sense? Does anyone have a better suggestion? Can be a different problem?

top 7 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 5 months ago (1 children)

If you exceed the capacity of the PSU and trip one of the protection circuits, it should completely cut power. When that happened to me, it needed a power cycle before it would boot again. So I'd say that something goes wrong after the PSU. It could still be a voltage drop at the GPU (see other comment regarding cables). Maybe even just a driver/software issue.

[–] [email protected] 1 points 5 months ago

Ok, I didn't know that. Yeah, all my theory was based on wrong assumption. Thanks

[–] [email protected] 4 points 5 months ago (1 children)

No idea what your problem is. But I do know that GPU is sensitive to power requirements, specifically, is the GPU powered with three individual, separate pcie power cables? They cannot be daisy chained or shared. It should be three separate cables from the psu

[–] [email protected] 1 points 5 months ago

yes, the all 3 separated cables are connected directly from the PSU :/

[–] [email protected] 4 points 5 months ago

Found the issue while check if any video cable was bad. The GPU got very hot. As the drivers are not collecting temperature sensor from GPU, it was not visible for me. Putting all FAN on full power all the time was enough to keep GPU cold, no "power" fluctuation or computer crashes. As almost always, the problem is on the basics.

Thank you all.

[–] [email protected] 2 points 5 months ago (1 children)

I wouldn't expect that hardware to be able to use more than 1kW. Is the CPU or GPU overclocked?

Use a watt meter to check the load. Don't forget to take the power supply efficiency into account since it will be measuring input power. If the power supply is not overloaded, check the voltage rails with a multimeter while the PC is under heavy load. If the rails are all within tolerance, check the 12V rail with an oscilloscope and look for any dips or excessive ripple while the PC is under heavy load.

[–] [email protected] 2 points 5 months ago

No overclocking, I assembly, bios default. Only tune was be more aggressive into fan to see if reduce the problem.... Thanks for the suggestions, I not capable to do that, but can bring to shop. But confirm that should not use more that 1KW was good, so my theory is broken :D