There is actually less to ’xkill’. It nukes the X window from orbit in a very violent manner. The owning process(-tree) will usually just instantly curl up and die.
The main benefit is that it doesn’t actually kill the process, it only nukes the window. As such, you can get rid of windows belonging to otherwise unkillable processes (zombies, etc).
Also, it’s fun. Just don’t miss the window and accidentally kill your WM. (Beat that Wayland)
Please note that the nominal FLOP/s from both Nvidia and Huawei are kinda bullshit. What precision we run at greatly affect that number. Nvidias marketing nowadays refer to fp4 tensor operations. Traditionally, FLOP/s are measured with fp64 matrix-matrix multiplication. That’s a lot more bits per FLOP.
Also, that GPU-GPU bandwidth is kinda shit compared to Nvidias marketing numbers if I’m parsing correctly (NVLink is 18x 10GB/s links per GPU, big ’B’ in GB). I might read the numbers incorrectly, but anyway. How and if they manage multi-GPU cache coherency will be interesting to see. Nvidia and AMD both do (to varying degrees) have cache coherency in those settings. Developer experience matters…
Now, the real interesting thing is power draw, density and price. Power draw and price obviously influence TCO. On 7nm, I guess the power bill won’t be very fun to read, but that’s just a guess. The density influences network options - are DAC-cables viable at all, or is it (more expensive) optical all the way?