this post was submitted on 28 Jan 2024
674 points (94.8% liked)
Programmer Humor
32712 readers
1273 users here now
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If you compare the performance of async rust vs. rust with blocking syscalls there's not even a comparison. 'epoll' and the like are vastly more performant than blocking system io, async then it simply a way to make that kind of system interface nice to program with as you can ignore all that yielding and waking up and write straight-line code.
Now, if all you do is read a config file, yes, all that is absolutely overkill. If you're actually doing serious io though there's no way around this kind of stuff.
I assume by performance you mean CPU usage per io request. Each io call should require a switch to the kernel and back. When you do blocking io the switch back is delayed(switch to other threads while waiting), but not more taxing. How could it be possible for there to be a difference?
Because the kernel doesn't like you spawning 100k threads. Your RAM doesn't, either. Even all the stacks aside, the kernel needs to record everything in data structures which now are bigger and need longer to traverse. Each thread is a process which could e.g. be sent a signal, requiring keeping stuff around that rust definitely doesn't keep around (async functions get compiled to tight state machines).
Specifically with io_uring: You can fire off quite a number of requests, not incurring a context switch (kernel and process share a ring buffer) and later on check on the completion status quite a number, also without having to context switch. If you're (exceedingly) lucky no io_uring call ever cause a context switch as the kernel will work on that queue on another cpu. The whole thing is memory, not CPU, bound.
Anyhow, your mode of inquiry is fundamentally wrong in the first place: It doesn't matter whether you can explain why exactly async is faster (I probably did a horrible job and got stuff wrong), what matters is that benchmarks blow blocking io out of the water. That's the ground truth. As programmers, as a first approximation, or ideas and models of how things work are generally completely wrong.
Why do you say this?
Not if your stacks per thread are small.
These data structures must exist either in userland or the kernel. Moving them to the kernel won't help anything. Also, many of these data structures scale at log(n). Splitting have the elements to userland and keeping the other half gives you two structures with log(n/2) so 2log(n/2) = log(n^2/4). Clearly that's worse.
If signals were the reason async worked better, then the correct solution is to enable threads that opt-out of signals. Anything that slows down threads that isn't present in an async design should be opt-out-able. The state-machines that async compiles to, do not appear inherently superior to multiple less stateful threads managed by a fast scheduler.
As described here you would still need to do a switch to kernel mode and back for the syscalls. The extra work required from assuming processes are hostile to each other should be easy to avoid among threads known to have a common process as they are obviously not hostile to each other and share memory space anyway. The synchronization required to handle multiple tasks should be the same regardless if they are being run on the same thread by a user land scheduler or if they are running on multiple threads with an os scheduler.
I'm not interested in saying that async is the best because it appears to work well currently. That's not the right way to decide the future of how to do things. That's just a statement of how things are. I agree, if your only goal is get the fastest thing now with no critical thought, then it does appear that async is faster. I am unconvinced it must fundamentally be the case.
Have you tried?
Page size is 4k it doesn't get smaller. The kernel can't give out memory in more fine-grained amounts, at least not without requiring a syscall on every access which would be prohibitively expensive.
That's what async does. It opts out of all the things, including having to do context switches when doing IO.
No, you don't: You can poll the data structure and the kernel can poll the data structure. No syscalls required. Kernel can do it on one core, the application on another, so in the extreme you don't even need to invoke the scheduler.
You can e.g. have a look at whether you can change the hardware to allow for arbitrarily small page sizes. The reaction of hardware designers will be first "are you crazy", then, upon explaining your issue, they'll tell you "well then just use async what's the problem".