this post was submitted on 27 Jun 2023
19 points (100.0% liked)

Python

6467 readers
19 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

๐Ÿ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

๐Ÿ Python project:
๐Ÿ’“ Python Community:
โœจ Python Ecosystem:
๐ŸŒŒ Fediverse
Communities
Projects
Feeds

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 2 points 1 year ago

Yes, there are a lot of assumptions, incorrect information, or at least miss-leading stuff out there. So I am always interested in learning more about easy and hard ways to make things better. In fact for most things I do, Python is fast enough, but sometimes it is not.

The things I find miss-leading about what people often say about Python are that it is not that slow, and that you can always just use a library like numpy or something similar to solve speed issues. I found both to be more or less untrue in the sense of getting C like speeds. On my code, Python was indeed slow, like 1% of C speed. The surprising thing for me was using numpy helps a lot but not as much as you think. I only got to 5 to 10% of of C speed with numpy. This is because libraries are often generically compiled and to get good speed you really have to have C code that is compiled for your specific hardware with vectorization, autoparallel, and fast math at least. So generic libraries just are not going to be that fast. Another one people push is using GPUs. That also is not really very effective unless you have a very expensive card and most notably a dedicated GPU card design just for that or an array of them. The GPU performance of my workstation is significantly less then throughput of my CPU. There are hardware limitations too that are interesting. My AMD Rizen 7 based workstation would have twice the speed if I had 4 port memory rather then two port memory which is a lot more common since fully optimized code is memory IO bound at about 1/2 the CPU throughput. This must be why AMD Rizen Threadrippers seem to use 4 port memory.

There are ways around a lot of this. For example using numba can be incredible. Similarly writing your owe C code and carefully compiling it too. The careful compile is critical. Maybe one could do the same with some stock libraries, carefully compile them. Lot of the other stuff people talk about just does not work very well in terms of speed or effort such as pypy, cython, nuitka, etc.