this post was submitted on 09 Jan 2025
21 points (75.6% liked)

Programming

17841 readers
66 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 2 years ago
MODERATORS
 

yes, not a unix os but rather unix-like, and i want to program all of it on python, is that possible?? even the kernel, i want it all python. i know most kernels use c++ or c* but maybe python has a library to turn c* into python?? i'm still sort of a beginner but thanks and i would appreciate the answers

top 20 comments
sorted by: hot top controversial new old
[–] litchralee 35 points 1 week ago* (last edited 1 week ago)

As it happens, this is strikingly similar to an interview question I sometimes ask: what parts of a multitasking OS cannot be written wholly in C. As one might expect, the question is intentionally open-ended so as to query a candidate's understanding of the capabilities and limitations of the C language. Your question asks about Python, but I posit that some OS requirement which a low-level language like C cannot accomplish would be equally intractable for Python.

Cutting straight to the chase, C is insufficient for initializing the stack pointer. Sure, C itself might not technically require a working stack, but a multitasking operating system written in C must have a stack by the time it starts running user code. So most will do that initialization much earlier, so that the OS's startup functions can utilize the stack.

Thjs is normally done by the bootloader code, which is typically written in assembly and runs when the CPU is taken out of reset, and then will jump into the OS's C code. The C functions will allocate local variables on the stack, and everything will work just fine, even rewriting the stack pointer using intrinsics to cause a context switch (although this code is often -- but not always -- written in assembly too).

The crux of the issue is that the initial value of the stack pointer cannot be set using C code. Some hardware like the Cortex M0 family will initialize the stack pointer register by copying the value from 0x00 in program memory, but that doesn't change the fact that C cannot set the stack pointer on its own, because invoking a C function may require a working stack in the first place.

In Python, I think it would be much the same: how could Python itself initialize the stack pointer necessary to start running Python code? You would need a hardware mechanism like with the Cortex M0 to overcome this same problem.

The reason the Cortex M0 added that feature is precisely to enable developers to never be forced to write assembly for that architecture. They can if they want to, but the architecture was designed to be developed with C exclusively, including interrupt handlers.

If you have hardware that natively executes Python bytecode, then your OS could work. But for x86 platforms or most other targets, I don't think an all-Python, no-assembly OS is possible.

[–] [email protected] 18 points 1 week ago (1 children)

What OS will run the Python interpreter?

[–] [email protected] 4 points 1 week ago* (last edited 1 week ago) (2 children)

Can't Python be translated into machine code and packaged into a binary? I swear I have no experience in OS development, just curious.

[–] [email protected] 6 points 1 week ago

Like Java, you can distribute a binary which bundles an interpreter/VM, but your code is still running inside a host OS.

[–] litchralee -2 points 1 week ago* (last edited 1 week ago) (1 children)

Can't Python be translated into machine code

Yes, and that's basically what the CPython interpreter does when you call a Python script. It sometimes even leaves the result laying in your filesystem, with the extension .pyc . This is the byte code (aka machine code) for CPython's implementation of the Python Virtual Machine (PVM).

and packaged into a binary?

Almost. The .pyc file is meant to run with the appropriate PVM, not for x86 or ARM64, for example. But if you did copy that .pyc to another computer that has a CPython PVM, then you can run that byte code and the Python code should work.

To create an actual x86 or ARM64 binary, you might use a Python compiler like cython, which compiles to x86 or ARM64 by first translating to C, and then compiling that. The result is a very inefficient and slow binary, but it is functional. You probably shouldn't do this though.

[–] [email protected] 1 points 1 week ago

Yes, and that’s basically what the CPython interpreter does when you call a Python script. It sometimes even leaves the machine code laying in your filesystem, with the extension .pyc . This is the byte code (aka machine code) for CPython’s implementation of the Python Virtual Machine (PVM).

This is incorrect; the term "machine code" refers to code that can be run on a real machine, not to code that requires a virtual machine.

[–] [email protected] 15 points 1 week ago

You might want to browse through this: https://wiki.osdev.org/Creating_an_Operating_System

Which should also help explain why doing the whole thing in python isn't feasible

[–] [email protected] 9 points 1 week ago* (last edited 1 week ago) (1 children)

The essence of your answers is "yes, but...". And the "but" is mostly about how slow Python is in contexts that need to be astonishingly fast.

It depends how complex the hardware is and how much time we're willing to waste.

Technically, when I deploy a Python program to a BBC Microbit, that's (more or less) what is happening. Pure Python code is making every decision, and is interacting directly with all available hardware.

We could still argue semantics - virtually no (modern) computer exists that isn't running at least one tiny binary compatibility driver written in C.

I believe the compiled C binary on a BBC Microbit to bootstrap a pure Python OS is incredibly small, but my best guess is that it's still present. The C library for Microbit needed to exist for other languages to use, and Python likes calling C binaries. So I don't imagine anyone has recreated it in pure Python for fun (and slower results).

(Edit: As others have pointed out, I'm talking about MicroPython, which is, itself written in C. The Microbit is so simple it might not use MicroPython, but I can't imagine the BBC Microbit team bothered to reinvent the wheel for this.)

Of course, if you don't mind that the lowest level code has got to be binary, and very few people are crazy enough to create that code with Python, then...

It begs another interesting question: Just how much of an OS can we get away with writing in Python.

And that question is answered both by RedHat Linux and Debian Linux - and the answer is that both are built with an awful lot of Python.

In contrast, Android is mostly Java with ~~lots of C~~ a C Linux kernel. Windows is mostly C# and lots of C. iOS is mostly Objective C and lots of C.

You can have an OS built with almost any language you want, as long as you also want parts of it built in C. (Edit: This is meant to amuse you, not be guidance for what is possible. Today, we love our C code. C didn't always exist, and might someday no longer be our favorite hardware driving language.)

An interesting current development is discussion around rebuilding parts of the Linux Kernel with Rust, which can run just as fast as C. This would effectively cause RedHat, Debian and Android to replace some of their C code with Rust. To date, there's been a lot of interest and discussion and not a lot of (any?) actual funding or work completed.

[–] litchralee 3 points 1 week ago* (last edited 1 week ago) (1 children)

While I get your point that Python is often not the most appropriate language to write certain parts of an OS, I have to object to the supposed necessity of C. In particular, the bolded claim that an OS not written in C is still going to have C involved.

Such an OS could instead have written its non-native parts using assembly. And while C was intentionally designed to be similar to assembly, it is not synonymous with assembly. OS authors can and do write assembly when even the C language cannot do what they need, and I gave an example of this in my comment.

The primacy of C is not universal, and has a strong dependency on the CPU architecture. Indeed, there's a history of building machines which are intended for a specific high-level language, with Lisp Machines being one of the most complex -- since Lisp still has to be compiled down to some sort of hardware instructions. A modern example would be Java, which defines the programming language as well as the ISA and byte code: embedded Java processors were built, and thus there would have been zero need for C apart from legacy convenience.

[–] [email protected] 3 points 1 week ago

I have to object to the supposed necessity of C. In particular, the bolded claim that an OS not written in C is still going to have C involved.

Such an OS could instead have written its non-native parts using assembly.

Agreed! That's a great point!

I appreciate your clarification. Not everything has to run C. It's just a trend in today's products.

I was attempting to humorously reference Monty Python's Spam sketch, where it seems like everything on the menu has at least a little Spam in it. Every device I could think of, that I've toyed with enough to guess what it has running, is running at least a bit of C.

For an attempt at a counterpoint, I thought of a few devices, like my PineWatch, that run an OS codes entirely written in one language. But... That one language is, of course, C.

legacy convenience.

Yeah. I think legacy convenience is, indeed, why there's C in so many places, even places it doesn't have to be.

There's so many folks with so much hardware driver expertise in C, and they teach our next generation, so I figure that will continue until something really compelling changes their preference.

I appreciate your point. There are lots of non-C ways to create bytecode. My (amused) point is that we don't seem very fond of any of those methods, today.

[–] atzanteol 7 points 1 week ago (1 children)

No, not if you want it to run on the hardware directly at least. If you want it to run as an emulator of sorts under Linux then you could.

Python needs an interpretor to run on and lacks direct memory/hardware access.

[–] [email protected] 2 points 1 week ago

I'm thinking one would be best off writing a virtual machine hypervisor to run Python code, like facebook did to run their PHP as close to the bare metal as they could get it.

It would still be a beast of a project to start on with the VM already built, of course.

[–] [email protected] 6 points 1 week ago (1 children)

I would not recommend this as an exercise for a beginner, but RPython is a subset of Python with a C backend; it is used as the basis of PyPy (an implementation of Python), so it may be possible to use it to implement the low-level parts which then can be used to bootstrap a full Python virtual machine.

[–] [email protected] 3 points 1 week ago

In short: If you'd like to learn more, come visit #pypy on Libera IRC. It's an interesting discussion topic, particularly if we want standard-library imports like math, sys, or json to work.

RPython is not capable of translating to bare metal today; it depends on libc and libffi for many features even when not producing JIT compilers. It's also intended to operate on a layer of syscalls: rather than directly instructing hardware, it wants to make fairly plain calls, perhaps via FFI, passing ordinary low-level values. So, any OS developer would first have to figure out how to get RPython to emit code that doesn't require runtime support, and also write out the low-level architecture-specific hardware-access routines.

That said, RPython is designed to translate interpreters, and fundamentally it thinks an interpreter is any function with a while-loop, so a typical OS would be a fairly good fit in terms of architecture. RPython knows the difference between high-level garbage-collected objects and low-level machine-compatible values; GC would be available and most code would be written in a statically-typable dialect of Python 2.7 that tastes like Java or OCaml.

The OS would be the hard part. RPython admits the same compositional flexibility as standard Python, so it should be possible to hack PyPy into something that can be composed with other RPython codebases. This wouldn't be trivial, though; PyPy in particular is tightly glued to RPython since they are developed together in a single repository, and it wasn't intended for reuse from the RPython side.

If all of that sounds daunting, and what you would like to do instead is take an existing kernel or shell with C linkage and ELF support, and extend it arbitrarily using Python code, then PyPy can help you in that direction too. Compile a libpypy and embed PyPy against your kernel, and you can then run arbitrary Python code in a fairly nice environment which supports Python-first development. Warning: while the high-level parts of this might be nice, like Python's built-in REPL tools, the low-level parts could be very nasty since this embedding interface is old and rotting, to say nothing of actually getting bare-metal code that doesn't make syscalls.

[–] [email protected] 5 points 1 week ago

I think it is best to have some understanding of how an OS works, and how Python works, before asking whether you can write an OS in Python.

Python is basically a scripting wrapper around a bunch of C functions ("builtins") and there are means of installing additional C functions if you need them. Without any of the builtins, you really can't do much of anything. For example, "print(2+2)" computes the string "4" (by adding 2+2 and converting the result to decimal), then calls a builtin to actually print the string on the console.

For an OS, you will need quite a few more C functions, mostly to control timers and task switching, the main functions of an OS. Given enough C functions though, in principle you can write an OS.

[–] [email protected] 5 points 1 week ago (1 children)

you could in micropython at least. it's not unixy but for example see https://github.com/Rybec/pyRTOS

[–] [email protected] 6 points 1 week ago* (last edited 1 week ago)

Micropython is an interpreter, implemented in C. Anything running in it wouldn't be an operating system in the sense that we usually mean. Anything incorporating it wouldn't satisfy OP's goal of being "only Python".

If a CPU were developed that used Python bytecode as its official instruction set, perhaps using micropython implemented as microcode, then it might work.

[–] csm10495 5 points 1 week ago* (last edited 6 days ago)

Warning: talking out of my butt a bit so take with a grain of salt.

I wonder if you could look at micropython. You could implement a unix like world on top of micropython then use micropython as the layer where a normal os would be.

It would be miserable and likely impossible to be fully unix compliant but could be a fun thing to play with. I would be amazed if it ever somehow could run native unix binaries.

[–] [email protected] 3 points 1 week ago

What you are looking for is some way to shortcut the process of learning to write an operating system by re-using your existing knowledge of Python.

(I'm not judging that; I understand why you want to do it)

The simple truth is that there is no way to do that. Any solution that involves using Python in a kernel would cost you more in terms of complexity and time than learning C would.

It is rarely worth it to use a language outside of the domains that it is normally used for.

[–] [email protected] 2 points 1 week ago

You could do some of it in Python, but some stuff needs low level access to registers, e.g. trap handlers and context switching.

Should you do that? Absolutely fucking not. It would be hilariously slow and inefficient. Hundreds of times, maybe thousands of times slower than C/C++ kernels.