What?
I want to share an experience where I started writing code to do something in a way that I thought would help me later, but ended up figuring out I was shooting myself in the foot and making complexity for no value.
Again… What?
I’ve muddled in trying to develop a minimalistic toy OS before. Various attempts have gotten so far, but I never really knew the path to get to where I wanted. Or knew where I wanted to get. The furthest I got was a system that could parse and load ELF binaries, but fell apart the moment I tried to load multiple binaries, even though I thought I had tested my multi-threading code. The problem was, the system was so heavily based on example code, I’d never had to debug at a low enough level to understand where I was. This time, the only example code I’ve taken is the bare minimum to make a multiboot-compliant image. Everything else I’m writing from first principles.
One of the design decisions I’ve made is that my kernel will be higher-half on a 32-bit x86 platform. I’m not particularly focusing on making it portable to other platforms, my aspiration is to make a functional system, which should help me design a new kernel that’s portable further down the line. I’m trying to write code that, while perhaps simple and slow, won’t fundamentally break further down the line. While writing the code to bootstrap page tables and map the kernel at the higher-half address space, a thought came to mind. I was statically allocating space in the .bss section for my initial page directory and page tables. My kernel will keep growing, and I’ll probably come to a point where it grows beyond the initial allocation of page tables. That will cause me problems that might be difficult to troubleshoot. It’ll probably either look like garbage being written to memory when the page table iteration inevitably overflows or, at best, page faults when parts of my kernel execute. So, I decided that I wouldn’t progress until I’d solved this problem in a graceful manner.
My initial thought was to calculate the number of page tables I would require at run-time, parse the memory mapping table, and find space for my page tables that don’t conflict with the kernel, the multiboot structures, any modules I’ve loaded, and I’m sure this list would grow. Since I haven’t initialised paging yet, I can’t jump to C code. A fun quirk of building with GCC is that certain functions must always be available, memset for example. I can’t emit a symbol for memset in two places, accounting for pre- and post- paging initialisation. So every time I think of something that is in memory, that I need to avoid when writing my initial page tables, I’d need to write in assembly. And then write again in C when I initialise my physical page allocator. It’s a problem that will continue to grow.
I worked on this code for a while, had a functioning prototype that I didn’t end up liking because it would use the lower-most available addresses - which I may want to keep free for various buffers used to interface with fun ISA cards, like Sound Blasters and NICs. (Yes, in 2025, I still want to write code that targets 486-class machines). I haven’t worked on it in a couple of months, but I sat down today to finally re-write this code that I wasn’t happy with. I thought, maybe I should try taking memory from the top instead? Maybe I shouldn’t search for individual frames, but rather the entire block of memory all at once? I started searching some random thoughts to help validate this idea, where are multiboot structures allowed to be loaded, where can the memory map be, is it safe to assume it’ll be in order… and I got sidetracked, got ahead of myself and started reading about multi-tasking again.
While doing so, another question popped into my brain. When I do eventually have multi-tasking implemented, I’m going to have multiple page directories. It would be painful if I had to look through and modify all the page directories that are allocated, each time I make a change to the kernel’s section. What can I do to solve that? The logical answer I came up with, was pre-allocating all the page tables my kernel could ever want for itself, so each page directory can inherently always be up-to-date, since I’d only ever need to modify the contents of the page tables. I’d need to allocate 1MB of memory for this but, despite knowing I want to target lower class machines, I don’t own any with <8MB and the ones I’d target all have at least 16MB - so this is an acceptable compromise for a system that’s only meant to be a toy, never a real production system. Great, I’ve solved a future problem! And then it hit me. Having reached this conclusion, I no longer need to dynamically allocate my page tables. My kernel fundamentally cannot require more than 256 page tables to contain itself, or it’d have no room to allocate anything for itself at runtime. So I can allocate 1MB of space in my .bss section for page tables, and be done with it until I have a physical memory allocator.
Truthfully, I’m a little disappointed that I’ve not ended up writing the code that I thought I would. But it’s nice to have solved the problem in such a simple way.
What, what, what??!?
Yeah, I know, this was a big ramble. The point is to demonstrate that, sometimes, requirements aren’t what you think. That it’s very easy to over-engineer. And, that designing good code first time is hard.