HelenOS Raspberry Pi port

Preface

A year ago, I decided to put my Raspberry Pi to good use and actually work on what I had bought it for: Learning ARM assembly and getting familiar with a pure µkernel architecture operating system by porting HelenOS to the pi. Once I got the system to run at all, I started writing this document. That was, in hindsight, a mistake. Getting the code out first is generally a better idea. What happened is that, in the middle of writing this, I glanced at the helenos bugtracker and realized someone else (ben_g) had been working on this same thing, and already had some code out. It, of course, got finished a few days after I had gotten mine working, but that didn't matter any more: The other code got out there first.

I contacted him (I believe on irc), congratulated him on his code (cleaner than mine, with proper headers and what not) and commented on the situation, suggested a few changes to his code from what I had learned on mine (I believe it had to do with memory map exclusions and using high memory) and moved on to other projects.

About a year later, I decided to tie things up by resurrecting and finishing this document, as it would have been wasted otherwise. It ended up shorter than the original plan, because at this point I do not remember a good portion of the debugging I did. I do think, however, that's for the best, as looking at it, it does feel longer than it should really be.

Skeleton

To get an idea of what basic structure would be needed in order to add a new platform I started by taking a look at the existing arm32 platforms. At kernel/arch/arm32/src/mach there's a directory for each of the arm32-based boards that HelenOS supports at the time. Looking inside, it seemed the purpose is to implement the functions defined in a structure type called arm_machine_ops, which seem to deal with board-specific details. I then went up to the parent to locate the related Makefile, hoping to see how they were each selected.

| ifeq ($(MACHINE),integratorcp)
|  ARCH\_SOURCES += arch/$(KARCH)/src/mach/integratorcp/integratorcp.c
| endif

And so, I added a similar entry for the "raspberrypi" machine, created the appropriate directories and realised I was doing too much work without having even tested building HelenOS for one of the already supported platforms. Doing a "make config" in the root shows a pretty menu where a target platform and some other options are easily selected. I selected and built "integratorcp", which went well. Good. I immediately knew I wanted to add "raspberrypi" to the config menus as a way to start. This implied adding a machine type:

% Machine type
@ "gta02" GTA02 / Neo FreeRunner
@ "integratorcp" Integratorcp
@ "beagleboardxm" BeagleBoard-xM
@ "beaglebone" BeagleBone
@ "raspberrypi" RaspberryPi
! [PLATFORM=arm32] MACHINE (choice)

Then we know Raspberry Pi's CPU:

% CPU type
@ "arm1176jzf_s" ARM1176JZF-S
! [PLATFORM=arm32&MACHINE=raspberrypi] PROCESSOR (choice)

And since we know that's using the ARMv6 ISA:

# Add more ARMv6 CPUs% CPU type
@ "armv6" ARMv6
! [PLATFORM=arm32&(PROCESSOR=arm1176jzf_s)] PROCESSOR_ARCH (choice)

Apparently, it'll cause the build to use gcc --march=armv6 and there's a collection of "ifdefs" for PROCESSOR_ARCH_armv6 in the code that I wanted to benefit from.

Running "grep -Hri integratorcp *" at the root revealed a few other files with machine-specific code, but the skeleton looked pretty much complete. I tried a build and it completed, but that's it. I didn't even bother to try and run the resulting image, as I knew it would be a pointless effort at this point.

I did however look into the image. It does expect to be loaded whole into memory, has the entrypoint of the loader at the very start and this loader will unpack the rest of the image into ram, do a little setting up and then jump into the kernel's entry point. I knew that If I wanted to get somewhere, I needed to make this loader work.

Loader

The loader, unsurprisingly, is pretty arch-specific. Under boot/arch/arm32/ there's the relevant code. Reading some of it, it became clear asm.S holds the entrypoint, which does setup the stack and calls bootstrap() from main.c. I wanted to see if the rpi was getting this far. In order to, I added the following magic right at the start of bootstrap():

*((int *)(0x20200028)) = 1<<16; //LED ON.

This pretty much turns on the green led in the rpi board, by hitting the relevant mmio port directly. And then I built and uploaded to the pi, which took some 15 minutes, as HelenOS debug images build to some 10MB, which would then have to be transferred through a 115200 bps serial port. Of course, the led did not turn on. Thankfully, at the root of boot/ there was a image.map, which the build system generates automatically. There I quickly confirmed the code expected to be run elsewhere in memory. This would be set in some place and the place is include/arch.h.

#ifdef MACHINE_gta02
#define BOOT_BASE       0x30008000
#elif defined MACHINE_beagleboardxm
#define BOOT_BASE       0x80000000
#elif defined MACHINE_beaglebone
#define BOOT_BASE       0x80000000
#elif defined MACHINE_raspberrypi
#define BOOT_BASE       0x00008000
#else
#define BOOT_BASE       0x00000000
#endif

The rpi firmware will load a kernel, generally Linux, from "kernel.img" at 0x8000 and jump at that same address. This hasn't always been the case, however, as it used to load at 0x0 and jump at 0x8000. By default, at 0x0, there's exception vectors. With the old image format, these vectors could be preloaded into the image. The serial bootloader I'm using does relocate and run from elsewhere higher in memory, downloading the kernel to the normal address, meaning it should be transparent for these purposes. So, to best of my knowledge, the boot base had to be 0x8000. The LED turning on confirmed as much.

Although turning a LED on is very nice, I noticed there were calls to printf all over main.c. The next step had to be getting all that text through the uart. At src/putchar.c:

#ifdef MACHINE_raspberrypi
static void scons_sendb_rpi(uint8_t byte)
{
    /* Wait for the uart to be ready */
    while (*((volatile uint8_t *) 0x20201018) & (1 << 5));
    /* Send the byte */
    *((volatile uint8_t *) 0x20201000) = byte;
}
#endif

What this does is pretty much as the code comments describe. Normally, it would be necessary to setup the serial port so it can be used. However, the not-so-transparent-now uart-based bootloader that runs in my rpi before all this code does did take care of that already. Code to properly set the uart up would have to be implemented at some point in the future.

At this point, the debug text was properly being piped through the serial. Turning on the led stopped being necessary, so I removed that code. The loader seemed to be doing a fine job from the debug output and therefore I went look at the kernel's entry point. The code looked like this:

<...>
ldr sp, =temp_stack

bl arch_pre_main

Where arch_pre_main was the first C function. So I added my turn-the-led-on snippet in that function, in kernel/arch/arm32/src/arm32.c. But the led wouldn't turn on. I noticed the bootloader was setting up the mmu with an initial page table. I thought that any problem that had anything to do with the mmu would be a pain to debug and fix and decided to just make the bootloader jump at the position where the serial bootloader should still be, to confirm things were reasonably sane. They still were and I had a loop on my hands, which was terminated by the server half of the uart booloading solution failing. It apparently ran out of FDs for the kernel file, which it was failing to free. Fixed that bug and sent a pull request to its developer.

It occurred to me that, like the bootloader, the kernel was probably expecting to be loaded into a different memory address than it was. So I added some extra debug with the targets in the decompression stage of the bootloader:

printf("%s@%p  ", components[i - 1].name, dest[i - 1]);

int err = inflate(components[i - 1].start, components[i - 1].size, dest[i - 1], components[i - 1].inflated);
<...>
printf("Booting the kernel at %p...",entry);

And by cross checking with the kernel's map, it became clear there was an address mismatch. This was easily addressed:

#ifdef MACHINE_raspberrypi
#define BOOT_OFFSET    (0xa00000)
#else
#define BOOT_OFFSET    (BOOT_BASE + 0xa00000)
#endif

And then the LED turned on. Good. Since I knew the processor was reaching the kernel at this point, I decided it was a good time to make the image smaller, as it had been taking 15 minutes per test, which is a bit too much. In boot/Makefile.common, I stopped embedding the initrd, as I won't be needing that until I get init to run, which will still take a while.

$(INIT_TASKS) #\
#$(INITRD).img

Took care of it. The image became 1MB which transferred at 115200bps means a certain but acceptable delay. The LED was still turning on.

Find crash point

At this point, the microkernel startup code was running. I removed the early LED turn on and quickly located the function that takes care of most of the startup, which is main_bsp_separated_stack(). There I placed the code to turn on the led at the start (works), then at the end (no), then at half and so on, subsequently dividing in order to find where the microkernel startup was hanging. as_init() seemed to be the culprit. Further investigation determined that's the address space initialization code, which contains a switch to a new page table. I was afraid of trouble with the MMU as I knew pretty much nothing about ARM's MMU. After a while it hit me that, obviously, after the swap, I would actually need to map the mmio the code to turn on the LED used. It looked like it wouldn't be possible until after km_non_identity_init(), which was several lines away from as_init() in the startup function. This seemed like it'd take some work to debug if km_map didn't work.

ioport32_t *led = (ioport32_t*)km_map(0x20020028, 4, PAGE_WRITE | PAGE_NOT_CACHEABLE);
pio_write_32(led, 1<<16);

And it didn't work. It looked like it would be a pain to debug so I went on to work on something to make it easier. Getting the UART to work for some very early debug.

uart

As a start, I just copied the driver for another uart (arm926_uart.c) and added the same minimal trickery that was used for the loader.

while ((pio_read_8((ioport8_t*) 0x20201018)) & (1 << 5));
pio_write_8((uint8_t *) 0x20201000, ch);

While making the kernel use it was a matter of

stdout_wire(&rpi.uart.outdev);

From here on, I had serial debug. By adding debug statements all over main_bsp_separated_stack(), I confirmed what I already knew: It'd stop working right after setting the memory map.

By this point, the size of the helenos image over the UART was getting really annoying, so I altered the image generation not to include the ram disk, as that'd only get used after init was running.

Page tables

As an experiment, I made the pagetable switching code set the old map rather than the new, just to make sure the new map was to blame, and not the switching code. Execution went past that point, so the new map was to blame.

After some inspection of the kernel's support for ARM pagetables, I saw that there were ARMv4 and ARMv6 versions of pagetable, so it would be expected to just work with that ARMv6 support. It just didn't.

Armed with the serial debugging, I proceeded to dump the old/new pagetables to inspect them with okteta, KDE's hexadecimal editor. The new pagetable would look like a mess, but I learned something about the one prepared by the bootloader: I was wrong about "identity" mapping. I got this stupid idea that identity would mean 0x0 and 0x80000000 onwards to be the same thing: The first half of virtual memory points to the first half of physical memory, and the second half of virtual memory points to the first half of physical memory again. It's much simpler. Virtual address = Physical address. That's all. That's obviously what identity is. Why would I have thought otherwise? Sometimes, it's the simplest things.

Then, by inspecting how other machines based on ARMv7 (that use the ARMv6 pagetable format) were doing things, I concluded that my kernel should really be somewhere over 0x80000000, not 0x0. So I went back to the loader, ld script to make things work with the kernel above, rather than below. It still didn't work. I was sure I was probably missing all sorts of memory setup related work and so things wouldn't work. The truth is I didn't really want to study at the pagetable dumps in more detail...

So I did look more into the code for other machines. One important thing that I wasn't doing and I should have been was to to exclude some regions (mmio regions) from being usable as memory. As a start, I went with a pretty conservative such setup:

void rpi_get_memory_extents(uintptr_t *start, size_t *size)
{
*start=PHYSMEM_START_ADDR;
*size=0x08000000; //128MB FIXME: should likely depend on RAM reserved for GPU use.
}

void rpi_frame_init(void)
{
    frame_mark_unavailable(0x80000, 8);
}

And of course, it just wouldn't work. Damn. That'd mean really looking into page tables.

Dumping and parsing

To get anywhere, I would need a better way to parse pagetables than staring very intently into okteta. So I threw together some Python to do it, using ARM's documentation as a reference.

That code doesn't do much, but was terribly useful. It can be found in: https://github.com/rvalles/pyarmptparse

Turns out the pagetable generated by spartan depends purely on extended pagetables. That's why it would look cryptic on okteta from a glance, while the loader one would look sane.

Around here, I was poking mmio registers directly, because I didn't trust km_map to work. This was a silly idea with virtual memory is set up, as someone needs to map these io registers where they should be. I thought that, as a quick hack, editing the pagetable directly to map the mmio registers corresponding to the led, things would work, but they didn't. Annoyed, I decided to leave the code alone for a while and just read some ARM documentation. That was a really good idea.

The actual problem: Compat mode flag on reset

As I was actually calmly reading the ARM documentation on page tables, I did again read that the ARMv6 architecture supports two formats of pagetable: The old one (used in ARMv4/ARMv5) and the new one, used in ARMv6+ onwards. Then I looked into the actual differences: They most notably added a no-execute flag, and the change was really incompatible only with coarse pages, which are used in the new pagetable, but not the one set by the loader. I knew I was onto something, so I checked the control registers that and yes, the CPU does boot in "compatibility mode", using the old format. On ARMv7, of course, the old format isn't supported anymore so the cpu uses the new format by default.

So, I wrote some code to enable it:

#ifdef PROCESSOR_ARCH_armv6
uint32_t control_reg = SCTLR_read();
control_reg |= SCTLR_SUBPAGE_AP_FLAG; //ARMv6+ subpage table format
//control_reg |= SCTLR_UNALIGNED_EN_FLAG;
SCTLR_write(control_reg);
#endif

But it still didn't work. Thankfully, I noticed the problem quickly: I did somehow make the LED address incorrect at some point, so all the way through, it couldn't possibly have worked! Here's km_map version with wrong address:

ioport32_t *led = (ioport32_t*)km_map(0x20020028, 4, PAGE_WRITE | PAGE_NOT_CACHEABLE);
pio_write_32(led, 1<<16);

With the correct address:

ioport32_t *led = (ioport32_t*)km_map(0x20200028, 4, PAGE_WRITE | PAGE_NOT_CACHEABLE);
pio_write_32(led, 1<<16);

Then tested. It worked. Yay. Just to make sure, I disabled the code to enable new pagetable format and tried again. It really was necessary. It's just silly that this mistake on the LED lighting code had been sitting there undetected for such a long time.

Then I adapted the serial port code to work with km_map:

static volatile ioport8_t *myuart;

static void bcm2835_uart_putchar(outdev_t *dev, wchar_t ch)
{
        bcm2835_uart_t *uart = dev->data;
        /* Wait for the uart to be ready */
        while (pio_read_8(myuart) & (1 << 5));
        /* Send the byte */
        pio_write_8((uint8_t *) uart->regs, ch);
}

static outdev_operations_t bcm2835_uart_ops = {
.write = bcm2835_uart_putchar,
.redraw = NULL,
};

bool bcm2835_uart_init(
bcm2835_uart_t *uart, inr_t interrupt, uintptr_t addr, size_t size)
{
        outdev_initialize("bcm2835_uart_dev", &uart->outdev, &bcm2835_uart_ops);
        uart->outdev.data = uart;
        uart->regs = (void*)km_map(0xA0201000, PAGE_SIZE, PAGE_NOT_CACHEABLE);
        myuart = (volatile ioport8_t *) km_map(0xA0201018, PAGE_SIZE, PAGE_NOT_CACHEABLE);
        return true;
}

By this point, as the flow was getting as far as running init, I added the ramdisk back. Userspace was booting. Success.

Conclusions

SPARTAN is pretty clean, the way it supports different architectures and machines and is a pleasure to work with. ARMv6 uses the old pagetable format by default. OS development on the raspberry pi is fun. It's a good idea to follow the gut when it's telling you to give related documentation one good reading, rather than just using it as a reference for specific things. Release early, release often.

Roc's Blog