22 April 202612 min read #os #osdev #limine #grub #kernel

Bootloader-Agnostic Kernels : Crafting a Universal x86-64 entry trampoline

Multiboot2 drops you in 32-bit protected mode; Limine hands you a 64-bit environment. A single entry trampoline that handles both without forking the kernel.

When writing my operating system, one issue I encountered was making it boot from anything, be it Limine, Grub2, uBoot, or the QEMU Generic Loader

The Issue #

Grub2 (Following the Multiboot2 specification) is oriented towards x86, Limine (with it’s Limine Boot Protocol) is oriented towards x86-64

In fact. Multiboot2 was designed with 32 bits in mind. Limine, with 64 in mind, and does not support 32 bits executables

It’s quite a problem: How to get the system up and running, when you could be in 32 or 64 mode ?

x86 Modes #

x86 CPUs operate in several distinct execution modes that determine address space width, available instructions, privilege levels and memory model.

Long Mode, aka the 64-bits mode of the processor is our goal. All x86-64 OSes (Windows, Linux, etc.) run in long mode.

On x86, the processor starts in Real Mode. It’s a relatively simple 16-bit mode of execution. Memory access is done via segmentation.

While having our instructions run is nice to see, I personally like to use every Gigabit of RAM I’ve install, especially with the RAMmageddon.

From Real Mode, we can switch to Protected Mode. Which is a 32 bit mode available since the 80386. It’s much more feature rich than the Real Mode.

Finally, once we are in Protected Mode, we want to switch to Long Mode (Or IA-32e mode, for Intel).

The Long Mode introduces 64-bit support, extends registers and adds new registers (general-purpose and multimedia).

For reference, see 12.9 “Mode Switching” of the Intel SDM, vol 3. There are other modes available.

Transitions Among the Processor’s Operating Modes

(From the Intel Software Developer’s Manual. March 2026 Edition)

Identifying which mode we are currently in #

While knowing how to switch modes is the most useful part, we first need to know the CPU’s current state.

There seems to be no “god flag” to know exactly which state we are in.

However, we can check a few values in configuration registers.

The PE flag in CR0 will tell you if you are in Protected Mode
IA32_EFER.LME allow you to know if you are running in IA-32e 64 bit mode (Long Mode)

Entrypoint #

My OS is formatted (for now) as a single ELF binary.

The bootloader is supposed to map the sections to memory, and jump to the entrypoint.

However, as Limine and GRUB2 expect respectively 32 and 64 bit code, we need to have two entrypoints.

Note: both MB2 & Limine are configurable bootloaders, meaning that you can provide a header that configures the BL’s behaviour

Multiboot2 #

In multiboot2, image loading is handled with physical addresses. More specifically, as we are making an ELF binary, the p_paddr indicates where, physically, we want our section to be loaded.

Multiboot2 will blindly follow your ELF fields, and happily overwrite whatever reserved memory block you throw at it.

We’ll have at the very least two sections:

multiboot_header section that includes:
- a magic string
- architecture information
- header length
- a few tags that customize the bootlader’s behaviour
The code

Multiboot header is quite a simple construction.

text

Offset	Type    Field Name	    Note
0       u32     magic	          required
4       u32	    architecture	  required
8       u32	    header_length	  required
12      u32	    checksum	      required
16-XX           tags	          required

The magic is always: MULTIBOOT2_HEADER_MAGIC or 0xe85250d6

architecture is of course dependant on your use-case, I’ll go with MULTIBOOT_ARCHITECTURE_I386 (aka 0)

header_length is the length of the total multiboot header, including all tags. I advise you calculate it directly in the linker script

checksum is a 32-bit unsigned value. From the doc: when added to the other magic fields (i.e. magic, architecture and header_length), must have a 32-bit unsigned sum of zero:

From the doc:

        .long   -(MULTIBOOT2_HEADER_MAGIC + GRUB_MULTIBOOT_ARCHITECTURE_I386 + (multiboot_header_end - multiboot_header))

The address tag #

TL;DR: It’s useless for ELF files

MB2 provides an address tag :

text

        +-------------------+
u16     | type = 2          |
u16     | flags             |
u32     | size              |
u32     | header_addr       |
u32     | load_addr         |
u32     | load_end_addr     |
u32     | bss_end_addr      |
        +-------------------+

From the specification:

header_addr: Contains the address corresponding to the beginning of the Multiboot2 header — the physical memory location at which the magic value is supposed to be loaded. This field serves to synchronize the mapping between OS image offsets and physical memory addresses.
load_addr: Contains the physical address of the beginning of the text segment. The offset in the OS image file at which to start loading is defined by the offset at which the header was found, minus (header_addr - load_addr). load_addr must be less than or equal to header_addr.
Special value -1 means that the file must be loaded from its beginning.
load_end_addr: Contains the physical address of the end of the data segment. (load_end_addr - load_addr) specifies how much data to load. This implies that the text and data segments must be consecutive in the OS image; this is true for existing a.out executable formats. If this field is zero, the boot loader assumes that the text and data segments occupy the whole OS image file.
bss_end_addr: Contains the physical address of the end of the bss segment. The boot loader initializes this area to zero, and reserves the memory it occupies to avoid placing boot modules and other data relevant to the operating system in that area. If this field is zero, the boot loader assumes that no bss segment is present.

Simply put: You choose where you want each section to be loaded, in physical memory.

After reading through GRUB2, it turns out, for ELF, it depends on your paddr. So, the address tag looks a bit useless. And after testing, it is.

The entry address tag. #

This one, however, is not useless at all.

text

        +-------------------+
u16     | type = 3          |
u16     | flags             |
u32     | size              |
u32     | entry_addr        |
        +-------------------+

It allows you to specify the physical address of your entry point.

While for ELF, the p_entry field usually defines it, you may want to override it for GRUB2/MB2-specific behaviour.

Other tags #

There are a few other tags of interest, namely the framebuffer tag which will make the bootloader give you access to a framebuffer (via a pointer).

Limine #

Reading the spec, there is a few things different when it comes to Limine.

The protocol mandates executables to load themselves at or above 0xffffffff80000000. Lower half executables are not supported. For relocatable executables asking to be loaded at address 0, a minimum slide of 0xffffffff80000000 is applied.

Furthermore, Limine will load the executable at the requested VMA, if it is above 0xffffffff80000000

The physical memory placement is not guaranteed (only a contiguity guarantee).

See LimineBootProtocol : Memory Layout At Entry

Two paths, one goal #

Now that we know the lay of the land, let’s think about it differently.

Limine is actually the easy one. It walks in already in Long Mode, paging on, kernel already mapped at the higher half.

__attribute__((used, section(".limine_requests")))
static volatile struct limine_entry_point_request entry_point_request = {
    .id       = LIMINE_ENTRY_POINT_REQUEST_ID,
    .revision = 0,
    .response = NULL,
    .entry    = limine_entry,        // a plain 64-bit C function
};

GRUB2 and the generic ELF/QEMU path, on the other hand, drop us in 32-bit protected mode. They come in through a different door.

So here’s the idea : two 32-bit doors, one shared path.

One door for Multiboot2 (_mb2_entry32)
One door for “someone loaded my ELF and jumped to e_entry” (_elf_entry32)

Both doors lead to the same hallway: a small piece of assembly that walks the CPU up from 32-bit to Long Mode, then hands off to the kernel. One ELF binary, three entry symbols, but only one of them does any real work.

There’s one detail to decide first. Once we’re in 64-bit, how do we know who sent us ? We’ll leave a breadcrumb : each 32-bit door stuffs a value in EDI (which becomes RDI in 64-bit). Multiboot2’s magic for a MB2 boot, 0 for “no idea, generic boot”. We read it on the other side to dispatch.

The higher-half trick #

Before any assembly, we need to talk about the linker script, because it’s what makes the whole thing possible.

Our kernel lives in the higher half : every symbol is linked at a virtual address >= 0xffffffff80000000. That’s where it’ll run once paging is on. But the 32-bit trampoline runs before paging is enabled, there is no higher half yet, the CPU only sees physical addresses.

The fix is to link high but load low. The VMA (where the code thinks it lives) is in the higher half, the LMA (where the bootloader actually drops the bytes) is down at 0x200000.

KERNEL_PHYS_BASE = 0x200000;
KERNEL_VIRT_BASE = 0xffffffff80000000;

. = KERNEL_VIRT_BASE + KERNEL_PHYS_BASE;

.multiboot2_header : AT(KERNEL_PHYS_BASE)
{
    KEEP(*(.multiboot2_header))
} :boot

AT(...) is the key : it sets the load address independently from the virtual one.

Now, the 32-bit code needs to reference things by their physical address (the stack, the page tables, the GDT). Hardcoding them would be ugly and fragile, so we let the linker compute them for us :

phys_boot_pml4      = boot_pml4      - KERNEL_VIRT_BASE;
phys_gdt64_pointer  = gdt64_pointer  - KERNEL_VIRT_BASE;
phys_trampoline64   = trampoline64   - KERNEL_VIRT_BASE;

Every phys_X is just X - KERNEL_VIRT_BASE. This is exactly the __pa() / phys_startup_32 trick I mentioned at the top : it’s how Linux does it too. The 32-bit code uses phys_* everywhere, the 64-bit code uses the normal high symbols.

Two 32-bit entrypoints #

The paths themselves are tiny.

asm

bits 32

; GRUB jumps here via the MB2 entry address tag.
global _mb2_entry32
_mb2_entry32:
    cli
    mov  edi, eax            ; EDI = magic 
    mov  esi, ebx            ; ESI = info ptr 
    jmp  trampoline32

; Anything that loads the ELF and jumps to e_entry lands here.
global _elf_entry32
_elf_entry32:
    cli
    xor  edi, edi            ; EDI = 0  => "no protocol"
    xor  esi, esi
    jmp  trampoline32

_elf_entry32 is also the target of a PVH ELF note, which is what lets the same binary boot as a Xen PVH guest (and QEMU, if you ever strip the MB2 header).

Remember the entry address tag from earlier ? It forces GRUB to enter at _mb2_entry32 instead of the ELF e_entry, so we keep MB2 and generic boot on separate doors.

The trampoline : 32 to 64 #

It gets us to Long Mode.

First, a temporary stack and some zeroed page-table memory to work with :

asm

trampoline32:
    mov  esp, phys_boot_stack_top

    mov  edi, phys_boot_pml4
    xor  eax, eax
    mov  ecx, (4096 * 4) / 4
    rep  stosd

Then we build a minimal set of page tables. We don’t need anything fancy, just enough to keep executing after we flip paging on. So : identity-map the low 1 GiB (so the trampoline keeps running at its physical address) and map the higher half (so we can jump up there afterwards).

asm

    mov  dword [phys_boot_pml4],           phys_boot_pdpt_low  + 0x03
    mov  dword [phys_boot_pml4 + 511 * 8], phys_boot_pdpt_high + 0x03
    mov  dword [phys_boot_pdpt_low],            phys_boot_pd + 0x03
    mov  dword [phys_boot_pdpt_high + 510 * 8], phys_boot_pd + 0x03

    ; 512 × 2 MiB pages (0x83 = PS | WRITABLE | PRESENT)
    mov  edi, phys_boot_pd
    mov  eax, 0x00000083
    mov  ecx, 512
.fill_pd:
    mov  [edi], eax
    mov  dword [edi + 4], 0
    add  eax, 0x200000
    add  edi, 8
    dec  ecx
    jnz  .fill_pd

Both PML4[511] and PDPT_high[510] point at the same PD as the low mapping. That 510 index is the -2 GiB slot, which is exactly where 0xffffffff80000000 lands. Same physical pages, two addresses.

We need to do these steps in this exact order or it won’t work:

Enable PAE in CR4
Load the page tables into CR3
Set EFER.LME (the bit from the “identifying our mode” section, remember)
Enable paging (CR0.PG)

asm

    mov  eax, cr4
    or   eax, CR4_PAE                ; 1. PAE
    mov  cr4, eax

    mov  eax, phys_boot_pml4         ; 2. CR3
    mov  cr3, eax

    mov  ecx, MSR_EFER               ; 3. EFER.LME (+ NXE)
    rdmsr
    or   eax, EFER_LME | EFER_NXE
    wrmsr

    mov  eax, cr0                    ; 4. paging on
    or   eax, CR0_PG | CR0_WP | CR0_PE
    mov  cr0, eax

We’re technically in long mode now, but we’re still running 32-bit code with a 32-bit GDT. We need a 64-bit code segment and a jump that reloads CS. I use a retf rather than a far jmp :

asm

    lgdt [phys_gdt64_pointer]

    push dword 0x08                  ; 64-bit code selector
    push dword phys_trampoline64     ; physical addr of the 64-bit half
    retf

Why push/retf instead of a plain jmp 0x08:trampoline64 ? Because the far jump wants an immediate, and I want the target to be a linker-resolved extern (phys_trampoline64). Pushing the selector + offset and doing a far return gets me the same effect with a symbol the linker can fill in.

The 64-bit GDT itself is nothing exotic, just a null, a code and a data descriptor :

asm

gdt64:
    dq 0                                                ; null
    dq (1 << 43) | (1 << 44) | (1 << 47) | (1 << 53)    ; 64-bit code
    dq (1 << 44) | (1 << 47) | (1 << 41)                ; data

Landing in 64-bit #

We made it. We’re in 64-bit, but still down in the identity-mapped physical region. Time to move into the higher half. We reload the data segments, then do an absolute jump to a high symbol :

asm

bits 64

trampoline64:
    mov  ax, 0x10                  
    mov  ds, ax
    mov  es, ax
    mov  fs, ax
    mov  gs, ax
    mov  ss, ax

    mov  rax, entry64_high           
    jmp  rax

From here on we’re a proper higher-half kernel. We grab the real stack and read our breadcrumb to figure out who booted us :

asm

entry64_high:
    mov  rsp, kernel_stack_top

    cmp  edi, MB2_BOOTLOADER_MAGIC
    je   .dispatch_mb2

    call kernel_entry_elf            ; EDI was 0, jmp generic boot
    jmp  .hang

.dispatch_mb2:
    call kernel_entry_mb2            ; EDI was the MB2 magic

And that’s the whole thing. The three paths converge :

kernel_entry_mb2 (came through the GRUB path)
kernel_entry_elf (came through the generic path)
limine_entry (let itself in, 64-bit from the start)

All three end up calling the same kernel_main, which from its point of view always wakes up in the same world : 64-bit long mode, paging on, a valid stack, kernel at the higher half.

Conclusion #

One ELF binary that boots under Limine, GRUB2, QEMU’s generic loader and Xen PVH. The entire protocol-specific surface is three little entry stubs, a handful of Limine requests and a linker script. Everything past kernel_main is bootloader-agnostic.

There’s plenty I’m not doing here, to be clear. The generic ELF path has no memory map, no framebuffer, no RSDP. But that’s a problem for kernel_main, not for the trampoline.

Sources #

Intel SDM
- 2-8 Vol. 3A: Figure 2-3 Transitions Among the Processor’s Operating Modes
- 12-13 Vol. 3A: 12.9.1 Switching to Protected Mode
- 9-21 Vol. 3A: 9.8.5 Initializing IA-32e Mode
AMD64 Architecture Programmer’s Manual, Vol. 2: System Programming - 14.6 “Long-Mode Initialization”
Multiboot2 Specification
Limine Boot Protocol
PVH Boot Protocol
Linux Standard Base Core Specification for X86-64

Bootloader-Agnostic Kernels : Crafting a Universal x86-64 entry trampoline

The Issue #

x86 Modes #

Identifying which mode we are currently in #

Entrypoint #

Multiboot2 #

The address tag #

The entry address tag. #

Other tags #

Limine #

Two paths, one goal #

The higher-half trick #

Two 32-bit entrypoints #

The trampoline : 32 to 64 #

Landing in 64-bit #

Conclusion #

Sources #

More posts

How to Use PGO and LTO With Meson and Clang

Part 2: Profiling, PGO & LTO

Optimizing an SPH Simulation, Part 1: Speed From the Compiler