loader gif

C threads in ARM Cortex M3

April 26, 2015

This was combined with a custom memory manager explained here.


One of the requirements for the product was to not block the user interface. That means the UI had to give different kinds of feedback to the user while the software was processing something. The SDK didn’t include any help for a multi-threaded system and we evaluated the following options:

  1. Build the system to process every function step by step without blocking other processes
  2. Implementing a custom thread system

The first option seemed to be the fastest one, but it would have complicated the development a lot. Implementing every function step by step would be too difficult if a single step took longer than expected or became blocked by any driver or device call, for instance. So I started investigating and implementing our own thread management.

Understanding hardware implications

To make a proper context switch some things have to be understood first:

  • Each thread has its own stack
  • When making a switch context, the CPU register values of the previous executing thread have to be saved onto the stack and the new active thread’s CPU register values have to be loaded from its stack.

We were working with a Cortex M3 microprocessor, and it has the following registers:

  • r0
  • r1
  • r2
  • r3
  • r4
  • r5
  • r6
  • r7
  • r8
  • r9
  • r10
  • r11
  • r12
  • r13 (SP)
  • r14 (LR)
  • r15 (PC)
  • xPSR


  • SP stands for Stack Pointer
  • LR stands for Link Return
  • PC stands for Program Counter
  • xPSR stands for special-purpose program status registers

All 17 registers have to be saved somewhere in order to be able to restore them later when making a context switch. As each thread has its own stack, the proper way is to store the values of the registers onto the stack of the corresponding thread.

Hardware support

The Cortex M3 “has dedicated multi-tasking hardware including task-switching interrupts (SysTick and PendSV)  and two stack pointers. The SysTick hardware consists of a 24-bit timer that triggers an interrupt each time it counts to zero. The PendSV interrupt is a software request, which can manually force a context switch.” (reference).

On one hand, when a SysTick or PendSV interruption is triggered, the hardware automatically saves some of the registers onto the current stack. Those registers are called the hardware stack frame and they are: r0, r1, r2, r3, r12, r14 (LR), r15 (PC) and xPSR. On the other hand, the other registers, called software stack frame, have to be saved by software. These registers are: r4, r5, r6, r7, r8, r9, r10 and r11.

Before the interruption ends, the software stack frame of the switching thread has to be loaded from the thread’s stack. The hardware stack frame will be autmatically loaded from the stack by the hardware when returning from the interruption. But that implies that just before the interruption ends the stack pointer should point at the correct address, that is at the beginning of the hardware stack frame.

If you have been paying attention to which registers belong to the hardware stack frame and which belong to the software stack frame maybe you have noticed that neither of them store the r13 (SP) register. As that is the stack pointer, it makes no sense to store it on the stack. We will have no way to recover the stack pointer if we don’t know there the stack is. Because of that, the stack pointer register (r13) has to be stored in the thread control block (TCB), which will be explained later.

Context switch

The project has to be configured properly in order to catch the SysTick and PendSV interruptions when they are triggered. To do that, the startup.S file should be modified and the custom function of the context switch should be added in both SysTick and PendSV interruption handlers. Once this is done, the switch context function has to execute the next steps:

  1. Save current thread software stack frame
  2. Change thread and get its stack pointer (this is where the actual context switch is done)
  3. Restore the new thread software stack frame
  4. Finish the interruption

These steps are done in assembly because we need to work directly on the CPU registers, except from the step 2, which calls a C function. This C function is where the actual context switch is done, it has to return the stack pointer of the new thread to be executed.

1- Saving the software stack frame

"  mrs r0, MSP                 \n"  // save the stack pointer in r0
"  stmdb r0!, {r4-r11}         \n"  // Copy the values of r4-r11 in the stack and
                                    // decrement the stack pointer (r0)

2- Context switch

When the schedulerSwitchContext() function ends, its return value is the stack pointer of the new executing thread, and it is stored in the register r0 as the step 3 expects.

"  mov r4, lr                  \n"
"  bl schedulerSwitchContext   \n"    // switch context. Calls C function
"  mov lr, r4                  \n"

3- Saving the software stack frame

"ldmia r0!, {r4-r11} \n" // Recover the registers r4-r11 from the stack and
                         // increment the stack pointer (r0)

4- Finish the interruption

"  mov r7, r0 \n" // Copies the stack pointer in r7 so the function
		  // returns to the switched thread

The whole interruption handler function looks like this:

void CW_SwitchContext(){
    asm volatile
	"mrs r0, MSP  		    \n"	// save the stack pointer in r0
	"stmdb r0!, {r4-r11} 	    \n" // Copy the values of r4-r11 in the stack and
					// decrement the stack pointer (r0)
	"mov r4, lr 		    \n"
	"bl schedulerSwitchContext  \n"	// switch context. Calls C function
	"mov lr, r4 		    \n"
	"ldmia r0!, {r4-r11} 	    \n"	// Recover the registers r4-r11 from the stack and
					// increment the stack pointer (r0)
	"msr MSP, r0 		    \n" // Loads the stack pointer in MSP
	"			    \n"
        "mov r7, r0 		    \n"	// Copies the stack pointer in r7 so the function
					// returns to the switched thread

Thread Control Block (TCB)

In order to manage all threads and their corresponding stack, we need to mantain a table or a list of them. For this, the ThreadControlBlock structure is used:

struct ThreadControlBlock{
    unsigned long long wakeup_timestamp;
    int *stack;
    int *stackPointer;
    unsigned short int stacksize;
    enum threadState state;
    unsigned char threadID;

Each time the user wants to create a thread, a TCB has to be initialized. This implies the following actions:

  1. Allocating a reference to the struct itself
  2. Allocationg memory for the stack
  3. Initialize the stack with proper values
  4. Add the thread to the threads list

The most important thing here is point 3, initialize the stack with proper values. In our case, we had a descending stack, that means that as the stack goes deeper, the memory addresses are lower. Once the TCB and thread stack are allocated, this function was used to initialized the stack:

void InitializeThreadStack(ThreadControlBlock_st* thread, void (*threadFunc)(void *), void *threadArgs){
    // Task (thread) initialization function. Based on:
    // http://www.embedded.com/design/prototyping-and-development/4231326/Taking-advantage-of-the-Cortex-M3-s-pre-emptive-context-switches
    // with inverted order of the registers because we have a DESCENDING stack!!

    int i = 0;
    int *stackIterator = 0;

    stackIterator = &(thread->stack[(thread->stacksize>>2) - 1]);

    // This is the xPSR register. Should be set to 0x21000000
    *stackIterator = 0x21000000;

    // This is the PC (Program Counter). This should point to the thread start function
    *stackIterator = (int*)threadFunc;

    // This stack position is the LR (Link Return). This should point to the thread Stop function
    *stackIterator = ThreadStopFunc;

    // r12 should be set to 0.
    *stackIterator = 0;

    // Registers r1, r2 and r3 are function arguments,
    // but we set them to 0. Argument is passed in r0
    *stackIterator = 0; stackIterator--; // r1
    *stackIterator = 0; stackIterator--; // r2
    *stackIterator = 0; stackIterator--; // r3

    // Sets the register r0 position of the stack - This is the thread function argument
    *stackIterator = threadArgs;

    // Sets to 0 the software stack frame of the registers r4-r11. This is needed because
    // when the switch context recovers this thread it will pop the software stack frame.
    for(i = 4; i <= 11; i++){
        *stackIterator = 0;

The ThreadStopFunc is a private function in which the thread is marked as destroyed. This function will avoid the thread from being executed again (and it will be destroyed and freed in the scheduler). The thread execution will go automatically to this function because it’s assigned to the initial Link Return register, so the user doesn’t need to call a thread finish function as it’s done in some libraries (FreeRTOS for instance).

Thread concurrency and sleep

The TCB stores the state of the thread. This makes it able to implement different features. For instance, a thread could be locked and it won’t execute until it’s unlocked. This allows the implementation of sempahores and mutexes (which I won’t cover here). Here are those functions:

void LockThread(){
    gCurrentThread->state = THREAD_STATE_WAITING;
    // Forces a context switch because current thread is locked

void UnlockThread(ThreadControlBlock_st *threadToUnlock){
    if(threadToUnlock != NULL){
        if(threadToUnlock->state == THREAD_STATE_WAITING){
            threadToUnlock->state = THREAD_STATE_IDLE;

A thread can also sleep. To do that, the system needs to have a real time clock and a function to get the timestamp in milliseconds (I won’t cover this here either). The thread enters into the state of sleep and it won’t be woken up until the time has passed.This doesn’t guarantee that it will restart execution precisely at the milliseconds given (perhaps it will restart slightly after depending on the scheduler), but it will never restart before.

void Sleep(int ms){
    // Sets the thread wake up time
    gCurrentThread->wakeup_timestamp = GetTimestamp() + (u64)ms;

    gCurrentThread->state = THREAD_STATE_SLEEP;

    // Forces a context switch because current thread is sleeping