This was combined with our custom thread management explained here.

Introduction

Once, I worked on a project in which we needed to implement our own and custom dynamic memory manager.

  • We are in the 21st Century, who needs to implement a memory manager?

We did. This was an embedded product in which we were really limited by RAM size, but the product requirements forced us to have dynamic memory in the software. So we decided to implement our own memory manager, optimizing memory usage. At that time we had just worked with a SIM800H chip which firmware also contains a custom memory manager and it allowed me to analyze it and gave me ideas on how to improve it in the way we needed. I came to the following conclusions:

  1. It was using 144 bytes as a memory header and 16 bytes as a memory footer.
  2. For each allocation, 8 extra bytes were used as a control block in which 4 of those bytes are the address of the next block
  3. 8 bytes memory alignment were used.

That system was using 16 bytes in order to allocate from 1 to 8 bytes, which was a lot for our software.

How can that be improved?

The extra information stored in the header, footer and in every allocation control block may not be completely necessary. Also, the control information for every allocation used too much memory. So I tried to reduce those things:

  1. Using only 16 bytes as memory header and no footer needed.
  2. For each allocation, 4 bytes are used as a control block which are the address pointer of the next block.
  3. 4 bytes memory alignment is used.

This way, only 8 bytes are used to allocate from 1 to 4 bytes, and 12 bytes are used to allocate from 5 to 8 bytes. The 4 bytes alignment would slightly slow down the system depending on the hardware but the requirement was to optimize the use of memory, not speed.

How does it work?

In order to handle the memory allocation, the software needs to have a huge array declared somewhere in the program (usually in the memory manager module). This huge array is basically all the memory available as dynamic memory. That means every succesful memory allocation should return a pointer into this array. So the size of this array needs to be adjusted depending on the hardware, requirements and the amount of available dynamic memory you want in the project. Setting its size as the total memory size you have in your chip could work, but it isn’t the best option. There are some things that should be taken into account that should be substracted from the total amount of dynamic memory:

  1. Each non const global or static variable needs space in the memory
  2. the use of some standard functions (printf, sprintf, memcpy, etc) internally use malloc and, as far as I know, there’s no way, or easy way, to make them work with your custom dynamic memory manager. So you need enough space for them to work if you’re going to use them.

My recommendation during development would be to start with a relatively small dynamic memory array and increase it as the development requires it. Let’s look at how the memory is organised in this array:

Memory structure

The main idea is that the memory is organised by memory blocks. Each block can be a free memory block or an allocated memory block. Two adjacent blocks can’t be of the same type: if they are, they join and make a single and bigger block. In order to keep track of all the blocks, each of them needs a pointer to the next one of the same type. This is what will be used to makes searches of the different memory blocks.

In the header we need two pointers: one to the first free memory block and the other to the first allocated memory block. That is 8 bytes, 4 for each pointer. Both pointers can point to NULL which means there are no blocks of that type, i.e. if the first free memory block in the header is NULL, it means all the memory has already been allocated. I also added two counters which can be very useful to know some statistics about the memory usage. Here it is an example of the header and the 16 bytes used in it (note it’s little endian):

Start addressMemory contentUse
0x2000099CE0090020Addres of the first free memory block (saved as unsigned int, this example shows the address 0x200009E0)
0x200009A000000000Address of the first allocated block
0x200009A4EC3B0000Number of free bytes. It does NOT count the 4 control bytes of the next allocation. Saved as unsigned int.
0x200009A800000000Number of used bytes including ALL the control bytes of each allocation. Saved as unsigned int.

Each of both free and allocated blocks use 4 control bytes to save the address of the next block of the same type. In the last blocks of each type, the pointer is NULL (0x00000000). Let’s see an example when making a single allocation of 5 bytes:

Start addressMemory contentUse
0x2000099CB8090020Memory Header. Address saved in 0x200009A0 points to the first allocated block (0x200009AC in this example)
0x200009A0AC090020
0x200009A4EC3B0000
0x200009A800000000
0x200009AC00000000Allocated Block. Pointer to the next allocated block. In this example it points to NULL because it's the last allocated block in memory
0x200009B000000000Usable bytes for the user.
0x200009B400000000Allocation of 5 bytes implies reserving 8 bytes due to 4 bytes alignment.
0x200009B800000000Free Block. Control bytes of the next free block. In this example it points to NULL because it's the last free block in memory
0x200009BC00000000

In this example, the memory manager would return the address 0x200009B0 which is the address of the first usable bytes by the user. But internally, the allocated block in the memory manager starts at 0x200009AC. Also, the user could use not only the 5 bytes they want, but 3 more, so 8 bytes in total. If we wanted to still optimize more, we could start the next block after the 5 bytes for the user.

Becoming thread safe

In the multi-threaded environment we had, we needed to make it thread safe. In order to accomplish that, some functions have to be provided to the memory manager. There are two different approaches:

  1. Suspend all tasks/threads (this is what FreeRTOS does)
  2. Disable CPU interruptions

Since we also wanted to use this module in a CPU interruption occasionally (UART, GPIO, etc) the first approach was not good enough for us because the execution would be blocked if an interruption was called during the execution of the memory manager. So we provided a function in which CPU interruptions were disabled. The provided function is called at the beginning and at the end of the allocation and free processes.