It seems like every C programmer builds their own memory allocator at some point. It’s almost like a rite of passage. I figured it was my turn. I had the opportunity of implementing one while I was tutoring for CSE29. One of the PA’s we had to guide students (PA3 was the malloc simulator). Since I had never taken the course before, one of my duties was to get familiar with the PA and if I had time, implement it myself. I probably should’ve but felt strange just filling out #TODO’s. I felt like I was cheating. If I was going to build my own memory allocator, it should be 100% my own. I think I’m just stubborn. Anyways I finally built my own, it’s not as elaborate but it’s mine. You can check it out on my Github :D

Why do we need Malloc?

Memory layout of C programs.

C has a pretty rigid system for handling memory. To start of, lets focus on the different segments that make up a program’s virtual address space.

Text Segment (also known as the code segment)
- a. Contains the instructions for executing the program.
- b. Read only, so nothing can change the instructions of the program while it’s running.
- c. Fixed size and read directly from the executable file on disk.
- d. Stores any literals from the program.
Static memory
- a. Uninitialized (Also known as the BSS segment)
  - i. Stores uninitialized global and static variables (i.e., holds values that are not known at compile time).
  - ii. Fixed size.
  - iii. Read and Write accessible.
- b. Initialized (also known as the data segment)
  - i. Holds initialized global and static variables (i.e., holds values that are known at compile time).
  - ii. Read directly from the executable.
  - iii. Fixed size.
  - iv. Read and Write accessible.
Dynamic memory
- a. Heap
  - i. Used for dynamic memory allocation (i.e., malloc and free).
  - ii. Grows and shrinks as needed during runtime, for sizes that can’t be determined during compile time.
  - iii. Grows upwards (towards higher memory addresses).
- b. Stack
  - i. Used for function call management, local variables, and arguments.
  - ii. Grows downwards (towards lower memory addresses).
  - iii. Automatically managed by the system (push and pop operations).
  - iv. Has a limited size, which can lead to stack overflow if exceeded.

We need malloc for variables with unknown compile-time sizes or those requiring a lifecycle that extends beyond the scope of a single function. It’s for the situations where we don’t know “how much” and “how long” at compile time.

mmap()

To get memory for my heap, I decided to use mmap(). mmap() is a system call that creates a memory mapping in the virtual address space of the process. I saw different implementations of malloc using sbrk() and mmap(). I chose mmap() because it allows for more flexible memory management and can be used to allocate larger blocks of memory (which I needed). It also provides better performance for large allocations (which is nice).

I also saw other implementations that called mmap() every malloc() call, but I wanted to minimize the number of system calls for performance reasons, so I decided to call mmap() once to get a large block of memory and then manage that block for subsequent malloc() and free() calls.

My implementation!

Functions:

my_malloc: Asks for a block of memory and returns a pointer to it.
my_calloc: Allocates memory for an array of elements and strictly initializes all of its bits to zero before handing it over.
my_free: Takes a pointer back when you’re done and marks that specific segment of memory as available for reuse.
my_realloc: Lets you resize a block of memory you already allocated, keeping the original data intact up to the new size.

How I organized memory:

Heap:A single, massive 10 MB region of memory I request from the OS exactly once using mmap.
Block: A smaller piece carved out of that giant heap that I hand out whenever malloc or calloc is called.

Both of these levels need metadata to track what is currently happening in memory. I put a small header at the very beginning of the giant heap, and another header at the beginning of each individual block. After a single malloc call, the 10 MB memory map is essentially split into two pieces: a “Used” block containing the requested bytes, and a massive “Free” block containing the rest.

Here are the C structures I used to manage the heap and blocks plus the functions I built:

typedef struct s_block{
  size_t size;
  t_bool is_free;
  struct s_block *next;
  struct s_block *prev;
}t_block;

typedef struct s_heap{
  size_t total_size;
  size_t free_size;
  size_t block_count;
}t_heap;

void *my_malloc(size_t size);
void my_free(void *ptr);
void *my_calloc(size_t num, size_t size);
void *my_realloc(void *ptr, size_t new_size);

By giving each block next and prev pointers, I effectively created a doubly linked list directly inside the memory arena. This lets the allocator walk through the giant heap to find free spots, or to easily peek at the neighbors of a block I want to free so they can be merged back together to prevent fragmentation.