INCIDENT REPORT #882-B: Why our heap looks like a disaster zone.
DATE: October 14, 2023
TO: Engineering Lead, CTO, and anyone else who thinks “it’s just a script”
FROM: Senior Systems Engineer (Level 4, Infrastructure)
SUBJECT: Post-Mortem of the Node-04 Memory Exhaustion Event (Python 3.11.4)
Table of Contents
SYSTEM INCIDENT LOG: 2023-10-12T03:14:22Z
03:14:22 - CRITICAL - kernel: [99283.12] oom-kill: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
03:14:22 - CRITICAL - kernel: [99283.12] Out of memory: Killed process 14202 (python3) total-vm:32.4GB, anon-rss:31.1GB
03:14:23 - ALERT - Monitoring: Service 'DataIngestor' down on Node-04.
03:14:45 - INFO - SysEng: Initial investigation shows heap fragmentation and 98% RAM utilization.
03:15:10 - INFO - SysEng: Tracing memory leak to a single variable: 'pending_records'.
03:15:12 - INFO - SysEng: It's a python list. Again.
SECTION 1: THE LIE OF THE ABSTRACTION
I’ve been awake for 48 hours. My coffee is cold, my eyes feel like they’ve been rubbed with sandpaper, and I’ve spent the last six hours staring at a heap dump that looks like a Jackson Pollock painting if he only used “unallocated” and “pointer” as colors. The culprit? A python list.
The problem with modern developers—the ones who think a “bootcamp” qualifies them to touch production—is that they treat the python list as some sort of magical, infinite, weightless container. They think it’s a linked list. They think it’s a “collection.” They don’t realize that under the hood, it’s a brutal, unforgiving C array of pointers that will happily eat your RAM and ask for seconds.
When you call my_list = [], you aren’t just creating a “box.” You are invoking PyList_New in Objects/listobject.c. You are asking the CPython runtime to manage a PyListObject struct. This struct contains a pointer to a block of memory (ob_item) and a count of how many items are currently in it (ob_size), plus the total capacity (allocated).
On a 64-bit system running Python 3.11.4, an empty python list isn’t “empty.” It’s 56 bytes of overhead just to exist.
$ python3.11
>>> import sys
>>> l = []
>>> sys.getsizeof(l)
56
Fifty-six bytes for nothing. And that’s just the start of the tragedy.
SECTION 2: THE OVER-ALLOCATION TAX
Let’s talk about how a python list grows. It doesn’t grow one element at a time. That would be too efficient for the OS but too slow for the CPU. Instead, Python uses a “growth pattern” to minimize the number of times it has to call realloc().
If you look at listobject.c in the CPython source, specifically the list_resize function, you’ll see the logic. The growth factor isn’t a simple doubling. It’s a bit more “creative.” The formula for the new size is roughly:
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
This means the python list over-allocates. It grabs more memory than it needs “just in case.” Here is the sequence of allocated slots as you append: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88…
When our “DataIngestor” service was pulling 10 million records into a python list, it wasn’t just using memory for 10 million records. It was using memory for the next jump in the allocation sequence. If you have 8,000,001 items, the python list has allocated space for significantly more. That “slack” space is dead RAM. It’s a ghost in the machine, sitting there, doing nothing, while the OOM killer sharpens its axe.
# Monitoring the growth of a python list
>>> import sys
>>> l = []
>>> for i in range(20):
... l.append(i)
... print(f"Len: {len(l)}, Size: {sys.getsizeof(l)}, Allocated: {(sys.getsizeof(l)-56)//8}")
...
Len: 1, Size: 88, Allocated: 4
Len: 2, Size: 88, Allocated: 4
Len: 3, Size: 88, Allocated: 4
Len: 4, Size: 88, Allocated: 4
Len: 5, Size: 120, Allocated: 8
...
Len: 9, Size: 184, Allocated: 16
Look at that. At length 5, the python list jumped from 88 bytes to 120 bytes. It’s already wasting space for 3 extra pointers. On a 64-bit machine, each pointer is 8 bytes. Multiply that by 10 million records, and you’re talking about megabytes of wasted space just to hold nothing.
SECTION 3: THE POINTER INDIRECTION NIGHTMARE
Here is the part where the junior devs usually start crying. A python list does not store your data. It stores pointers to your data.
If you have a python list of integers, you don’t have a contiguous block of integers in memory. You have a contiguous block of 8-byte memory addresses. Each of those addresses points to a PyObject somewhere else on the heap.
Let’s do the math for 1,000,000 integers.
In a sane language like C, an array of 1,000,000 32-bit integers takes 4,000,000 bytes. 4MB. Simple.
In Python, for a python list of 1,000,000 integers:
1. The list itself: ~8,000,000 bytes for the pointers (8 bytes each).
2. The integer objects: Each int in Python 3.11 is at least 28 bytes.
– 24 bytes for the PyObject header (refcount, type pointer, size).
– 4 bytes for the actual value.
Total: (8 bytes * 1,000,000) + (28 bytes * 1,000,000) = 36,000,000 bytes.
36MB.
We are using 36MB to store 4MB of data. That is a 9x overhead. And that’s assuming no fragmentation. When the “DataIngestor” was running, it wasn’t just storing integers; it was storing complex dictionaries and strings. The indirection kills cache locality. The CPU spends all its time waiting for the RAM to fetch the pointer, then waiting again to fetch the object the pointer points to. It’s a game of “where’s Waldo” played at 3 gigahertz, and the RAM is losing.
SECTION 4: THE O(N) INSERTION SUICIDE
I found a line of code in the ingestor.py file that made me want to throw my mechanical keyboard through the window.
pending_records.insert(0, new_record)
The developer who wrote this clearly thinks a python list is a linked list. It is not. As I’ve already established, it’s a dynamic array. When you insert at index 0, the python list has to move every single other pointer one slot to the right in memory.
If the list has 1,000,000 items, insert(0, x) calls memmove() for 8,000,000 bytes. Every. Single. Time.
If you do this in a loop to ingest 1,000,000 items, you aren’t doing $O(n)$ work. You are doing $O(n^2)$ work. You are moving trillions of bytes.
I watched the CPU metrics for Node-04. It wasn’t doing “data processing.” It was spending 99% of its cycles in libc.so.6 moving memory around because some kid didn’t understand that a python list is a contiguous block. This is why the service was lagging. This is why the “heartbeat” failed. This is why I was paged at 3 AM.
SECTION 5: THE 64-BIT TAX AND OBJECT OVERHEAD
We are running on 64-bit hardware. Everything is bigger. Every pointer in that python list is 8 bytes. In Python 3.11, the object headers are still massive.
Let’s look at the PyObject struct:
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
Every single item you put in a python list carries this baggage. If you have a list of small strings, the overhead is even more insulting. A 1-character string in Python is about 50 bytes.
When we hit 31GB of RSS (Resident Set Size), it wasn’t because we had 31GB of data. We had maybe 3GB of actual data and 28GB of Python’s “convenience.” The python list is the primary vehicle for this bloat. It encourages you to keep references to objects alive longer than they need to be. Because the list holds a reference, the garbage collector can’t touch the objects. Even if you “delete” the data elsewhere, if it’s still in that python list, it’s staying in RAM.
And don’t get me started on the cyclic garbage collector. The python list is a container object, which means it’s tracked by the GC’s linked lists for cycle detection. This adds even more overhead. Every time the GC runs, it has to traverse these massive lists, checking ob_refcnt and looking for cycles. With 10 million items, the GC pause times were hitting 500ms. The whole application just… stops.
SECTION 6: REMEDIATION FOR THE INCOMPETENT
If I see another python list used for high-throughput data ingestion, I am going to start revoking git permissions. We have tools for this. Tools that don’t treat RAM like an infinite resource.
1. Use collections.deque for Queues
If you need to add things to the beginning and end, use a double-ended queue. It’s implemented as a linked list of blocks. It doesn’t require memmove() for every insertion. It’s $O(1)$ for appendleft(), unlike the python list which is $O(n)$.
2. Use array.array for Homogeneous Data
If you are storing 10 million integers, for the love of God, use the array module.
import array
# This uses 4 bytes per integer. Period. No PyObject overhead for each element.
data = array.array('i', [0] * 1000000)
The array.array stores the actual raw C values in a contiguous block. It’s compact. It’s fast. It’s what a python list pretends to be but isn’t.
3. Use NumPy for Numerical Work
If you’re doing math, a python list is a joke. NumPy arrays are contiguous, they allow for SIMD instructions, and they don’t involve pointer indirection for every single addition.
4. Pre-allocate if you must use a list
If you know you need 1,000,000 slots, don’t append() in a loop. That triggers the resize logic multiple times. Pre-allocate: l = [None] * 1000000. This sets the allocated size once. It’s still a list of pointers, but at least you aren’t calling realloc and memmove twenty times while the list grows.
SECTION 7: THE “JUNIOR-LEVEL” CODE VS. REALITY
Let’s look at the “before” and “after” of the refactor I had to perform while hallucinating from sleep deprivation.
THE CRIME (Original Code):
# Found in ingestor.py
class DataIngestor:
def __init__(self):
self.pending_records = [] # The ticking time bomb
def add_record(self, record):
# O(n) insertion. Absolute madness.
self.pending_records.insert(0, record)
# Memory grows indefinitely until OOM
if len(self.pending_records) > 1000000:
self.process_batch()
def process_batch(self):
# Slicing creates a NEW list. Another memory allocation!
batch = self.pending_records[:500000]
self.pending_records = self.pending_records[500000:]
return batch
THE REFACTOR (What I wrote at 4 AM):
from collections import deque
class DataIngestor:
def __init__(self):
# Deque handles O(1) appends and pops from both ends
self.pending_records = deque()
def add_record(self, record):
# O(1) operation. No memmove.
self.pending_records.append(record)
if len(self.pending_records) > 1000000:
self.process_batch()
def process_batch(self):
# No slicing. No new list creation.
batch = []
for _ in range(500000):
if self.pending_records:
batch.append(self.pending_records.popleft())
return batch
The refactored version dropped the memory usage of the process by 40% immediately. Why? Because we stopped slicing. In Python, self.pending_records[:500000] creates a shallow copy. That means it creates a new python list and copies 500,000 pointers into it. For a brief moment, you have the original list and the slice in memory at the same time. If you’re already at 80% RAM usage, that slice is the bullet that kills the process.
SECTION 8: FINAL GRIEVANCES
I am tired of explaining that “Python is slow” is a self-fulfilling prophecy written by people who don’t understand how a python list works. Python isn’t inherently slow; your use of its abstractions is incompetent.
The python list is a tool. It is a general-purpose, flexible, dynamic array of pointers. It is not a database. It is not a message queue. It is not a high-performance numerical buffer. When you treat it as such, you are offloading the mental burden of memory management onto the hardware. But the hardware has limits.
We lost two hours of production data because someone thought list.insert(0, x) was “cleaner” than using a deque. We lost $4,000 in cloud compute over-provisioning because the heap was so fragmented by millions of int objects that the OS couldn’t reclaim memory even after the list was cleared.
Next time someone suggests we “just add more RAM” to fix a memory leak caused by a python list, I am resigning. We don’t need more RAM. We need developers who have read listobject.c. We need developers who understand that every time they type [], they are making a contract with the CPython runtime—a contract that says, “I am too lazy to manage memory, so please waste it for me.”
The audit is complete. The server is back up. I am going home to sleep. If the “DataIngestor” crashes again because someone replaced the deque with a python list because “it’s more Pythonic,” don’t call me. Call a priest.
END OF REPORT.
APPENDIX A: MEMORY BREAKDOWN (64-BIT)
| Item | Size (Bytes) |
|---|---|
PyListObject Header |
56 |
Single Pointer in ob_item |
8 |
PyObject Header (Integer) |
24 |
| Integer Value (Small) | 4 |
| Total for 1 Integer in List | 88 (List overhead + Pointer + Object) |
Total for 1 Integer in array.array |
4 |
Note: The 88 bytes for the first element includes the initial over-allocation of the list. Subsequent elements cost 8 bytes (pointer) + 28 bytes (object) until the next resize event.
APPENDIX B: THE GROWTH CONSTANTS
For those who still don’t get it, here is the growth table for a python list in 3.11:
- 0 elements: 56 bytes (Allocated: 0)
- 1-4 elements: 88 bytes (Allocated: 4)
- 5-8 elements: 120 bytes (Allocated: 8)
- 9-16 elements: 184 bytes (Allocated: 16)
- 17-25 elements: 256 bytes (Allocated: 25)
- 26-35 elements: 336 bytes (Allocated: 35)
If you have 26 items, you are paying for 35. You are paying for 9 pointers to nowhere. Multiply that by the scale of our production environment, and you’ll see why the CFO is asking why our AWS bill looks like a phone number.
SIGN-OFF:
Senior Systems Engineer, Infrastructure Recovery Team
“I miss malloc.”
Related Articles
Explore more insights and best practices: