Hacker News

Why is the first C++ (m)allocation always 72 KB?

Comments

March 1, 2026 9 min read Via joelsiks.com

Mewayz Team

Editorial Team

Hacker News

The Mystery Behind Your First C++ Allocation

You write a simple C++ program. A single new int. Four bytes. You fire up strace or your favorite memory profiler, and there it is — your process just requested roughly 72 KB from the operating system. Not 4 bytes. Not 64 bytes. A full 72 KB. If you have ever stared at that number and wondered whether your tooling was lying to you, you are not alone. This seemingly bizarre behavior is one of the most frequently asked questions among C++ developers digging into memory internals for the first time, and the answer takes us on a fascinating journey through the layers that sit between your code and the actual hardware.

What Happens When You Call new

To understand the 72 KB figure, you need to trace the full allocation chain. When your C++ code executes new int, the compiler translates that into a call to operator new, which on most Linux systems delegates to malloc from glibc. But malloc does not directly ask the kernel for 4 bytes of memory. The kernel operates in pages — typically 4 KB on x86_64 — and the cost of a system call is enormous relative to a simple memory access. Calling brk() or mmap() for every individual allocation would make any non-trivial program grind to a halt.

Instead, glibc's memory allocator — an implementation called ptmalloc2, itself descended from Doug Lea's classic dlmalloc — acts as a middleman. It requests large blocks of memory from the kernel upfront, then carves them into smaller pieces as your program needs them. This is the fundamental reason your first 4-byte allocation triggers a much larger request to the operating system. The allocator is not being wasteful. It is being strategic.

Dissecting the 72 KB: Where the Bytes Go

The initial allocation overhead comes from several distinct components that the runtime must initialize before it can hand you even a single byte of usable memory. Understanding each component explains why the number lands where it does.

First, glibc's malloc initializes the main arena — the primary bookkeeping structure that tracks all allocations on the main thread. This arena includes metadata for the heap, free-list pointers, and bin structures for different allocation sizes. The allocator extends the program break via sbrk(), and the initial extension is governed by an internal parameter called M_TOP_PAD, which defaults to 128 KB of padding. However, the actual initial request is adjusted for page alignment and existing break position, which often results in a smaller first request — commonly landing near that 72 KB figure on a freshly started process.

Second, since glibc 2.26, the allocator initializes a thread-local cache (tcache) on first use. The tcache contains 64 bins (one per small-allocation size class), each capable of holding up to 7 cached chunks. The tcache_perthread_struct itself consumes around 1 KB, but the act of initializing it triggers the broader arena setup. Third, the C++ runtime has already performed allocations before your main() even runs — static constructors, iostream buffer initialization for std::cout and friends, and locale setup all contribute to that initial heap footprint.

The Arena System and Why Pre-Allocation Is Smart

The decision to pre-allocate a substantial chunk of memory rather than requesting it piecemeal is not an accident of implementation. It is a deliberate engineering tradeoff rooted in decades of systems programming experience. Every call to brk() or mmap() involves a context switch from user space to kernel space, modification of the process's virtual memory mappings, and potential page table updates. On modern hardware, a single system call costs roughly 100-200 nanoseconds — trivial in isolation, catastrophic at scale.

Consider a program that makes 10,000 small allocations during initialization. Without pre-allocation, that would mean 10,000 system calls, costing approximately 1-2 milliseconds of pure overhead. With an arena-based allocator, the first allocation triggers a single system call, and the subsequent 9,999 allocations are serviced entirely in user space through pointer arithmetic and linked-list operations — each taking roughly 10-50 nanoseconds. The math is unambiguous: pre-allocation wins by orders of magnitude.

The 72 KB you see on your first allocation is not wasted memory — it is a performance investment. The allocator is betting that your program will make more allocations soon, and in virtually every real-world scenario, that bet pays off handsomely. The cost of unused virtual address space is essentially zero on modern 64-bit systems.

Virtual Memory vs. Physical Memory: Why It Does Not Matter

A common concern among developers encountering this behavior for the first time is resource waste. If I only need 4 bytes, why is my program consuming 72 KB? The critical insight is that virtual memory is not physical memory. When glibc extends the program break by 72 KB, the kernel updates the process's virtual memory mappings, but it does not immediately back those pages with physical RAM. The actual physical pages are allocated on demand through page faults — only when your program writes to a specific address does the kernel assign a real page of memory to it.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →

This means that even though your process's virtual size increases by 72 KB, its resident set size (RSS) — the amount of physical RAM actually consumed — increases by only the pages you actually touch. For a single new int, that is typically one 4 KB page, plus whatever pages the arena metadata occupies. The remaining virtual space sits there, ready for use, costing nothing but address space — of which you have 128 TB on a 64-bit Linux system.

This distinction is critical when profiling and monitoring production applications. If you are building software that needs to track real resource consumption — whether it is a SaaS backend, a microservice, or an analytics pipeline like those that run on platforms such as Mewayz for business operations — you should always monitor RSS rather than virtual size. Tools like /proc/[pid]/smaps, valgrind --tool=massif, and pmap can give you accurate physical memory footprints rather than misleading virtual memory figures.

How Different Allocators Handle the First Allocation

The 72 KB figure is specific to glibc's ptmalloc2. Other allocators make different tradeoffs, and the initial allocation overhead varies accordingly. Understanding these differences is valuable when choosing an allocator for performance-sensitive applications.

jemalloc (used by Facebook, FreeBSD) — Uses a more granular arena structure with thread-local caches. The initial overhead tends to be higher (often 200+ KB) but delivers better multi-threaded performance due to reduced lock contention.
tcmalloc (Google's Thread-Caching Malloc) — Allocates a per-thread cache of approximately 2 MB by default, with aggressive pre-allocation. Initial overhead is higher, but subsequent small allocations are extremely fast.
musl libc's malloc — Uses a much simpler design based on mmap for all allocations. Initial overhead is minimal (often just 4 KB per allocation), but per-allocation cost is higher due to more frequent system calls.
mimalloc (Microsoft) — Uses segment-based allocation with 64 MB segments. The first allocation triggers a 64 MB virtual reservation (with minimal physical commitment), trading address space for exceptional locality and throughput.

The choice between these allocators depends entirely on your workload. For long-running server applications with heavy multi-threaded allocation, jemalloc or tcmalloc typically outperforms glibc's default. For memory-constrained embedded systems, musl's simpler approach may be preferable despite lower throughput. For most general-purpose desktop and server applications, ptmalloc2's 72 KB initial overhead represents a reasonable default that works well without tuning.

Tuning the Initial Allocation Behavior

If the default 72 KB initial overhead is genuinely problematic for your use case — perhaps you are spawning thousands of short-lived processes, each making only a handful of allocations — glibc provides several tunables via mallopt() and the MALLOC_ family of environment variables.

The M_TOP_PAD parameter controls how much extra memory the allocator requests beyond what is immediately needed. Setting it to 0 with mallopt(M_TOP_PAD, 0) tells the allocator to request only what is needed, reducing the initial overhead significantly. The M_MMAP_THRESHOLD parameter controls the size above which allocations use mmap instead of the arena. The M_TRIM_THRESHOLD controls when freed memory is returned to the OS. And since glibc 2.26, the glibc.malloc.tcache_count and glibc.malloc.tcache_max tunables let you control the thread cache behavior.

However, a word of caution: tuning these parameters without careful benchmarking almost always makes things worse. The defaults were chosen based on extensive real-world profiling, and they represent a sweet spot for the vast majority of workloads. Unless you have strong evidence from production profiling that malloc overhead is a bottleneck — and you have measured the impact of your changes — leave the defaults alone. Premature optimization of the allocator is a particularly insidious form of yak shaving that has consumed countless engineering hours for negligible benefit.

What This Teaches Us About Systems Programming

The 72 KB first-allocation mystery is, at its core, a lesson about abstraction layers. C++ gives you the illusion that new int allocates 4 bytes. The language standard says so. Your mental model says so. But between your code and the hardware sits a stack of sophisticated systems — the C++ runtime, the C library allocator, the kernel's virtual memory subsystem, and the hardware's MMU and TLB — each adding its own behaviors, optimizations, and overhead.

This is not a flaw. It is the entire point of systems software. Each layer exists to solve a real problem: the allocator exists so you do not have to make system calls for every allocation. The virtual memory system exists so you do not have to manage physical memory directly. The page fault handler exists so memory is committed lazily and efficiently. Every layer trades a small amount of transparency for a large amount of performance and convenience.

The developers who build the most reliable, highest-performing systems are those who understand these layers — not because they need to think about them constantly, but because when something unexpected happens (like a mysterious 72 KB allocation), they have the mental model to understand why. Whether you are building a real-time trading system, a game engine, or a business platform serving thousands of users, the ability to reason about what your code actually does at the system level is what separates competent developers from exceptional ones. The 72 KB is not a bug. It is your allocator doing its job brilliantly.

Build Your Business OS Today

From freelancers to agencies, Mewayz powers 138,000+ businesses with 207 integrated modules. Start free, upgrade when you grow.

Create Free Account →

Try Mewayz Free

All-in-one platform for CRM, invoicing, projects, HR & more. No credit card required.

Start Free Try Demo

Start managing your business smarter today

Join 30,000+ businesses. Free forever plan · No credit card required.

Start Free → Watch Demo

Found this useful? Share it.

X / Twitter LinkedIn Facebook WhatsApp

Ready to put this into practice?

Join 30,000+ businesses using Mewayz. Free forever plan — no credit card required.

Start Free Trial →

Hacker News

Bluesky has been dealing with a DDoS attack for nearly a full day

Apr 17, 2026

Hacker News

Human Accelerated Region 1

Apr 17, 2026

Hacker News

Discourse Is Not Going Closed Source

Apr 17, 2026

Hacker News

Substrate AI Is Hiring Harness Engineers

Apr 17, 2026

Hacker News

US Bill Mandates On-Device Age Verification

Apr 17, 2026

Hacker News

Show HN: SPICE simulation → oscilloscope → verification with Claude Code

Apr 17, 2026

Ready to take action?

Start your free Mewayz trial today

All-in-one business platform. No credit card required.

Start Free →

14-day free trial · No credit card · Cancel anytime

Why is the first C++ (m)allocation always 72 KB?

The Mystery Behind Your First C++ Allocation

What Happens When You Call new

Dissecting the 72 KB: Where the Bytes Go

The Arena System and Why Pre-Allocation Is Smart

Virtual Memory vs. Physical Memory: Why It Does Not Matter

How Different Allocators Handle the First Allocation

Tuning the Initial Allocation Behavior

What This Teaches Us About Systems Programming

Build Your Business OS Today

Try Mewayz Free

Start managing your business smarter today

Ready to put this into practice?

Related articles

Start your free Mewayz trial today

Try Mewayz — Live

Wait — don't leave empty-handed!

Check your inbox!

Why is the first C++ (m)allocation always 72 KB?

The Mystery Behind Your First C++ Allocation

What Happens When You Call new

Dissecting the 72 KB: Where the Bytes Go

The Arena System and Why Pre-Allocation Is Smart

Virtual Memory vs. Physical Memory: Why It Does Not Matter

How Different Allocators Handle the First Allocation

Tuning the Initial Allocation Behavior

What This Teaches Us About Systems Programming

Build Your Business OS Today

Try Mewayz Free

Start managing your business smarter today

Ready to put this into practice?

Related articles

Start your free Mewayz trial today

Change Language

Contact Us

Wait — don't leave empty-handed!

Check your inbox!