Sporadic crash in xzm_main_malloc_zone_init_range_groups when spawning large binaries (macOS 26.3.1)

We're seeing a sporadic crash (~2-3% of spawns) when launching a large Mach-O binary via posix_spawn(). The crash happens inside libsystem_malloc.dylib during __malloc_init, before any application code runs. The process never reaches main().

Environment: macOS 26.3.1 (25D2128), Apple Silicon (ARM64)

Crash signature

BUG IN LIBMALLOC: pointer range initial reservation failed, Abort Cause 3

#0 libsystem_malloc.dylib: xzm_main_malloc_zone_init_range_groups.cold.1
#1 libsystem_malloc.dylib: xzm_main_malloc_zone_init_range_groups
#2 libsystem_malloc.dylib: xzm_main_malloc_zone_create
#3 libsystem_malloc.dylib: __malloc_init
#4 libSystem.B.dylib: libSystem_initializer
#5 dyld: dyld4::Loader::findAndRunAllInitializers

The binary

It's a Chromium component-build test binary (browser_tests):

  • ~1.5 GiB on disk, 5.54 GiB total VA footprint (__TEXT 517 MiB, __LINKEDIT 1.04 GiB, __PAGEZERO 4 GiB)
  • Links 527 dylibs via @rpath
  • All images span ~16.4 GiB of VA when loaded

A simple loop that spawns this binary 200 times via posix_spawn() reliably shows 2-5 crashes. Spawning /bin/cat 1000 times produces zero failures.

Investigation

We did extensive analysis to understand the root cause:

ASLR is irrelevant. We disabled ASLR using _POSIX_SPAWN_DISABLE_ASLR (flag 0x0100) and the failure rate is unchanged (~2% with or without). With ASLR disabled, the library addresses are identical across all crashes, confirming the VA layout itself isn't the problem.

Plenty of free VA space is available. We compared the memory layout of crashing processes (from crash reports) with successful ones (via vmmap):

  • In successful spawns, XZone places its MALLOC zones (SMALL, LARGE, metadata) in the large free regions after the loaded dylibs — for example at 0x784400000 and 0xD32000000, with 13-22 GiB contiguous free gaps available.
  • In crashing processes, the same free regions exist (the image layout is identical), but xzm_main_malloc_zone_init_range_groups fails to reserve into them.

Based on libmalloc/tests/memory_pressure.c, XZone needs 8 GiB for pointer ranges and 10 GiB for data ranges. The free gaps after the dylibs are far larger than this, yet the reservation sporadically fails.

No workarounds exist. MallocNanoZone=0 has no effect (the crash is before zone configuration). The crash is entirely within system code.

Questions

  1. Is this a known issue in XZone malloc on macOS 26.x?
  2. Is there any environment variable or entitlement that could work around this?
  3. Any guidance on what makes xzm_main_malloc_zone_init_range_groups fail non-deterministically when contiguous VA space is clearly available?
Answered by henguetta in 882435022

Thanks Quinn, that was exactly the right pointer. We traced through the disassembly of xzm_main_malloc_zone_init_range_groups on macOS 26.3.1 and correlated it with the open source in rel/libmalloc-792 (xzone_segment.c:1210-1250). Here's what we found:

The CONFIG_MACOS_RANGES path computes: ptr_reservation_size = XZM_RANGE_SEPARATION + XZM_POINTER_RANGE_SIZE + XZM_RANGE_SEPARATION = 4G + 16G + 4G = 24 GiB

ptr_start = 16GiB + (entropy % 736) * 32MiB

It then calls mach_vm_map with VM_FLAGS_FIXED for 24 GiB at ptr_start. The 736 granules cover [16 GiB, 39 GiB), ensuring the reservation fits under the 63 GiB commpage limit.

Why dylibs land above 16 GiB: The binary is a Chromium component-build test binary (browser_tests) that loads 516 Chromium dylibs plus system libraries (1543 images total). The main executable alone consumes 5.54 GiB of VA (4 GiB __PAGEZERO, 517 MiB __TEXT, 1.04 GiB __LINKEDIT). The 516 Chromium dylibs add another 10.19 GiB of aggregate VM (2.26 GiB __TEXT, 3.86 GiB __LINKEDIT). Combined with system libraries, the total loaded image span stretches from 4 GiB to 16.46 GiB - a 12.4 GiB spread. This pushes 7 dylibs past the 16 GiB boundary that XZone assumes is free:

The collision: When entropy % 736 is 0-14 (ptr_start = 16.0-16.438 GiB), the fixed 24 GiB reservation overlaps with one of these dylibs, and mach_vm_map returns KERN_NO_SPACE. Slots 15+ (ptr_start >= 16.469 GiB) clear the highest mapped byte and succeed. We confirmed this in lldb where in a certain run entropy % 736 was 1 (ptr_start = 0x402000000) and it collided with one of our dylibs (loaded at 0x401d04000).

The root issue is that CONFIG_MACOS_RANGES assumes the space starting at 16 GiB is available, but dyld can place dylibs there for sufficiently large binaries.

We filed https://feedbackassistant.apple.com/feedback/22372485 yesterday, but we can't upload the 1.5 GiB binary.

The 3 in that error is KERN_NO_SPACE, which is a pretty clear indication that the system memory allocator asked for address space and the kernel refused to give it. However, given all the constraints you’ve described, it’s not at all clear why that would be.

Weirdly, the code that traps this way is not on the main branch of the Darwin open source, but I found a copy in the rel/libmalloc-792 branch [1]. Now, I don’t have time today to study that code in depth, but I figured you’d find it interesting (-: AFAICT it tries to allocate address space by using mach_vm_map to map MEMORY_OBJECT_NULL, specifying a start that’s after 16 GiB, such that it fits within 64 GiB, with a randomly applied offset. That random bit probably explains why this shows up so infrequently.

It’s hard to say why this is failing. You might be able to learn more by running vmmap against the dying process. It’d also be interested to know what address was passed to mach_vm_map (with reference to the source, that’s the ptr_addr value), something you could perhaps figure out from a core dump or DTrace.

Regardless, it’s clear from the message that this is considered a bug in the memory allocator, and I encourage you to file it as such. At a minimum, you should include a sysdiagnose log taken shortly after reproducing it, but if you want to invest more time then adding a cut down version of your test harness that reproduces the issue would be great.

Please post your bug number, just for the record.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] As always with Darwin, there’s no guarantee that the code exactly matches any given release of macOS. However, in most cases it’s a pretty good guide.

Thanks Quinn, that was exactly the right pointer. We traced through the disassembly of xzm_main_malloc_zone_init_range_groups on macOS 26.3.1 and correlated it with the open source in rel/libmalloc-792 (xzone_segment.c:1210-1250). Here's what we found:

The CONFIG_MACOS_RANGES path computes: ptr_reservation_size = XZM_RANGE_SEPARATION + XZM_POINTER_RANGE_SIZE + XZM_RANGE_SEPARATION = 4G + 16G + 4G = 24 GiB

ptr_start = 16GiB + (entropy % 736) * 32MiB

It then calls mach_vm_map with VM_FLAGS_FIXED for 24 GiB at ptr_start. The 736 granules cover [16 GiB, 39 GiB), ensuring the reservation fits under the 63 GiB commpage limit.

Why dylibs land above 16 GiB: The binary is a Chromium component-build test binary (browser_tests) that loads 516 Chromium dylibs plus system libraries (1543 images total). The main executable alone consumes 5.54 GiB of VA (4 GiB __PAGEZERO, 517 MiB __TEXT, 1.04 GiB __LINKEDIT). The 516 Chromium dylibs add another 10.19 GiB of aggregate VM (2.26 GiB __TEXT, 3.86 GiB __LINKEDIT). Combined with system libraries, the total loaded image span stretches from 4 GiB to 16.46 GiB - a 12.4 GiB spread. This pushes 7 dylibs past the 16 GiB boundary that XZone assumes is free:

The collision: When entropy % 736 is 0-14 (ptr_start = 16.0-16.438 GiB), the fixed 24 GiB reservation overlaps with one of these dylibs, and mach_vm_map returns KERN_NO_SPACE. Slots 15+ (ptr_start >= 16.469 GiB) clear the highest mapped byte and succeed. We confirmed this in lldb where in a certain run entropy % 736 was 1 (ptr_start = 0x402000000) and it collided with one of our dylibs (loaded at 0x401d04000).

The root issue is that CONFIG_MACOS_RANGES assumes the space starting at 16 GiB is available, but dyld can place dylibs there for sufficiently large binaries.

We filed https://feedbackassistant.apple.com/feedback/22372485 yesterday, but we can't upload the 1.5 GiB binary.

Sporadic crash in xzm_main_malloc_zone_init_range_groups when spawning large binaries (macOS 26.3.1)
 
 
Q