Symbolicating kernel backtraces on Apple Silicon

I am able to symbolicate kernel backtraces for addresses that belong to my kext.

Is it possible to symbolicate kernel backtraces for addresses that lie beyond my kext and reference kernel code?

Answered by DTS Engineer in 836958022

Let me step back to the beginning and explain this in a bit more detail:

  1. When using "-l", what atos is actually "getting" there is the load address starting from the first __TEXT segment in the executable.

  2. In arm64, (very generally) things shifted about code was loading out of __TEXT_EXEC, NOT _TEXT.

SO, to correctly symbolicate, you need to properly shift the load address to account for that. Starting with loadable KEXTs, this is fairly easy. They're __TEXT segment is at 0, so the math here is:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->

KEXTs also tend to have a fixed opening __TEXT size, so the value of __TEXT_EXEC.vmaddr ends up being "0x4000"

0x4000 - 0 -> 0x4000

That is where my original command came from:

atos -arch arm64e -o <symbol file path)  -l <KEXT load address - 0x4000>  <address to symbol>

NOTE: My earlier post was wrong, you need to subtract not add.

For kext, that load address comes directly from the panic log. Here is what that looks like in the panic from the other thread on this issue:

          lr: 0xfffffe0013723894  fp: 0x0000000000000000
      Kernel Extensions in backtrace:
         com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab
            dependency: com.apple.iokit.IOPCIFamily(2.9)[6D6666E6-340F-3A5E-9464-DE05164C0658]@0xfffffe0015e65e90->0xfffffe0015e93b3f

Now, the math is actually exactly the same for the kernel, but the numbers are totally different. Running otool -l on kernel.release.t8112 gets you:

__TEXT.vmaddr -> 
...
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 552
  segname __TEXT
   vmaddr 0xfffffe0007004000
   vmsize 0x0000000000100000
...

AND:

__TEXT_EXEC.vmaddr:
...
Load command 2
      cmd LC_SEGMENT_64
  cmdsize 312
  segname __TEXT_EXEC
   vmaddr 0xfffffe00072b8000
   vmsize 0x00000000008f0000
...

(You may notice there are multiple __TEXT_EXEC section, in this case all we care about is the first)

SO, plugging that that into our math above:

(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->
0xfffffe00072b8000 - 0xfffffe0007004000 -> 0x2B4000

That number is then subtracted from:

Kernel text exec base:  0xfffffe001cd8c000

0xfffffe001cd8c000 - 0x2B4000 -> 0xFFFFFE001CAD8000

Plugging that value into your atos command:

atos -o kernel.release.t8112 -arch arm64e -l 0xFFFFFE001CAD8000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04

Gives us:

handle_debugger_trap (in kernel.release.t8112) (debug.c:1839)
handle_uncategorized (in kernel.release.t8112) (sleh.c:1591)
sleh_synchronous (in kernel.release.t8112) (sleh.c:1477)
fleh_synchronous (in kernel.release.t8112) + 44
panic_trap_to_debugger (in kernel.release.t8112) (debug.c:1400)
panic_with_options_and_initiator (in kernel.release.t8112) (debug.c:1213)
0xfffffe001d86a708
0xfffffe001d86941c
0xfffffe001d866504
0xfffffe001e234bc8
sleh_irq (in kernel.release.t8112) (sleh.c:3840)
fleh_irq (in kernel.release.t8112) + 64
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
ubc_create_upl_kernel (in kernel.release.t8112) (ubc_subr.c:2643)
vnode_pageout (in kernel.release.t8112) (vnode_pager.c:407)
vnode_pager_data_return (in kernel.release.t8112) (bsd_vm.c:451)
vm_pageout_iothread_external_continue (in kernel.release.t8112) (vm_pageout.c:4237)
Call_continuation (in kernel.release.t8112) + 196

Note the references to "panic..." which are an easy way to validate that the symbolication is valid. By it's nature, the kernel cannot "crash" in the same sense any other process can, as the kernel IS the core component that controls all of the mechanisms that cause OTHER processes to crash. Because of that, "all" kernel panics end in a call to "panic" because that's the function that ends up generating the log data your looking at and then stopping normal execution.

Anticipating your question, no, I don't know why the other 4 frame aren't symbolicating, but I don't think it's an issue with atos. Apple obviously has our own internal symbolication tools and they're getting exactly the same result. Are they from your KEXT and you just didn't include the KEXT address info?

Having said that, they also don't really matter for this particular panic. That's because of this message at the top of the log:

panic(cpu 4 caller 0xfffffe001d86a708): watchdog timeout: no checkins from watchdogd in 90 seconds (25 total checkins since monitoring last enabled)

This message basically means exactly what it sounds like. There's system in place (managed by "watchdogd") to verify that user space that requires specific, critical system daemons to "checkin" with the kernel. If they fail to do so, then the system times out and the kernel panics. That dynamic also means that the specific stack traces is largely irrelevant, as it simply shows what happened to be executing at the time the panic occurred, NOT what actually caused the panic.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for clarifying that point. Hopefully, Xcode will soon come invested with the symbolication capabilities on Apple Silicon which will certainly be much appreciated.

I have closed FB17497530. Thanks.

I am unsure if there has been any progress in this area. But I needed to resolve a recent Tahoe panic, so chatgpt cobbled together this:

https://gist.github.com/lundman/54e633a850e7623aae5adab38a39f464

If we are allowed to share? Either way, output was:

./symbolicate_panic.py -p ~/ZFS.2.3.1rc1.kernel.panic.-.Tahoe.M4.Pro.Mac.Mini.txt -k /Library/Extensions/zfs.kext/Contents/MacOS/zfs --kernel --kdk-nearest --accept-mismatch
=== KEXT mapping ===
 bundle: org.openzfsonosx.zfs
 arch:   arm64e
 base@:  0xfffffe004400d840
 file __TEXT vmaddr:      0x0
 file __TEXT_EXEC vmaddr: 0x70000   delta=0x70000
 chosen TEXT_LOAD:        0xfffffe0043f9d840

=== Kernel mapping (via KDK) ===
 checking for KDK build 25A354 ... nearest: /Library/Developer/KDKs/KDK_26.0_25A353.kdk
 build: 25A354
 SoC:   t6041
 kernel: /Library/Developer/KDKs/KDK_26.0_25A353.kdk/System/Library/Kernels/kernel.release.t6041
 panic Kernel UUID:          E67CAF31-8F84-389C-BB27-7FAEC762FA14
 KDK Kernel UUID:            E67CAF31-8F84-389C-BB27-7FAEC762FA14
 file __TEXT vmaddr:      0xfffffe0007004000
 file __TEXT_EXEC vmaddr: 0xfffffe00072cc000   delta=0x2c8000
 panic Kernel text exec base: 0xfffffe003e2c4000
 chosen TEXT_LOAD:           0xfffffe003dffc000
 ESR: 0x96000005  -> Data Abort, same EL; Translation fault, level 1
 FAR: 0x0000000000000084


=== Panicked thread (merged, ordered) ===
0xfffffe004401d934  [zfs]    taskq_dispatch (in zfs) (spl-taskq.c:1457)
0xfffffe003e3164c0  [kernel] handle_debugger_trap (in kernel.release.t6041) (debug.c:1863)
0xfffffe003e48bc54  [kernel] handle_uncategorized (in kernel.release.t6041) (sleh.c:1818)
0xfffffe003e489e8c  [kernel] sleh_synchronous (in kernel.release.t6041) (sleh.c:0)
0xfffffe003e2c7d48  [kernel] fleh_synchronous (in kernel.release.t6041) + 72
0xfffffe003e3167d0  [kernel] DebuggerTrapWithState (in kernel.release.t6041) (debug.c:830)
0xfffffe003ec199c8  [kernel] Assert (in kernel.release.t6041) (debug.c:841)
0xfffffe003ec24c74  [kernel] sleh_synchronous_sp1 (in kernel.release.t6041) (sleh.c:1191)
0xfffffe003e48bac4  [kernel] handle_kernel_abort (in kernel.release.t6041) (sleh.c:3960)
0xfffffe003e489ed0  [kernel] sleh_synchronous (in kernel.release.t6041) (sleh.c:1544)
0xfffffe0044155ab4  [zfs]    vdev_disk_io_start (in zfs) (vdev_disk.c:750)
0xfffffe0044140c54  [zfs]    zio_vdev_io_start (in zfs) (zio.c:0)
0xfffffe004413bdd8  [zfs]    zio_nowait (in zfs) (zio.c:2580)
0xfffffe00440dbb98  [zfs]    vdev_probe (in zfs) (vdev.c:1840)
0xfffffe00440dc92c  [zfs]    vdev_open (in zfs) (vdev.c:2273)
0xfffffe00440e4d90  [zfs]    vdev_open_child (in zfs) (vdev.c:1868)
0xfffffe004401ffe4  [zfs]    taskq_thread (in zfs) (spl-taskq.c:2144)
0xfffffe0044020574  [zfs]    spl_thread_setup (in zfs) (spl-thread.c:128)
0xfffffe003e2c87cc  [kernel] Call_continuation (in kernel.release.t6041) + 204

Maybe it will help someone, or, if there is an official way now, please let me know.

I am unsure if there has been any progress in this area. But I needed to resolve a recent Tahoe panic, so ChatGPT cobbled together this:

Cool, thanks for sharing!

If we are allowed to share? Either way, output was:

I don't have any specific guidance on the panic itself, but this did catch my eye:

0xfffffe0044140c54  [zfs]    zio_vdev_io_start (in zfs) (zio.c:0)

In particular, the "0" here "zio.c:0".

The thing to understand here is that the symbolization process is an entirely "mechanical" process without any real "intelligence". The math is simple enough that it normally works "well enough”; however, from experience*, that "0" is often a sign that something was "off" and you may not have the right symbol. If the stack trace isn't matching up with your code, try symbolicating the addresses a few bytes above and below the original address, and you'll often get the correct result.

*The situation has improved considerably since then, but many years ago, I actually compared how 4 different crash reporters (including ours) symbolicated the stack of the system’s default exception handler chain **. They all returned slightly different results, none of which actually matched exactly what was going on.

**This is basically a long series of tiny nested functions, which is what typically creates these problems.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hello Kevin,

I put together a Python script to do a full symbolication based on your instructions.

I'm having trouble computing addresses of functions given in the actual stack frames.

First off, the symbolication of the frame of the thread that panicked does succeed as the function addresses come precomputed by the system:

% ./symbolicate.py
TEXT_EXEC 0xfffffe00072d4000
TEXT 0xfffffe0007004000
ktext_exec_base 0xfffffe002c900000
load_address: 0xfffffe002c630000

panicked_thread_faddrs: ['0xfffffe002c95d93c', '0xfffffe002cacd124', '0xfffffe002cacb31c', '0xfffffe002c903b88', '0xfffffe002c95dc08', '0xfffffe002d264898', '0xfffffe002d2700f8', '0xfffffe002caccf80', '0xfffffe002cacb490', '0xfffffe002c903b88', '0xfffffe002cb7f820', '0xfffffe002cba8538', '0xfffffe002cb8fd00', '0xfffffe002cb905c0', '0xfffffe002d05eb5c', '0xfffffe002cacb3a4', '0xfffffe002c903b88', '0x19d82a2b0']

['atos', '-o', 'Kernel_Debug_Kit_26_build_25A353.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-l', '0xfffffe002c630000', '0xfffffe002c95d93c', '0xfffffe002cacd124', '0xfffffe002cacb31c', '0xfffffe002c903b88', '0xfffffe002c95dc08', '0xfffffe002d264898', '0xfffffe002d2700f8', '0xfffffe002caccf80', '0xfffffe002cacb490', '0xfffffe002c903b88', '0xfffffe002cb7f820', '0xfffffe002cba8538', '0xfffffe002cb8fd00', '0xfffffe002cb905c0', '0xfffffe002d05eb5c', '0xfffffe002cacb3a4', '0xfffffe002c903b88', '0x19d82a2b0']

handle_debugger_trap (in kernel.release.t8103) (debug.c:1863)
handle_uncategorized (in kernel.release.t8103) (sleh.c:1818)
sleh_synchronous (in kernel.release.t8103) (sleh.c:1698)
fleh_synchronous (in kernel.release.t8103) + 24
DebuggerTrapWithState (in kernel.release.t8103) (debug.c:830)
Assert (in kernel.release.t8103) (debug.c:841)
sleh_synchronous_sp1 (in kernel.release.t8103) (sleh.c:1191)
handle_kernel_abort (in kernel.release.t8103) (sleh.c:3960)
sleh_synchronous (in kernel.release.t8103) (sleh.c:1698)
fleh_synchronous (in kernel.release.t8103) + 24
vn_create (in kernel.release.t8103) (vfs_subr.c:8079)
vn_open_auth (in kernel.release.t8103) (vfs_vnops.c:483)
open1 (in kernel.release.t8103) (vfs_syscalls.c:0)
open_extended (in kernel.release.t8103) (vfs_syscalls.c:5273)
unix_syscall (in kernel.release.t8103) (systemcalls.c:181)
sleh_synchronous (in kernel.release.t8103) (sleh.c:1484)
fleh_synchronous (in kernel.release.t8103) + 24
0x19d82a2b0

Here's the output generated for the very first kernel frame that fails to symbolicate:

kernelFrames: tid:110 IOServiceTerminateThread
UUID: 8502a040-9cf9-35f5-b8a2-84b0e48d379e
[1, 622608]
funcaddr: 0xfffffe0008774000+0x98010 -> 0xfffffe000880c010
[1, 617372]
funcaddr: 0xfffffe0008774000+0x96b9c -> 0xfffffe000880ab9c
[1, 509108]
funcaddr: 0xfffffe0008774000+0x7c4b4 -> 0xfffffe00087f04b4
[1, 8496472]
funcaddr: 0xfffffe0008774000+0x81a558 -> 0xfffffe0008f8e558
[1, 53004]
funcaddr: 0xfffffe0008774000+0xcf0c -> 0xfffffe0008780f0c

['atos', '-o', 'Kernel_Debug_Kit_26_build_25A353.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-l', '0xfffffe002c630000', '0xfffffe000880c010', '0xfffffe000880ab9c', '0xfffffe00087f04b4', '0xfffffe0008f8e558', '0xfffffe0008780f0c']

0xfffffe000880c010
0xfffffe000880ab9c
0xfffffe00087f04b4
0xfffffe0008f8e558
0xfffffe0008780f0c

UUID: 8502a040-9cf9-35f5-b8a2-84b0e48d379e is that of the kernel as referenced in the panic log Kernel UUID: 8502A040-9CF9-35F5-B8A2-84B0E48D379E and given in the binaryImages lookup table located at index 1:

"binaryImages": [
        [
            "fbe15ad4-ea36-3c07-81be-460a8240c1d4",
            18446741874887413328,
            "T"
        ],
        [
            "8502a040-9cf9-35f5-b8a2-84b0e48d379e",
            18446741874828328960,
            "T"
        ],

Function addresses are computed as follows: faddr=binaryImages[loadaddr]+kernelFrames[offset].

Given the diagnostics above [1, 622608] funcaddr: 0xfffffe0008774000+0x98010 -> 0xfffffe000880c010, where 0xfffffe0008774000 is the load address of the kernel 18446741874828328960, and 0x98010 is the offset from the load address 622608, atos(1) fails to perform symbolication.

Your second suggestion to use the value of the offset+ the load address of 0 doesn't succeed either.

Your clarifying how function addresses given in kernel stack frames are to be computed so they result in a successful symbolication would be greatly appreciated.

First, as a note on progress, you should be able to replace this:

load_address: 0xfffffe002c630000->
'-l', '0xfffffe002c630000'

With this:

ktext_exec_base 0xfffffe002c900000 ->
'-textExecAddress', '0xfffffe002c900000'

...and then directly use kernel frame addresses without further modification.

That should also work for symbolicating your KEXT, but you may need to find a different exec_base for KEXTs.

First off, the symbolication of the frame of the thread that panicked does succeed as the function addresses come precomputed by the system:

So, I think the issue here is that you've actually crossed the kernel boundary. For future reference, this:

...
unix_syscall (in kernel.release.t8103) (systemcalls.c:181)
sleh_synchronous (in kernel.release.t8103) (sleh.c:1484)
fleh_synchronous (in kernel.release.t8103) + 24
...

…is one of the "boilerplate" sequences used for syscalls into the kernel. That leads to here:

UUID: 8502a040-9cf9-35f5-b8a2-84b0e48d379e
[1, 622608]
funcaddr: 0xfffffe0008774000+0x98010 -> 0xfffffe000880c010

The changing library index ([0,...] -> [1,...]) means you've changed to an entirely different library/executable. That is, you're now in the executable with the UUID of "8502a040-9cf9-35f5-b8a2-84b0e48d379e". Which executable is that... no idea. There are directions in this document for finding and matching dSYMs, but trying to do it fully by hand would be a very laborious process. A lot of the time I'd probably just "cheat" and focus on symbolicating the libraries I cared about (like my KEXT/app).

One final note on the image list:

"binaryImages": [
        [
            "fbe15ad4-ea36-3c07-81be-460a8240c1d4",
            18446741874887413328,
            "T"
        ],

The letter code is significant, as it actually tells you how to symbolicate that particular entry. I believe the format here is basically:

[<library UUID>, <library address>, <library address type>]

...and there is an ldb python script here which shows how those type codes are interpreted.

Hopefully that clarifies things, but if you're still having issues please post the full panic log and I'll take a closer look.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Based on the symbolicated backtraces, there are no references to where the memory is being used after being freed, or where it's being freed.

There's just this reference to where it's being allocated:

['atos', '-o', '/Users/user/panic/Kernel_Debug_Kit_15.6_build_24G84.dmg_extracted/KDK.pkg/Payload/System/Library/Kernels/kernel.release.t8103.dSYM/Contents/Resources/DWARF/kernel.release.t8103', '-arch', 'arm64e', '-textExecAddress', '0xfffffe0025c50000', '0xfffffe0025cae7c0', '0xfffffe0026558328', '0xfffffe002655e730', '0xfffffe0025d30f08', '0xfffffe0025cc21c0', '0xfffffe002620afe4', '0xfffffe00243f717c', '0xfffffe00243f7110', '0xfffffe00243df12c', '0xfffffe00243df610', '0xfffffe0024408bc0', '0xfffffe0025ee60cc', '0xfffffe0025ecdf08', '0xfffffe0025eceee4', '0xfffffe002635e9b8', '0xfffffe0025e0fff0', '0xfffffe0025c57b88']
panic_trap_to_debugger (in kernel.release.t8103) (debug.c:1400)
kalloc_type_distribute_budget (in kernel.release.t8103) (kalloc.c:1343)
kmem_init (in kernel.release.t8103) (vm_kern.c:4584)
zone_early_scramble_rr (in kernel.release.t8103) (zalloc.c:3220)
kalloc_ext (in kernel.release.t8103) (kalloc.c:2520)
mprotect (in kernel.release.t8103) (kern_mman.c:1311)
0xfffffe00243f717c
0xfffffe00243f7110
0xfffffe00243df12c
0xfffffe00243df610
0xfffffe0024408bc0
vn_pathconf (in kernel.release.t8103) (vfs_vnops.c:1875)
openat_dprotected_internal (in kernel.release.t8103) (vfs_syscalls.c:5127)
mknod (in kernel.release.t8103) (vfs_syscalls.c:5469)
sendsig (in kernel.release.t8103) (unix_signal.c:626)
sleh_synchronous (in kernel.release.t8103) (sleh.c:1267)
fleh_synchronous (in kernel.release.t8103) + 24

The unsymbolicated memory addresses come from my KEXT where memory is allocated via OSMalloc(). The object being allocated ends up being stored in an RB tree.

I'd like to take you up on your offer to take a look at the full panic log for me if I may.

Please find the full panic log attached to this ticket https://feedbackassistant.apple.com/feedback/21231833.

I'd like to take you up on your offer to take a look at the full panic log for me if I may.

Alright. So, here is a rundown of what I found:

(1) There are five different threads blocked in some variation of this:

"kernelFrames": [
	["thread_invoke", "kernel.release.t8103", "(sched_prim.c:3530)", "0xfffffe0008a6b0e8", 1180], 
	["thread_block_reason", "kernel.release.t8103", "(sched_prim.c:4147)", "0xfffffe0008a69b5c", 244], 
	["lck_mtx_sleep_deadline", "kernel.release.t8103", "(locks.c:1224)", "0xfffffe0008a526a0", 236], 
	["_sleep", "kernel.release.t8103", "(kern_synch.c:206)", "0xfffffe0008fb145c", 176], 
...
	[11,342632], 
	["Call_continuation", "kernel.release.t8103", null, "0xfffffe00089e4f0c", 268]
], 

The intermediate frames going through your code varies, but they all start at the entry point above and block in sleep. If you search for "[11,342632]" in the raw panic, you should be able to find all 5 and reconstruct them from there.

(2) All other activity came from Preview(2407). There were three threads tied to your KEXT:

  1. One thread is doing handling I/O from the VM system, caused by memmove, probably tied to memory-mapped file I/O:
"userFrames": [
	["_platform_memmove", "libsystem_platform.dylib", null, "0x1804ad320", 96], 
	["CGPDFSourceRefill", "CoreGraphics", "(CGPDFSource.c:464)", "0x1869dfff8", 164], 

"kernelFrames": [
	["thread_invoke", "kernel.release.t8103", "(sched_prim.c:3530)", "0xfffffe0008a6b0e8", 1180], 
	["thread_block_reason", "kernel.release.t8103", "(sched_prim.c:4147)", "0xfffffe0008a69b5c", 244], 
	["lck_rw_lock_shared_gen", "kernel.release.t8103", "(lock_rw.c:1375)", "0xfffffe0008a4f404", 892], 
	["IORWLockRead", "kernel.release.t8103", "(lock_rw.c:1524)", "0xfffffe0008a4f030", 144], 
	[11, 315180], 
	[11, 316328], 
	[11, 70956], 
	[11, 70608], 
	[11, 236500], 
	["cluster_io", "kernel.release.t8103", "(vfs_cluster.c:2336)", "0xfffffe0008c1fb74", 3512], 
	["cluster_pagein_ext", "kernel.release.t8103", "(vfs_cluster.c:2736)", "0xfffffe0008c219b8", 244], 
	["cluster_pagein", "kernel.release.t8103", "(vfs_cluster.c:2668)", "0xfffffe0008c218b8", 28], 
	[11, 237688], 
	["vnode_pagein", "kernel.release.t8103", "(vnode_pager.c:797)", "0xfffffe000904c67c", 1068], 
	["vnode_pager_cluster_read", "kernel.release.t8103", "(bsd_vm.c:860)", "0xfffffe0008ada1b4", 76], 
	["vm_fault_page", "kernel.release.t8103", "(vm_fault.c:2019)", "0xfffffe0008af3c88", 2348], 
	["vm_fault_internal", "kernel.release.t8103", "(vm_fault.c:6299)", "0xfffffe0008aef600", 6688], 
	["handle_user_abort", "kernel.release.t8103", "(sleh.c:2981)", "0xfffffe0008b94bdc", 420], 
	["sleh_synchronous", "kernel.release.t8103", "(sleh.c:1271)", "0xfffffe0008b94030", 944], 
	["fleh_synchronous", "kernel.release.t8103", null, "0xfffffe00089dbb88", 24], 
	[2, 0]
], 
  1. Here is the panic itself, which was triggered by an open() call. One detail I will share is that the open itself wasn't actually opening the file contents (it was going to read and xattr), so the flags might be considered unusual:
open(..., O_NOFOLLOW|O_RDONLY|O_EVTONLY);

"userFrames": [
	["__open", "libsystem_kernel.dylib", null, "0x180432678", 8], 


"kernelFrames": [
	["panic_trap_to_debugger", "kernel.release.t8103", "(debug.c:1400)", "0xfffffe0008a327c0", 780], 
	["panic", "kernel.release.t8103", "(debug.c:1161)", "0xfffffe00092dc328", 60], 
	["zalloc_uaf_panic", "kernel.release.t8103", "(zalloc.c:3173)", "0xfffffe00092e2730", 236], 
	["zalloc_item", "kernel.release.t8103", "(zalloc.c:6760)", "0xfffffe0008ab4f08", 1440], 
	["kalloc_ext", "kernel.release.t8103", "(kalloc.c:2520)", "0xfffffe0008a461c0", 296], 
	["OSMalloc", "kernel.release.t8103", "(kern_malloc.c:270)", "0xfffffe0008f8efe4", 168], 
	[11, 160124], 
	[11, 160016], 
	[11, 61740], 
	[11, 62992], 
	[11, 232384], 
	["vn_open_auth", "kernel.release.t8103", "(vfs_vnops.c:705)", "0xfffffe0008c6a0cc", 2404], 
	["open1", "kernel.release.t8103", "(vfs_syscalls.c:4812)", "0xfffffe0008c51f08", 488], 
	["open", "kernel.release.t8103", "(vfs_syscalls.c:5313)", "0xfffffe0008c52ee4", 288], 
	["unix_syscall", "kernel.release.t8103", "(systemcalls.c:181)", "0xfffffe00090e29b8", 772], 
	["sleh_synchronous", "kernel.release.t8103", "(sleh.c:1267)", "0xfffffe0008b93ff0", 880], 
	["fleh_synchronous", "kernel.release.t8103", null, "0xfffffe00089dbb88", 24], 
	[2, 0]
], 
  1. Another thread is also blocked in open. That open appears to be an O_RDONLY call tied to bookmark creation:
"kernelFrames": [
	["thread_invoke", "kernel.release.t8103", "(sched_prim.c:3530)", "0xfffffe0008a6b0e8", 1180], 
	["thread_block_reason", "kernel.release.t8103", "(sched_prim.c:4147)", "0xfffffe0008a69b5c", 244], 
	["lck_rw_sleep", "kernel.release.t8103", "(lock_rw.c:2683)", "0xfffffe0008a509c4", 132], 
	["vm_object_update", "kernel.release.t8103", "(memory_object.c:1024)", "0xfffffe0008aecc64", 2296], 
	["memory_object_lock_request", "kernel.release.t8103", "(memory_object.c:352)", "0xfffffe0008aec228", 188], 
	["ubc_msync", "kernel.release.t8103", "(ubc_subr.c:1818)", "0xfffffe0008fef174", 188], 
	[11, 232440], 
	["vn_open_auth", "kernel.release.t8103", "(vfs_vnops.c:705)", "0xfffffe0008c6a0cc", 2404], 
	["open1", "kernel.release.t8103", "(vfs_syscalls.c:4812)", "0xfffffe0008c51f08", 488], 
	["open", "kernel.release.t8103", "(vfs_syscalls.c:5313)", "0xfffffe0008c52ee4", 288], 
	["unix_syscall", "kernel.release.t8103", "(systemcalls.c:181)", "0xfffffe00090e29b8", 772], 
	["sleh_synchronous", "kernel.release.t8103", "(sleh.c:1267)", "0xfffffe0008b93ff0", 880], 
	["fleh_synchronous", "kernel.release.t8103", null, "0xfffffe00089dbb88", 24], 
	[2, 0]
], 

4. I also noticed that Preview has another thread blocked in vn_open_auth, this time waiting on a response from EndpointSecurity:

"kernelFrames": [
	["thread_invoke", "kernel.release.t8103", "(sched_prim.c:3530)", "0xfffffe0008a6b0e8", 1180], 
	["thread_block_reason", "kernel.release.t8103", "(sched_prim.c:4147)", "0xfffffe0008a69b5c", 244], 
	["lck_mtx_sleep", "kernel.release.t8103", "(locks.c:1169)", "0xfffffe0008a524a0", 232], 
	...
	["EndpointSecurityEventManager::es_vnode_check_open(ucred*, vnode*, label*, int)", "EndpointSecurity", "(esk_event_manager.cpp:1709)", "0xfffffe000a679348", 68], 
	["mac_vnode_check_open", "kernel.release.t8103", "(mac_vfs.c:1637)", "0xfffffe00092b9368", 204], 
	["vn_authorize_open_existing", "kernel.release.t8103", "(vfs_subr.c:8349)", "0xfffffe0008c447d0", 140], 
	["vn_open_auth", "kernel.release.t8103", "(vfs_vnops.c:669)", "0xfffffe0008c69f3c", 2004], 
	["open1", "kernel.release.t8103", "(vfs_syscalls.c:4812)", "0xfffffe0008c51f08", 488], 
	["open", "kernel.release.t8103", "(vfs_syscalls.c:5313)", "0xfffffe0008c52ee4", 288], 
	["unix_syscall", "kernel.release.t8103", "(systemcalls.c:181)", "0xfffffe00090e29b8", 772], 
	["sleh_synchronous", "kernel.release.t8103", "(sleh.c:1267)", "0xfffffe0008b93ff0", 880], 
	["fleh_synchronous", "kernel.release.t8103", null, "0xfffffe00089dbb88", 24], 
	[2, 0]
], 

(3)

Covering a few odds and ends:

  • I get concerned anytime I see EndpointSecurity. The API basically allows a user space client to "veto" a broad variety of critical syscalls which, as you can imagine, opens the door to all sorts of mischief. Critically, alongside the obvious risk of things failing unexpectedly, it also introduces timing changes which could have other, unforeseen consequences. It could be totally unrelated to your issue, but I'd at least make sure I knew what "else" was happening on the machine.

  • I don't really see very much "else" happening on the system. The OSMalloc call which panicked appears to be the only memory-related activity on the system. I was so surprised by that that I searched for "alloc" (looking for malloc/vm_allocate/etc.) and basically found "nothing". Similarly, the threads above are the only open calls happening on the entire system.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks very much for looking into this for me.

I'll take a closer look at what I'm doing with the values stored in the RB tree in my filesystem.

I might need to resort to interprocess synchronisation to guard the critical sections in that code.

Symbolicating kernel backtraces on Apple Silicon
 
 
Q