Metal

Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Post

Replies

Boosts

Views

Activity

Unable to profile Metal app on M2 Ultra (profiling works on M3 Pro)

On MacBook Pro M3 14" I can profile the Metal App performance by running it, then clicking on the M icon and choosing profile after replay. On Mac Studio M2 Ultra I cannot: the profiler starts and crashes. I have tried everything including reinstalling the OS, Xcode, the Metal SDK, you name it. The app uses the Metal 4 API. The content of the replayer errorinfo report is shown at the end. Any ideas what is going on here and/or what else I can do do root cause this and fix it? FWIW, it was worse on 26.1 (Xcode just reported Metal 4 profiling not available). In 26.2 Xcode attempts to profile and invariably crashes. === Error summary: === 1x DYErrorDomain (512) - guest app crashed (512) 1x com.apple.gputools.MTLReplayer (100) - Abort trap: 6 === First Error === Domain: DYErrorDomain Error code: 512 Description: guest app crashed (512) GTErrorKeyPID: 26913 GTErrorKeyProcessName: GPUToolsReplayService GTErrorKeyCrashDate: 2026-01-09 19:22:52 +0000 === Underlying Error #1 === Domain: com.apple.gputools.MTLReplayer Error code: 100 Description: Abort trap: 6 Call stack: 0 GPUToolsReplay 0x0000000249c25850 MakeNSError + 284 1 GPUToolsReplay 0x0000000249c26428 HandleCrashSignal + 252 2 libsystem_platform.dylib 0x00000001856c7744 _sigtramp + 56 3 libsystem_pthread.dylib 0x00000001856bd888 pthread_kill + 296 4 libsystem_c.dylib 0x00000001855c2850 abort + 124 5 libsystem_c.dylib 0x00000001855c1a84 err + 0 6 IOGPU 0x00000001a9ea60a8 -[IOGPUMetal4CommandQueue _commit:count:commitFeedback:].cold.1 + 0 7 IOGPU 0x00000001a9ea0df8 __77-[IOGPUMetal4CommandQueue commitFillArgs:count:args:argsSize:commitFeedback:]_block_invoke + 0 8 IOGPU 0x00000001a9ea1004 -[IOGPUMetal4CommandQueue _commit:count:commitFeedback:] + 148 9 AGXMetalG14X 0x00000001158a2c98 -[AGXG14XFamilyCommandQueue_mtlnext noMergeCommit:count:options:commitFeedback:error:] + 116 10 AGXMetalG14X 0x0000000115a45c14 +[AGXG14XFamilyRenderContext_mtlnext mergeRenderEncoders:count:options:commitFeedback:queue:error:] + 4740 11 AGXMetalG14X 0x00000001158a2b34 -[AGXG14XFamilyCommandQueue_mtlnext commit:count:options:] + 96 12 GPUToolsReplay 0x0000000249bf0644 GTMTLReplayController_defaultDispatchFunction_noPinning + 2744 13 GPUToolsReplay 0x0000000249befb10 GTMTLReplayController_defaultDispatchFunction + 1368 14 GPUToolsReplay 0x0000000249b7a61c _ZL16DispatchFunctionP21GTMTLReplayControllerPK11GTTraceFuncRb + 476 15 GPUToolsReplay 0x0000000249b8603c ___ZN35GTUSCSamplingStreamingManagerHelper19StreamFrameTimeDataEv_block_invoke + 456 16 Foundation 0x0000000186f6c878 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 24 17 Foundation 0x0000000186f6c740 -[NSBlockOperation main] + 96 18 Foundation 0x0000000186f6c6d8 __NSOPERATION_IS_INVOKING_MAIN__ + 16 19 Foundation 0x0000000186f6c308 -[NSOperation start] + 640 20 Foundation 0x0000000186f6c080 __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 16 21 Foundation 0x0000000186f6bf70 __NSOQSchedule_f + 164 22 libdispatch.dylib 0x00000001855104d0 _dispatch_block_async_invoke2 + 148 23 libdispatch.dylib 0x000000018551aad4 _dispatch_client_callout + 16 24 libdispatch.dylib 0x00000001855056e4 _dispatch_continuation_pop + 596 25 libdispatch.dylib 0x0000000185504d58 _dispatch_async_redirect_invoke + 580 26 libdispatch.dylib 0x0000000185512fc8 _dispatch_root_queue_drain + 364 27 libdispatch.dylib 0x0000000185513784 _dispatch_worker_thread2 + 180 28 libsystem_pthread.dylib 0x00000001856b9e10 _pthread_wqthread + 232 29 libsystem_pthread.dylib 0x00000001856b8b9c start_wqthread + 8 Replayer breadcrumbs: [ ] GTErrorKeyProcessSignal: SIGABRT === Setup === Capture device: star.localdomain (Mac14,14) - macOS 26.2 (25C56) - 0BA10D1D-D340-5F2E-934B-536675AF9BA1 Metal version: 370.64.2 Supported graphics APIs: Metal device: Apple M2 Ultra Supported GPU families: Apple1 Apple2 Apple3 Apple4 Apple5 Apple6 Apple7 Apple8 Mac1 Mac2 Common1 Common2 Common3 Metal3 Metal4 Replay device: star (Mac14,14) - macOS 26.2 (25C56) - 0BA10D1D-D340-5F2E-934B-536675AF9BA1 Metal version: 370.64.2 Supported graphics APIs: Metal device: Apple M2 Ultra Supported GPU families: Apple1 Apple2 Apple3 Apple4 Apple5 Apple6 Apple7 Apple8 Mac1 Mac2 Common1 Common2 Common3 Metal3 Metal4 Host: Mac14,14 - macOS 26.2 (25C56) Tool: Xcode (17C52) Known SDKs:

Graphics & Games Metal

407

Jan ’26

broken animations and memory leak issues

hello apple through this message i want to draw you attention to some problems with gptk and rosetta some games like marvel spiderman 2 have broken animations and t pose issues and other like uncharted and the last of us have severe memory leak issues so its my request please fix it asap

Graphics & Games Metal Graphics and Games

168

Jan ’26

MetalFX FrameInterpolator assertion: `Color texture width mismatch from descriptor` even when all texture sizes match

I am integrating MetalFX FrameInterpolator into a custom Unity RenderGraph–based render pipeline (C++ native plugin + C# render passes), and I am hitting the following assertion at runtime: /MetalFXDebugError.h:29: failed assertion `Color texture width mismatch from descriptor' What makes this confusing is that all input/output textures have the correct width and height, and they exactly match the values specified in the MTLFXFrameInterpolatorDescriptor. Setup Input resolution: 1024 x 512 Output resolution: 2048 x 1024 MTLFXTemporalScaler is created first and then passed into MTLFXFrameInterpolator The TemporalScaler and FrameInterpolator descriptors use the same input/output sizes and formats All Metal textures: Have no parentTexture Are 2D textures Match the descriptor sizes exactly (verified via logging) Texture bindings at encode time frameInterpolator.colorTexture = mtlTexColor; // 1024 x 512 frameInterpolator.prevColorTexture = mtlTexPrevColor; // 1024 x 512 frameInterpolator.motionTexture = mtlTexMotion; // 1024 x 512 frameInterpolator.depthTexture = mtlTexDepth; // 1024 x 512 frameInterpolator.uiTexture = mtlTexUI; // 2048 x 1024 frameInterpolator.outputTexture = mtlTexOutput; // 2048 x 1024 All widths/heights are logged and match: Color : 1024 x 512 (input) PrevColor : 1024 x 512 (input) Motion : 1024 x 512 (input) Depth : 1024 x 512 (input) UI : 2048 x 1024 (output) Output : 2048 x 1024 (output) The TemporalScaler works correctly on its own. The assertion only occurs when using FrameInterpolator. Important detail about colorTexture Originally, colorTexture was copied from BuiltinRenderTextureType.CurrentActive. After reading that this might violate MetalFX semantics, I changed the pipeline so that: colorTexture now comes from a dedicated private RenderGraph texture It is not the backbuffer It is not a drawable It is not used as a final output It is created before UI rendering Despite this, the assertion still occurs. Question Can uiTexture for MTLFXFrameInterpolator legally come from a texture copied from BuiltinRenderTextureType.CurrentActive? More generally: Are there additional hidden constraints on colorTexture / prevColorTexture (such as Metal usage, storageMode, aliasing, or hazard tracking) that could cause this assertion, even when sizes match? Does FrameInterpolator require colorTexture and prevColorTexture to be created in a very specific way (e.g. non-aliased, ShaderRead usage, identical Metal resource properties)? Any clarification on the exact semantic requirements for colorTexture, prevColorTexture, or uiTexture in MetalFX FrameInterpolator would be greatly appreciated.

Graphics & Games Metal MetalFX

640

Jan ’26

Has anyone been able to create a window/portal using metal.

I am trying to create a simple portal like that in RealityKit, but using metal instead of RealityKit. Has anyone been able to create a window or portal like thing to show a skybox outside in mixed Reality?

Graphics & Games Metal

227

Jan ’26

Optimizing HZB Mip-Chain Generation and Bindless Argument Tables in a Custom Metal Engine

Hi everyone, I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture. While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges. Current Core Implementations: GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders. Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering. Temporal Stability: Custom TAA with history rejection and Motion Blur resolve. Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy. The Architectural Challenge: I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers. && m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty(); if (canBuildHzb) { MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder(); hzbInit->setComputePipelineState(m_hzbInitPipeline); hzbInit->setTexture(m_depthTexture, 0); hzbInit->setTexture(m_hzbMipViews[0], 1); if (m_pointClampSampler) { hzbInit->setSamplerState(m_pointClampSampler, 0); } else if (m_linearClampSampler) { hzbInit->setSamplerState(m_linearClampSampler, 0); } const uint32_t hzbWidth = m_hzbMipViews[0]->width(); const uint32_t hzbHeight = m_hzbMipViews[0]->height(); const uint32_t threads = 8; MTL::Size tgSize = MTL::Size(threads, threads, 1); MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads, (hzbHeight + threads - 1) / threads * threads, 1); hzbInit->dispatchThreads(gridSize, tgSize); hzbInit->endEncoding(); for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) { MTL::Texture* src = m_hzbMipViews[mip - 1]; MTL::Texture* dst = m_hzbMipViews[mip]; if (!src || !dst) { continue; } MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder(); downEncoder->setComputePipelineState(m_hzbDownsamplePipeline); downEncoder->setTexture(src, 0); downEncoder->setTexture(dst, 1); const uint32_t mipWidth = dst->width(); const uint32_t mipHeight = dst->height(); MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads, (mipHeight + threads - 1) / threads * threads, 1); downEncoder->dispatchThreads(downGrid, tgSize); downEncoder->endEncoding(); } if (m_instanceCullHzbPipeline) { dispatchInstanceCulling(m_instanceCullHzbPipeline, true); } } My Questions: Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR? visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency. Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process? I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).

444

Feb ’26

Open Shading Language (OSL) in Metal

Hi. I'm a 3D designer, using Blender for most of my work. The most recent Blender conference discussed utilizing the Open Shading Language (OSL) in their latest versions, which allows designers to write custom shaders for their workflows. At the moment, only Nvidia Optix GPU's can utilize this language for rendering (from what I understand), but Blender developers stated they are waiting on other GPU manufacturers to implement this feature as well. I'm not sure if there are any licensing issues here, but would this be something Apple could implement in Metal to make their hardware more attractive to the 3D design community? Any help or knowledge on this topic would be greatly appreciated.

Graphics & Games Metal Metal Metal Performance Shaders

272

Feb ’26

Unable to find intelgpu_kbl_gt2r0 slice or a compatible one in binary archive

Unable to find intelgpu_kbl_gt2r0 slice or a compatible one in binary archive 'file:///System/Library/PrivateFrameworks/IconRendering.framework/Resources/binary.metallib' available slices: applegpu_g13g, applegpu_g13s, applegpu_g13d, applegpu_g14g, applegpu_g14s, applegpu_g14d, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16g, applegpu_g16s, applegpu_g17g, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16s Is it related to performance of applications in macOS 26.2 on Intel Macs?

Graphics & Games Metal Metal

306

Feb ’26

Terminal Codes

Hello Apple Developers and users I am writing this message reguarding some help on some performance codes/settings I can use for my Macbook since I recently downloaded the MacOs Tahoe 26.2 and its been very glitchy and laggy with gaming and just using my mac normally I have tried using a FPS unlocker and downloading Metal 4 the FPS unlocker hasent worked at all I am still stuck on the normal 60 FPS and need some advice/help. Thank you. Kind regards Zachary

Graphics & Games Metal Developer Tools

177

Feb ’26

The description of set_indices in the MSL reference seems incorrect.

I'm currently learning Metal. While reading the reference, I came across a strange description. Page 78 in Version 4 Reference (2025-10-25) says: It is legal to call the following set_indices functions to set the indices if the position in the index buffer is valid and if the position in the index buffer is a multiple of 2 (uchar2 overload) or 2 (uchar4 overload). The index I needs to be in the range [0, max_indices). void set_indices(uint I, uchar2 v); void set_indices(uint I, uchar4 v); However, it seems that the uchar4 overload should be multiple of 4. Furthermore, there is no explanation of what these methods actually do. I believe it involves setting two to four consecutive indices at once, but there is no mention of that here. I would like to know if the above understanding is correct.

Graphics & Games Metal Metal

117

Feb ’26

Xcode Metal Capture crash when using MTLSamplerState

The sample code just draw a triangle and sample texture. both sample code can draw a correct triangle and sample texture as expected. there are no error message from terminal. Sample code using constexpr Sampler can capture and replay well. Sample code using a argumentTable to bind a MTLSamplerState was crashed when using Metal capture and replay on Xcode. Here are sample codes. Sample Code Test Environment: M1 Pro MacOS 26.3 (25D125) Xcode Version 26.2 (17C52) Feedback ID: FB22031701

Graphics & Games Metal Metal MetalKit Xcode Graphical Debugger

116

Feb ’26

Metal 4 (validation / debug layer): residency set requirement mismatch for memoryless attachments

Setup: MSAA rendering using a memoryless texture as the color attachment (render_image) and a "normal" texture as the resolve attachment (resolve_image). MTL_DEBUG_LAYER / API validation is enabled for this. When trying to add the memoryless texture to a residency set, I get the following error: -[MTLDebugResidencySet validateResource:], line 114: error 'residency sets do not support memoryless resources. Which is as expected and identical to Metal 3. However, if I don't add it to the residency set, I then get the following error when committing to the command queue: -[MTL4DebugCommandQueue commit:count:options:], line 67: error 'Commit With Options Validation Attachment texture (Label: render_image) used in command buffer (at index 0) is not added to any residency set on the command buffer or command queue. So which way around is actually correct in Metal 4? Either way, this makes the use of memoryless textures/attachments impossible right now when validation is enabled. FWIW: when disabling all validation, either way seems to work just fine. Tested on: M1 Max, macOS 26.3, Xcode 26.2 & 26.4b2

Graphics & Games Metal

Feb ’26

Using Metal compute for scientific simulation (lattice QCD gauge theory)

I've been using Metal compute shaders for lattice quantum chromodynamics simulations and wanted to share the experience in case others are doing scientific computing on Metal. The workload involves SU(2) matrix operations on 4D lattice grids — lots of 2x2 and 3x3 complex matrix multiplies, reductions over lattice sites, and nearest-neighbor stencil operations. The implementation bridges a C++ scientific framework (Grid) to Metal via Objective-C++ .mm files, with MSL kernels compiled into .metallib archives during the build. Things that work well: Shared memory on M-series eliminates the CPU↔GPU copy overhead that dominates in CUDA workflows The .metallib compilation integrates cleanly with autotools builds using xcrun Float4 packing for SU(2) matrices maps naturally to MSL vector types Things I'm still figuring out: Optimal threadgroup sizes for stencil operations on 4D grids Whether to use MTLHeap for gauge field storage or stick with individual buffers Best practices for double precision — some measurements need float64 but Metal's double support varies by hardware The application is measuring chromofield flux distributions between static quarks, ultimately targeting multi-quark systems. Production runs are on MacBook Pro M-series and Mac Studio. Code: https://github.com/ThinkOffApp/multiquark-lattice-qcd

Graphics & Games Metal Metal Metal Shader Converter

122

Feb ’26

Metal Shader inside Swift Package not found?

Hello everyone! I am trying to wrap a ViewModifier inside a Swift Package that bundles a metal shader file to be used in the modifier. Everything works as expected in the Preview, in the Simulator and on a real device for iOS. It also works in Preview and in the Simulator for tvOS but not on a real AppleTV. I have tried this on a 4th generation Apple TV running tvOS 26.3 using Xcode 26.2.0. Xcode logs the following: The metallib is processed and exists in the bundle. Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Contents of Package.swift: import PackageDescription let package = Package( name: "Test", platforms: [ .iOS(.v17), .tvOS(.v17) ], products: [ .library( name: "Test", targets: [ "Test" ] ) ], targets: [ .target( name: "Test", resources: [ .process("Shaders") ] ), .testTarget( name: "TestTests", dependencies: [ "Test" ] ) ] ) Content of my metal file: #include <metal_stdlib> using namespace metal; [[ stitchable ]] float2 complexWave(float2 position, float time, float2 size, float speed, float strength, float frequency) { float2 normalizedPosition = position / size; float moveAmount = time * speed; position.x += sin((normalizedPosition.x + moveAmount) * frequency) * strength; position.y += cos((normalizedPosition.y + moveAmount) * frequency) * strength; return position; } And my ViewModifier: import MetalKit import SwiftUI extension ShaderFunction { static let complexWave: ShaderFunction = { ShaderFunction( library: .bundle(.module), name: "complexWave" ) }() } extension Shader { static func complexWave(arguments: [Shader.Argument]) -> Shader { Shader(function: .complexWave, arguments: arguments) } } struct WaveModifier: ViewModifier { let start: Date = .now func body(content: Content) -> some View { TimelineView(.animation) { context in let delta = context.date.timeIntervalSince(start) content .visualEffect { view, proxy in view.distortionEffect( .complexWave( arguments: [ .float(delta), .float2(proxy.size), .float(0.5), .float(8), .float(10) ] ), maxSampleOffset: .zero ) } } .onAppear { let paths = Bundle.module.paths(forResourcesOfType: "metallib", inDirectory: nil) print(paths) } } } extension View { public func wave() -> some View { modifier(WaveModifier()) } } #Preview { Image(systemName: "cart") .wave() } Any help is appreciated.

Graphics & Games Metal MetalKit SwiftUI Swift Packages

348

Mar ’26

Question on setVertexBytes

I think if your buffer is less than 4k its recommended to use setVertexBytes, the question I have is can I keep hammering on setVertexBytes as the primary method to issue multiple draw calls within a render buffer and rely on Metal to figure out how to orphan and replace the target buffer? A lot of the primitives I am drawing are less than 4k and the process of wiring down larger segments of memory for individual buffers for each draw primitive call seems to be a negative. And it's just simpler to copy, submit and forget about buffer synchronization.

Graphics & Games Metal

465

Can a compute pipeline be as efficient as a render pipeline for rasterization?

I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?

Graphics & Games Metal Graphics and Games Metal 2D Graphics

176

Xcode26 Replay frame broken

Got a broken frame when using Xcode to capture a frame and replay it from a Unity game. It seems like the vertex buffer is broken; I see a bunch of "nan"s in the vertex buffer. However, the game displays correct when running, and it only happend when I upgrade my Xcode and iphone to Xcode26 and IOS26 ios26

Graphics & Games Metal Metal Xcode

168

Best Way to Use MetalFX in Unreal Engine 5.7 for macOS Port?

Hi everyone, We’re currently porting a high-fidelity AA+ PC title built on Unreal Engine 5.7 to macOS (Apple Silicon), and we’re looking for guidance from anyone with experience in this area. At the moment, the game is already runnable on Mac, but not yet at a playable level — we’re seeing performance around 10–15 FPS on an M4 device. We’re actively analyzing and defining the work needed to reach production-quality performance on macOS. One of the key areas we’re exploring is leveraging MetalFX to improve frame rate. However, it seems there’s no official MetalFX plugin or direct integration available for Unreal Engine. Has anyone here successfully integrated MetalFX into a UE5 rendering pipeline, or found a recommended approach to do so? Any insights on best practices, workflows, or references (docs, samples, etc.) would be greatly appreciated. Thanks in advance!

Graphics & Games Metal

265

Unable to profile Metal app on M2 Ultra (profiling works on M3 Pro)

Graphics & Games Metal