Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Compute kernel fails to compile when calling texture.read()
If I compile a compute kernel with a call to texture.read(), it fails with the following error: "Error Domain=AGXMetalG13X Code=3 "Encountered unlowered function call to air.get_read_sampler" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.get_read_sampler}." This error occurs on both macOS and iOS 26 Beta 5, but not when running on a simulator or in a playground. It does not occur on a macOS Sequoia VM. It occurs whether I use the old metal 3 or new metal 4 compilation method. A workaround would be to use a sampler, but according to the feature tables, all platforms support reading from textures of all formats. Below is a minimal example which produces the error: let device = MTLCreateSystemDefaultDevice()! let library = device.makeDefaultLibrary()! let computeFunction = library.makeFunction(name: "compute_test")! do { let pipeline = try device.makeComputePipelineState(function: computeFunction) debugPrint(pipeline) } catch { debugPrint("Metal 3 failed with error:\n\(error)") } #import <metal_stdlib> using namespace metal; kernel void compute_test(uint2 gid [[thread_position_in_grid]], texture2d<float, access::read> in [[texture(0)]], texture2d<float, access::write> out [[texture(1)]]) { out.write(in.read(gid), gid); } I filed feedback FB19530049.
1
0
224
Aug ’25
Metal HUD Display Value Range
Can't seem to get the Metal HUD to display value range's (pre 26 Tahoe). The documented environment variable MTL_HUD_SHOW_VALUE_RANGE doesn't seem to work. https://developer.apple.com/documentation/xcode/monitoring-your-metal-apps-graphics-performance#Display-the-value-range-of-metrics Anyone having any luck?
2
0
348
Sep ’25
Metal fails to create PSO on AMD based GPUs
Hello, Shaders in our application is written using HLSL and we rely on Metal Shader Converter to convert DXIL to Metal IR. We ran into an issue that causes metal pipeline state creation to fail when vertex stage-in function is used on AMD GPUs. Here's the error reported by Metal in Xcode output: Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED XPC_ERROR_CONNECTION_INTERRUPTED MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 4 try. This error suggests an unexpected interruption in the connection. Possible reasons: a crash in the compiler service, termination by the OS due to resource constraints (e.g., jetsam), a timeout in the service, or an issue with IPC. Verify system stability and check the logs for more details. Compiler failed with XPC_ERROR_CONNECTION_INVALID XPC_ERROR_CONNECTION_INVALID MTLCompiler: Compiler encountered XPC_ERROR_CONNECTION_INVALID: failed to check-in, peer may have been unloaded: mach_error=10000003 (is the OS shutting down or process jetsammed?) Compilation failed due to an interrupted connection: XPC_ERROR_CONNECTION_INTERRUPTED. This error occurred after multiple retries. which seems to indicate a internal compiler error. I have a minimal repro here: https://github.com/kcloudy0717/metal_pso_fail/tree/main, simply follow the instructions in README.
1
0
204
Sep ’25
Pink screen on MTLCommandBuffer.presentDrawable.
I rewrote my graphics pipeline to use Load/Store better for clearing and don't care cases. All my tests pass, and in the Metal debugger, all the draw calls succeed. But when I present drawables (before [commandBuffer commit]) I only get a pink screen. I've tried everything I can think of: making sure the pixel formats are the same for the back buffer as my render targets, etc. But it's still pink. Could you point me in the right direction so I can fix this, or help describe why it's pink. That would be really helpful. Thank you, Brian Hapgood
2
0
413
Sep ’25
10-bit support in iPad Pro
Hi, I’m using the latest iPad Pro (13-inch) and I can see that Metal offers an rgb10a2unorm texture for rendering, but when I render a grey ramp and measure the actual luminance, I get a pattern that I would expect from an 8-bit texture (see below). Before I start ripping apart all my code, is there anything else I need to do to convince iOS to render my texture in 10-bit? I already tried setting the PixelFormat in my CMetalLayer to rgb10a2unorm, but that didn’t change anything.
1
0
456
Sep ’25
Can you delete a MTLLibrary once shaders are placed into pipeline?
Hello, I am quite new to using the metal API and was wondering if it was common (or even possible) if you knew that, when a pipeline was created, you never needed to make another one with the same shaders again, if it is safe to release the library the was used to reference the shaders? Only asking because this is possible in other apis, but apple never mentions (as far as I have found) if this is safe or not safe to do.
1
0
398
Oct ’25
NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
1
0
291
Dec ’25
Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
1
0
618
Jan ’26
Metal and Swift Concurrency
Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!
5
0
1.1k
Apr ’25
How to use MTKTextureLoader to load png data
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];
5
0
691
May ’25
Threadgroup memory for fragment shader
Hello I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access. Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way [renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0] Unfortunately, all I am getting is the following error (runtime assertion) -[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation offset + length(16) must be <= threadgroupMemoryLength(0).` What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.
1
0
137
Apr ’25
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
3
0
603
Dec ’25
Diagnose data access latency
The code is pretty simple kernel void naive( constant RunParams *param [[ buffer(0) ]], const device float *A [[ buffer(1) ]], // [N, K] device float *output [[ buffer(2) ]], uint2 gid [[ thread_position_in_grid ]]) { uint a_ptr = gid.x * param->K; for (uint i = 0; i < param->K; i++, a_ptr++) { val += A[b_ptr]; } output[ptr] = val; } when uint a_ptr = gid.x * param->K, the code got 150 GFLops when uint a_ptr = gid.y * param->K, the code got 860 GFLops param->K = 256; thread per group: [16, 16] I'd like to understand why the performance is so different, and how can I profile/diagnose this to help with further optimization.
0
0
95
Apr ’25
Physics bug in WWE 2K25 with GPTK2.1
The game physics work as expected using GTPK 2.0 using Crossover 24 or Whisky. However, using GPTK 2.1 with Crossover 25, the player and camera physics misbehave. See https://www.reddit.com/r/WWEGames/comments/1jx9mph/the_siamese_elbow/ and https://www.reddit.com/r/WWEGames/comments/1jx9ow4/camera_glitch/ Full video also linked in the Reddit post. I have also submitted this bug via the feedback assistant.
2
0
231
Apr ’25
Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions
Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..
1
0
174
Apr ’25
Metal and Swift PM
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using // swift-tools-version: 6.1 and my Metal code includes: #include <metal_stdlib> #include <metal_geometric> #include <metal_math> #include <metal_atomic> using namespace metal; kernel void test(device atomic_float* imageBuffer [[buffer(1)]], uint id [[ thread_position_in_grid ]]) { } But I get an error on the definition of atomic_float . Any help, one more importantly, where I could have found this information about this limitation, would be helpful. -RadBobby
0
0
113
Apr ’25
CMake unable to generate the Xcode file described in this tutorial
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
1
0
185
May ’25
Query GPU metrics
Hello! I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API. Requirements: Access GPU utilization, temperature, memory usage, power usage C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now Plugin GitHub Thank you! Noah
0
0
143
May ’25
vsync, drawable present, instrument gui
hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
0
0
169
May ’25
Customize the Metal Performance HUD on Apple TV
Hi there, Is it possible to customize the Metal Performance HUD on Apple TV, similar to how it can be done on iPhone & iPad? Would like to see things like Compiled Shaders for my Apps on tvOS .
Replies
3
Boosts
0
Views
373
Activity
Aug ’25
Compute kernel fails to compile when calling texture.read()
If I compile a compute kernel with a call to texture.read(), it fails with the following error: "Error Domain=AGXMetalG13X Code=3 "Encountered unlowered function call to air.get_read_sampler" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.get_read_sampler}." This error occurs on both macOS and iOS 26 Beta 5, but not when running on a simulator or in a playground. It does not occur on a macOS Sequoia VM. It occurs whether I use the old metal 3 or new metal 4 compilation method. A workaround would be to use a sampler, but according to the feature tables, all platforms support reading from textures of all formats. Below is a minimal example which produces the error: let device = MTLCreateSystemDefaultDevice()! let library = device.makeDefaultLibrary()! let computeFunction = library.makeFunction(name: "compute_test")! do { let pipeline = try device.makeComputePipelineState(function: computeFunction) debugPrint(pipeline) } catch { debugPrint("Metal 3 failed with error:\n\(error)") } #import <metal_stdlib> using namespace metal; kernel void compute_test(uint2 gid [[thread_position_in_grid]], texture2d<float, access::read> in [[texture(0)]], texture2d<float, access::write> out [[texture(1)]]) { out.write(in.read(gid), gid); } I filed feedback FB19530049.
Replies
1
Boosts
0
Views
224
Activity
Aug ’25
Metal HUD Display Value Range
Can't seem to get the Metal HUD to display value range's (pre 26 Tahoe). The documented environment variable MTL_HUD_SHOW_VALUE_RANGE doesn't seem to work. https://developer.apple.com/documentation/xcode/monitoring-your-metal-apps-graphics-performance#Display-the-value-range-of-metrics Anyone having any luck?
Replies
2
Boosts
0
Views
348
Activity
Sep ’25
Metal fails to create PSO on AMD based GPUs
Hello, Shaders in our application is written using HLSL and we rely on Metal Shader Converter to convert DXIL to Metal IR. We ran into an issue that causes metal pipeline state creation to fail when vertex stage-in function is used on AMD GPUs. Here's the error reported by Metal in Xcode output: Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED XPC_ERROR_CONNECTION_INTERRUPTED MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 4 try. This error suggests an unexpected interruption in the connection. Possible reasons: a crash in the compiler service, termination by the OS due to resource constraints (e.g., jetsam), a timeout in the service, or an issue with IPC. Verify system stability and check the logs for more details. Compiler failed with XPC_ERROR_CONNECTION_INVALID XPC_ERROR_CONNECTION_INVALID MTLCompiler: Compiler encountered XPC_ERROR_CONNECTION_INVALID: failed to check-in, peer may have been unloaded: mach_error=10000003 (is the OS shutting down or process jetsammed?) Compilation failed due to an interrupted connection: XPC_ERROR_CONNECTION_INTERRUPTED. This error occurred after multiple retries. which seems to indicate a internal compiler error. I have a minimal repro here: https://github.com/kcloudy0717/metal_pso_fail/tree/main, simply follow the instructions in README.
Replies
1
Boosts
0
Views
204
Activity
Sep ’25
Pink screen on MTLCommandBuffer.presentDrawable.
I rewrote my graphics pipeline to use Load/Store better for clearing and don't care cases. All my tests pass, and in the Metal debugger, all the draw calls succeed. But when I present drawables (before [commandBuffer commit]) I only get a pink screen. I've tried everything I can think of: making sure the pixel formats are the same for the back buffer as my render targets, etc. But it's still pink. Could you point me in the right direction so I can fix this, or help describe why it's pink. That would be really helpful. Thank you, Brian Hapgood
Replies
2
Boosts
0
Views
413
Activity
Sep ’25
10-bit support in iPad Pro
Hi, I’m using the latest iPad Pro (13-inch) and I can see that Metal offers an rgb10a2unorm texture for rendering, but when I render a grey ramp and measure the actual luminance, I get a pattern that I would expect from an 8-bit texture (see below). Before I start ripping apart all my code, is there anything else I need to do to convince iOS to render my texture in 10-bit? I already tried setting the PixelFormat in my CMetalLayer to rgb10a2unorm, but that didn’t change anything.
Replies
1
Boosts
0
Views
456
Activity
Sep ’25
Can you delete a MTLLibrary once shaders are placed into pipeline?
Hello, I am quite new to using the metal API and was wondering if it was common (or even possible) if you knew that, when a pipeline was created, you never needed to make another one with the same shaders again, if it is safe to release the library the was used to reference the shaders? Only asking because this is possible in other apis, but apple never mentions (as far as I have found) if this is safe or not safe to do.
Replies
1
Boosts
0
Views
398
Activity
Oct ’25
NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
Replies
1
Boosts
0
Views
291
Activity
Dec ’25
Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
Replies
1
Boosts
0
Views
618
Activity
Jan ’26
Metal and Swift Concurrency
Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!
Replies
5
Boosts
0
Views
1.1k
Activity
Apr ’25
How to use MTKTextureLoader to load png data
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];
Replies
5
Boosts
0
Views
691
Activity
May ’25
Threadgroup memory for fragment shader
Hello I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access. Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way [renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0] Unfortunately, all I am getting is the following error (runtime assertion) -[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation offset + length(16) must be <= threadgroupMemoryLength(0).` What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.
Replies
1
Boosts
0
Views
137
Activity
Apr ’25
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
Replies
3
Boosts
0
Views
603
Activity
Dec ’25
Diagnose data access latency
The code is pretty simple kernel void naive( constant RunParams *param [[ buffer(0) ]], const device float *A [[ buffer(1) ]], // [N, K] device float *output [[ buffer(2) ]], uint2 gid [[ thread_position_in_grid ]]) { uint a_ptr = gid.x * param->K; for (uint i = 0; i < param->K; i++, a_ptr++) { val += A[b_ptr]; } output[ptr] = val; } when uint a_ptr = gid.x * param->K, the code got 150 GFLops when uint a_ptr = gid.y * param->K, the code got 860 GFLops param->K = 256; thread per group: [16, 16] I'd like to understand why the performance is so different, and how can I profile/diagnose this to help with further optimization.
Replies
0
Boosts
0
Views
95
Activity
Apr ’25
Physics bug in WWE 2K25 with GPTK2.1
The game physics work as expected using GTPK 2.0 using Crossover 24 or Whisky. However, using GPTK 2.1 with Crossover 25, the player and camera physics misbehave. See https://www.reddit.com/r/WWEGames/comments/1jx9mph/the_siamese_elbow/ and https://www.reddit.com/r/WWEGames/comments/1jx9ow4/camera_glitch/ Full video also linked in the Reddit post. I have also submitted this bug via the feedback assistant.
Replies
2
Boosts
0
Views
231
Activity
Apr ’25
Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions
Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..
Replies
1
Boosts
0
Views
174
Activity
Apr ’25
Metal and Swift PM
I have run into an issue where I am trying to use atomic_float in a swift package but I cannot get things to compile because it appears that the Swift Package Manager doesn't support Metal 3 (atomic_float is Metal 3 functionality). Is there any way around this? I am using // swift-tools-version: 6.1 and my Metal code includes: #include <metal_stdlib> #include <metal_geometric> #include <metal_math> #include <metal_atomic> using namespace metal; kernel void test(device atomic_float* imageBuffer [[buffer(1)]], uint id [[ thread_position_in_grid ]]) { } But I get an error on the definition of atomic_float . Any help, one more importantly, where I could have found this information about this limitation, would be helpful. -RadBobby
Replies
0
Boosts
0
Views
113
Activity
Apr ’25
CMake unable to generate the Xcode file described in this tutorial
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
Replies
1
Boosts
0
Views
185
Activity
May ’25
Query GPU metrics
Hello! I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API. Requirements: Access GPU utilization, temperature, memory usage, power usage C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now Plugin GitHub Thank you! Noah
Replies
0
Boosts
0
Views
143
Activity
May ’25
vsync, drawable present, instrument gui
hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
Replies
0
Boosts
0
Views
169
Activity
May ’25