Hi,
I'm not sure whether this is the appropriate forum for this topic. I just followed a link from the JAX Metal plugin page https://developer.apple.com/metal/jax/
I'm writing a Python app with JAX, and recent JAX versions fail on Metal. E.g. v0.8.2
I have to downgrade JAX pretty hard to make it work:
pip install jax==0.4.35 jaxlib==0.4.35 jax-metal==0.1.1
Can we get an updated release of jax-metal that would fix this issue?
Here is the error I get with JAX v0.8.2:
WARNING:2025-12-26 09:55:28,117:jax._src.xla_bridge:881: Platform 'METAL' is experimental and not all JAX functionality may be correctly supported!
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1766771728.118004 207582 mps_client.cc:510] WARNING: JAX Apple GPU support is experimental and not all JAX functionality is correctly supported!
Metal device set to: Apple M3 Max
systemMemory: 36.00 GB
maxCacheSize: 13.50 GB
I0000 00:00:1766771728.129886 207582 service.cc:145] XLA service 0x600001fad300 initialized for platform METAL (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1766771728.129893 207582 service.cc:153] StreamExecutor device (0): Metal, <undefined>
I0000 00:00:1766771728.130856 207582 mps_client.cc:406] Using Simple allocator.
I0000 00:00:1766771728.130864 207582 mps_client.cc:384] XLA backend will use up to 28990554112 bytes on device 0 for SimpleAllocator.
Traceback (most recent call last):
File "<string>", line 1, in <module>
import jax; print(jax.numpy.arange(10))
~~~~~~~~~~~~~~~~^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 5951, in arange
return _arange(start, stop=stop, step=step, dtype=dtype,
out_sharding=sharding)
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/numpy/lax_numpy.py", line 6012, in _arange
return lax.broadcasted_iota(dtype, (size,), 0, out_sharding=out_sharding)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/lax/lax.py", line 3415, in broadcasted_iota
return iota_p.bind(dtype=dtype, shape=shape,
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
dimension=dimension, sharding=out_sharding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 633, in bind
return self._true_bind(*args, **params)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 649, in _true_bind
return self.bind_with_trace(prev_trace, args, params)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 661, in bind_with_trace
return trace.process_primitive(self, args, params)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/core.py", line 1210, in process_primitive
return primitive.impl(*args, **params)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/florin/git/FlorinAndrei/star-cluster-simulator/.venv/lib/python3.13/site-packages/jax/_src/dispatch.py", line 91, in apply_primitive
outs = fun(*args)
jax.errors.JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22
-:0:0: note: in bytecode version 6 produced by: StableHLO_v1.13.0
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
I0000 00:00:1766771728.149951 207582 mps_client.h:209] MetalClient destroyed.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
144 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hey I'm using the CIDepthBlurEffect Core Image Filter in my app. It seems to work ok but I get these errors in the console when calling the class.
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_scan
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_diffuse
CoreImage Metal library does not contain function for name: sparserendering_xhlrb_copy_back
CoreImage Metal library does not contain function for name: plain_or_sRGB_copy
Am I missing some sort of import to gain these Metal functions? I am using my own custom shaders but I assume you'd be able to use them along side the built in ones.
Hello All,
I've been finally able to finish my so called Quantumsort implementation, which is basically only slightly modified Radix Sort at the basis, but utilizes the somewhat newish Apple Metal SIMD commands heavily. Or rather rather renders the threadgroup shared memory void altogether with those SIMD sharing options available.
The sorting itself works "well enough" for my needs here on these Github shared main.swift and sort.metal files already, and the question relates more on it that - can someone spot would it be possible to get rid of the "sorted_temp" array memory too and use one "sorted" list only at all times..?
Here's the related repository and copilots I prefer you and you and you more for haha tho : https://guthib.com/roger-more-fi/quantumsort
Cheers, make it to the perfection those good last ones of '25 'aights Apple soldiers!
I've been playing around with iPad PRO M5 13" as part of my goal to implement some music relating SPH particle simulation effects on it - and this involves utilizing tap events also from the incredible looking fresh screen the device has.
See more information from here, all should be overreactively implemented but the ideas remain (with almost zero cost copy fragment shader) :
`https://youtu.be/ci-GSgQ0wlM`
This attached image shows the tap effects implementation brought just bit a little further than in the video.
Hello guys, recently I integrated a third-party library into my code to handle blur effects (Glur). This library leverages Metal's compute capabilities and appears to automatically depend on the Metal toolchain during the build process. When using Xcode Cloud, the archive step consistently fails with a "CompileMetalFile Failed" error. However, when I manually archive the project in Xcode locally, everything works fine without any issues. I’ve confirmed that the macOS and Xcode versions specified in my Xcode Cloud workflow are correct. Reply if you know how to fix it or it will be fixed future, thanks.
I'm using Xcode 26.2(17C52) and macOS(15.7.1 (24G231))
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size.
Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right?
Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)?
func onWindowResized(newSize: CGSize) {
// Called on the Main thread
metalLayer.drawableSize = newSize
}
func onVsync(drawable: MTLDrawable) {
// Called on a background rendering thread
renderer.renderInto(drawable: drawable)
}
Hi Apple Developer Forums,
I’m developing an iOS camera app that processes RAW captures using Core Image. I’m seeing a large “first use” performance penalty specifically when creating the CIImage from CIRAWFilter.outputImage.
What’s slow (important detail)
I’m measuring the time for:
let rawFilter = CIRAWFilter(imageData: rawData, identifierHint: hint)
let ciImage = rawFilter.outputImage
This is not CIContext.render(...) / createCGImage(...). It’s just the time to access outputImage (i.e., building the Core Image graph / RAW pipeline setup).
Observed behavior
First time accessing CIRAWFilter.outputImage: ~3 seconds
Second time (same app session, similar RAW): ~3 milliseconds
So something heavy is happening only on first use (decoder initialization, pipeline setup, shader/library compilation, caching, etc.).
Using Metal System Trace, I also noticed that during the slow first call there are many “Create MTLLibrary” events, while the second call doesn’t show this pattern.
Warm-up attempts using bundled DNG
I tried to “warm up” early (e.g., on camera screen entry) by loading a bundled DNG and then accessing CIRAWFilter.outputImage by taking a photo:
Warm-up with a ~247 KB DNG → first real RAW outputImage cost drops to ~1.42s
Warm-up with a ~25 MB DNG → first real RAW outputImage cost drops to ~843ms
This helps, but it’s still far from the steady-state ~3ms.
Warm-up by capturing a real RAW (works, but concerns)
The only method that fully eliminates the delay is to trigger a real RAW capture programmatically before the user’s first photo, then use that captured rawData to warm up the CIRAWFilter.outputImage path. This brings the first user-facing capture close to the steady-state timing.
However:
In some regions, the camera shutter sound cannot be suppressed, so “hidden warm-up capture” is unacceptable UX.
I’m also unsure whether triggering a real capture without an explicit user action could raise compliance/privacy concerns, even if the image is immediately discarded and never saved/uploaded.
Questions
Is the large first-time cost of CIRAWFilter.outputImage expected (RAW pipeline initialization / shader compilation)?
Is there an Apple-recommended way to pre-initialize the Core Image RAW pipeline / Metal resources so the first outputImage is fast, without taking a real photo?
Are there any best practices (e.g. CIContext creation timing, prepareRender(...), specific options) that reliably reduce this first-use overhead for CIRAWFilter?
Attachments
Figure 1: First RAW capture with no warm-up (~3s outputImage time)
Figure 2: First RAW capture after warm-up with bundled DNG (improved but still hundreds of ms)
Thanks for any guidance or experience sharing!
We set the CVDisplayLink on macOS to 0 or 120, and get the following. This then clamps maximum refresh to 60Hz on the 120Hz ProMotion display on a MBP M2 Max laptop. How is this not fixed in 4 macOS releases?
CoreVideo: currentVBLDelta returned 200000 for display 1 -- ignoring unreasonable value
CoreVideo: [0x7fe2fb816020] Bad CurrentVBLDelta for display 1 is zero. defaulting to 60Hz.
Hi everyone,
I'm currently working on a Swift Playgrounds project where I need to incorporate a Metal shader file. However, when I tried to include my shader file (PincushionShader.metal), I encountered the following error:
Is it possible to use Metal shader files within Swift Playgrounds, it is really important for my swift student challenge scene? If not, are there any workarounds or recommended approaches for testing Metal shaders in a similar environment?
Any guidance or suggestions would be greatly appreciated!
Topic:
Community
SubTopic:
Swift Student Challenge
Tags:
Swift Student Challenge
Metal
Swift Playground
Playground Support
Hello.
I migrated yesterday two machines from OS15 to OS26 (same Ram; M1Max chip). On the first one, a MBP, everything is allright, and my old Metal projects can compile and run, even with MacOS12 as a destination.
On the second computer (MacStudio) I got a compile "error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain".
I spent hours on many forums and tried all proposed solutions and still get the same error. Any idea? Thanks
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR.
This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible.
Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
I have a model that uses a video material as the surface shader and I need to also use a geometry modifier on the material.
This seemed like it would be promising (adapted from https://developer.apple.com/wwdc21/10075 ~5m 50s).
// Did the setup for the video and AVPlayer eventually leading me to
let videoMaterial = VideoMaterial(avPlayer: avPlayer)
// Assign the material to the entity
entity.model!.materials = [videoMaterial]
// The part shown in WWDC: Set up the library and geometry modifier before, so now try to map the new custom material to the video material
entity.model!.materials = entity.model!.materials.map { baseMaterial in
try! CustomMaterial(from: baseMaterial, geometryModifier: geometryModifier)
}
But, I get the following error
Thread 1: Fatal error: 'try!' expression unexpectedly raised an error: RealityFoundation.CustomMaterialError.defaultSurfaceShaderForMaterialNotFound
How can I apply a geometry modifier to a VideoMaterial? Or, if I can't do that, is there an easy way to route the AVPlayer video data into the baseColor of CustomMaterial?
Hello,
As far as I know and in all of my testing there is no way for a user or a developer to change the frame rate of the video output on iPadOS. If you connect an iPad via a USB Hub or a USB to HDMI Adaptor and then connect it to an external monitor it will output at 59.94fps.
I have a video app where a user monitors live video at 25fps and 30fps, they often output to an external display and there are times when the external display will stutter due to the mismatch in frame rate, ie. using 25fps and outputting at 59.94fps.
I thought it was impossible to change the video output frame rate, then in V3.1 of the Blackmagic Camera App I saw an interesting change in their release notes:
‘Support for HDMI Monitoring at Sensor Rate and Resolution’
This means there is some way to modify it, not sure if this is done via a Private API that Apple has allowed Blackmagic to use. If so, how can we access this or is there a way to enable this that is undocumented?
Thanks!
Hi all,
I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS).
Intersection tests suddenly stop working after a certain point in the sampling loop.
Situation
I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing.
Intersection tests are performed using an IAS.
Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates).
As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples.
Here's a code sketch:
let sampleCount: UInt16 = 1024
for sampleIndex: UInt16 in 0..<sampleCount {
// ...
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Dispatch the intersection kernel.
await commandBuffer.completed()
}
do {
let commandBuffer = commandQueue.makeCommandBuffer()
// Use the intersection test results from the previous command buffer.
await commandBuffer.completed()
}
// ...
}
kernel void intersectAlongRay(
const metal::uint32_t threadIndex [[thread_position_in_grid]],
// ...
const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]],
// ...
)
{
// ...
const auto result = intersector.intersect(ray, accelerationStructure);
switch (result.type) {
case metal::raytracing::intersection_type::triangle: {
// Write intersection result to device buffers.
break;
}
default:
break;
}
Observations
Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem.
Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem.
Rebuilding the IAS for each sample also resolves the issue.
Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss.
Questions
Am I misusing IAS in some way ?
Could this be a Metal bug ?
Any guidance or confirmation would be greatly appreciated.
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users.
I was able to reproduce this and grabbed the following error messages when the freeze happens:
IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource)
Environment:
Qt 6.5.4 (Qt for iOS)
Esri Maps SDK for Qt 200.3
iPadOS 26.1
Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api):
QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL)
Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple.
I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update?
Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now.
Appreciate any thoughts/insights....
View Layout
Add the following views in a view controller:
Label
View A, with a subview of the same size: MTKView A
View B, with a subview of the same size: MTKView B
Refresh Rates of Each View
The label view refreshes at 60fps (driven by CADisplayLink).
MTKView A and B refresh at 15fps.
MTKView Implementation Details
The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering.
The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame.
self.metalView.enableSetNeedsDisplay = NO;
self.metalView.paused = YES;
A new high-priority queue is created for drawing, instead of handling it on the main queue.
MTKView Latency Tracking
The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer.
The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView.
Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that.
I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism.
Observation from Instruments
From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details.
Questions
According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer?
The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens?
Explanation of the Reasoning Behind Some MTKView Code Details
Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering.
Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
I was encountering visual artifacts in my Unity game only on iOS devices and wanted to use the Metal Debugger to diagnose it. However, it seems to only capture static noise which is not helpful as shown below
For reference, this is a frame of the visual artifact and also the location where I asked Metal to capture the frame
Am I missing any settings in Unity/Xcode?
Hi, I am using xCode26.x. But my Metal4 classes are not compiling. I downloaded the sample code from Apple's website - https://developer.apple.com/documentation/Metal/processing-a-texture-in-a-compute-function. For example, I am getting errors like "Cannot find protocol declaration for 'MTL4CommandQueue';
I have hit a deadline. Any recommendations are very welcome.
I have downloaded the Metal Tool chain. When I run the following commands on the terminal - xcodebuild -showComponent metalToolchain ; xcrun -f metal ; xcrun metal --version
I get the following response -
Asset Path: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/86fbaf7b114a899754307896c0bfd52ffbf4fded.asset/AssetData
Build Version: 17A321
Status: installed
Toolchain Identifier: com.apple.dt.toolchain.Metal.32023
Toolchain Search Path: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded
/Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/bin/metal
Apple metal version 32023.830 (metalfe-32023.830.2)
Target: air64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/metal/current/bin
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU.
Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible.
Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
I am puzzled by the setAddress(_:attributeStride:index:) of MTL4ArgumentTable. Can anyone please explain what the attributeStride parameter is for? The doc says that it is "The stride between attributes in the buffer." but why?
Who uses this for what? On the C++ side in the shaders the stride is determined by the C++ type, as far as I know. What am I missing here?
Thanks!