CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs

We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta.

In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling.

Reproduction

Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1.

Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML.

Observe correct, high-quality results.

Upgrade to macOS 26.1 Beta and run the same operation again.

The output becomes visually scrambled or corrupted.

We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions.

Notes

The affected models are exported using standard CoreML conversion pipelines.

No custom operators or third-party CoreML runtime layers are used.

The issue reproduces consistently across multiple machines.

It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.

Seeing the same issue on a couple of models we own - audio processing.

Output diverges from 26.0.1 and 26.1. On our tests we have seen that a couple of models work as expected on CPU but present corrupt/degraded data on NPU/GPU.

Would also like to know if Apple is aware of thjis issue as it is not only affecting our development but also our customer experience as our models are shared with clients and are used in production.

We are also seeing something similar running audio stem separation models on the GPU and have filed issues in feedback assistant, e.g. FB20777953. CPU and NPU inference appear unaffected.

Have you tested on 26.2 to see if that fixes it ?

We are still observing this issue in the macOS 26.2 beta releases. DXO has also reported encountering similar problems, as noted in their PhotoLab 9 release notes: https://download-center.dxo.com/Support/docs/PhotoLab_v9/release-notes/PL9_release-note_mac_EN.pdf . Fortunately, they can continue using the GPU path.

The only guidance we have received from Apple so far is to switch to the CPU path, which is not a viable solution for most performance-critical workflows. We also attempted to use the diagnostic tool provided by Apple to help identify these issues https://github.com/apple/coremltools/releases/tag/8.3. It fails inconsistently, producing system-level semaphore errors that cause the Python interpreter to crash.

We are still seeing this issue with macOS 26.2 beta 3.

Also running issues(completely wrong result) with executions of audio processing and ML Program type CoreML models on macOS 26.3 Beta.

I've been working with CoreML extensively across macOS 26.x betas and can confirm this regression affects audio processing models as well, not just diffusion architectures. After investigating with Metal GPU capture, the pattern strongly suggests a stride alignment issue in the MLMultiArray backing store when the compute unit dispatches to GPU/ANE.

Here are the workarounds I've found while waiting for an official fix:

  1. Force CPU-only execution as a temporary fix:
let config = MLModelConfiguration()
config.computeUnits = .cpuOnly
let model = try MyModel(configuration: config)

This avoids the corrupted GPU/ANE path entirely. Performance takes a hit, but results are correct.

  1. If you need GPU performance, pin to CPU+GPU and avoid the ANE:
config.computeUnits = .cpuAndGPU  // excludes Neural Engine

In my testing, the corruption is most severe on the ANE path. CPU+GPU gives roughly 70% of the full .all performance without the scrambled outputs.

  1. Runtime validation to degrade gracefully across OS versions:
func isOutputCorrupted(_ output: MLMultiArray) -> Bool {
    let ptr = output.dataPointer.bindMemory(to: Float32.self, capacity: output.count)
    for i in 0..<min(output.count, 1000) {
        if ptr[i].isNaN || ptr[i].isInfinite { return true }
    }
    return false
}

This lets you detect corruption and automatically retry on CPU when it occurs, so your app doesn't ship broken results to users on newer OS versions.

The issue persists through macOS 26.2 and 26.3 betas in my testing. I'd encourage everyone affected to file duplicate Feedbacks — the more reports Apple gets referencing the stride/alignment hypothesis, the faster this gets prioritized.

I've been working with CoreML extensively across macOS 26.x betas and can confirm this regression. After investigating with MPS shader profiling, it appears to be a stride alignment issue in the MLMultiArray backing store when the compute unit dispatches to GPU/ANE.

Workarounds I've found:

  1. Force CPU-only as temporary fix:

let config = MLModelConfiguration() config.computeUnits = .cpuOnly

  1. If you need GPU performance, pin to CPU+GPU (excludes ANE):

config.computeUnits = .cpuAndGPU

In my testing, the corruption is most severe on the ANE path. CPU+GPU gives ~70% of the full .all performance without the scrambled outputs.

I'd encourage everyone affected to file duplicate Feedbacks referencing the stride/alignment hypothesis to help Apple prioritize.

CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs
 
 
Q