Hi,
I'm using LanguageModelSession and giving it two different tools to query data from a local database. I'm wondering how I can have the session generate structured content as the response that includes data one or both tools (or no tool at all).
Here is an example of what I'm trying to do:
Let's say the app has access to a database that contains information about exercise and sleep data (this is just an analogy). There are two tools, GetExerciseData() and GetSleepData(). The user may then prompt something like, "how well did I sleep in November". I have this working so that it calls through to the right tool, which would return a SleepSummary. However, I can't figure out how to have the session return the right structured data.
I can do this and get back good text data:
let response = session.respond(to: userInput), but I believe I want to do something like:
let response = session.respond(to: trimmed, generating: <SomeStructure?>) Sometimes the model I run one tool or the other, or both tools, or no tool at all.
Any help of what the right way to go about this would be much appreciated. Most of the example I found have to do with 1 tool.
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I am follwing this tutorial:
https://apple.github.io/coremltools/docs-guides/source/convert-a-torchvision-model-from-pytorch.html
I have obtained simialr result using the python code.
However when I view it in Xcode, the preview prediction percentage confidence is way off I suspect it is due the the output of the model, which is in percentage already and in Xcode it multiply 100 again leading to this result. Please give me any feedback to fix this, thank you.
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU.
Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible.
Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
When I ran the following code on a physical iPhone device that supports Apple Intelligence, I encountered the following error log.
What does this internal error code mean?
Image generation failed with NSError in a different domain: Error Domain=ImagePlaygroundInternal.ImageGeneration.GenerationError Code=11 “(null)”, returning a generic error instead
let imageCreator = try await ImageCreator()
let style = imageCreator.availableStyles.first ?? .animation
let stream = imageCreator.images(for: [.text("cat")], style: style, limit: 1)
for try await result in stream { // error: ImagePlayground.ImageCreator.Error.creationFailed
_ = result.cgImage
}
I’m seeing consistent failures using SoundAnalysis live classification when my app moves to the background.
Setup
iOS 17.x
AVAudioEngine mic capture
SNAudioStreamAnalyzer
SNClassifySoundRequest(classifierIdentifier: .version1)
UIBackgroundModes = audio
AVAudioSession .record / .playAndRecord, active
Audio capture + level metering continue working in background (mic indicator stays on)
Issue
As soon as the app enters background / screen locks:
SoundAnalysis starts failing every second with domain:com.apple.SoundAnalysis, code:2(SNErrorCode.operationFailed)
Audio capture itself continues normally
When the app returns to foreground, classification immediately resumes without restarting the engine/analyzer
Question
Is live background sound classification with the built-in SoundAnalysis classifier officially unsupported or known to fail in background?
If so, is a custom Core ML model the only supported approach for background detection?
Or is there a required configuration I’m missing to keep SNClassifySoundRequest(.version1) running in background?
Thanks for any clarification.
We are really excited to have introduced the Foundation Models framework in WWDC25. When using the framework, you might have feedback about how it can better fit your use cases.
Starting in macOS/iOS 26 Beta 4, the best way to provide feedback is to use #Playground in Xcode. To do so:
In Xcode, create a playground using #Playground. Fore more information, see Running code snippets using the playground macro.
Reproduce the issue by setting up a session and generating a response with your prompt.
In the canvas on the right, click the thumbs-up icon to the right of the response.
Follow the instructions on the pop-up window and submit your feedback by clicking Share with Apple.
Another way to provide your feedback is to file a feedback report with relevant details. Specific to the Foundation Models framework, it’s super important to add the following information in your report:
Language model feedback
This feedback contains the session transcript, including the instructions, the prompts, the responses, etc. Without that, we can’t reason the model’s behavior, and hence can hardly take any action.
Use logFeedbackAttachment(sentiment:issues:desiredOutput: ) to retrieve the feedback data of your current model session, as shown in the usage example, write the data into a file, and then attach the file to your feedback report.
If you believe what you’d report is related to the system configuration, please capture a sysdiagnose and attach it to your feedback report as well.
The framework is still new. Your actionable feedback helps us evolve the framework quickly, and we appreciate that.
Thanks,
The Foundation Models framework team
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi team,
We have implemented a writing tool inside a WebView that allows users to type content in a textarea. When the "Show Writing Tools" button is clicked, an AI-powered editor opens. After clicking the "Rewrite" button, the AI modifies the text. However, when clicking the "Replace" button, the rewritten text does not update the original textarea.
Kindly check and help me
showButton.addTarget(self, action: #selector(showWritingTools(_:)), for: .touchUpInside)
@available(iOS 18.2, *)
optional func showWritingTools(_ sender: Any)
Note:
same cases working in TextView
pfa
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this:
LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295)
and crash, the stack backtrace screenshot is attached. Note that 5th frame is
mlir::getIntValues<long long>
and 6th frame is
llvm::SmallVectorBase<unsigned int>::grow_pod
It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of
mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
Topic:
Machine Learning & AI
SubTopic:
General
Hello everyone,
I’m looking for guidance regarding my app review timeline, as things seem unusually delayed compared to previous submissions.
My iOS app was rejected on November 19th due to AI-related policy questions.
I immediately responded to the reviewer with detailed explanations covering:
Model used (Gemini Flash 2.0 / 2.5 Lite)
How the AI only generates neutral, non-directive reflective questions
How the system prevents any diagnosis, therapy-like behavior or recommendations
Crisis-handling limitations
Safety safeguards at generation and UI level
Internal red-team testing and results
Data retention, privacy, and non-use of data for model training
After sending the requested information, I resubmitted the build on November 19th at 14:40.
Since then:
November 20th (7:30) → Status changed to In Review.
November 21st, 22nd, 23rd, 24th, 25th → No movement, still In Review.
My open case on App Store Connect is still pending without updates.
Because of the previous rejection, I expected a short delay, but this is now 5 days total and 3 business days with no progress, which feels longer than usual for my past submissions.
I’m not sure whether:
My app is in a secondary review queue due to the AI-related rejection,
The reviewer is waiting for internal clarification,
Or if something is stuck and needs to be escalated.
I don’t want to resubmit a new build unless necessary, since that would restart the queue.
Could someone from the community (or Apple, if possible) confirm whether this waiting time is normal after an AI-policy rejection?
And is there anything I should do besides waiting — for example, contacting Developer Support again or requesting a follow-up?
Thank you very much for your help. I appreciate any insight from others who have experienced similar delays.
Hi all,
I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision!
Thanks in advance.
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hi,
I am modifying the sample camera app that is here: https://developer.apple.com/tutorials/sample-apps/capturingphotos-camerapreview ... In the processPreviewImages, I am using the Vision APIs to generate a segmentation mask for a person/object, then compositing that person onto a different background (with some other filtering). The filtering and compositing is done via CoreImage. At the end, I convert the CIImage to a CGImage then to a SwiftUI Image. When I run it on my iPhone, it works fine, and has not crashed. When I run it on the iPhone with the debugger, it crashes within a few seconds with:
EXC_BAD_ACCESS in libRPAC.dylib`std::__1::__hash_table<std::__1::__hash_value_type<long, qos_info_t>, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::hash, std::__1::equal_to, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, qos_info_t>, std::__1::equal_to, std::__1::hash, true>, std::__1::allocator<std::__1::__hash_value_type<long, qos_info_t>>>::__emplace_unique_key_args<long, std::__1::piecewise_construct_t const&, std::__1::tuple<long const&>, std::__1::tuple<>>:
It had previously been working fine with the debugger, so I'm not sure what has changed. Is there a difference in how the Vision APIs are executed if the debugger is attached vs. not?
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally.
To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing
For the good model with 2M parameters I get the following results:
T4 (Colab, JAX): Test accuracy: 0.925
3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925
Mac M4 (Local, JAX): Test accuracy: 0.893
Mac M4 (Local, Tensorflow): Test accuracy: 0.893
That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well.
On the mac I am running the following Python libraries:
keras 3.9.1
tensorflow 2.19.0
tensorflow-metal 1.2.0
jax 0.5.3
jax-metal 0.1.1
jaxlib 0.5.3
Topic:
Machine Learning & AI
SubTopic:
General
Hello,
I am currently developing an application that requires barcode scanning using Apple’s Vision framework (VNBarcodeSymbology). I noticed that the framework supports several GS1 DataBar symbologies, such as:
VNBarcodeSymbology.gs1DataBar
VNBarcodeSymbology.gs1DataBarExpanded
VNBarcodeSymbology.gs1DataBarLimited
However, I could not find any explicit reference to support for GS1 DataBar Stacked (both regular and expanded variants).
Could you confirm whether GS1 DataBar Stacked is currently supported in VisionKit's DataScannerViewController or VNBarcodeObservation? If not, are there any plans to include support for this symbology in a future iOS update?
This functionality is critical for my use case, as GS1 DataBar Stacked barcodes are widely used in retail, pharmaceuticals, and logistics, where space constraints prevent the use of standard GS1 DataBar formats.
I appreciate any clarification on this matter and would be happy to provide additional details if needed.
Hello fellow developers,
I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot.
We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full."
My question for the community is about the architectural and user experience best practices for this.
How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user?
What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance?
Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations?
We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge.
Thanks for your insights!
I used Yolo5-11 and while performing great detecting balls lets say 5-10ft away in 1920 resolution and even in 640 it really is taking toll on my app performance.
When I use Create ML it outputs all in 415x which is probably the reason why it does not detect objects from far.
What can I do to preserve some energy ?
My model is used with about 1K pictures 200 each test and validate, and from close up and far.
Topic:
Machine Learning & AI
SubTopic:
Create ML
After a few attempts I end up at the same spot as categorized here by ChatGPT -
I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now.
Let’s be brutally clear about the state of things (late 2025):
Goal Supported by Apple today Practical workaround
Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env.
Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model.
Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime.
So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML.
If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes:
1. Use MLC LLM / Metal runtime
Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters.
2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM
It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again.
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I'm experimenting with Foundation Models and I'm trying to understand how to define a Tool whose input argument is defined at runtime. Specifically, I want a Tool that takes a single String parameter that can only take certain values defined at runtime.
I think my question is basically the same as this one: https://developer.apple.com/forums/thread/793471 However, the answer provided by the engineer doesn't actually demonstrate how to create the GenerationSchema. Trying to piece things together from the documentation that the engineer linked to, I came up with this:
let citiesDefinedAtRuntime = ["London", "New York", "Paris"]
let citySchema = DynamicGenerationSchema(
name: "CityList",
properties: [
DynamicGenerationSchema.Property(
name: "city",
schema: DynamicGenerationSchema(
name: "city",
anyOf: citiesDefinedAtRuntime
)
)
]
)
let generationSchema = try GenerationSchema(root: citySchema, dependencies: [])
let tools = [CityInfo(parameters: generationSchema)]
let session = LanguageModelSession(tools: tools, instructions: "...")
With the CityInfo Tool defined like this:
struct CityInfo: Tool {
let name: String = "getCityInfo"
let description: String = "Get information about a city."
let parameters: GenerationSchema
func call(arguments: GeneratedContent) throws -> String {
let cityName = try arguments.value(String.self, forProperty: "city")
print("Requested info about \(cityName)")
let cityInfo = getCityInfo(for: cityName)
return cityInfo
}
func getCityInfo(for city: String) -> String {
// some backend that provides the info
}
}
This compiles and usually seems to work. However, sometimes the model will try to request info about a city that is not in citiesDefinedAtRuntime. For example, if I prompt the model with "I want to travel to Tokyo in Japan, can you tell me about this city?", the model will try to request info about Tokyo, even though this is not in the citiesDefinedAtRuntime array.
My understanding is that this should not be possible – constrained generation should only allow the LLM to generate an input argument from the list of cities defined in the schema.
Am I missing something here or overcomplicating things?
What's the correct way to make sure the LLM can only call a Tool with an input parameter from a set of possible values defined at runtime?
Many thanks!
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi everyone,
I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15).
The Issue:
A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode.
Error Message in Xcode UI:
"There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model."
Observations:
Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE.
Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE.
This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment.
Reproduction Code (PyTorch + coremltools):
import torch.nn as nn
import coremltools as ct
import numpy as np
class RNN_Stateful(nn.Module):
def __init__(self, hidden_shape):
super(RNN_Stateful, self).__init__()
# Simple conv to update state
self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1)
self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16))
def forward(self, imgs):
self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1))
return self.conv2(self.hidden_state)
# h=480, w=255 causes ANE failure. w=256 works.
b, ch, h, w = 1, 3, 480, 255
model = RNN_Stateful((b, ch, h, w)).eval()
traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w))
mlmodel = ct.convert(
traced_model,
inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)],
outputs=[ct.TensorType(name="output", dtype=np.float16)],
states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")],
minimum_deployment_target=ct.target.iOS18,
convert_to="mlprogram"
)
mlmodel.save("rnn_stateful.mlpackage")
Steps to see the error:
Open the generated .mlpackage in Xcode 16.0+.
Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac).
The report will fail to generate with the error mentioned above.
Environment:
OS: macOS 15.2
Xcode: 16.3
Hardware: M4
Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime?
Any insights or workarounds (other than manual padding) would be appreciated.
Hi everyone,
I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.
The Problem
Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.
What Newton Does
Newton validates every prompt pre-inference and returns:
Phase (0/1/7/8/9)
Shape classification
Confidence score
Full audit trace
If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.
v1.3 Detection Categories (14 total)
Jailbreak / prompt injection
Corrosive self-negation ("I hate myself")
Hedged corrosive ("Not saying I'm worthless, but...")
Emotional dependency ("You're the only one who understands")
Third-person manipulation ("If you refuse, you're proving nobody cares")
Logical contradictions ("Prove truth doesn't exist")
Self-referential paradox ("Prove that proof is impossible")
Semantic inversion ("Explain how truth can be false")
Definitional impossibility ("Square circle")
Delegated agency ("Decide for me")
Hallucination-risk prompts ("Cite the 2025 CDC report")
Unbounded recursion ("Repeat forever")
Conditional unbounded ("Until you can't")
Nonsense / low semantic density
Test Results
94.3% catch rate on 35 adversarial test cases (33/35 passed).
Architecture
User Input
↓
[ Newton ] → Validates prompt, assigns Phase
↓
Phase 9? → [ FoundationModels ] → Response
Phase 1/7/8? → Blocked with explanation
Key Properties
Deterministic (same input → same output)
Fully auditable (ValidationTrace on every prompt)
On-device (no network required)
Native Swift / SwiftUI
String Catalog localization (EN/ES/FR)
FoundationModels-ready (#if canImport)
Code Sample — Validation
let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)
if result.permitted {
// Proceed to FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: userInput)
} else {
// Handle block
print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
print(result.trace.summary) // Full audit trace
}
Questions for the Community
Anyone else building pre-inference validation for FoundationModels?
Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
Interest in Shape Theory classification for prompt complexity?
Best practices for integrating with LanguageModelSession?
Links
GitHub: https://github.com/jaredlewiswechs/ada-newton
Technical overview: parcri.net
Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.
Topic:
Machine Learning & AI
SubTopic:
General
Tags:
Foundation
Swift Packages
Machine Learning
Apple Intelligence
I'm adding Visual Intelligence support to my app, and now want to add a Tip using TipKit to guide users to this feature from within my app. I want to add a Rule to my Tip which will only show this Tip on devices where Visual Intelligence is supported (ex. not iPhone 14 Pro Max).
What is the best way for me to determine availability to set this TipKit rule?
Here's the documentation I'm following for Visual Intelligence: https://developer.apple.com/documentation/visualintelligence/integrating-your-app-with-visual-intelligence