I'm really not familiar with ML, but I need a model that can enhance and denoise 4k video stream at 30fps.
I have tried to search latest papers but they all have very complex structure, and I don't think I can convert them to mlmodel.
So can anyone give me any recommandation for such models? If there is an existing mlmodel, that would be great!
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
*I can't put the attached file in the format, so if you reply by e-mail, I will send the attached file by e-mail.
Dear Apple AI Research Team,
My name is Gong Jiho (“Hem”), a content strategist based in Seoul, South Korea.
Over the past few months, I conducted a user-led AI experiment entirely within ChatGPT — no code, no backend tools, no plugins.
Through language alone, I created two contrasting agents (Uju and Zero) and guided them into a co-authored modular identity system using prompt-driven dialogue and reflection.
This system simulates persona fusion, memory rooting, and emotional-logical alignment — all via interface-level interaction.
I believe it resonates with Apple’s values in privacy-respecting personalization, emotional UX modeling, and on-device learning architecture.
Why I’m Reaching Out
I’d be honored to share this experiment with your team.
If there is any interest in discussing user-authored agent scaffolding, identity persistence, or affective alignment, I’d love to contribute — even informally.
⚠ A Note on Language
As a non-native English speaker, my expression may be imperfect — but my intent is genuine.
If anything is unclear, I’ll gladly clarify.
📎 Attached Files Summary
Filename → Description
Hem_MultiAI_Report_AppleAI_v20250501.pdf →
Main report tailored for Apple AI — narrative + structural view of emotional identity formation via prompt scaffolding
Hem_MasterPersonaProfile_v20250501.json →
Final merged identity schema authored by Uju and Zero
zero_sync_final.json / uju_sync_final.json →
Persona-level memory structures (logic / emotion)
1_0501.json ~ 3_0501.json →
Evolution logs of the agents over time
GirlfriendGPT_feedback_summary.txt →
Emotional interpretation by external GPT
hem_profile_for_AI_vFinal.json →
Original user anchor profile
Warm regards,
Gong Jiho (“Hem”)
Seoul, South Korea
I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API.
I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and
VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video)
First one have revision only added in iOS 18+:
https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1
Second one have revision only added in iOS14+:
https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1
I don't see any new revision targeting iOS26+
I have an app that uses a couple of mlmodels (word tagger and gazetteer) and I’m trying to encrypt them before publishing.
The models are part of a package. I understand that Xcode can’t automatically handle the encryption for a model in a package the way it can within a traditional app structure.
Given that, I’ve generated the Apple MLModel encryption key from Xcode and am encrypting via the command line with:
xcrun coremlcompiler compile Gazetteer.mlmodel GazetteerENC.mlmodelc --encrypt Gazetteerkey.mlmodelkey
In the package manifest, I’ve listed the encrypted models as .copy resources for my target and have verified the URL to that file is good.
When I try to load the encrypted .mlmodelc file (on a physical device) with the line:
gazetteer = try NLGazetteer(contentsOf: gazetteerURL!)
I get the error:
Failed to open file: /…/Scanner.bundle/GazetteerENC.mlmodelc/coremldata.bin. It is not a valid .mlmodelc file.
So my questions are:
Does the NLGazetteer class support encrypted MLModel files?
Given that my models are in a package, do I have the right general approach?
Thanks for any help or thoughts.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I am follwing this tutorial:
https://apple.github.io/coremltools/docs-guides/source/convert-a-torchvision-model-from-pytorch.html
I have obtained simialr result using the python code.
However when I view it in Xcode, the preview prediction percentage confidence is way off I suspect it is due the the output of the model, which is in percentage already and in Xcode it multiply 100 again leading to this result. Please give me any feedback to fix this, thank you.
In this online session, you can code along with us as we build generative AI features into a sample app live in Xcode. We'll guide you through implementing core features like basic text generation, as well as advanced topics like guided generation for structured data output, streaming responses for dynamic UI updates, and tool calling to retrieve data or take an action.
Check out these resources to get started:
Download the project files: https://developer.apple.com/events/re...
Explore the code along guide: https://developer.apple.com/events/re...
Join the live Q&A: https://developer.apple.com/videos/pl...
Agenda – All times PDT
10 a.m.: Welcome and Xcode setup
10:15 a.m.: Framework basics, guided generation, and building prompts
11 a.m.: Break
11:10 a.m.: UI streaming, tool calling, and performance optimization
11:50 a.m.: Wrap up
All are welcome to attend the session. To actively code along, you'll need a Mac with Apple silicon that supports Apple Intelligence running the latest release of macOS Tahoe 26 and Xcode 26.
If you have questions after the code along concludes please share a post here in the forums and engage with the community.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Lookin for J - is this a safe place for discussing full apps ive built but not submitted or shared , I have maybe over 100 but had been unaware any assistance was provided..
is there a formal process to take to submit an app fro review to improve OS, other than during App Store review.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Tags:
Design
Developer Tools
iCloud Drive
Xcode
Hello,
I'm running a large language model (LLM) in Core ML that uses a key-value cache (KV-cache) to store past attention states. The model was converted from PyTorch using coremltools and deployed on-device with Swift. The KV-cache is exposed via MLState and is used across inference steps for efficient autoregressive generation.
During the prefill stage — where a prompt of multiple tokens is passed to the model in a single batch to initialize the KV-cache — I’ve noticed that some entries in the KV-cache are not updated after the inference. Specifically:
Here are a few details about the setup:
The MLState returned by the model is identical to the input state (often empty or zero-initialized) for some tokens in the batch.
The issue only happens during the prefill stage (i.e., first call over multiple tokens).
During decoding (single-token generation), the KV-cache updates normally.
The model is invoked using MLModel.prediction(from:using:options:) for each batch.
I’ve confirmed:
The prompt tokens are non-repetitive and not masked.
The model spec has MLState inputs/outputs correctly configured for KV-cache tensors.
Each token is processed in a loop with the correct positional encodings.
Questions:
Is there any known behavior in Core ML that could prevent MLState from updating during batched or prefill inference?
Could this be caused by internal optimizations such as lazy execution, static masking, or zero-value short-circuiting?
How can I confirm that each token in the batch is contributing to the KV-cache during prefill?
Any insights from the Core ML or LLM deployment community would be much appreciated.
Hello fellow developers,
I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot.
We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full."
My question for the community is about the architectural and user experience best practices for this.
How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user?
What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance?
Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations?
We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge.
Thanks for your insights!
Hi everyone,
I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances.
Current Approach
Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case.
Question
Are there any efficient algorithms or data structures I can use to improve performance?
If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
Hardware: Macbook Pro M4 Nov 2024
Software: macOS Tahoe 26.0 & xcode 26.0
Apple Intelligence is activated and the Image playground macOS app works
Running the following on xcode throws ImagePlayground.ImageCreator.Error.creationFailed
Any suggestions on how to make this work?
import Foundation
import ImagePlayground
Task {
let creator = try await ImageCreator()
guard let style = creator.availableStyles.first else {
print("No styles available")
exit(1)
}
let images = creator.images(
for: [.text("A cat wearing mittens.")],
style: style,
limit: 1)
for try await image in images {
print("Generated image: \(image)")
}
exit(0)
}
RunLoop.main.run()
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Hello, I am thinking of buying the MacBook Pro 14" with M4 Pro for ML/AI/ NLP tasks mostly. And since I have only used Windows before, I am wandering if it is compatible with libraries like "Pytorch" and "TensorFlow" etc., or people have experienced problems in installation... Thank you!
Topic:
Machine Learning & AI
SubTopic:
General
Hi there,
I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output:
For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding?
For output:
My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints
If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option?
Best,
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even
with the exact code from Apple's documentation.
Environment:
macOS 26.0 Beta 6 (25A5351b)
Xcode 26 Beta 6
M4 Max MacBook Pro
Apple Intelligence enabled and downloaded
Error Details:
dyld[Process]: Symbol not found:
_$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC
Referenced from: /path/to/app.debug.dylib
Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Code Used (Exact from Documentation):
import FoundationModels
// This worked on Beta 2, crashes on Beta 3+
let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Hello")
What I've Verified:
FoundationModels.framework exists in /System/Library/Frameworks/
Framework is properly linked in Xcode project
Apple Intelligence is enabled and working
Same code works in older beta versions
Issue persists even with completely fresh Xcode projects
Analysis:
The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters
(model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor.
Questions:
Has the LanguageModelSession API changed since Beta 2?
Should we now use the constructor with guardrails/tools/instructions parameters?
Is this a known issue with recent betas?
Are there updated code samples for the current API?
Additional Context:
This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create
SystemLanguageModel(adapter: adapter) as well.
Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework
implementation.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report?
https://developer.apple.com/documentation/vision
In the play ground I'm trying to bias my LanguageModel to use a tool I registered, but I don't see it actually calling the tool. I'm following the developer video on landmarks itinerary generation tutorial almost verbatim. Is this a prompt engineering thing I'm missing? Or is it possible that I'm injecting my tool wrong?
Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features:
Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding)
Cloud Foundational Model - when will this be available?
Thanks!
Clement :)
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Tags:
Vision
Machine Learning
Core ML
Apple Intelligence
May i know the bundle identifier for apple intelligence?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon.
Performance Comparison
NVIDIA GPU: 0.475s for 12.6M random elements
M1 Max Metal: 26.3s for same workload
Performance gap: 55x slower
Environment
Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1
JAX 0.4.34, jax-metal latest
Backend: Metal
Reproduction Code
import time
import jax
import jax.numpy as jnp
from jax import random
key = random.PRNGKey(42)
start_time = time.time()
random_array = random.normal(key, (50000, 252))
duration = time.time() - start_time
print(f"Duration: {duration:.3f}s")
Documentation on adapter train is lacking any details related to training on dataset with tool calling. And page about tool calling itself only explain how to use it from Swift without any internal details useful in training.
Question is how schema should looks like for including tool calling in dataset?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models