Subject: Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera
Message:
Hello Apple Developer Community,
We’re developing an application using the front camera that requires both real-time ARKit face tracking/guidance and the capture of high-resolution still images via AVCaptureSession. Our goal is to leverage ARKit’s depth and face data to render a captured image from another perspective post-capture, maintaining high image quality.
Our Approach:
Real-Time ARKit Guidance:
Utilize ARKit (e.g., ARFaceTrackingConfiguration) for continuous face tracking, depth, and scene understanding to guide the user in real time.
High-Resolution Capture Transition:
At the moment of capture, we plan to pause the ARKit session and switch to an AVCaptureSession to take a high-resolution image.
We assume that for a front-facing image, the subject’s face is directly front-on, and the relative pose between the face and camera remains the same during the transition. The only variation we expect is a change in distance.
Our intention is to minimize the delay between the last ARKit frame and the high-res capture to maintain temporal consistency, assuming that aside from distance, the face-camera relative pose remains unchanged.
Post-Processing Perspective Rendering:
Using the last ARKit face data (depth, pose, and landmarks) along with the high-resolution 2D image, we aim to render the scene from another perspective.
We want to correct the perspective of the 2D image using SceneKit or RealityKit, leveraging the collected ARKit scene information to achieve a natural, high-quality rendering from a different viewpoint.
The rendering should match the quality of a normally captured high-resolution image, adjusting for the difference in distance while using the stored ARKit data to correct perspective.
Our Questions:
Session Transition Best Practices:
What are the recommended best practices to seamlessly pause ARKit and switch to a high-resolution AVCapture session on the front camera
How can we minimize user movement or other issues during this brief transition, given our assumption that the face-camera pose remains largely consistent except for distance changes?
Data Integration for Perspective Rendering:
How can we effectively integrate stored ARKit face, depth, and pose data with the high-res image to perform accurate perspective correction or rendering from another viewpoint?
Given that we assume the relative pose is constant except for distance, are there strategies or APIs to leverage this assumption for simplifying the perspective transformation?
Perspective Correction with SceneKit/RealityKit:
What techniques or workflows using SceneKit or RealityKit are recommended for correcting the perspective of a captured 2D image based on ARKit scene data?
How can we use these frameworks to render the high-resolution image from an alternative perspective, while maintaining image quality and fidelity?
4. Pitfalls and Guidelines:
What common pitfalls should we be aware of when combining ARKit tracking data with high-res capture and post-processing for perspective rendering?
Are there performance considerations, recommended thresholds for acceptable temporal consistency, or validation techniques to ensure the ARKit data remains applicable at the moment of high-res capture?
We appreciate any advice, sample code references, or documentation pointers that could assist us in implementing this workflow effectively.
Thank you!
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hello all, I saw this interesting VisionOS app: https://apps.apple.com/us/app/splitscreen-multi-display/id6478007837
I was wondering if there was any documentation on the Swift APIs that were used to create this app.
Wondering if this is even possible without using CVImageBuffer and passing each frame as an image which I imagine will be very expensive.
Have a PoC of a shader graph that applies a radial zoom effect to an image. In RealityKit I'm passing the image as a resource:
if let textureResource = try? await TextureResource(named: "fuji") {
let value = MaterialParameters.Value.textureResource(textureResource)
try? material.setParameter(name: "MyImage", value: value)
model.model?.materials = [material]
}
Thanks in advance
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Tags:
Reality Composer Pro
Shader Graph Editor
visionOS
I want a sentence custom hover effect, not a button.
I want a hover effect when you look at one sentence out of many sentences.
So I searched for reference videos https://youtu.be/DftRTx1oX6E , https://developer.apple.com/videos/play/wwdc2023/10110/ on apple youtube and visionOS documentation.
But I haven't gotten anywhere near my wish feature yet.
I respectfully request someone to help me. :)
PLATFORM AND VERSION
Vision OS
Development environment: Xcode 16.2, macOS 15.2
Run-time configuration: visionOS 2.3 (On Real Device, Not simulator)
Please someone confirm I'm not crazy and this issue is actually out of my control.
Spent hours trying to fix my app and running profiles because thought it was an issue related to my apps performance. Finally considered chance it was issue with API itself and made sample app to isolate problem, and it still existed in it. The issue is when a model entity moves around in a full space that was launched when the system environment immersion was turned up before opening it, the entities looks very choppy as they move around. If you take off the headset while still in the space, and put it back on, this fixes it and then they move smoothly as they should. In addition, you can also leave the space, and then turn the system environment immersion all the way down before launching the full space again, this will also make the entity moves smoothly as it should. If you launch a mixed immersion style instead of a full immersion style, this issue never arrises. The issue only arrises if you launch the space with either a full style, or progressive style, while the system immersion level is turned on.
STEPS TO REPRODUCE
https://github.com/nathan-707/ChoppyEntitySample
Open my test project, its a small, modified vision os project template that shows it clearly.
otherwise:
create immersive space with either full or progressive immersion style.
setup a entity in kinematic mode, apply a velocity to it to make it pass over your head when the space appears.
if you opened the space while the Apple Vision Pros system environment was turned up, the entity will look choppy.
if you take the headset off while in the same space, and put it back on, it will fix the issue and it will look smooth.
alternatively if you open the space with the system immersion environment all the way down, you will also not run into the issue. Again, issue also does not happen if space launched is in mixed style.
I have two RealityView: ParentView and When click the button in ParentView, ChildView will be shown as full screen cover, but the camera feed in ChildView will not be shown, only black screen.
If I show ChildView directly, it works with camera feed.
Please help me on this issue? Thanks.
import RealityKit
import SwiftUI
struct ParentView: View{
@State private var showIt = false
var body: some View{
ZStack{
RealityView{content in
content.camera = .virtual
let box = ModelEntity(mesh: MeshResource.generateSphere(radius: 0.2),materials: [createSimpleMaterial(color: .red)])
content.add(box)
}
Button("Click here"){
showIt = true
}
}
.fullScreenCover(isPresented: $showIt){
ChildView()
.overlay(
Button("Close"){
showIt = false
}.padding(20),
alignment: .bottomLeading
)
}
.ignoresSafeArea(.all)
}
}
import ARKit
import RealityKit
import SwiftUI
struct ChildView: View{
var body: some View{
RealityView{content in
content.camera = .spatialTracking
}
}
}
Hi Apple Team and Developers,
First of all, I’d like to express my appreciation for the incredible results achieved using PhotogrammetrySession. I’ve been developing a portrait scanning app using Object Capture, and in many tests—especially with human models—I’ve found the reconstructed body surfaces are remarkably smooth and clean, often outperforming tools like Metashape and RealityCapture in terms of aesthetic results.
However, I’ve encountered some challenges when working with complex areas like long hair overlapping the face. For instance, with female models where strands of hair partially occlude the face, the resulting mesh tends to merge the hair and facial geometry. This leads to distorted or “melted” facial features, likely due to ambiguity in the geometry estimation phase.
Feature Suggestion:
Would it be possible to allow developers to supply two versions of the input images:
• One version (original) for texture generation
• A pre-processed version (e.g., contrast-enhanced or CLAHE filtered) to guide mesh reconstruction only
This would give us the flexibility to enhance edge features or shadow detail without affecting the final texture appearance. In other photogrammetry pipelines, applying image enhancement selectively before dense reconstruction improves geometry quality in low-contrast areas.
Question:
Is there any plan to support this kind of two-path workflow in future versions of PhotogrammetrySession? Or perhaps expose more intermediate stages or tunable parameters to developers?
Also, any hints on what we can expect from WWDC 2025 regarding improvements to Object Capture or related vision/3D technologies?
Thanks again for this powerful API. Looking forward to hearing insights from the team and other developers.
Warm regards,
KitCheng
Trying to add a Reality Composer Pro project into my swift playground application. Can't figure out what name to call for the package.
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Tags:
Swift Student Challenge
Swift Playground
RealityKit
Hello, I am currently developing a Vision Pro VR application with Unreal Engine 5.5. Is it possible to interact with objects (grabbing, clicking on buttons)? I cannot find any information on this. Thank you.
Topic:
Spatial Computing
SubTopic:
General
In several visionOS apps, we readjust our scenes to the user's eye level (their heads). But, we have encountered issues whereby the WorldTrackingProvider returns bad/incorrect positions for the first x number of frames.
See below code which you can copy paste in any Immersive Space. Relaunch the space and observe the numberOfBadWorldInfos value is inconsistent.
a. what is the most reliable way to get the devices's position?
b. is this indeed a bug?
c. are we using worldInfo improperly?
d. as a workaround, in our apps we set to 10 the number of frames to let pass before using worldInfo, should we set our threshold differently?
import ARKit
import Combine
import OSLog
import SwiftUI
import RealityKit
import RealityKitContent
let SUBSYSTEM = Bundle.main.bundleIdentifier!
struct ImmersiveView: View {
let logger = Logger(subsystem: SUBSYSTEM, category: "ImmersiveView")
let session = ARKitSession()
let worldInfo = WorldTrackingProvider()
@State var sceneUpdateSubscription: EventSubscription? = nil
@State var deviceTransform: simd_float4x4? = nil
@State var numberOfBadWorldInfos = 0
@State var isBadWorldInfoLoged = false
var body: some View {
RealityView { content in
try? await session.run([worldInfo])
sceneUpdateSubscription = content.subscribe(to: SceneEvents.Update.self) { event in
guard let pose = worldInfo.queryDeviceAnchor(atTimestamp: CACurrentMediaTime()) else {
return
}
// `worldInfo` does not return correct values for the first few frames (exact number of frames is unknown)
// - known SO: https://stackoverflow.com/questions/78396187/how-to-determine-the-first-reliable-position-of-the-apple-vision-pro-device
deviceTransform = pose.originFromAnchorTransform
if deviceTransform!.columns.3.y < 1.6 {
numberOfBadWorldInfos += 1
logger.warning("\(#function) \(#line) deviceTransform.columns.3.y \(deviceTransform!.columns.3.y), numberOfBadWorldInfos \(numberOfBadWorldInfos)")
} else {
if !isBadWorldInfoLoged {
logger.info("\(#function) \(#line) deviceTransform.columns.3.y \(deviceTransform!.columns.3.y), numberOfBadWorldInfos \(numberOfBadWorldInfos)")
}
isBadWorldInfoLoged = true // stop logging.
}
}
}
}
}
Hi guys,
In visionOS, when using a ZStack decorated with .glassBackgroundEffect(), you can see the 3D glass background from the front, but when viewed from the side, the view appears to have no thickness.
However, I noticed that in an app built by Apple, when viewing a glass background view from the side, it appears to have thickness.
I tried adding .frame(depth:) to a glass background view, but it appears as two separate layers spaced by the depth value.
My question is:
Is there a view modifier that adds visual thickness to a glass background view, as shown in the picture?
Or, if not, how should I write a custom view modifier to achieve this effect? Thanks!
help me please i cant remove little messed up pixels and they are getting more and more please help me!!
2025 Macbook pro no protection screen actual screen
I'm working on an iOS app using ARKit and RealityKit where I scan QR codes and want to place 3D models at the exact position of the QR code in the real world.
Is it possible to accurately place a 3D model at the exact position of a QR code in AR using ARKit and RealityKit? Specifically, I want the model to appear at the precise location where the QR code is detected, rather than just somewhere in the AR space.
If this is possible, could you point me in the right direction or recommend the best approach to achieve this?
Thank you for your help!
Here is the sample project from apple of Object Tracking.
https://developer.apple.com/documentation/visionOS/exploring_object_tracking_with_arkit
can we improve it tracking accuracy and tracking when object is moving little faster, so the 3d object that draw still follow it and make it more accurate.
I believe I have created a videoMaterial and assigned it to a mesh with code I found in the Developer's Documentation but Im getting this error.
"Trailing closure passed to parameter of type 'String' that does not accept a closure"
I have attached a photo of the code and where the error happens.
Any help will greatly be appreciated.
Hi, I'm playing now with hand tracking. I want to get position of hand inside a system update function. I was not sure if transform I'm getting from hand attached AnchorEntity (with trackingMode: .predicted) would give same results as handAnchors(at:) from hand tracking provider, so I started to read them both and compare. For handAnchors i tried using context.scene.timebase.sourceTimebase!.sourceClock!.time.seconds and CACurrentMediaTime() as timestamp source. They seem to use exactly same clock, so that doesn't matter, but:
for some reason update handler is always called twice with same context.deltaTime, but first time the query finds 0 entities, second time it finds them all. The query is the standard EntityQuery(where: .has(MyComponent.self)) and in update (matching: Self.query, updatingSystemWhen: .rendering). Here's part of logs:
System update called, entity count: 0, dt: 0.01000458374619484, absTime: 4654.222593541
System update called, entity count: 11, dt: 0.01000458374619484, absTime: 4654.22262525
System update called, entity count: 0, dt: 0.009999999776482582, absTime: 4654.249390875
System update called, entity count: 11, dt: 0.009999999776482582, absTime: 4654.249425
accounting for the double update calling I started to calculate time delta of absolute time between calls and they're most of the time much bigger, or much smaller than advertised by system's context.deltaTime, only sometimes they kind of match, for example:
system: (dt: 0.01000458374619484)
scene : (dt: 0.021419291667371) (absTime: 4654.222628125001)
and the very next call
system: (dt: 0.010009 166784584522)
scene : (dt: 0.0013097083328830195) (absTime: 4654.223937833334)
but sometimes
system: (dt: 0.009999999776482582)
scene : (dt: 0.009 112249999816413) (absTime: 4654.351299 166668)
Shouldn't those be more or less equal, or am I missing something?
In the end it seems that getting hand position from AnchorEntity and with handAnchors(at:) gives kind of same results, but at different time points, so I'd love to understand what's the correct way to use them and why time flows differently :).
--Edit--
P.S. Had to put spaces everywhere in logs between "9" and "1", otherwise post was blocked due to "sensitive content" :D
Hello,
We discovered that a bunch of our old animated models were no longer animated on iOS15 and onwards.
After a few days of playing spot the difference between usda files I noticed that all the broken models had an xform called "Scene". Lo and behold, changing the name of that xform fixed the issue on all the models. Even lowercase "scene" makes the animations work again. Is "Scene" a reserved keyword or something? What other keywords do we need to avoid so we can create more robust USDZ files?
I'm surprised this issue isn't more widespread considering Blender wraps models in a "Scene" node.
At the drive link below you can find two animated cube USDZs. The only difference is the name of one of the xforms. The one with a "Scene" xform is not animated in quicklook (replicated on iPhone 13 iOS v15.2, iPhone 13 iOS v 18.3, and various devices on Browserstack including iPhone 16 iOS v18.3).
https://drive.google.com/drive/folders/1dch1WaM9O6mbHy29S6NGWgnSHkZkPiBf?usp=sharing
VisionPro 开发,XCode,我想在窗口中找到一个显示模型的图片。这个模型可以改变它的材料,它不是唯一的形象,它正在改变。如何将此模型转换为图像
Topic:
Spatial Computing
SubTopic:
ARKit
Hello,
In my project, I have attached a ManipulationComponent to Entity A and as expected, I'm able interact with it using the built-in gestures. I have another Entity B which is a child of A that I would like to interact with as well, so I attempted to add a ManipulationComponent to B. However, no gestures seem to be registered on B; I can still interact with A but B cannot be interacted with despite having ManipulationComponents on both entities.
So I'm wondering if I'm just doing something wrong, if this is an issue with the ManipulationComponent, or if this is a limitation of the API.
Attached is the code used to add the ManipulationComponent to an Entity and it was done on both A and B:
let mc = ManipulationComponent()
model.components.set(mc)
var boxShape = ShapeResource.generateBox(width: 0.25, height: 0.05, depth: 0.25)
boxShape = boxShape.offsetBy(translation: simd_float3(0, -0.05, -0.25))
ManipulationComponent.configureEntity(model, collisionShapes: [boxShape])
if var mc = model.components[ManipulationComponent.self] {
mc.releaseBehavior = .stay
mc.dynamics.inertia = .low
model.components.set(mc)
}
I am using visionOS 26.0; let me know if there's any additional information needed.
Hello,
There are three issues I am running into with a default template project + additional minimal code changes:
the Sphere_Left entity always overlaps the Sphere_Right entity.
when I release the Sphere_Left entity, it does not remain sticking to the Sphere_Right entity
when I release the Sphere_Left entity, it distances itself from the Sphere_Right entity
When I manipulate the Sphere_Right entity, these above 3 issues do not occur: I get a correct and expected behavior.
These issues are simple to replicate:
Create a new project in XCode
Choose visionOS -> App, then click Next
Name your project, and leave all other options as defaults: Initial Scene: Window, Immersive Space Renderer: RealityKit, Immersive Space: Mixed, then click Next
Save you project anywhere...
Replace the entire ImmersiveView.swift file with the below code.
Run.
Try to manipulate the left sphere, you should get the same issues I mentioned above
If you restart the project, and manipulate only the right sphere, you should get the correct expected behaviors, and no issues.
I am running this in macOS 26, XCode 26, on visionOS 26, all released lately.
ImmersiveView Code:
//
// ImmersiveView.swift
//
import OSLog
import SwiftUI
import RealityKit
import RealityKitContent
struct ImmersiveView: View {
private let logger = Logger(subsystem: "com.testentitiessticktogether", category: "ImmersiveView")
@State var collisionBeganUnfiltered: EventSubscription?
var body: some View {
RealityView { content in
// Add the initial RealityKit content
if let immersiveContentEntity = try? await Entity(named: "Immersive", in: realityKitContentBundle) {
content.add(immersiveContentEntity)
// Add manipulation components
setupManipulationComponents(in: immersiveContentEntity)
collisionBeganUnfiltered = content.subscribe(to: CollisionEvents.Began.self) { collisionEvent in
Task { @MainActor in
handleCollision(entityA: collisionEvent.entityA, entityB: collisionEvent.entityB)
}
}
}
}
}
private func setupManipulationComponents(in rootEntity: Entity) {
logger.info("\(#function) \(#line) ")
let sphereNames = ["Sphere_Left", "Sphere_Right"]
for name in sphereNames {
guard let sphere = rootEntity.findEntity(named: name) else {
logger.error("\(#function) \(#line) Failed to find \(name) entity")
assertionFailure("Failed to find \(name) entity")
continue
}
ManipulationComponent.configureEntity(sphere)
var manipulationComponent = ManipulationComponent()
manipulationComponent.releaseBehavior = .stay
sphere.components.set(manipulationComponent)
}
logger.info("\(#function) \(#line) Successfully set up manipulation components")
}
private func handleCollision(entityA: Entity, entityB: Entity) {
logger.info("\(#function) \(#line) Collision between \(entityA.name) and \(entityB.name)")
guard entityA !== entityB else { return }
if entityB.isAncestor(of: entityA) {
logger.debug("\(#function) \(#line) \(entityA.name) already under \(entityB.name); skipping reparent")
return
}
if entityA.isAncestor(of: entityB) {
logger.info("\(#function) \(#line) Skip reparent: \(entityA.name) is an ancestor of \(entityB.name)")
return
}
reparentEntities(child: entityA, parent: entityB)
entityA.components[ParticleEmitterComponent.self]?.burst()
}
private func reparentEntities(child: Entity, parent: Entity) {
let childBounds = child.visualBounds(relativeTo: nil)
let parentBounds = parent.visualBounds(relativeTo: nil)
let maxEntityWidth = max(childBounds.extents.x, parentBounds.extents.x)
let childPosition = child.position(relativeTo: nil)
let parentPosition = parent.position(relativeTo: nil)
let currentDistance = distance(childPosition, parentPosition)
child.setParent(parent, preservingWorldTransform: true)
logger.info("\(#function) \(#line) Set \(child.name) parent to \(parent.name)")
child.components.remove(ManipulationComponent.self)
logger.info("\(#function) \(#line) Removed ManipulationComponent from child \(child.name)")
if currentDistance > maxEntityWidth {
let direction = normalize(childPosition - parentPosition)
let newPosition = parentPosition + direction * maxEntityWidth
child.setPosition(newPosition - parentPosition, relativeTo: parent)
logger.info("\(#function) \(#line) Adjusted position: distance was \(currentDistance), now \(maxEntityWidth)")
}
}
}
fileprivate extension Entity {
func isAncestor(of other: Entity) -> Bool {
var current: Entity? = other.parent
while let node = current {
if node === self { return true }
current = node.parent
}
return false
}
}
#Preview(immersionStyle: .mixed) {
ImmersiveView()
.environment(AppModel())
}