I’m exploring a visionOS interaction pattern where nearby Apple Vision Pro users have opted in ahead of time, such as when entering a shared venue, and a physical gesture like a handshake triggers a lightweight exchange of user-approved information between their devices.
I’m not asking about identifying strangers, accessing raw camera data, or tracking another person without consent. I’m trying to understand the most automatic Apple-supported architecture for this kind of privacy-preserving nearby interaction.
What is the recommended approach on visionOS?
Specifically:
- Can a visionOS app use ARKit hand tracking to detect the current user’s own gesture, then combine that with nearby peer discovery or ranging through Nearby Interaction, Network framework, Multipeer Connectivity, Bluetooth, Wi-Fi, or another supported API?
- Is there a supported way for two nearby Vision Pro devices to exchange a small user-approved payload after prior opt-in, without requiring users to manually start SharePlay or confirm every individual exchange?
- If SharePlay or Group Activities is the recommended path for shared spatial context, is there a supported alternative for venue-scale or multi-user interactions that should not be limited to a small active SharePlay group?
- What are the privacy and App Review boundaries for this pattern? Should developers assume the app cannot identify nearby people, observe another person’s body or hands, or trigger an exchange unless both users have explicit opt-in and clear awareness?
- If a mostly passive gesture-triggered exchange is not supported today, what is the closest Apple-recommended design pattern?
There's a lot to consider here and multiple ways to approach this. There's a few things that come to mind when I think about this:
- Object Anchor or Image Anchor to understand the relative position of each user to that anchor.
- Network to connect quickly over the local network.
- CloudKit or your own server to share any info needed to establish a local connection.
- The exchange could be connected by using the time of the handshake gesture and location from the anchor, or by using colliders that are synced across all the devices.
Also the following samples come to mind:
- Building a custom peer-to-peer protocol
- Connecting iPadOS and visionOS apps over the local network
- Tracking preregistered images in 3D space
- Exploring object tracking with ARKit
Hope this helps get you started,
Michael