Recommended visionOS architecture for opt-in nearby exchange triggered by a physical gesture

I’m exploring a visionOS interaction pattern where nearby Apple Vision Pro users have opted in ahead of time, such as when entering a shared venue, and a physical gesture like a handshake triggers a lightweight exchange of user-approved information between their devices.

I’m not asking about identifying strangers, accessing raw camera data, or tracking another person without consent. I’m trying to understand the most automatic Apple-supported architecture for this kind of privacy-preserving nearby interaction.

What is the recommended approach on visionOS?

Specifically:

  1. Can a visionOS app use ARKit hand tracking to detect the current user’s own gesture, then combine that with nearby peer discovery or ranging through Nearby Interaction, Network framework, Multipeer Connectivity, Bluetooth, Wi-Fi, or another supported API?
  2. Is there a supported way for two nearby Vision Pro devices to exchange a small user-approved payload after prior opt-in, without requiring users to manually start SharePlay or confirm every individual exchange?
  3. If SharePlay or Group Activities is the recommended path for shared spatial context, is there a supported alternative for venue-scale or multi-user interactions that should not be limited to a small active SharePlay group?
  4. What are the privacy and App Review boundaries for this pattern? Should developers assume the app cannot identify nearby people, observe another person’s body or hands, or trigger an exchange unless both users have explicit opt-in and clear awareness?
  5. If a mostly passive gesture-triggered exchange is not supported today, what is the closest Apple-recommended design pattern?
Answered by Vision Pro Engineer in 891177022

Hey @dotsonxone@gmail.clom

There's a lot to consider here and multiple ways to approach this. There's a few things that come to mind when I think about this:

  • Object Anchor or Image Anchor to understand the relative position of each user to that anchor.
  • Network to connect quickly over the local network.
  • CloudKit or your own server to share any info needed to establish a local connection.
  • The exchange could be connected by using the time of the handshake gesture and location from the anchor, or by using colliders that are synced across all the devices.

Also the following samples come to mind:

Hope this helps get you started,
Michael

I can't answer all of these but starting with the first two:

Can a visionOS app use ARKit hand tracking to detect the current user’s own gesture

You can implement your own gesture detection by observing changes to the HandAnchor's .handSkeleton. This only works for the user that is wearing the device though - one Vision Pro cannot detect that another user has performed the same gesture.

then combine that with nearby peer discovery or ranging through Nearby Interaction, Network framework, Multipeer Connectivity, Bluetooth, Wi-Fi, or another supported API?

Yes, users should be able to discover each other via Multipeer Connectivity and should be able to exchange information (e.g. which gestures, if any, have been performed).

Accepted Answer

Hey @dotsonxone@gmail.clom

There's a lot to consider here and multiple ways to approach this. There's a few things that come to mind when I think about this:

  • Object Anchor or Image Anchor to understand the relative position of each user to that anchor.
  • Network to connect quickly over the local network.
  • CloudKit or your own server to share any info needed to establish a local connection.
  • The exchange could be connected by using the time of the handshake gesture and location from the anchor, or by using colliders that are synced across all the devices.

Also the following samples come to mind:

Hope this helps get you started,
Michael

Recommended visionOS architecture for opt-in nearby exchange triggered by a physical gesture
 
 
Q