VisionKit

Khmer Script Misidentified as Thai in Vision Framework

It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications. There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai. While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos). import SwiftUI import Vision class TextExtractor { func extractText(from data: Data, completion: @escaping (String) -> Void) { let request = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { completion("No text found.") return } let recognizedStrings = observations.compactMap { observation in let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } completion(recognizedStrings.joined(separator: "\n")) } request.automaticallyDetectsLanguage = true // <-- This is the issue. request.recognitionLevel = .accurate let handler = VNImageRequestHandler(data: data, options: [:]) DispatchQueue.global(qos: .background).async { do { try handler.perform([request]) } catch { completion("Failed to perform OCR: \(error.localizedDescription)") } } } } Recognizing Khmer Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score) Recognizing English Confidence Score is high expected. Recognizing Thai Confidence Score is high as expected Issues on Preview, Photos Khmer text Copied text Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย" สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม สงสอบานตรเราะากกต่ายภากายระตารุกเตียน Recommended Solutions 1. Set a Threshold Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue. For example, let recognizedStrings = observations.compactMap { observation in if observation.confidence <= 0.5 { return nil } let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } 2. Add Khmer Language Support This issue would never happen if the model has the capability to detect and recognize image with Khmer language. Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift

Machine Learning & AI General Vision VisionKit

0

6

3m

VisionKit - crash after photo taken in VNDocumentCameraViewController in iOS 26 when Liquid Glass enabled

I'm adopting Liquid Glass in iOS 26, when I try to test VNDocumentCameraViewController with document scanning after Liquid Glass enabled, there's a crash just after a photo is taken in VNDocumentCameraViewController, here's the screenshot when it crashed The exception output in XCode console is this: *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Layout requested for visible navigation bar, <UINavigationBar: 0x1240bde00; frame = (0 117; 390 54); opaque = NO; tintColor = UIExtendedSRGBColorSpace 1 1 0 1; layer = <CALayer: 0x120c21e60>> standardAppearance=0x12407b900 scrollEdgeAppearance=0x12407bb80 compactAppearance=0x12407b880 no-scroll-edge-support, when the top item belongs to a different navigation bar. topItem = <UINavigationItem: 0x1240bd800> style=navigator leftBarButtonItems=0x123d4e5f0 rightBarButtonItems=0x123d4d5a0, navigation bar = <UINavigationBar: 0x107b9ad00; frame = (0 47; 390 54); opaque = NO; autoresize = W; tintColor = UIExtendedSRGBColorSpace 1 1 0 1; layer = <CALayer: 0x120c20150>> delegate=0x10a805200 standardAppearance=0x107b2c300 scrollEdgeAppearance=0x107b2c280 compactAppearance=0x107b2c100, possibly from a client attempt to nest wrapped navigation controllers.' *** First throw call stack: (0x18e1db994 0x18b0f5814 0x18c092aa0 0x193b18660 0x193a7d540 0x193a7e020 0x1953ec4a0 0x1943b7d78 0x18ed83420 0x18ed82f74 0x18eb83134 0x18eb44c10 0x18eb70bc4 0x18eb7e74c 0x193ac8cd0 0x193ac8c04 0x193ad6afc 0x193ad5f8c 0x27b456560 0x18e12c4cc 0x18e15c0b0 0x18e15bfd8 0x18e133c1c 0x18e132a6c 0x22ed54498 0x193af6ba4 0x193a9fa78 0x193bcb68c 0x102cc2718 0x102cc2688 0x102cc2794 0x18b14ae28) libc++abi: terminating due to uncaught exception of type NSException

Media Technologies Photos & Camera VisionKit

0

227

6d

VisionKit adds unexpected extra space above “Retake / Keep” controls after scanning

We are facing a UI layout issue while using VisionKit (VNDocumentCameraViewController) for document scanning. After capturing a document, an unexpected extra blank space appears above the “Retake” and “Keep” action bar also in Bottom. This extra space is not part of our custom UI and seems to be introduced by VisionKit itself. This issue impacts the final scan preview layout and reduces usable screen space, especially noticeable on devices all the device iPhone or iPad

UI Frameworks UIKit VisionKit

0

22

1w

RecognizeDocumentsRequest for receipts

Hi, I'm trying to use the new RecognizeDocumentsRequest from the Vision Framework to read a receipt. It looks very promising by being able to read paragraphs, lines and detect data. So far it unfortunately seems to read every line on the receipt as a paragraph and when there is more space on one line it creates two paragraphs. Is there perhaps an Apple Engineer who knows if this is expected behaviour or if I should file a Feedback for this? Code setup: let request = RecognizeDocumentsRequest() let observations = try await request.perform(on: image) guard let document = observations.first?.document else { return } for paragraph in document.paragraphs { print(paragraph.transcript) for data in paragraph.detectedData { switch data.match.details { case .phoneNumber(let data): print("Phone: \(data)") case .postalAddress(let data): print("Postal: \(data)") case .calendarEvent(let data): print("Calendar: \(data)") case .moneyAmount(let data): print("Money: \(data)") case .measurement(let data): print("Measurement: \(data)") default: continue } } } See attached image as an example of a receipt I'd like to parse. The top 3 lines are the name, street, and postal code + city. These are all separate paragraphs. Checking on detectedData does see the street (2nd line) as PostalAddress, but not the complete address. Might that be a location thing since it's a Dutch address. And lower on the receipt it sees the block with "Pomp 1 95 Ongelood" and the things below also as separate paragraphs. First picking up the left side and after that the right side. So it's something like this: * Pomp 1 Volume Prijs € TOTAAL * BTW Netto 21.00 % 95 Ongelood 41,90 l 1.949/ 1 81.66 € 14.17 67.49

Machine Learning & AI General Vision VisionKit

3

1

475

Nov ’25

VNDocumentCameraViewController UI issues in iOS 26

We're observing several UI issues with VNDocumentCameraViewController on devices running iOS 26. These screens were functioning correctly in earlier iOS versions. Issue 1 - On the edge correction screen, the top bar now appears as a gray strip beneath the status bar, whereas in previous iOS versions, it was positioned at the bottom of the screen. Do we have any workarounds to address this issue? Issue2 - The edit buttons and their labels are not clearly visible, affecting usability. Im using XCode 16.4 to build to iOS26 and the usage is like below: `let scanner = VNDocumentCameraViewController() scanner.delegate = self self.present(scanner, animated: true)`

UI Frameworks UIKit iOS VisionKit

2

1

185

Nov ’25

VNDocumentCameraViewController localization issues in iOS 26

We're observing several localization issues with VNDocumentCameraViewController on devices running iOS 26. These localizations were correct in earlier iOS versions. Images indicate that some English labels appear when the device's language is changed to German. The issue can be reproduced by using the Note app.

UI Frameworks UIKit VisionKit

0

40

Nov ’25

IOS 26.1 isSourceTypeAvailable: UIImagePickerControlSourceTypeCamera method keeps returning true when the camera is unavailable

Prerequisite: After the MDM APP issues the command, the camera on the phone is no longer visible (unusable). After upgrading to iOS 26.1, the isSourceTypeAvailable: UIImagePickerControlSourceTypeCamera method keeps returning true when the camera is unavailable. The isSourceTypeAvailable: UIImagePickerControlSourceTypeCamera method on iOS 26.0.1 is normal, returning false when the camera is unavailable and true when it is available.

App & System Services Hardware Camera VisionKit

10

0

642

Nov ’25

Cancel button for VNDocumentCameraViewController is missing on iPadOS 26

The "Cancel" button for VNDocumentCameraViewController is not displayed on iPadOS 26. This issue appears to be specific to iPad, as the button appears correctly on iPhone.

UI Frameworks UIKit VisionKit

3

2

299

Oct ’25

Change VNDocumentCameraViewController 'done' button color on iOS 26

The design of VNDocumentCameraViewController has been updated with iOS 26. Now the 'done' button that appears when at least one page has been scanned is by default in blue (liquid glass). Changing the navBar or the barButtonItems tintColor does not work. So, how does one change the color of this button to match the app's color? Thank you very much!

UI Frameworks General VisionKit

1

0

159

Sep ’25

VisionKit – Tab Bar Button Titles Not Visible and Extra Back Button in iPad Landscape Mode

Description We observed multiple UI issues in VisionKit (VNDocumentCameraViewController) on iPad devices running iPadOS 26 (from Public Beta 4 onwards): Tab bar button titles are not properly visible due to color/contrast issues. An extra back button appears in the navigation bar when editing a captured image in landscape mode. These issues seem to be iPadOS 26 bugs, as Apple does not provide public APIs to customize or override VNDocumentCameraViewController. VisionKit relies on private ICDocCam* classes, which are not accessible for modification. Steps to Reproduce Open the app on an iPad running iPadOS 26 (Public Beta 4 or later). Switch the device to landscape mode. Launch document scanning using VNDocumentCameraViewController. Capture a document and tap Keep Scan. Go to the edit captured image screen. Observed Behavior: Tab bar button titles are not clearly visible (color/contrast issue). An extra back button is displayed in the navigation bar.

UI Frameworks General VisionKit

0

1

101

Sep ’25

How to obtain the physical memory size of VisionPro and how much memory is currently available

UI Frameworks SwiftUI Vision VisionKit

0

69

Aug ’25

Making DataScannerViewController work in the Simulator

Before you post —Camera doesn't work on the Simulator— that's no longer true. I've made a solution that makes the Simulator believe there's an actual hardware device connected, allowing users to stream the macOS camera to the iOS Simulator (see for more info RocketSim's documentation: https://docs.rocketsim.app/features/hzQMSrSga7BGWvxdNVdwYs/simulator-camera-support/58tQ5jvevLNSnyUEA7VgAv) Now, it works for VNDocumentCameraViewController, but when I try opening DataScannerViewController, I directly run into: Failed to start scanning: The operation couldn’t be completed. (VisionKit.DataScannerViewController.ScanningUnavailable error 0.) My question: How does this view controller determine whether scanning is available? Is there a certain capability the available AVCaptureDevice's need to support maybe? Any direction would be helpful for me to make this work for developers, making them build apps faster!

Media Technologies Photos & Camera Vision Camera VisionKit AVFoundation

0

1

321

Jul ’25

About VisionOS HUD

In Apple Vision Pro, I want to implement a HUD page similar to the one in Medivis' SuricalAR product (i.e. the UI is fixed on the screen field of view rather than in space). How should I do it?

Design General VisionKit visionOS

1

0

90

Jun ’25

Various On-Device Frameworks API & ChatGPT

Posting a follow up question after the WWDC 2025 Machine Learning AI & Frameworks Group Lab on June 12. In regards to the on-device API of any of the AI frameworks (foundation model, vision framework, ect.), is there a response condition or path where the API outsources it's input to ChatGPT if the user has allowed this like Siri does? Ignore this if it's a no: is this handled behind the scenes or by the developer?

Machine Learning & AI Apple Intelligence Machine Learning VisionKit Apple Intelligence

0

259

Jun ’25

videoCaptureQueue would make the app crashed when I using IOS 18.4.1

Hi All I have some problem when I using the IOS 18.4.1 I have iphone16 pro and ipad Air, both are updated to IOS 18.4.1 I tried to following sample code. However, when I run the app around 30 seconds to 1 minutes, the application would be crashed When I using another Ipad with IOS 17, it would not have the same problem. https://developer.apple.com/documentation/createml/creating-an-action-classifier-model https://developer.apple.com/documentation/createml/detecting_human_actions_in_a_live_video_feed#overview%29,

Media Technologies Video ARKit VisionKit AVFoundation

6

0

142

May ’25

Score range of ImageAestheticsScoresObservation in Vision framework

Hi everyone, I'm using the Vision framework’s ImageAestheticsScoresObservation class (https://developer.apple.com/documentation/vision/imageaestheticsscoresobservation). I noticed that the overallScore returned sometimes gives negative values. Could someone confirm whether the expected range of the score is from -1.0 to 1.0? The documentation doesn’t explicitly state the possible score range, so I’d appreciate any clarification or insights. Thanks in advance!

Graphics & Games General Vision VisionKit

0

89

Apr ’25

DataScannerViewController does't recognize currency less 1.00

Hi, DataScannerViewController does't recognize currencies less than 1.00 (e.g. 0.59 USD, 0.99 EUR, etc.). Why? How to solve the problem? This feature is not described in Apple documentation, is there a solution? This is my code: func makeUIViewController(context: Context) -> DataScannerViewController { let dataScanner = DataScannerViewController(recognizedDataTypes: [ .text(textContentType: .currency)]) return dataScanner }

Machine Learning & AI General Concurrency Live Text VisionKit

4

0

156

Apr ’25

Unable to Create a Fully Immersive Experience That Hides Other Windows in visionOS App

Description： I'm developing a travel/panorama viewing app for visionOS that allows users to view 360° panoramic images in an immersive space. When users enter panorama viewing mode, I want to provide a fully immersive experience where the main interface window and Earth 3D globe window are hidden. I've implemented the app following Apple's documentation on Creating Fully Immersive Experiences, but when users enter the immersive space, both the main window and the Earth 3D window remain visible, diminishing the immersive experience. Implementation Details： My app has three main components: A main content window showing panorama thumbnails A 3D globe window (volumetric) showing locations An immersive space for viewing 360° panoramas I'm using .immersionStyle(selection: $panoImageView, in: .full) to create a fully immersive experience, but other windows remain visible. Relevant Code： @main struct Travel_ImmersiveApp: App { @StateObject private var appModel = AppModel() @State private var panoImageView: ImmersionStyle = .full var body: some Scene { WindowGroup { ContentView() .environmentObject(appModel) } .windowStyle(.automatic) .defaultSize(width: 1280, height: 825) WindowGroup(id: "Earth") { Globe3DView() .environmentObject(appModel) .onAppear { appModel.isGlobeWindowOpen = true appModel.globeWindowOpen = true } .onDisappear { if !appModel.shouldCloseApp { appModel.handleGlobeWindowClose() } } } .windowStyle(.volumetric) .defaultSize(width: 0.8, height: 0.8, depth: 0.8, in: .meters) .windowResizability(.contentSize) ImmersiveSpace(id: "ImmersiveView") { ImmersiveView() .environmentObject(appModel) } .immersionStyle(selection: $panoImageView, in: .full) } } Opening the Immersive Space： func getPanoImageAndOpenImmersiveSpace() async { appModel.clearMemoryCache() do { let canView = appModel.canViewImage(image) if canView { let downloadedImage = try await appModel.getPanoramaImage(for: image) { progress in Task { @MainActor in cardState = .loading(progress: progress) } } await MainActor.run { appModel.updateCurrentImage(image, panoramaImage: downloadedImage) } if !appModel.immersiveSpaceOpened { try await openImmersiveSpace(id: "ImmersiveView") await MainActor.run { appModel.immersiveSpaceOpened = true cardState = .normal } } else { await MainActor.run { appModel.updateImmersiveView = true cardState = .normal } } } else { await MainActor.run { appModel.errorMessage = "You do not have permission to view this image." cardState = .normal } } } catch { // Error handling } } Immersive View Implementation： struct ImmersiveView: View { @EnvironmentObject var appModel: AppModel var body: some View { RealityView { content in let rootEntity = Entity() content.add(rootEntity) Task { if let selectedImage = appModel.selectedImage, appModel.canViewImage(selectedImage) { await loadPanorama(for: rootEntity) } } } update: { content in if appModel.updateImmersiveView, let selectedImage = appModel.selectedImage, appModel.canViewImage(selectedImage), let rootEntity = content.entities.first { Task { await loadPanorama(for: rootEntity) appModel.updateImmersiveView = false } } } .onAppear { print("ImmersiveView appeared") } .onDisappear { appModel.resetImmersiveState() } } // loadPanorama implementation... } What I've Tried Set immersionStyle to .full as recommended in the documentation Confirmed that the immersive space is properly opened and displaying panoramas Verified that the state management for the immersive space is working correctly Questions How can I ensure that when the user enters the immersive panorama viewing experience, all other windows (main interface and Earth 3D globe) are automatically hidden? Is there a specific API or approach I'm missing to properly implement a fully immersive experience that hides all other windows? Do I need to manually dismiss the windows when opening the immersive space, and if so, what's the best approach for doing this? Any guidance or sample code would be greatly appreciated. Thank you!

Spatial Computing General VisionKit visionOS

3

0

146

Apr ’25

VNDetectFaceLandmarksRequest & VNFaceLandmarkRegion2D changed in iOS 15

Did something change on face detection / Vision Framework on iOS 15? Using VNDetectFaceLandmarksRequest and reading the VNFaceLandmarkRegion2D to detect eyes is not working on iOS 15 as it did before. I am running the exact same code on an iOS 14 and iOS 15 device and the coordinates are different as seen on the screenshot? Any Ideas?

Machine Learning & AI General VisionKit Vision

6

1

3.3k

Feb ’25

Feature Request – Support for GS1 DataBar Stacked in Vision Framework

Dear Apple Developer Team, I am writing to request the addition of GS1 DataBar Stacked (both regular and expanded variants) to the barcode symbologies supported by the Vision framework (VNBarcodeSymbology) and VisionKit's DataScannerViewController. Currently, Vision supports several GS1 DataBar formats, such as: VNBarcodeSymbology.gs1DataBar VNBarcodeSymbology.gs1DataBarExpanded VNBarcodeSymbology.gs1DataBarLimited However, GS1 DataBar Stacked is widely used in industries such as retail, pharmaceuticals, and logistics, where space constraints prevent the use of the standard GS1 DataBar format. Many businesses rely on this symbology to encode GTINs and other product data, but Apple's barcode scanning API does not explicitly support it. Why This Feature Matters: Essential for Small Packaging: GS1 DataBar Stacked is commonly used on small product labels where a standard linear barcode does not fit. Widespread Industry Adoption: Many point-of-sale (POS) systems and inventory management tools require this symbology. Improves iOS Adoption for Enterprise Use: Adding support would make Apple’s Vision framework a more viable solution for businesses that currently rely on third-party barcode scanning SDKs. Feature Request: Please add GS1 DataBar Stacked and GS1 DataBar Expanded Stacked to the recognized symbologies in: VNBarcodeSymbology (for Vision framework) DataScannerViewController (for VisionKit) This addition would enhance the versatility of Apple’s barcode scanning tools and reduce the need for third-party libraries. I appreciate your consideration of this request and would be happy to provide more details or test implementations if needed. Thank you for your time and support! Best regards

Machine Learning & AI Core ML Vision VisionKit

2

5

594

Feb ’25

Post

Replies

Boosts

Views

Activity

VisionKit

Posts under VisionKit tag

Post

Replies

Boosts

Views

Activity