Dive into the technical aspects of audio on your device, including codecs, format support, and customization options.

Audio Documentation

Posts under Audio subtopic

Post

Replies

Boosts

Views

Activity

AVAudioSessionCategoryOptionAllowBluetooth incorrectly marked as deprecated in iOS 8 in iOS 26 beta 5
AVAudioSessionCategoryOptionAllowBluetooth is marked as deprecated in iOS 8 in iOS 26 beta 5 when this option was not deprecated in iOS 18.6. I think this is a mistake and the deprecation is in iOS 26. Am I right? It seems that the substitute for this option is "AVAudioSessionCategoryOptionAllowBluetoothHFP". The documentation does not make clear if the behaviour is exactly the same or if any difference should be expected... Has anyone used this option in iOS 26? Should I expect any difference with the current behaviour of "AVAudioSessionCategoryOptionAllowBluetooth"? Thank you.
2
0
328
Aug ’25
iOS 26 Beta Personal Voice bug affecting AVSpeechSynthesizer
I have sent in a feedback report (FB18222398) but I have no idea if anyone has looked at it. I know from past experiences that Apple devs do look at these forums. This applies to each of the betas, 1, 2 and 3. I have created a new Personal Voice with each beta. I create a personal voice in English. When it's done processing, I tap Preview and it says in English what is expected. But after some time, an hour or a day, the language of the voice file changes languages and no longer works properly. If I press Preview it is no longer intelligible. I have a text to speech app and initially the created voice works but then when the language of the file changes, it no longer works. I have run an app on my iphone through Xcode that prints to the console the voices installed on the device with the language. Currently this is the voice file: Voice Identifier: com.apple.speech.personalvoice.AAA9C6F2-9125-475F-BA2F-22C63274991D Language: es-MX and on a second device the same personal voice is in a different language: Voice Identifier: com.apple.speech.personalvoice.AAA9C6F2-9125-475F-BA2F-22C63274991D Language: zh-CN Although, a previous personal voice file that listed as Spanish-Mexican played in English with a Spanish accent or when playing Spanish text, it sounded almost perfect. This current personal voice doesn't do that, and is unintelligible. Previous attempts have converted to Chinese. I hope someone can look into this.
2
0
559
Dec ’25
AVAssetWriterInput Crash on appendSampleBuffer Converting PCM
Overview We are producing audio in real time from an editing application and are trying to put that on an HLS stream. We attempt to submit PCM samples through an audio writer but are getting a crash after a select number of samples have been appended. Depending on the number of audio frames in the PCM buffer, we might get more iterations before the crash but it always has the same traceback (see below). Code The setup is rather simple. We took inspiration from a few sources around the web. NSMutableDictionary *audio = [[NSMutableDictionary alloc] init]; [audio setObject:@(kAudioFormatMPEG4AAC) forKey:AVFormatIDKey]; [audio setObject:[NSNumber numberWithInt:config.audioSampleRate] // 48000 forKey:AVSampleRateKey]; [audio setObject:[NSNumber numberWithInt:config.audioChannels] // 2 forKey:AVNumberOfChannelsKey]; [audio setObject:@160000 forKey:AVEncoderBitRateKey]; m_audioConfig = [[NSDictionary alloc] initWithDictionary:audio]; m_audio = [[AVAssetWriterInput alloc] initWithMediaType:AVMediaTypeAudio outputSettings:m_audioConfig]; AVAudioFrameCount audioFrames = BUFFER_SAMPLES * bCount; AVAudioPCMBuffer *pcmBuffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:m_full.pcmFormat frameCapacity:audioFrames]; pcmBuffer.frameLength = pcmBuffer.frameCapacity; AudioChannelLayout layout; memset(&layout, 0, sizeof(layout)); layout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo; CMFormatDescriptionRef format; OSStatus stats = CMAudioFormatDescriptionCreate( kCFAllocatorDefault, pcmBuffer.format.streamDescription, sizeof(layout), &layout, 0, nil, nil, &format ); for (int i = 0; i < bCount; i++) { AudioPCM pcm; audioCallback->callback(pcm); memcpy(*(pcmBuffer.int16ChannelData) + (bufferSize * i), pcm.data, bufferSize); } size_t samplesConsumed = BUFFER_SAMPLES * bCount; CMSampleBufferRef sampleBuffer; CMSampleTimingInfo timing; timing.duration = CMTimeMake(1, config.audioSampleRate); timing.presentationTimeStamp = presentationTime; timing.decodeTimeStamp = kCMTimeInvalid; OSStatus ostatus = CMSampleBufferCreate( kCFAllocatorDefault, nil, false, nil, nil, format, (CMItemCount)pcmBuffer.frameLength, 1, &timing, 0, nil, &sampleBuffer ); //// ostatus = CMSampleBufferSetDataBufferFromAudioBufferList( sampleBuffer, kCFAllocatorDefault, kCFAllocatorDefault, kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment, pcmBuffer.audioBufferList ); if (ostatus != noErr) { NSLog(@"fill audio sample from buffer list failed: %s", logAudioError(ostatus)); return; } ostatus = CMSampleBufferSetDataReady(sampleBuffer); if (ostatus != noErr) { NSLog(@"set sample buffer ready failed: %s", logAudioError(ostatus)); return; } // Finally we can attach it, then shove the presentation time forward [m_audio appendSampleBuffer:sampleBuffer]; The Crash The crash points towards some level of deallocation when the conversion tooling is done or has enough samples to process an output packet? It's had to say. 0 caulk 0x1a1e9532c caulk::alloc::tiered_allocator<caulk::alloc::size_range_tier<0ul, 1008ul, caulk::alloc::tree_allocator<caulk::alloc::chunk_allocator<caulk::alloc::page_allocator, caulk::alloc::bitmap_allocator, caulk::alloc::embed_block_memory, 16384ul, 16ul, 6ul>>>, caulk::alloc::size_range_tier<1009ul, 256000ul, caulk::alloc::guarded_edges_allocator<caulk::alloc::consolidating_free_map<caulk::alloc::page_allocator, 10485760ul>, 4ul>>, caulk::alloc::tracking_allocator<caulk::alloc::page_allocator>>::deallocate(caulk::alloc::block, unsigned long) + 636 1 AudioToolboxCore 0x1993fbfe4 ExtendedAudioBufferList_Destroy + 112 2 AudioToolboxCore 0x1993d5fe0 std::__1::__optional_destruct_base<ACCodecOutputBuffer, false>::~__optional_destruct_base[abi:ne180100]() + 68 3 AudioToolboxCore 0x1993d5f48 acv2::CodecConverter::~CodecConverter() + 196 4 AudioToolboxCore 0x1993d5e5c acv2::CodecConverter::~CodecConverter() + 16 5 AudioToolboxCore 0x1992574d8 std::__1::vector<std::__1::unique_ptr<acv2::AudioConverterBase, std::__1::default_delete<acv2::AudioConverterBase>>, std::__1::allocator<std::__1::unique_ptr<acv2::AudioConverterBase, std::__1::default_delete<acv2::AudioConverterBase>>>>::__clear[abi:ne180100]() + 84 6 AudioToolboxCore 0x199259acc acv2::AudioConverterChain::RebuildConverterChain(acv2::ChainBuildSettings const&) + 116 7 AudioToolboxCore 0x1992596ec acv2::AudioConverterChain::SetProperty(unsigned int, unsigned int, void const*) + 1808 8 AudioToolboxCore 0x199324acc acv2::AudioConverterV2::setProperty(unsigned int, unsigned int, void const*) + 84 9 AudioToolboxCore 0x199327f08 with_resolved(OpaqueAudioConverter*, caulk::function_ref<int (AudioConverterAPI*)>) + 60 10 AudioToolboxCore 0x1993281e4 AudioConverterSetProperty + 72 11 MediaToolbox 0x1a7566c2c FigSampleBufferProcessorCreateWithAudioCompression + 2296 12 MediaToolbox 0x1a754db08 0x1a70b5000 + 4819720 13 MediaToolbox 0x1a754dab4 FigMediaProcessorCreateForAudioCompressionWithFormatWriter + 100 14 MediaToolbox 0x1a77ebb98 0x1a70b5000 + 7564184 15 MediaToolbox 0x1a7804158 0x1a70b5000 + 7663960 16 MediaToolbox 0x1a7801da0 0x1a70b5000 + 7654816 17 AVFCore 0x1ada530c4 -[AVFigAssetWriterTrack addSampleBuffer:error:] + 192 18 AVFCore 0x1ada55164 -[AVFigAssetWriterAudioTrack _flushPendingSampleBuffersReturningError:] + 500 19 AVFCore 0x1ada55354 -[AVFigAssetWriterAudioTrack addSampleBuffer:error:] + 472 20 AVFCore 0x1ada4ebf0 -[AVAssetWriterInputWritingHelper appendSampleBuffer:error:] + 128 21 AVFCore 0x1ada4c354 -[AVAssetWriterInput appendSampleBuffer:] + 168 22 lib_devapple_hls.dylib 0x115d2c7cc detail::AppleHLSImplementation::audioRuntime() + 1052 23 lib_devapple_hls.dylib 0x115d2d094 void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (detail::AppleHLSImplementation::*)(), detail::AppleHLSImplementation*>>(void*) + 72 24 libsystem_pthread.dylib 0x196e5b2e4 _pthread_start + 136 Any insight would be welcome!
2
0
319
Jun ’25
ApplicationMusicPlayer fails play in macCatalyst 26.3 due to RemotePlayerService crash
I've filed this as FB21446798 but figured I'd post here too. In the first build of macOS 26.3, playback via ApplicationMusicPlayer is completely broken. When starting playback of anything at all, the console shows the following error: applicationController: xpc service connection interrupted Failed to obtain remoteObject: Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service created from an endpoint was invalidated from this process." UserInfo={NSDebugDescription=The connection to service created from an endpoint was invalidated from this process.} Failed to prepareToPlay with error: Error Domain=MPMusicPlayerControllerErrorDomain Code=10 "(null)" UserInfo={NSUnderlyingError=0xc92910ff0 {Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service created from an endpoint was invalidated from this process." UserInfo={NSDebugDescription=The connection to service created from an endpoint was invalidated from this process.}}} In addition, several crash logs for RemotePlayerService are generated, showing my app as the parent process. This issue is 100% repeatable. No matter how I load the queue, whether it’s catalog or library content, any variation I can think of all fails like this. I really hope this can be fixed before 26.3 comes out, otherwise my app will be totally unusable. 😅
2
0
693
Jan ’26
UIDocumentPickerViewController in Audiounit Extension unable to receive touches
Hello, I have an existing AUv3 instrument plugin. In the plug in, users can access files (audio files, song projects) via a UIDocumentPickerViewController In Logic Pro, (and some other hosts, but not all), the document picker is unable to receive touches, while a keyboard case is attached to the iPad. Removing the case (this is an Apple brand iPad case) allows the interactions to resume and allows me to pick files in the usual way. One of my users reports this non-responsive behavior occurs even after disconnecting their keyboard. I have fiddled with entitlements all day, and have determined that is not the issue, since the keyboard disconnection appears to fix it every time for me. Here is my, very boilerplate, presentation code : guard let type = UTType("com.my.type") else { return } let fileBrowser = UIDocumentPickerViewController(forOpeningContentTypes: [type]) fileBrowser.overrideUserInterfaceStyle = .dark fileBrowser.delegate = self fileBrowser.directoryURL = myFileFolderURL() self.present(fileBrowser, animated: true) {
2
0
569
Jul ’25
Video Audio + Speech To Text
Hello, I am wondering if it is possible to have audio from my AirPods be sent to my speech to text service and at the same time have the built in mic audio input be sent to recording a video? I ask because I want my users to be able to say "CAPTURE" and I start recording a video (with audio from the built in mic) and then when the user says "STOP" I stop the recording.
2
0
834
1w
Sound not working on testflight / Appstore
I have a flutter iOS app that has some simple sound FX for button clicks, swipes, etc. In simulator and on real device the sound works fine, but when i upload the app to testflight (and App store) the sound FX don't play. When I upload the app to my phone via xcode I am using the release profile so I don't see what the difference could be. I have also gone through the archive that i uploaded and verified that the sound files are indeed there. I have other flutter apps that use sound but non since the iOS 26 update. I've tried 3 different flutter sound libraries and all face the same issue. Wondering if anyone else is seeing this issue or if I'm missing a simple permission or something that has changed recently? Thanks in advanced
2
0
238
Dec ’25
How to disable/hide Audio Controls on lock screen from WkWebView
Hi, I am trying to remove the audio controls for my app on the lock screen. Since I use WKWebView, there are 3 audio tags in my html and I play and pause em via JS. However, if I do not play any sound since app launch, there are no audio controls on the lock screen. But if I play one of those 3 files (they are even less then 3 Sec sound effects e.g. for buttons) the audio controls appears on lock screen. Note even when the sounds on pause() or not playing they were listed on the lock screen. What I have tried so far without success MPNowPlayingInfoCenter.default().nowPlayingInfo = [:] and ``try audioSession.setCategory(.playback, mode: .default, options: []) try audioSession.setActive(false, options: .notifyOthersOnDeactivation)`` and UIApplication.shared.endReceivingRemoteControlEvents() Another problem is that the app scales with iOS system settings "display zoom". Is there a way to deny it? It is latest Xcode verion 16.3 and iOS 18. I have no background mode in my Capabilities. Nothing worked so far. Has anyone an idea? Greetings
2
0
136
May ’25
iOS 26 HLS Audio Track Display Behavior: EXT-X-MEDIA NAME vs LANGUAGE Attributes
Hello Apple Developer Community, I am seeking clarification on the intended display behavior of HLS audio tracks within the iOS 26 (or current beta) native player, specifically concerning the NAME and LANGUAGE attributes of the EXT-X-MEDIA tag. In our HLS manifests, we define alternative audio tracks using EXT-X-MEDIA tags, like so: #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="ja",NAME="AUDIO-1",DEFAULT=YES,AUTOSELECT=YES,URI="audio_ja.m3u8" #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="ja",NAME="AUDIO-2",URI="audio_en.m3u8" Our observation is that when an audio track is selected and its name is displayed in the native iOS media controls (e.g., Control Center or within a full-screen video player's UI), the value specified in the NAME attribute ("AUDIO-1", "AUDIO-2") does not seem to be used. Instead, the display appears to derive from the LANGUAGE attribute ("ja", "en"), often showing the system's localized string for that language (e.g., "Japanese", "English"). We would like to understand the official or intended behavior regarding this. Is it the expected behavior for the iOS native player to prioritize the LANGUAGE attribute (or its localized equivalent) over the NAME attribute for displaying the selected audio track's label? If this is the intended design, what is the recommended best practice for developers who wish to present a custom, human-readable name for audio tracks (beyond the standard language name) in the native iOS UI? Are there any specific AVPlayer properties or AVMediaSelectionOption considerations that would allow more granular control over this display, or is this entirely managed by the system based on the LANGUAGE attribute? Any insights or official guidance on this behavior in iOS 26 (and potentially previous versions) would be greatly appreciated. Thank you for your time and assistance.
2
0
429
Aug ’25
Audio / Video sync issue on iOS using AVSampleBufferRenderSynchronizer
My current app implements a custom video player, based on a AVSampleBufferRenderSynchronizer synchronising two renderers: an AVSampleBufferDisplayLayer receiving decoded CVPixelBuffer-based video CMSampleBuffers, and an AVSampleBufferAudioRenderer receiving decoded lpcm-based audio CMSampleBuffers. The AVSampleBufferRenderSynchronizer is started when the first image (in presentation order) is decoded and enqueued, using avSynchronizer.setRate(_ rate: Float, time: CMTime), with rate = 1 and time the presentation timestamp of the first decoded image. Presentation timestamps of video and audio sample buffers are consistent, and on most streams, the audio and video are correctly synchronized. However on some network streams, on iOS, the audio and video aren't synchronized, with a time difference that seems to increase with time. On the other hand, with the same player code and network streams on macOS, the synchronization always works fine. This reminds me of something I've read, about cases where an AVSampleBufferRenderSynchronizer could not synchronize audio and video, causing them to run with independent and potentially drifting clocks, but I cannot find it again. So, any help / hints on this sync problem will be greatly appreciated! :)
2
0
1.3k
Apr ’25
AVAudioEngine : Split 1x4 channel bus into 4x1 channel busses?
I'm using a 4 channel USB Audio interface, with 4 microphones, and want to process them through 4 independent effect chains. However the output from AVAudioInputNode is a single 4 channel bus. How can I split this into 4 mono busses? The following code splits the input into 4 copies, and routes them through the effects, but each bus contains all four channels. How can I remap the channels to remove the unwanted channels from the bus? I tried using channelMap on the mixer node but that had no effect. I'm currently using this code primarily on iOS but it should be portable between iOS and MacOS. It would be possible to do this through a Matrix Mixer Node, but that seems completely overkill, for such a basic operation. I'm already using a Matrix Mixer to combine the inputs, and it's not well supported in AVAudioEngine. AVAudioInputNode *inputNode=[engine inputNode]; [inputNode setVoiceProcessingEnabled:NO error:nil]; NSMutableArray *micDestinations=[NSMutableArray arrayWithCapacity:trackCount]; for(i=0;i<trackCount;i++) { fixMicFormat[i]=[AVAudioMixerNode new]; [engine attachNode:fixMicFormat[i]]; // And create reverb/compressor and eq the same way... [engine connect:reverb[i] to:matrixMixerNode fromBus:0 toBus:i format:nil]; [engine connect:eq[i] to:reverb[i] fromBus:0 toBus:0 format:nil]; [engine connect:compressor[i] to:eq[i] fromBus:0 toBus:0 format:nil]; [engine connect:fixMicFormat[i] to:compressor[i] fromBus:0 toBus:0 format:nil]; [micDestinations addObject:[[AVAudioConnectionPoint alloc] initWithNode:fixMicFormat[i] bus:0] ]; } AVAudioFormat *inputFormat = [inputNode outputFormatForBus: 1]; [engine connect:inputNode toConnectionPoints:micDestinations fromBus:1 format:inputFormat];
2
0
314
Oct ’25
Live Translations on VOIP on iOS26
Hi team, With regards to Call (Live) Translations on VOIP: Is it possible to invoke live translations within the app? (without going into the Call System UI) Is it possible to navigate users from app to Call System UI via an API? (and also invoking the new live translations directly) Will Apple support more languages apart from the current ones? (Currently I see 4 supported languages)
1
0
168
Aug ’25
SpeechTranscriber extremely slow (14+ seconds) despite proper locale allocation and optimization
Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications. Environment iOS: 26.0 Beta Xcode: Beta 5 Device: iPhone 16 pro Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025 Configuration Tested Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES Setup: All optimizations applied (preheating, high priority, model retention) I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results. I added some logs to check the specific time: 🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio! 🎙️ [20:30:41.532] Listening for transcription results... 🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false) Questions Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster? Are there additional optimization steps for SpeechTranscriber? Should we expect significant performance improvements in later betas?
1
0
230
Aug ’25
Audio clipping - macOS Tahoe 26 - Beta 5
I was testing audio playback from YouTube in Safari, and the sound was clipping heavily. At first, I thought it might be due to the poor quality of my small sound system. However, when I took a screenshot and the screenshot sound effect itself produced a loud clipping noise, it became clear that this is not a mechanical problem with my speakers, nor an issue specific to YouTube or Safari. This appears to be a system-wide audio issue in macOS Tahoe 26 - Beta 5.
1
2
314
Aug ’25
Can't set AVAudio sampleRate and installTap needs bufferSize 4800 at minimum
Two issues: No matter what I set in try audioSession.setPreferredSampleRate(x) the sample rate on both iOS and macOS is always 48000 when the output goes through the speaker, and 24000 when my Airpods connect to an iPhone/iPad. Now, I'm checking the current output loudness to animate a 3D character, using mixerNode.installTap(onBus: 0, bufferSize: y, format: nil) { [weak self] buffer, time in Task { @MainActor in // calculate rms and animate character accordingly but any buffer size under 4800 is just ignored and the buffers I get are 4800 sized. This is ok, when the sampleRate is currently 48000, as 10 samples per second lead to decent visual results. But when AirPods connect, the samplerate is 24000, which means only 5 samples per second, so the character animation looks lame. My AVAudioEngine setup is the following: audioEngine.connect(playerNode, to: pitchShiftEffect, format: format) audioEngine.connect(pitchShiftEffect, to: mixerNode, format: format) audioEngine.connect(mixerNode, to: audioEngine.outputNode, format: nil) Now, I'd be fine if the outputNode runs at whatever if it needs, as long as my tap would get at least 10 samples per second. PS: Specifying my favorite format in the let format = AVAudioFormat(standardFormatWithSampleRate: 48_000, channels: 2)! mixerNode.installTap(onBus: 0, bufferSize: y, format: format) doesn't change anything either
1
0
454
Aug ’25
MusicKit: Multichannel Dolby Atmos Limited to Stereo Output - Is This Intended Behavior?
I'm experiencing a significant limitation with MusicKit's Dolby Atmos implementation on macOS and would appreciate clarification on whether this is intended behavior or if there are solutions available. When streaming Dolby Atmos content through MusicKit's ApplicationMusicPlayer, the output is limited to 2-channel stereo, even when: Audio MIDI Setup is configured for 7.1.4 (12-channel) output The same tracks play in full multichannel through the native Apple Music app Dolby Atmos is set to "Automatic" in Apple Music preferences Please let me know if there is anyway to enable this. If not, is this documented anywhere? Thanks!
1
2
305
Aug ’25
In Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different audio files. Nothing in the documentation says that the speechRecognitionMetadata property on an SFSpeechRecognitionResult will be nil until isFinal, but that's the behaviour I'm seeing. I've stripped my class down to the following: private var isAuthed = false // I call this in a .task {} in my SwiftUI View public func requestSpeechRecognizerPermission() { SFSpeechRecognizer.requestAuthorization { authStatus in Task { self.isAuthed = authStatus == .authorized } } } public func transcribe(from url: URL) { guard isAuthed else { return } let locale = Locale(identifier: "en-US") let recognizer = SFSpeechRecognizer(locale: locale) let recognitionRequest = SFSpeechURLRecognitionRequest(url: url) // the behaviour occurs whether I set this to true or not, I recently set // it to true to see if it made a difference recognizer?.supportsOnDeviceRecognition = true recognitionRequest.shouldReportPartialResults = true recognitionRequest.addsPunctuation = true recognizer?.recognitionTask(with: recognitionRequest) { (result, error) in guard result != nil else { return } if result!.isFinal { //speechRecognitionMetadata is not nil } else { //speechRecognitionMetadata is nil } } } } Further, and this isn't documented either, the SFTranscriptionSegment values don't have correct timestamp and duration values until isFinal. The values aren't all zero, but they don't align with the timing in the audio and they change to accurate values when isFinal is true. The transcription otherwise "works", in that I get transcription text before isFinal and if I wait for isFinal the segments are correct and speechRecognitionMetadata is filled with values. The context here is I'm trying to generate a transcription that I can then highlight the spoken sections of as audio plays and I'm thinking I must be just trying to use the Speech framework in a way it does not work. I got my concept working if I pre-process the audio (i.e. run it through until isFinal and save the results I need to json), but being able to do even a rougher version of it 'on the fly' - which requires segments to have the right timestamp/duration before isFinal - is perhaps impossible?
1
0
171
Jul ’25
AVAudioSessionCategoryOptionAllowBluetooth incorrectly marked as deprecated in iOS 8 in iOS 26 beta 5
AVAudioSessionCategoryOptionAllowBluetooth is marked as deprecated in iOS 8 in iOS 26 beta 5 when this option was not deprecated in iOS 18.6. I think this is a mistake and the deprecation is in iOS 26. Am I right? It seems that the substitute for this option is "AVAudioSessionCategoryOptionAllowBluetoothHFP". The documentation does not make clear if the behaviour is exactly the same or if any difference should be expected... Has anyone used this option in iOS 26? Should I expect any difference with the current behaviour of "AVAudioSessionCategoryOptionAllowBluetooth"? Thank you.
Replies
2
Boosts
0
Views
328
Activity
Aug ’25
iOS 26 Beta Personal Voice bug affecting AVSpeechSynthesizer
I have sent in a feedback report (FB18222398) but I have no idea if anyone has looked at it. I know from past experiences that Apple devs do look at these forums. This applies to each of the betas, 1, 2 and 3. I have created a new Personal Voice with each beta. I create a personal voice in English. When it's done processing, I tap Preview and it says in English what is expected. But after some time, an hour or a day, the language of the voice file changes languages and no longer works properly. If I press Preview it is no longer intelligible. I have a text to speech app and initially the created voice works but then when the language of the file changes, it no longer works. I have run an app on my iphone through Xcode that prints to the console the voices installed on the device with the language. Currently this is the voice file: Voice Identifier: com.apple.speech.personalvoice.AAA9C6F2-9125-475F-BA2F-22C63274991D Language: es-MX and on a second device the same personal voice is in a different language: Voice Identifier: com.apple.speech.personalvoice.AAA9C6F2-9125-475F-BA2F-22C63274991D Language: zh-CN Although, a previous personal voice file that listed as Spanish-Mexican played in English with a Spanish accent or when playing Spanish text, it sounded almost perfect. This current personal voice doesn't do that, and is unintelligible. Previous attempts have converted to Chinese. I hope someone can look into this.
Replies
2
Boosts
0
Views
559
Activity
Dec ’25
AVAssetWriterInput Crash on appendSampleBuffer Converting PCM
Overview We are producing audio in real time from an editing application and are trying to put that on an HLS stream. We attempt to submit PCM samples through an audio writer but are getting a crash after a select number of samples have been appended. Depending on the number of audio frames in the PCM buffer, we might get more iterations before the crash but it always has the same traceback (see below). Code The setup is rather simple. We took inspiration from a few sources around the web. NSMutableDictionary *audio = [[NSMutableDictionary alloc] init]; [audio setObject:@(kAudioFormatMPEG4AAC) forKey:AVFormatIDKey]; [audio setObject:[NSNumber numberWithInt:config.audioSampleRate] // 48000 forKey:AVSampleRateKey]; [audio setObject:[NSNumber numberWithInt:config.audioChannels] // 2 forKey:AVNumberOfChannelsKey]; [audio setObject:@160000 forKey:AVEncoderBitRateKey]; m_audioConfig = [[NSDictionary alloc] initWithDictionary:audio]; m_audio = [[AVAssetWriterInput alloc] initWithMediaType:AVMediaTypeAudio outputSettings:m_audioConfig]; AVAudioFrameCount audioFrames = BUFFER_SAMPLES * bCount; AVAudioPCMBuffer *pcmBuffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:m_full.pcmFormat frameCapacity:audioFrames]; pcmBuffer.frameLength = pcmBuffer.frameCapacity; AudioChannelLayout layout; memset(&layout, 0, sizeof(layout)); layout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo; CMFormatDescriptionRef format; OSStatus stats = CMAudioFormatDescriptionCreate( kCFAllocatorDefault, pcmBuffer.format.streamDescription, sizeof(layout), &layout, 0, nil, nil, &format ); for (int i = 0; i < bCount; i++) { AudioPCM pcm; audioCallback->callback(pcm); memcpy(*(pcmBuffer.int16ChannelData) + (bufferSize * i), pcm.data, bufferSize); } size_t samplesConsumed = BUFFER_SAMPLES * bCount; CMSampleBufferRef sampleBuffer; CMSampleTimingInfo timing; timing.duration = CMTimeMake(1, config.audioSampleRate); timing.presentationTimeStamp = presentationTime; timing.decodeTimeStamp = kCMTimeInvalid; OSStatus ostatus = CMSampleBufferCreate( kCFAllocatorDefault, nil, false, nil, nil, format, (CMItemCount)pcmBuffer.frameLength, 1, &timing, 0, nil, &sampleBuffer ); //// ostatus = CMSampleBufferSetDataBufferFromAudioBufferList( sampleBuffer, kCFAllocatorDefault, kCFAllocatorDefault, kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment, pcmBuffer.audioBufferList ); if (ostatus != noErr) { NSLog(@"fill audio sample from buffer list failed: %s", logAudioError(ostatus)); return; } ostatus = CMSampleBufferSetDataReady(sampleBuffer); if (ostatus != noErr) { NSLog(@"set sample buffer ready failed: %s", logAudioError(ostatus)); return; } // Finally we can attach it, then shove the presentation time forward [m_audio appendSampleBuffer:sampleBuffer]; The Crash The crash points towards some level of deallocation when the conversion tooling is done or has enough samples to process an output packet? It's had to say. 0 caulk 0x1a1e9532c caulk::alloc::tiered_allocator<caulk::alloc::size_range_tier<0ul, 1008ul, caulk::alloc::tree_allocator<caulk::alloc::chunk_allocator<caulk::alloc::page_allocator, caulk::alloc::bitmap_allocator, caulk::alloc::embed_block_memory, 16384ul, 16ul, 6ul>>>, caulk::alloc::size_range_tier<1009ul, 256000ul, caulk::alloc::guarded_edges_allocator<caulk::alloc::consolidating_free_map<caulk::alloc::page_allocator, 10485760ul>, 4ul>>, caulk::alloc::tracking_allocator<caulk::alloc::page_allocator>>::deallocate(caulk::alloc::block, unsigned long) + 636 1 AudioToolboxCore 0x1993fbfe4 ExtendedAudioBufferList_Destroy + 112 2 AudioToolboxCore 0x1993d5fe0 std::__1::__optional_destruct_base<ACCodecOutputBuffer, false>::~__optional_destruct_base[abi:ne180100]() + 68 3 AudioToolboxCore 0x1993d5f48 acv2::CodecConverter::~CodecConverter() + 196 4 AudioToolboxCore 0x1993d5e5c acv2::CodecConverter::~CodecConverter() + 16 5 AudioToolboxCore 0x1992574d8 std::__1::vector<std::__1::unique_ptr<acv2::AudioConverterBase, std::__1::default_delete<acv2::AudioConverterBase>>, std::__1::allocator<std::__1::unique_ptr<acv2::AudioConverterBase, std::__1::default_delete<acv2::AudioConverterBase>>>>::__clear[abi:ne180100]() + 84 6 AudioToolboxCore 0x199259acc acv2::AudioConverterChain::RebuildConverterChain(acv2::ChainBuildSettings const&) + 116 7 AudioToolboxCore 0x1992596ec acv2::AudioConverterChain::SetProperty(unsigned int, unsigned int, void const*) + 1808 8 AudioToolboxCore 0x199324acc acv2::AudioConverterV2::setProperty(unsigned int, unsigned int, void const*) + 84 9 AudioToolboxCore 0x199327f08 with_resolved(OpaqueAudioConverter*, caulk::function_ref<int (AudioConverterAPI*)>) + 60 10 AudioToolboxCore 0x1993281e4 AudioConverterSetProperty + 72 11 MediaToolbox 0x1a7566c2c FigSampleBufferProcessorCreateWithAudioCompression + 2296 12 MediaToolbox 0x1a754db08 0x1a70b5000 + 4819720 13 MediaToolbox 0x1a754dab4 FigMediaProcessorCreateForAudioCompressionWithFormatWriter + 100 14 MediaToolbox 0x1a77ebb98 0x1a70b5000 + 7564184 15 MediaToolbox 0x1a7804158 0x1a70b5000 + 7663960 16 MediaToolbox 0x1a7801da0 0x1a70b5000 + 7654816 17 AVFCore 0x1ada530c4 -[AVFigAssetWriterTrack addSampleBuffer:error:] + 192 18 AVFCore 0x1ada55164 -[AVFigAssetWriterAudioTrack _flushPendingSampleBuffersReturningError:] + 500 19 AVFCore 0x1ada55354 -[AVFigAssetWriterAudioTrack addSampleBuffer:error:] + 472 20 AVFCore 0x1ada4ebf0 -[AVAssetWriterInputWritingHelper appendSampleBuffer:error:] + 128 21 AVFCore 0x1ada4c354 -[AVAssetWriterInput appendSampleBuffer:] + 168 22 lib_devapple_hls.dylib 0x115d2c7cc detail::AppleHLSImplementation::audioRuntime() + 1052 23 lib_devapple_hls.dylib 0x115d2d094 void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (detail::AppleHLSImplementation::*)(), detail::AppleHLSImplementation*>>(void*) + 72 24 libsystem_pthread.dylib 0x196e5b2e4 _pthread_start + 136 Any insight would be welcome!
Replies
2
Boosts
0
Views
319
Activity
Jun ’25
ApplicationMusicPlayer fails play in macCatalyst 26.3 due to RemotePlayerService crash
I've filed this as FB21446798 but figured I'd post here too. In the first build of macOS 26.3, playback via ApplicationMusicPlayer is completely broken. When starting playback of anything at all, the console shows the following error: applicationController: xpc service connection interrupted Failed to obtain remoteObject: Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service created from an endpoint was invalidated from this process." UserInfo={NSDebugDescription=The connection to service created from an endpoint was invalidated from this process.} Failed to prepareToPlay with error: Error Domain=MPMusicPlayerControllerErrorDomain Code=10 "(null)" UserInfo={NSUnderlyingError=0xc92910ff0 {Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service created from an endpoint was invalidated from this process." UserInfo={NSDebugDescription=The connection to service created from an endpoint was invalidated from this process.}}} In addition, several crash logs for RemotePlayerService are generated, showing my app as the parent process. This issue is 100% repeatable. No matter how I load the queue, whether it’s catalog or library content, any variation I can think of all fails like this. I really hope this can be fixed before 26.3 comes out, otherwise my app will be totally unusable. 😅
Replies
2
Boosts
0
Views
693
Activity
Jan ’26
Usage of Apple Music Feed leads to error 500
Hello, I'm trying to receive parquet files using the example that provided in documentation. I've done all required steps but receive constantly error 500 with "Upstream Service Error". By looking into the issues list, seems this error exists for months. Is it possible to get it working?
Replies
2
Boosts
0
Views
159
Activity
May ’25
UIDocumentPickerViewController in Audiounit Extension unable to receive touches
Hello, I have an existing AUv3 instrument plugin. In the plug in, users can access files (audio files, song projects) via a UIDocumentPickerViewController In Logic Pro, (and some other hosts, but not all), the document picker is unable to receive touches, while a keyboard case is attached to the iPad. Removing the case (this is an Apple brand iPad case) allows the interactions to resume and allows me to pick files in the usual way. One of my users reports this non-responsive behavior occurs even after disconnecting their keyboard. I have fiddled with entitlements all day, and have determined that is not the issue, since the keyboard disconnection appears to fix it every time for me. Here is my, very boilerplate, presentation code : guard let type = UTType("com.my.type") else { return } let fileBrowser = UIDocumentPickerViewController(forOpeningContentTypes: [type]) fileBrowser.overrideUserInterfaceStyle = .dark fileBrowser.delegate = self fileBrowser.directoryURL = myFileFolderURL() self.present(fileBrowser, animated: true) {
Replies
2
Boosts
0
Views
569
Activity
Jul ’25
Video Audio + Speech To Text
Hello, I am wondering if it is possible to have audio from my AirPods be sent to my speech to text service and at the same time have the built in mic audio input be sent to recording a video? I ask because I want my users to be able to say "CAPTURE" and I start recording a video (with audio from the built in mic) and then when the user says "STOP" I stop the recording.
Replies
2
Boosts
0
Views
834
Activity
1w
Sound not working on testflight / Appstore
I have a flutter iOS app that has some simple sound FX for button clicks, swipes, etc. In simulator and on real device the sound works fine, but when i upload the app to testflight (and App store) the sound FX don't play. When I upload the app to my phone via xcode I am using the release profile so I don't see what the difference could be. I have also gone through the archive that i uploaded and verified that the sound files are indeed there. I have other flutter apps that use sound but non since the iOS 26 update. I've tried 3 different flutter sound libraries and all face the same issue. Wondering if anyone else is seeing this issue or if I'm missing a simple permission or something that has changed recently? Thanks in advanced
Replies
2
Boosts
0
Views
238
Activity
Dec ’25
How to disable/hide Audio Controls on lock screen from WkWebView
Hi, I am trying to remove the audio controls for my app on the lock screen. Since I use WKWebView, there are 3 audio tags in my html and I play and pause em via JS. However, if I do not play any sound since app launch, there are no audio controls on the lock screen. But if I play one of those 3 files (they are even less then 3 Sec sound effects e.g. for buttons) the audio controls appears on lock screen. Note even when the sounds on pause() or not playing they were listed on the lock screen. What I have tried so far without success MPNowPlayingInfoCenter.default().nowPlayingInfo = [:] and ``try audioSession.setCategory(.playback, mode: .default, options: []) try audioSession.setActive(false, options: .notifyOthersOnDeactivation)`` and UIApplication.shared.endReceivingRemoteControlEvents() Another problem is that the app scales with iOS system settings "display zoom". Is there a way to deny it? It is latest Xcode verion 16.3 and iOS 18. I have no background mode in my Capabilities. Nothing worked so far. Has anyone an idea? Greetings
Replies
2
Boosts
0
Views
136
Activity
May ’25
SIGABORT with ExtAudioFileWrite and .m4a file
Hi, I am getting into a trap. Please check stack-trace, howto fix this? regards, Joël stack-trace with ExtAudioFileWrite
Replies
2
Boosts
0
Views
928
Activity
Jun ’25
iOS 26 HLS Audio Track Display Behavior: EXT-X-MEDIA NAME vs LANGUAGE Attributes
Hello Apple Developer Community, I am seeking clarification on the intended display behavior of HLS audio tracks within the iOS 26 (or current beta) native player, specifically concerning the NAME and LANGUAGE attributes of the EXT-X-MEDIA tag. In our HLS manifests, we define alternative audio tracks using EXT-X-MEDIA tags, like so: #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="ja",NAME="AUDIO-1",DEFAULT=YES,AUTOSELECT=YES,URI="audio_ja.m3u8" #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="ja",NAME="AUDIO-2",URI="audio_en.m3u8" Our observation is that when an audio track is selected and its name is displayed in the native iOS media controls (e.g., Control Center or within a full-screen video player's UI), the value specified in the NAME attribute ("AUDIO-1", "AUDIO-2") does not seem to be used. Instead, the display appears to derive from the LANGUAGE attribute ("ja", "en"), often showing the system's localized string for that language (e.g., "Japanese", "English"). We would like to understand the official or intended behavior regarding this. Is it the expected behavior for the iOS native player to prioritize the LANGUAGE attribute (or its localized equivalent) over the NAME attribute for displaying the selected audio track's label? If this is the intended design, what is the recommended best practice for developers who wish to present a custom, human-readable name for audio tracks (beyond the standard language name) in the native iOS UI? Are there any specific AVPlayer properties or AVMediaSelectionOption considerations that would allow more granular control over this display, or is this entirely managed by the system based on the LANGUAGE attribute? Any insights or official guidance on this behavior in iOS 26 (and potentially previous versions) would be greatly appreciated. Thank you for your time and assistance.
Replies
2
Boosts
0
Views
429
Activity
Aug ’25
Audio / Video sync issue on iOS using AVSampleBufferRenderSynchronizer
My current app implements a custom video player, based on a AVSampleBufferRenderSynchronizer synchronising two renderers: an AVSampleBufferDisplayLayer receiving decoded CVPixelBuffer-based video CMSampleBuffers, and an AVSampleBufferAudioRenderer receiving decoded lpcm-based audio CMSampleBuffers. The AVSampleBufferRenderSynchronizer is started when the first image (in presentation order) is decoded and enqueued, using avSynchronizer.setRate(_ rate: Float, time: CMTime), with rate = 1 and time the presentation timestamp of the first decoded image. Presentation timestamps of video and audio sample buffers are consistent, and on most streams, the audio and video are correctly synchronized. However on some network streams, on iOS, the audio and video aren't synchronized, with a time difference that seems to increase with time. On the other hand, with the same player code and network streams on macOS, the synchronization always works fine. This reminds me of something I've read, about cases where an AVSampleBufferRenderSynchronizer could not synchronize audio and video, causing them to run with independent and potentially drifting clocks, but I cannot find it again. So, any help / hints on this sync problem will be greatly appreciated! :)
Replies
2
Boosts
0
Views
1.3k
Activity
Apr ’25
AVAudioEngine : Split 1x4 channel bus into 4x1 channel busses?
I'm using a 4 channel USB Audio interface, with 4 microphones, and want to process them through 4 independent effect chains. However the output from AVAudioInputNode is a single 4 channel bus. How can I split this into 4 mono busses? The following code splits the input into 4 copies, and routes them through the effects, but each bus contains all four channels. How can I remap the channels to remove the unwanted channels from the bus? I tried using channelMap on the mixer node but that had no effect. I'm currently using this code primarily on iOS but it should be portable between iOS and MacOS. It would be possible to do this through a Matrix Mixer Node, but that seems completely overkill, for such a basic operation. I'm already using a Matrix Mixer to combine the inputs, and it's not well supported in AVAudioEngine. AVAudioInputNode *inputNode=[engine inputNode]; [inputNode setVoiceProcessingEnabled:NO error:nil]; NSMutableArray *micDestinations=[NSMutableArray arrayWithCapacity:trackCount]; for(i=0;i<trackCount;i++) { fixMicFormat[i]=[AVAudioMixerNode new]; [engine attachNode:fixMicFormat[i]]; // And create reverb/compressor and eq the same way... [engine connect:reverb[i] to:matrixMixerNode fromBus:0 toBus:i format:nil]; [engine connect:eq[i] to:reverb[i] fromBus:0 toBus:0 format:nil]; [engine connect:compressor[i] to:eq[i] fromBus:0 toBus:0 format:nil]; [engine connect:fixMicFormat[i] to:compressor[i] fromBus:0 toBus:0 format:nil]; [micDestinations addObject:[[AVAudioConnectionPoint alloc] initWithNode:fixMicFormat[i] bus:0] ]; } AVAudioFormat *inputFormat = [inputNode outputFormatForBus: 1]; [engine connect:inputNode toConnectionPoints:micDestinations fromBus:1 format:inputFormat];
Replies
2
Boosts
0
Views
314
Activity
Oct ’25
Live Translations on VOIP on iOS26
Hi team, With regards to Call (Live) Translations on VOIP: Is it possible to invoke live translations within the app? (without going into the Call System UI) Is it possible to navigate users from app to Call System UI via an API? (and also invoking the new live translations directly) Will Apple support more languages apart from the current ones? (Currently I see 4 supported languages)
Replies
1
Boosts
0
Views
168
Activity
Aug ’25
SpeechTranscriber extremely slow (14+ seconds) despite proper locale allocation and optimization
Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications. Environment iOS: 26.0 Beta Xcode: Beta 5 Device: iPhone 16 pro Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025 Configuration Tested Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES Setup: All optimizations applied (preheating, high priority, model retention) I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results. I added some logs to check the specific time: 🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio! 🎙️ [20:30:41.532] Listening for transcription results... 🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false) Questions Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster? Are there additional optimization steps for SpeechTranscriber? Should we expect significant performance improvements in later betas?
Replies
1
Boosts
0
Views
230
Activity
Aug ’25
Audio clipping - macOS Tahoe 26 - Beta 5
I was testing audio playback from YouTube in Safari, and the sound was clipping heavily. At first, I thought it might be due to the poor quality of my small sound system. However, when I took a screenshot and the screenshot sound effect itself produced a loud clipping noise, it became clear that this is not a mechanical problem with my speakers, nor an issue specific to YouTube or Safari. This appears to be a system-wide audio issue in macOS Tahoe 26 - Beta 5.
Replies
1
Boosts
2
Views
314
Activity
Aug ’25
Should AVAudioFormat be Sendable?
AVAudioFormat has no Swift concurrency annotations but the documentation states "Instances of this class are immutable." This made me always assume it was safe to pass AVAudioFormat instances around. Is this the case? If so can it be marked as Sendable? Am I missing something?
Replies
1
Boosts
1
Views
710
Activity
Aug ’25
Can't set AVAudio sampleRate and installTap needs bufferSize 4800 at minimum
Two issues: No matter what I set in try audioSession.setPreferredSampleRate(x) the sample rate on both iOS and macOS is always 48000 when the output goes through the speaker, and 24000 when my Airpods connect to an iPhone/iPad. Now, I'm checking the current output loudness to animate a 3D character, using mixerNode.installTap(onBus: 0, bufferSize: y, format: nil) { [weak self] buffer, time in Task { @MainActor in // calculate rms and animate character accordingly but any buffer size under 4800 is just ignored and the buffers I get are 4800 sized. This is ok, when the sampleRate is currently 48000, as 10 samples per second lead to decent visual results. But when AirPods connect, the samplerate is 24000, which means only 5 samples per second, so the character animation looks lame. My AVAudioEngine setup is the following: audioEngine.connect(playerNode, to: pitchShiftEffect, format: format) audioEngine.connect(pitchShiftEffect, to: mixerNode, format: format) audioEngine.connect(mixerNode, to: audioEngine.outputNode, format: nil) Now, I'd be fine if the outputNode runs at whatever if it needs, as long as my tap would get at least 10 samples per second. PS: Specifying my favorite format in the let format = AVAudioFormat(standardFormatWithSampleRate: 48_000, channels: 2)! mixerNode.installTap(onBus: 0, bufferSize: y, format: format) doesn't change anything either
Replies
1
Boosts
0
Views
454
Activity
Aug ’25
MusicKit: Multichannel Dolby Atmos Limited to Stereo Output - Is This Intended Behavior?
I'm experiencing a significant limitation with MusicKit's Dolby Atmos implementation on macOS and would appreciate clarification on whether this is intended behavior or if there are solutions available. When streaming Dolby Atmos content through MusicKit's ApplicationMusicPlayer, the output is limited to 2-channel stereo, even when: Audio MIDI Setup is configured for 7.1.4 (12-channel) output The same tracks play in full multichannel through the native Apple Music app Dolby Atmos is set to "Automatic" in Apple Music preferences Please let me know if there is anyway to enable this. If not, is this documented anywhere? Thanks!
Replies
1
Boosts
2
Views
305
Activity
Aug ’25
In Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different audio files. Nothing in the documentation says that the speechRecognitionMetadata property on an SFSpeechRecognitionResult will be nil until isFinal, but that's the behaviour I'm seeing. I've stripped my class down to the following: private var isAuthed = false // I call this in a .task {} in my SwiftUI View public func requestSpeechRecognizerPermission() { SFSpeechRecognizer.requestAuthorization { authStatus in Task { self.isAuthed = authStatus == .authorized } } } public func transcribe(from url: URL) { guard isAuthed else { return } let locale = Locale(identifier: "en-US") let recognizer = SFSpeechRecognizer(locale: locale) let recognitionRequest = SFSpeechURLRecognitionRequest(url: url) // the behaviour occurs whether I set this to true or not, I recently set // it to true to see if it made a difference recognizer?.supportsOnDeviceRecognition = true recognitionRequest.shouldReportPartialResults = true recognitionRequest.addsPunctuation = true recognizer?.recognitionTask(with: recognitionRequest) { (result, error) in guard result != nil else { return } if result!.isFinal { //speechRecognitionMetadata is not nil } else { //speechRecognitionMetadata is nil } } } } Further, and this isn't documented either, the SFTranscriptionSegment values don't have correct timestamp and duration values until isFinal. The values aren't all zero, but they don't align with the timing in the audio and they change to accurate values when isFinal is true. The transcription otherwise "works", in that I get transcription text before isFinal and if I wait for isFinal the segments are correct and speechRecognitionMetadata is filled with values. The context here is I'm trying to generate a transcription that I can then highlight the spoken sections of as audio plays and I'm thinking I must be just trying to use the Speech framework in a way it does not work. I got my concept working if I pre-process the audio (i.e. run it through until isFinal and save the results I need to json), but being able to do even a rougher version of it 'on the fly' - which requires segments to have the right timestamp/duration before isFinal - is perhaps impossible?
Replies
1
Boosts
0
Views
171
Activity
Jul ’25