Audio

Dive into the technical aspects of audio on your device, including codecs, format support, and customization options.

Audio Documentation

Post

Replies

Boosts

Views

Activity

Electron app + Apple Music playback: queue works, playback does not start. Looking for guidance.

Hi everyone. I’m building a macOS-first desktop app where music drives the app's behavior loop. The app is currently an Electron prototype. The blocker: we’re testing Apple Music inside an Electron app. MusicKit JS authorization works, catalog search works, and setting the queue works, but playback does not actually start in Electron. What we tried: Created Apple Developer / MusicKit credentials. Generated Apple Music developer tokens successfully. Retrieved a Music User Token through MusicKit JS. Confirmed Apple Music API calls work. Confirmed /v1/test and /me/storefront return 200 OK. Built a local HTTP auth/playback window inside Electron instead of using file://. Tested music.setQueue() with both: { song: songId } { url: catalogUrl } In Electron, the queue loads correctly: queueEmpty=false queueLength=1 volume=1 playbackRate=1 But after music.play(), playbackTime stays at 0 and no audio plays. Then we ran the same MusicKit playback test in normal Chrome using the same token, same local origin, same catalog track, and same queue descriptor. Chrome played successfully and playbackTime advanced. We also checked Electron directly and found navigator.requestMediaKeySystemAccess is missing, so our current theory is that stock Electron lacks the protected media / EME support Apple Music web playback needs. Important: we are not trying to bypass DRM or extract audio. We just want a legitimate way for a user-authorized macOS app to control Apple Music playback or observe playback state. What we’re considering next: Use the native macOS Music app as the playback engine and control it from our app. Test AppleScript / Automation permissions for play, pause, next, current track, player state, etc. Later, possibly build a native Swift helper using Apple Music / MediaPlayer APIs and communicate with Electron over IPC. Avoid relying on Electron MusicKit JS playback if this is a known dead end. Questions: Has anyone successfully made Apple Music / MusicKit JS playback work inside Electron? Is the missing EME/protected-media layer the expected blocker here? Is controlling the native macOS Music app the more realistic path? Any gotchas with AppleScript, MusicKit native APIs, or Electron + native helper architecture for this use case? Any pointers from people who have dealt with Electron + Apple Music / protected media would be appreciated.

Media Technologies Audio Apple Music API MusicKit MusicKit JS Audio

328

May ’26

AudioHardwareCreateProcessTap delivers all-zero buffers while system audio is audible

Summary Using AudioHardwareCreateProcessTap + AudioHardwareCreateAggregateDevice for system audio capture. During long sessions, the AudioDeviceIOProc callback continues firing normally but every PCM sample is exactly 0.0f — while the system is producing audible output. Environment Field Value macOS 26.5 Beta Hardware MacBook Air (M2) API AudioHardwareCreateProcessTap + AudioHardwareCreateAggregateDevice Tap CATapDescription, processes = [], .unmuted, private Format 48,000 Hz, Float32, interleaved stereo Aggregate anchor kAudioAggregateDeviceMainSubDeviceKey = current default output UID Observed behavior After running normally for several minutes, the stream transitions into an all-zero state: AudioDeviceIOProc continues to fire at expected cadence Frame count, timestamps (mHostTime, mSampleTime), and mDataByteSize all look normal AudioBufferList pointers are valid Every sample in every buffer is exactly 0.0f Other apps are still producing audible output through the same output device The condition may self-recover or persist until the session is stopped Confirmed via RMS logging both inside the IOProc and after the ring buffer consumer — data is zero on delivery, not introduced downstream. Example: 51-minute session on MacBook Air M2 Segment 1 (~7 min): Three all-zero periods: 60 s, 53 s, 141 s. Real PCM briefly returned between them. Segment 2 (~44 min): Two all-zero periods: 16 min 3 s, 3 min 8 s. IOProc cadence, timestamp deltas, default output UID, and kAudioDevicePropertyDeviceIsRunningSomewhere all remained normal throughout. What I have ruled out Actual silence: User was in an active video call and could hear participants through the output device. Default output device change: Monitored kAudioHardwarePropertyDefaultOutputDevice — no change during affected periods. IOProc stall: Heartbeat and kAudioDevicePropertyDeviceIsRunningSomewhere remained normal. Aggregate device destroyed: AudioObjectGetPropertyData on the aggregate UID continued returning the expected device. Tap descriptor misconfiguration: The same tap produces valid PCM earlier in the same session, so this is not a startup-time issue. Why detection is hard All-zero buffers from a broken tap are indistinguishable from legitimate silence (muted participant, waiting room, paused media). kAudioProcessPropertyIsRunningOutput reports whether a process has active output IO, not whether it is contributing non-zero samples — a muted Zoom call still reports true. Possible correlations Sample-rate renegotiation on the output device (44.1 kHz ↔ 48 kHz) when another app changes output Bluetooth device state changes (AirPods sleep/wake) where UID stays the same MacBook Air more frequently affected than MacBook Pro Always occurs after extended uptime — first few minutes are consistently clean Current workaround Full teardown and rebuild restores real PCM. Restarting the IOProc alone or recreating only the aggregate device is not reliable — both the Process Tap and Aggregate Device must be destroyed and recreated. 1. AudioDeviceStop 2. AudioDeviceDestroyIOProcID 3. AudioHardwareDestroyAggregateDevice 4. AudioHardwareDestroyProcessTap 5. AudioHardwareCreateProcessTap 6. AudioHardwareCreateAggregateDevice 7. Create + start new IOProc Applying this automatically is risky because it cannot be reliably distinguished from legitimate silence. Questions Expected failure mode? Can a Process Tap continue delivering zero-filled buffers while the system output is audible? Is this expected under certain device or routing conditions? Detection signal? Is there any HAL property, notification, or diagnostic counter that distinguishes "sources are genuinely silent" from "the tap data path has stopped receiving the real mix"? Targeted recovery? Is there a supported way to re-anchor or reset the tap data path without destroying and recreating both objects? Full rebuild as intended workaround? If so, it would help to confirm this so developers can converge on a consistent approach. Mixer activity signal? kAudioProcessPropertyIsRunningOutput reflects IO registration, not sample contribution. Is there any AudioProcess property that indicates a process is currently delivering non-zero audio to the system mixer?

Media Technologies Audio AudioToolbox Core Audio AVFoundation

363

May ’26

CarPlay HID transport buttons remap to call-control during continuous mic capture (no opt-out API)

Hello, I am developing Uniq Intercom, a voice-only group communication app for motorcyclists (always-on intercom over WebRTC, used continuously for multi-hour rides). I am seeking guidance on an iOS audio session and CarPlay HID interaction I have not been able to resolve through documented APIs. Problem: As soon as my app activates the microphone (yellow recording indicator visible), iOS appears to classify the app as a real-time communication participant and CarPlay re-routes the steering-wheel / handlebar HID transport buttons (left / right / ok) from the media-control role to the call-control role (answer/decline). Because I do not register a CallKit / LiveCommunicationKit call (the session is a continuous group voice channel, not a discrete telephony call), there is no call object for those buttons to act upon — they effectively become inert. Why this matters: Motorcyclists rely on the intercom for 4–6 hour rides. CarPlay is now built into a growing number of modern motorcycles and with aftermarket display units virtually any bike, and any rider who uses any voice communication platform alongside it — Uniq Intercom, WhatsApp Call currently runs into this same handlebar button remap. With the buttons inert, the rider's only remaining option is to reach for the motorcycle's touchscreen to skip a track or change navigation — this is unsafe. The exact same remap behavior occurs during a real WhatsApp or Phone call — but for those the remap is correct (answer/decline maps to a real call). For continuous voice apps without a CallKit-style discrete call, no equivalent path exists today. As both an iOS developer and a motorcyclist, I would very much like to see this resolved — solving it would meaningfully improve safety for every rider using an iPhone with CarPlay. Configurations I have tested (all reproduce the symptom on iOS 18+ / 26 with wireless CarPlay): AVAudioSession.Category.playAndRecord + .voiceChat mode + various option combinations (duckOthers, mixWithOthers, allowBluetoothHFP, allowBluetoothA2DP, defaultToSpeaker) Same category with .videoChat mode (which @livekit/react-native defaults to) Same category with .default mode (re-applied after setAudioModeAsync to defeat any framework override) — confirmed Mode = Default for ~2 s window in audiomxd log before WebRTC's setActive cycle returned mode to .voiceChat. Buttons remained remapped during the .default window. Disabling MPRemoteCommandCenter and clearing MPNowPlayingInfoCenter.default().nowPlayingInfo JS-side override of WebRTC's global RTCAudioSessionConfiguration via @livekit/react-native's AudioSession.setAppleAudioConfiguration({audioMode: 'default'}) bridge, applied both before connect and after setAudioModeAsync to defeat library overrides In every case the audiomxd system log confirms our session goes active (Mode = VoiceChat or Default, Recording = YES), and CarPlay HID buttons are immediately remapped to call-control. The middle "OK" button remains functional because it is not part of the call-control mapping — confirming the buttons are not blocked, only re-purposed. The remap occurs the instant the iOS recording indicator appears, regardless of audio session mode. This led me to conclude the trigger is not audio session mode but the combination of microphone permission + active session + (likely) the AUVoiceIO unit instantiated by WebRTC. I cannot find a public API path to suppress this classification while maintaining the always-on continuous voice channel. My questions: Is there an entitlement or API that allows an app with active microphone capture to declare itself as a non-call media participant, keeping CarPlay HID transport buttons in the media role? Is AVAudioSession.setPrefersEchoCancelledInput(_:) (iOS 18+) the intended path for retaining platform AEC under .default mode without the focus-engine "communication priority" marking? Documentation is sparse on its CarPlay arbitration implications. Does the PushToTalk framework affect HID arbitration differently from playAndRecord + voiceChat? Our continuous-channel UX does not fit the PTT transmit-on-press model, but understanding the contrast would help. If no current API exists, is this something the iOS Audio team would consider for future SDKs? Solving this would meaningfully improve safety for motorcycle and adventure-sport users on iOS. Thank you for your time and any guidance you can offer. — Emre Erkaya / Uniq Intercom

Media Technologies Audio CarPlay AVAudioSession AVFoundation

403

May ’26

AVAudioEngineConfigurationChangeNotification received while engine is running

The documentation for AVAudioEngineConfigurationChangeNotification states When the audio engine’s I/O unit observes a change to the audio input or output hardware’s channel count or sample rate, the audio engine stops, uninitializes itself, and issues this notification. A user of my framework has reported a crash during notification processing on iOS 26.4 when the main mixer node is disconnected from the output node in order to reestablish the connection with a different format. The failing precondition is com.apple.coreaudio.avfaudio: required condition is false: !IsRunning(). The report was observed on iPhone 16 / iOS 26.4.2, ARM64, TestFlight build. The backtrace contains: [Last Exception Backtrace] 3 AVFAudio AVAudioEngineGraph::_DisconnectInput AVAudioEngineGraph.mm:2728 4 AVFAudio -[AVAudioEngine disconnectNodeInput:bus:] AVAudioEngine.mm:155 5 SFB sfb::AudioPlayer::handleAudioEngineConfigurationChange AudioPlayer.mm:2247 [Thread 18 Crashed] 9 SFB sfb::AudioPlayer::handleAudioEngineConfigurationChange AudioPlayer.mm:2212 … 14 AVFAudio IOUnitConfigurationChanged Has the behavior for AVAudioEngineConfigurationChangeNotification changed in iOS 26.4? It's simple enough to call [engine_ stop] in the notification handler but the documentation states this shouldn't be necessary. I've not observed a similar crash on previous iOS versions.

Media Technologies Audio AVAudioEngine

360

May ’26

MacOS system audio capture low volume with multichannel soundcards

I am building an app that uses system audio capture. This works well for 2-channel sound cards, but as soon as the interface has more than 2 outputs, the capture volume is very low. Does anyone have tips on where to look? Capturing neither before nor after the mix doesn't solve it.

Media Technologies Audio

188

May ’26

AVAudioEngine startAndReturnError is now failing

I have a keyboard in my iOS Morse Code app that has always been able to play audio via AVAudioEngine. Recently it has been failing to produce audio. I see that startAndReturnError: is now failing with this error: Error Domain=com.apple.coreaudio.avfaudio Code=268435459 "(null)" UserInfo={failed call=err = PerformCommand(*outputNode, kAUInitialize, NULL, 0)} What's going on? Have keyboards lost the ability to play audio? Here's how I set things up: _engine = [AVAudioEngine new]; _prefs = [[NSUserDefaults alloc] initWithSuiteName:kSharedAppGroupID]; AVAudioMixerNode* mainMixerNode = _engine.mainMixerNode; AVAudioOutputNode* outputNode = _engine.outputNode; AVAudioFormat* format = [outputNode inputFormatForBus:0]; AVAudioFormat* inputFormat = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32 sampleRate:44100 channels:1 interleaved:NO]; self.srcNode = [[AVAudioSourceNode alloc] initWithRenderBlock:^OSStatus(BOOL* _Nonnull isSilence, const AudioTimeStamp* _Nonnull timestamp, AVAudioFrameCount frameCount, AudioBufferList* _Nonnull outputData) { // This block builds the data, but is never called, so it is not the culprit. }]; [_engine attachNode:self.srcNode]; [_engine connect:self.srcNode to:mainMixerNode format:inputFormat]; [_engine connect:mainMixerNode to:_engine.outputNode format:nil]; [_engine prepare];

Media Technologies Audio AVAudioSession AVAudioEngine

290

Apr ’26

Radio stations unable to play on Android with MusicKit SDK

Radio stations are currently not supported by the MusicKit SDK for Android. The SDK has not been updated for years now. It lacks pretty big features of Apple Music

Media Technologies Audio MusicKit

482

Apr ’26

MusicKit developer token returns 401 on all catalog endpoints

My MusicKit developer token returns 401 (empty body) on every Apple Music API catalog endpoint. I've tried two different keys — both fail identically. Setup: Team ID: K79RSBVM9G Key ID: URNQV5UDGB (MusicKit enabled, associated with Media ID media.audio.explore.musickit) Apple Developer Program License Agreement accepted April 14, 2026 Token format (matches docs exactly): Header: {"alg":"ES256","kid":"URNQV5UDGB"} Payload: {"iss":"K79RSBVM9G","iat":,"exp":<now+15777000>} What works: /v1/storefronts/us returns 200 What fails: Every catalog endpoint returns 401 with empty body: /v1/catalog/us/search?types=artists&term=test /v1/catalog/us/artists/5920832 /v1/catalog/us/genres /v1/test The token self-verifies (signature is valid). I've tried with and without typ:"JWT", with the origin claim, and with a manually signed JWT bypassing the jsonwebtoken library. Same 401 every time. What am I missing?

Media Technologies Audio Apple Music API MusicKit MusicKit JS

364

Apr ’26

Is Push to Talk appropriate for a voice-based interactive assistant (not a walkie-talkie app)?

Hello, Looking for guidance from Apple engineers or developers who have used Push to Talk in production I am developing an iOS application called Companion AI / Theo Voice, designed for elderly users. The goal of the app is to provide a simple, voice-first interactive assistant that enables: natural voice interaction (no typing required) daily assistance (reminders, well-being, conversation) bidirectional voice communication (the user can immediately respond by voice) ⸻ How it works The app operates in two main modes: Conversation mode the user opens the app the assistant speaks the user replies naturally by voice Proactive mode in specific useful situations (e.g. medication reminders, check-ins) the app initiates a voice interaction the user can respond immediately ⸻ Important constraints there is no continuous listening the microphone is only active during interactions users can disable proactive interactions frequency is limited and user-controlled ⸻ Question We are considering using the Push to Talk framework in order to: allow the app to be awakened in the background initiate a voice interaction enable immediate voice response from the user Would this usage be considered aligned with the intended use of Push to Talk? Are there any specific recommendations to ensure compliance with App Store Review Guidelines? Thank you very much for your guidance.

Media Technologies Audio

229

Apr ’26

Bug: Channels erroneously populated when sending audio from an iPhone to a linux gadget audio device.

I have a device which is using linux gadget audio to receive audio input via USB, exposing 24 capture channels. This device works well with Mac, Windows, and Android phones. However, when sending audio from an iPhone (both USB-C iPhones and lightning iPhones using an official Apple lightning -> usb adaptor) I am seeing strange behaviour. Audio which is sent from the iPhone to any one of inputs 12, 19, 20, 21, or 22 appears in all of those channels, rather than only the channel to which audio is routed. I have confirmed on my linux device that these channels are not being erroneously populated by the software running on that device; the issue is visible in audio recorded directly from the gadget using arecord, meaning it is present in the audio being sent from the iPhone. I have confirmed that the gadget channel mask is correct for 24 channel audio (0xFFFFFF). As said above, audio routed to this device from any non-iPhone device (Mac, Windows, Android) works fine. The only sensible conclusion seems to be that the iPhone is populating the additional channels erroneously due to some bug in CoreAudio's handling of gadget audio devices. I would appreciate any insight on this from Apple developers, or from anyone else who has come across this issue and found a workaround.

Media Technologies Audio Core Audio

362

Apr ’26

How to use the SpeechDetector Module

I am trying to use SpeechDetector Module in Speech framework along with SpeechTranscriber. and it is giving me an error Cannot convert value of type 'SpeechDetector' to expected element type 'Array.ArrayLiteralElement' (aka 'any SpeechModule') Below is how I am using it let speechDetector = Speech.SpeechDetector() let transcriber = SpeechTranscriber(locale: Locale.current, transcriptionOptions: [], reportingOptions: [.volatileResults], attributeOptions: [.audioTimeRange]) speechAnalyzer = try SpeechAnalyzer(modules: [transcriber,speechDetector])

Media Technologies Audio Speech

743

Apr ’26

SpeechAnalyzer speech to text wwdc sample app

I am using the sample app from: https://developer.apple.com/videos/play/wwdc2025/277/?time=763 I installed this on an Iphone 15 Pro with iOS 26 beta 1. I was able to get good transcription with it. The app did crash sometimes when transcribing and I was going to post here with the details. I then installed iOS beta 2 and uninstalled the sample app. Now every time I try to run the sample app on the 15 Pro I get this message: SpeechAnalyzer: Input loop ending with error: Error Domain=SFSpeechErrorDomain Code=10 "Cannot use modules with unallocated locales [en_US (fixed en_US)]" UserInfo={NSLocalizedDescription=Cannot use modules with unallocated locales [en_US (fixed en_US)]} I can't continue our our work towards using SpeechAnalyzer now with this error. I have set breakpoints on all the catch handlers and it doesn't catch this error. My phone region is "United States"

Media Technologies Audio Speech

3.1k

Apr ’26

tvOS: Background audio + local caching works on Simulator but stops on real Apple TV device

Description: I’m developing a tvOS app using SwiftUI where we play background audio (music) in the Welcome screen, with support for offline playback via local caching. 🔹 Feature Overview App fetches audio metadata from API Starts streaming audio (HLS .m3u8) immediately In parallel, downloads the raw audio file (.mp3) Once download completes: Switches playback from streaming → local file On next launch (offline mode), app plays audio from local storage 🔹 Issue This flow works perfectly on the Simulator, but on a real Apple TV device: Audio plays for a few seconds (2–5 sec) and then stops Especially after switching from streaming → local file No explicit AVPlayer error is logged Playback sometimes stops after UI updates or periodic API refresh 🔹 Implementation Details Using AVPlayer with AVPlayerItem Background audio controlled via a shared manager (singleton) Files stored locally using FileManager (currently using .cachesDirectory) Switching playback using: player.replaceCurrentItem(with: AVPlayerItem(url: localURL)) player.play() 🔹 Observations Works reliably on Simulator On device: Playback stops silently Seems related to lifecycle, buffering, or file access No issues when continuously streaming (without switching to local) 🔹 Questions Is there any limitation or known issue with AVPlayer when switching from streaming (HLS) to local file playback on tvOS? Are there specific requirements for playing locally cached media files on tvOS (e.g., file location, permissions, or sandbox behavior)? What is the recommended storage location and size limit for cached media files on tvOS? We understand tvOS has limited persistent storage Is .cachesDirectory the correct approach for this use case? Are there known differences in AVPlayer behavior between Simulator and real Apple TV devices (especially regarding buffering or lifecycle)? What is the recommended approach for implementing offline background audio on tvOS apps? 🔹 Goal We want to implement a reliable system where: Audio streams initially Seamlessly switches to local file after download Continues playing without interruption Supports offline playback on subsequent launches Any guidance or best practices would be greatly appreciated. Thank you!

Media Technologies Audio tvOS SwiftUI

308

Apr ’26

tvOS: Background audio + local caching works on Simulator but stops on real Apple TV device

Description: I’m developing a tvOS app using SwiftUI where we play background audio (music) in the Welcome screen, with support for offline playback via local caching. Feature Overview: App fetches audio metadata from API Starts streaming audio (HLS .m3u8) immediately In parallel, downloads the raw audio file (.mp3) Once download completes: Switches playback from streaming → local file On next launch (offline mode), app plays audio from local storage Issue: This flow works perfectly on the Simulator, but on a real Apple TV device: Audio plays for a few seconds (2–5 sec) and then stops Especially after switching from streaming → local file No explicit AVPlayer error is logged Playback sometimes stops after UI updates or periodic API refresh Implementation Details: Using AVPlayer with AVPlayerItem Background audio controlled via a shared manager (singleton) Files stored locally using FileManager (currently using .cachesDirectory) Switching playback using: player.replaceCurrentItem(with: AVPlayerItem(url: localURL)) player.play() Observations: Works reliably on Simulator On device: -- Playback stops silently -- Seems related to lifecycle, buffering, or file access No issues when continuously streaming (without switching to local) Questions: Is there any limitation or known issue with AVPlayer when switching from streaming (HLS) to local file playback on tvOS? Are there specific requirements for playing locally cached media files on tvOS (e.g., file location, permissions, or sandbox behavior)? What is the recommended storage location and size limit for cached media files on tvOS? We understand tvOS has limited persistent storage Is .cachesDirectory the correct approach for this use case? Are there known differences in AVPlayer behavior between Simulator and real Apple TV devices (especially regarding buffering or lifecycle)? What is the recommended approach for implementing offline background audio on tvOS apps? Goal: We want to implement a reliable system where: Audio streams initially Seamlessly switches to local file after download Continues playing without interruption Supports offline playback on subsequent launches Any guidance or best practices would be greatly appreciated. Thank you!

Media Technologies Audio tvOS SwiftUI

255

Apr ’26

AVAudioSession : Audio issues when recording the screen in an app that changes IOBufferDuration on iOS 26.

Among Japanese end users, audio issues during screen recording—primarily in game applications—have become a topic of discussion. We have confirmed that the trigger for this issue is highly likely to be related to changes to IOBufferDuration. When using setPreferredIOBufferDuration and the IOBufferDuration is set to a value smaller than the default, audio problems occur in the recorded screen capture video. Audio playback is performed using AudioUnit (RemoteIO). https://developer.apple.com/documentation/avfaudio/avaudiosession/setpreferrediobufferduration(_:)?language=objc This issue was not observed on iOS 18, and it appears to have started occurring after upgrading to iOS 26. We provide an audio middleware solution, and we had incorporated changes to IOBufferDuration into our product to achieve low-latency audio playback. As a result, developers using our product as well as their end users are being affected by this issue. We kindly request that this issue be investigated and addressed in a future update. “This document has been translated by AI. The original text is included below for reference.” 日本のエンドユーザー間で主にゲームアプリケーションにおける画面収録時の音声の問題が話題になっています。こちらの症状のトリガーが、IOBufferDurationの変更によるものである可能性が高いことを確認しました。 setPreferredIOBufferDurationを使用し、IOBufferDurationがデフォルトより小さい状態の時、画面収録された動画の音声に問題が発生することをしています。音声の再生にはAudioUnit(RemoteIO)を使用しています。 https://developer.apple.com/documentation/avfaudio/avaudiosession/setpreferrediobufferduration(_:)?language=objc iOS 18ではこのような問題は確認されておらず、iOS26になってから問題が発生しているようです。私たちはオーディオミドルウェアを提供しており、低遅延の再生のためにIOBufferDurationの変更を製品に組み込んでいました。そのため、弊社製品をご利用いただいている開発者およびエンドユーザーの皆様がこの不具合の影響を受けています。こちらの不具合の調査及び修正対応を検討いただけますでしょうか。

Media Technologies Audio

868

Apr ’26

SpeechAnalyzer error "asset not found after attempted download" for certain languages

I am trying to use the new SpeechAnalyzer framework in my Mac app, and am running into an issue for some languages. When I call AssetInstallationRequest.downloadAndInstall() for some languages, it throws an error: Error Domain=SFSpeechErrorDomain Code=1 "transcription.ar asset not found after attempted download." The ".ar" appears to be the language code, which in this case was Arabic. When I call AssetInventory.status(forModules:) before attempting the download, it is giving me a status of "downloading" (perhaps from an earlier attempt?). If this language was completely unsupported, I would expect it to return a status of "unsupported", so I'm not sure what's going on here. For other languages (Polish, for example) SpeechTranscriber.supportedLocale(equivalentTo:) is returning nil, so that seems like a clearly unsupported language. But I can't tell if the languages I'm trying, like Arabic, are supported and something is going wrong, or if this error represents something I can work around. Here's the relevant section of code. The error is thrown from downloadAndInstall(), so I never even get as far as setting up the SpeechAnalyzer itself. private func setUpAnalyzer() async throws { guard let sourceLanguage else { throw Error.languageNotSpecified } guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: Locale(identifier: sourceLanguage.rawValue)) else { throw Error.unsupportedLanguage } let transcriber = SpeechTranscriber(locale: locale, preset: .progressiveTranscription) self.transcriber = transcriber let reservedLocales = await AssetInventory.reservedLocales if !reservedLocales.contains(locale) && reservedLocales.count == AssetInventory.maximumReservedLocales { if let oldest = reservedLocales.last { await AssetInventory.release(reservedLocale: oldest) } } do { let status = await AssetInventory.status(forModules: [transcriber]) print("status: \(status)") if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) { try await installationRequest.downloadAndInstall() } } ...

Media Technologies Audio Speech

1.3k

Apr ’26

SpeechAnalyzer > AnalysisContext lack of documentation

I'm using the new SpeechAnalyzer framework to detect certain commands and want to improve accuracy by giving context. Seems like AnalysisContext is the solution for this, but couldn't find any usage example. So I want to make sure that I'm doing it right or not. let context = AnalysisContext() context.contextualStrings = [ AnalysisContext.ContextualStringsTag("commands"): [ "set speed level", "set jump level", "increase speed", "decrease speed", ... ], AnalysisContext.ContextualStringsTag("vocabulary"): [ "speed", "jump", ... ] ] try await analyzer.setContext(context) With this implementation, it still gives outputs like "Set some speed level", "It's speed level", etc. Also, is it possible to make it expect number after those commands, in order to eliminate results like "set some speed level to" (instead of two).

Media Technologies Audio Speech

766

Apr ’26

CarPlay outputs no audio

I have an application that includes custom artwork for the album cover and text details setup with the MPRemoteCommandCenter.shared() reference. I need the user to have a full featured "now playing" display to see all of this. My experience is that cannot find a set of parameters for AVAudioSession.setCategory() that route audio successfully, and display the full featured now playing deck. If I use .playAndRecord, the audio I send out plays out on the radio. But, the now-playing deck is empty and nothing I do with the command center seems to change that. If I instead use .playback, I cannot use .defaultToSpeaker option which is the only way I've found to cause the "now-playing" navigation button to appear so that the full featured deck will display. But, of course setCategory() fails with an error about .defaultToSpeaker only available with .playAndRecord, so some default or intermediate state is entered and I see the full featured deck, but no audio goes out to the radio. What combination is supposed to be used here and is this more likely a problem with thread use (@MainActor) and/or some ordering of operations that I've overlooked?

Media Technologies Audio

262

Apr ’26

AVAudioFile.read extremely slow after seeking in FLAC and MP3 files

I'm developing an audio player app that uses AVAudioFile to read PCM data from various formats. I'm experiencing severe performance issues when seeking in FLAC, while other compressed formats (M4A/AAC) work correctly. I don't intend to use them in my app, but I also tested mp3 files just by curiosity and they also have this issue. Environment: macOS 26 (Tahoe) Xcode 26.3 Apple Silicon (M1) The issue: After setting AVAudioFile.framePosition to a position mid-file, the subsequent call to AVAudioFile.read(into:frameCount:) blocks for an unreasonable amount of time for FLAC and MP3 files. The delay scales linearly with the seek target, seeking near the beginning is fast, seeking toward the end is proportionally slower, which suggests the decoder is decoding linearly from the beginning of the file rather than using any seek index. (My app deals with “images” of Audio CDs ripped as a single long audio file.) The issue is particularly severe when reading files from an SMB network share (server on Ethernet, client on Wi-Fi with the access point ~2 meters away in line of sight). Quick Benchmark results: I tested with the same 75-minute audio content (16-bit/44.1 kHz stereo, 200,502,708 frames) encoded in five formats, seeking to the midpoint. Over SMB (Local Network, Server on Ethernet, Client on WiFi): Format | Seek + Read Time ----------|------------------ WAV | 0.007 s AIFF | 0.009 s Apple | 0.015 s Lossless | MP3 | 9.2 s FLAC | 30.2 s Locally (MacBook Air M1 SSD) : Format | Seek + Read Time ----------|------------------ WAV | 0.0005 s AIFF | 0.0004 s Apple | 0.0011 s Lossless | MP3 | 0.1958 s FLAC | 0.7528 s WAV, AIFF, and M4A all seek virtually instantly (< 15 ms). MP3 and FLAC exhibit linear-time behavior, with FLAC being the worst affected. Note that M4A (AAC) is also a compressed format that requires decoding after seeking, yet it completes in 15 ms. This rules out any inherent limitation of compressed formats, the MP4 container's packet index (stts/stco) is clearly being used for fast random access. Both MP3 (Xing/LAME TOC) and FLAC (SEEKTABLE metadata block) have their own seek mechanisms that should provide similar performance. Minimal CLI tool to reproduce: import Foundation guard CommandLine.arguments.count > 1 else { print("Usage: FLACSpeed <audio-file-path>") exit(1) } let path = CommandLine.arguments[1] let fileURL = URL(fileURLWithPath: path) do { let file = try AVAudioFile(forReading: fileURL) let format = file.processingFormat let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: 8192)! let totalFrames = file.length let seekTarget = totalFrames / 2 print("File: \(fileURL.lastPathComponent)") print("Format: \(format)") print("Total frames: \(totalFrames)") print("Seeking to frame: \(seekTarget)") file.framePosition = seekTarget let start = CFAbsoluteTimeGetCurrent() try file.read(into: buffer, frameCount: 8192) let elapsed = CFAbsoluteTimeGetCurrent() - start print("Read after seek took \(elapsed) seconds") } catch { print("Error: \(error.localizedDescription)") exit(1) } Expected behavior: AVAudioFile.read(into:frameCount:) after setting framePosition should use the available seek mechanisms in FLAC and MP3 files for fast random access, as it already does for M4A (AAC). Even accounting for the fact that seek tables provide approximate (not sample-precise) positioning, the "jump to nearest index point + decode forward" approach should complete in milliseconds, not seconds. Workaround: For FLAC, I've worked around this by using libFLAC directly, which provides instant seeking via FLAC__stream_decoder_seek_absolute(). libFLAC Performance: For comparison, libFLAC's FLAC__stream_decoder_seek_absolute() performs the same seek + read on the same FLAC file in around 0.015, using the FLAC seek table to jump to the nearest preceding seek point, then decoding forward a small number of frames to the exact target sample.

Media Technologies Audio AVFoundation

483

Apr ’26

CarPlay: Voice Conversational Entitlement Details

With the Voice Conversational Entitlement, can a CarPlay app establish a turn-based audio interface that operates in two modes: Speaking mode: Audio Session configured for playback Buffered audio Listening mode: Switch Audio Session to .record or .playAndRecord Activate SFSpeechRecognizer And continue toggling back and forth. The app should listen for responses to questions or other audio cues, and assuming those answers are correct (based on analysis of results from SFSpeechRecognizer), continue this pattern of mode 1 and 2 alternating. This appears to be a valid use of this entitlement. Does this also require the Audio App Entitlement, or is the Voice Conversational Entitlement sufficient? Are there other obstacles to this type of app that I'm not seeing? Or perhaps this is technically possible, but unlikely to pass app store review?

Media Technologies Audio CarPlay Speech

363

Apr ’26

Electron app + Apple Music playback: queue works, playback does not start. Looking for guidance.

Media Technologies Audio Apple Music API MusicKit MusicKit JS Audio

Replies: 0
Boosts: 0
Views: 328
Activity: May ’26

AudioHardwareCreateProcessTap delivers all-zero buffers while system audio is audible

Media Technologies Audio AudioToolbox Core Audio AVFoundation

Replies: 0
Boosts: 0
Views: 363
Activity: May ’26

CarPlay HID transport buttons remap to call-control during continuous mic capture (no opt-out API)

Media Technologies Audio CarPlay AVAudioSession AVFoundation

Replies: 1
Boosts: 0
Views: 403
Activity: May ’26

AVAudioEngineConfigurationChangeNotification received while engine is running

Media Technologies Audio AVAudioEngine

Replies: 0
Boosts: 1
Views: 360
Activity: May ’26

MacOS system audio capture low volume with multichannel soundcards

Media Technologies Audio

Replies: 0
Boosts: 0
Views: 188
Activity: May ’26

AVAudioEngine startAndReturnError is now failing

Media Technologies Audio AVAudioSession AVAudioEngine

Replies: 0
Boosts: 0
Views: 290
Activity: Apr ’26

Radio stations unable to play on Android with MusicKit SDK

Radio stations are currently not supported by the MusicKit SDK for Android. The SDK has not been updated for years now. It lacks pretty big features of Apple Music

Media Technologies Audio MusicKit