SpeechTranscriber Faster Results

Question

Created 5d

Replies 1

Boosts 0

Participants 2

This post is from the WWDC26 Audio Q&A.

I am experimenting with SpeechTranscriber and am curious if I can get quicker results when using buffered audio, rather than a file. The use case is a voice ordering experience for a restaurant. When I've been playing with it, it takes about 3 seconds for faster results and 7-8 seconds for accurate results. Is there any way to bring this down a bit?

In this WWDC demo, the results appear nearly instantaneously. I'm curious how to replicate this in my app. I presume DicationTranscriber is faster, but how is siri detecting when the user stops speaking? Is it custom code, or is it using SpeechDetector? I tried using SpeechDetector with SpeechTranscriber but the detector didn't emit any results and seemed to slow down the results of SpeechTranscriber. I also assumed SpeechTranscriber makes more sense than DictationTranscriber in this use case, but want to confirm.

Answer 1

u_kudo OP

5d

You can use SpeechTranscriber.Preset to configure settings tailored to specific use cases.

If you prioritize real-time recognition, specifying .progressiveTranscription should make it somewhat faster and improve responsiveness.

https://developer.apple.com/documentation/speech/speechtranscriber/preset

let preset = SpeechTranscriber.Preset.progressiveTranscription

let transcriber = SpeechTranscriber(
            locale: Locale.current,
            transcriptionOptions: preset.transcriptionOptions,
            reportingOptions: preset.reportingOptions,
            attributeOptions: preset.attributeOptions
        )