Mixing ScreenCaptureKit audio with microphone audio

Hi,

I'm new to AVAudioEngine(and macOS programming in general).

I'm trying to mix microphone audio with ScreenCaptureKit audio using AVAudioEngine without playing it back. I've created a AVAudioPlayerNode and scheduling buffers in my SCStream handler:

                playerNode.scheduleBuffer(samples)

and have connected the playerNode to the mainMixerNode.

        audioEngine.connect(audioEngine.inputNode, to: audioEngine.mainMixerNode, format: micFormat)
        audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: format)

The problem is that mainMixerNode plays the audio to the speaker creating a feedback loop. How can I prevent the mixer output from being played back.

Also: Is this the best way of mixing microphone input with some other input? I ran into AVAudioEngine's manual rendering mode, which seems like the way to go for mixing audio without playing it back. However, I couldn't figure out how to connect microphone input to the AVAudioEngine in manual rendering mode?

I ran into exactly this problem when building an audio pipeline that mixes system audio (via ScreenCaptureKit) with microphone input for real-time speech processing. The core issue is that mainMixerNode is connected to outputNode by default, which routes everything to speakers. You have two approaches:

In manual rendering mode, AVAudioEngine does not play back to hardware — you pull rendered buffers on your own schedule. Enable manual rendering, attach a player node for your SCK audio, connect it to the main mixer, then call renderOffline() to pull mixed audio on demand. The catch: inputNode does not work in offline mode on macOS. The workaround is to capture mic samples separately (via AVCaptureSession or a tap on a separate realtime engine), then schedule those buffers into a second AVAudioPlayerNode.

Keep the engine in realtime mode but prevent playback by setting mainMixerNode.outputVolume = 0. Then install a tap on mainMixerNode to capture the mixed audio without speaker feedback. I tried disconnecting mainMixerNode from outputNode entirely, but on some macOS versions (13.x specifically) this causes the engine to stop pulling audio from its inputs. Setting volume to 0 is more reliable across macOS 13–15. For the sample rate mismatch between SCK output (typically 48kHz) and mic input (sometimes 44.1kHz), let the mixer handle the conversion — connect each source in its native format and set the mixer output format to your target rate.

Mixing ScreenCaptureKit audio with microphone audio
 
 
Q