ScreenCaptureKit recording output is corrupted when captureMicrophone is true

Question

Created Nov ’25

Replies 2

Boosts 0

Participants 3

Hello everyone,

I'm working on a screen recording app using ScreenCaptureKit and I've hit a strange issue. My app records the screen to an .mp4 file, and everything works perfectly until the .captureMicrophone is false In this case, I get a valid, playable .mp4 file.

However, as soon as I try to enable the microphone by setting streamConfig.captureMicrophone = true, the recording seems to work, but the final .mp4 file is corrupted and cannot be played by QuickTime or any other player. This happens whether capturesAudio (app audio) is on or off.

I've already added the "Privacy - Microphone Usage Description" (NSMicrophoneUsageDescription) to my Info.plist, so I don't think it's a permissions problem.

I have my logic split into a ScreenRecorder class that manages state and a CaptureEngine that handles the SCStream. Here is how I'm configuring my SCStream:

ScreenRecorder.swift

// This is my main SCStreamConfiguration
    private var streamConfiguration: SCStreamConfiguration {
        
        var streamConfig = SCStreamConfiguration()
        
        // ... other HDR/preset config ...
        
        // These are the problem properties
        streamConfig.capturesAudio = isAudioCaptureEnabled  
        streamConfig.captureMicrophone = isMicCaptureEnabled // breaks it if true
        
        streamConfig.excludesCurrentProcessAudio = false
        streamConfig.showsCursor = false
        
        if let region = selectedRegion, let display = currentDisplay {
            // My region/frame logic (works fine)
            let regionWidth = Int(region.frame.width)
            let regionHeight = Int(region.frame.height)
            
            streamConfig.width = regionWidth * scaleFactor
            streamConfig.height = regionHeight * scaleFactor
            
            // ... (sourceRect logic) ...
        }
        
        streamConfig.pixelFormat = kCVPixelFormatType_32BGRA
        streamConfig.colorSpaceName = CGColorSpace.sRGB
        streamConfig.minimumFrameInterval = CMTime(value: 1, timescale: 60)
        
        return streamConfig
    }

And here is how I'm setting up the SCRecordingOutput that writes the file:

ScreenRecorder.swift

private func initRecordingOutput(for region: ScreenPickerManager.SelectedRegion) throws {
       
        let screeRecordingOutputURL = try RecordingWorkspace.createScreenRecordingVideoFile(
            in: workspaceURL,
            sessionIndex: sessionIndex
        )
        
        let recordingConfiguration = SCRecordingOutputConfiguration()
        recordingConfiguration.outputURL = screeRecordingOutputURL
        recordingConfiguration.outputFileType = .mp4
        recordingConfiguration.videoCodecType = .hevc
      
        let recordingOutput = SCRecordingOutput(configuration: recordingConfiguration, delegate: self)
        self.recordingOutput = recordingOutput
       
    }

Finally, my CaptureEngine adds these to the SCStream:

CaptureEngine.swift

class CaptureEngine: NSObject, @unchecked Sendable {
    
    private(set) var stream: SCStream?
    private var streamOutput: CaptureEngineStreamOutput?
    
    // ... (dispatch queues) ...

    func startCapture(configuration: SCStreamConfiguration, filter: SCContentFilter, recordingOutput: SCRecordingOutput) async throws {
        
        let streamOutput = CaptureEngineStreamOutput()
        self.streamOutput = streamOutput

        do {
            stream = SCStream(filter: filter, configuration: configuration, delegate: streamOutput)
            
            // Add outputs for raw buffers (not used for file recording)
            try stream?.addStreamOutput(streamOutput, type: .screen, sampleHandlerQueue: videoSampleBufferQueue)
            try stream?.addStreamOutput(streamOutput, type: .audio, sampleHandlerQueue: audioSampleBufferQueue)
            try stream?.addStreamOutput(streamOutput, type: .microphone, sampleHandlerQueue: micSampleBufferQueue)
            
            // Add the file recording output
            try stream?.addRecordingOutput(recordingOutput)
            
            try await stream?.startCapture()
            
        } catch {
            logger.error("Failed to start capture: \(error.localizedDescription)")
            throw error
        }
    }
    
    // ... (stopCapture, etc.) ...
}

When I had the .captureMicrophone value to be false, I get a perfect .mp4 video playable everywhere, however, when its true, I am getting corrupted video which doesn't play at all :-

Boost

Answer 1

DTS Engineer OP

Apple

Feb ’26

Hello @ZainBren,

Are you able to reproduce this issue using our sample code?https://developer.apple.com/documentation/screencapturekit/capturing-screen-content-in-macos

(Please remember to add the Privacy - Microphone Usage Description key to the info.plist of that sample project before you try to reproduce!)

-- Greg

0

Answer 2

eddiewangyw OP

1w

When captureMicrophone is true, ScreenCaptureKit delivers separate audio sample buffers for app audio and microphone audio through the same stream output delegate. The key detail is that these arrive with different CMFormatDescriptions.

A few things to check in your CaptureEngine:

Make sure you are distinguishing between the two audio stream types in your stream(_:didOutputSampleBuffer:of:) callback. The type parameter will be .audio for app audio and .microphone for mic audio — these need separate AVAssetWriterInput instances with matching format descriptions.
If you are writing both to a single AVAssetWriterInput, the interleaved samples with different sample rates or channel counts will corrupt the container. App audio typically comes at the system sample rate (e.g. 48kHz stereo) while microphone audio may arrive at a different rate depending on the input device.
Verify the timing: microphone and app audio timestamps are on independent clocks. Both need to be offset relative to your recording start time. A common pattern is to capture the presentationTimeStamp of the very first sample buffer (whichever arrives first) and subtract that from all subsequent timestamps.

If you just need a combined recording, consider using AVCaptureSession with separate audio inputs instead, which gives you more control over the mixing.

0