Hello there,
We are looking at resolving audio co channel interference resolution... Googles Gemini is saying ...
Data Volume: A production-grade model capable of cleanly untangling mixed broadcast channels requires between 5,000 and 10,000 hours of verified audio data to map out diverse vocal characteristics and varying signal strengths.
Synthetic Data Generation: Rather than recording manual interference loops, the dataset can be entirely synthetic. Clean speech profiles are digitally mixed using Python pipelines that emulate a GroupTalk channel environment. This includes applying standard codecs, simulating varying packet loss concealment (PLC) artifacts, and injecting real-world environmental noise (like bridge wind or engine room hum).
Do you believe that ?
Warmest regards,
Ken
Can we implement language translation (for example Dutch (or Frisian) to English on edge) over these frameworks?
Your app has full control over what's recorded and played back, so how you process that audio is entirely up to you. For example, I believe there's at least one PTT app that actually sends text (not audio) and uses Text to Speech for playback.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware