feat(live): Support native(model-side) audio transcription for agent transferring in live mode.

The old implementation: 1. We only started transcription at the beginning of agent transferring. 2. The transcription service we used is not as good / fast as the model/native transcription. In the current implementation, the live agent will rely on the llm's transcription, instead of our transcription when llm support audio transcription in the input. And in that case, the live agent won't use our own audio transcriber. This reduces the latency from 5secs to 2 secs during agent transferring. It also improves the transcription quality. When the llm doesn't support audio transcription, we still use our audio transcriber to transcribe audio input. PiperOrigin-RevId: 758296647
2025-12-20 04:12:21 -06:00 · 2025-05-13 11:12:35 -07:00
parent c4d5e3b298
commit 39f78dc28f
3 changed files with 33 additions and 11 deletions
--- a/src/google/adk/runners.py
+++ b/src/google/adk/runners.py
@@ -456,6 +456,9 @@ class Runner:
          run_config.output_audio_transcription = (
              types.AudioTranscriptionConfig()
          )
+      if not run_config.input_audio_transcription:
+        # need this input transcription for agent transferring in live mode.
+        run_config.input_audio_transcription = types.AudioTranscriptionConfig()
    return self._new_invocation_context(
        session,
        live_request_queue=live_request_queue,