Automotive Backend
This demo showcases a speech-to-speech conversation system using NVIDIA's ACE Controller framework. The system supports real-time audio processing, speech recognition, and text-to-speech capabilities.
Installation
Using uv
Using conda
Components
Server (server.py
)
- FastAPI-based WebSocket server
- Integrates with NVIDIA's Riva services for ASR and TTS
- Uses NVIDIA LLM service for conversation
- Supports real-time audio streaming and processing
- Implements VAD (Voice Activity Detection) using Silero
Client (client.py
)
- Python-based client for sending WAV files
- Supports WebSocket communication with the server
- Handles audio playback and streaming
- Implements retry mechanisms for reliable communication
- Provides detailed logging and progress tracking
Web UI (static/index.html
)
- Browser-based interface for real-time interaction
- Supports microphone input and audio playback
- Uses WebSocket for communication
- Implements Protobuf for data serialization
Prerequisites
- Python 3.12.9
- NVIDIA API Key (Required if you adopt Nvidia cloud service)
Setup
-
(Opt) Set up your environment variables in the .env file:
-
Install required packages:
Usage
Running the Server
The server will start on http://localhost:8100
Using the Web UI
- Open
http://localhost:8100/static/index.html
in your browser - Click "Start Audio" to begin the conversation
- Speak into your microphone
- Click "Stop Audio" to end the session