- Updated .env.example to include transcription configuration options. - Enhanced main.go to support audio transcription, including new endpoints and logic for handling transcription requests. - Added functionality to transcribe audio using OpenAI and Groq APIs. - Updated README.md with detailed instructions on enabling and using the transcription feature.
Evolution Audio Converter
This project is a microservice in Go that processes audio files, converts them to opus or mp3 format, and returns both the duration of the audio and the converted file in base64. The service accepts audio files sent as form-data, base64, or URL.
Requirements
Before starting, you'll need to have the following installed:
- Go (version 1.21 or higher)
- Docker (to run the project in a container)
- FFmpeg (for audio processing)
Installation
Clone the Repository
Clone this repository to your local machine:
git clone https://github.com/EvolutionAPI/evolution-audio-converter.git
cd evolution-audio-converter
Install Dependencies
Install the project dependencies:
go mod tidy
Install FFmpeg
The service depends on FFmpeg to convert the audio. Make sure FFmpeg is installed on your system.
-
On Ubuntu:
sudo apt update sudo apt install ffmpeg -
On macOS (via Homebrew):
brew install ffmpeg -
On Windows, download FFmpeg here and add it to your system
PATH.
Configuration
Create a .env file in the project's root directory with the following configuration:
PORT=4040
API_KEY=your_secret_api_key_here
Transcription Configuration
To enable audio transcription, configure the following variables in the .env file:
ENABLE_TRANSCRIPTION=true
TRANSCRIPTION_PROVIDER=openai # or groq
OPENAI_API_KEY=your_openai_key_here
GROQ_API_KEY=your_groq_key_here
TRANSCRIPTION_LANGUAGE=en # Default transcription language (optional)
ENABLE_TRANSCRIPTION: Enables or disables the transcription featureTRANSCRIPTION_PROVIDER: Chooses the AI provider for transcription (openai or groq)OPENAI_API_KEY: Your OpenAI API key (required if using openai)GROQ_API_KEY: Your Groq API key (required if using groq)TRANSCRIPTION_LANGUAGE: Sets the default transcription language (optional)
Running the Project
Locally
To run the service locally, use the following command:
go run main.go -dev
The server will be available at http://localhost:4040.
Using Docker
If you prefer to run the service in a Docker container, follow the steps below:
-
Build the Docker image:
docker build -t audio-service . -
Run the container:
docker run -p 4040:4040 --env-file=.env audio-serviceThis will start the container on the port specified in the
.envfile.
How to Use
You can send POST requests to the /process-audio endpoint with an audio file in the following formats:
- Form-data (to upload files)
- Base64 (to send the audio encoded in base64)
- URL (to send the link to the audio file)
Authentication
All requests must include the apikey header with the value of the API_KEY configured in the .env file.
Optional Parameters
format: You can specify the format for conversion by passing theformatparameter in the request. Supported values:mp3ogg(default)
Audio Transcription
You can get the audio transcription in two ways:
- Along with audio processing by adding the
transcribe=trueparameter:
curl -X POST -F "file=@audio.mp3" \
-F "transcribe=true" \
-F "language=en" \
http://localhost:4040/process-audio \
-H "apikey: your_secret_api_key_here"
- Using the specific transcription endpoint:
curl -X POST -F "file=@audio.mp3" \
-F "language=en" \
http://localhost:4040/transcribe \
-H "apikey: your_secret_api_key_here"
Optional parameters:
language: Audio language code (e.g., "en", "es", "pt"). If not specified, it will use the value defined inTRANSCRIPTION_LANGUAGEin.env. If neither is defined, the system will try to automatically detect the language.
The response will include the transcription field with the transcribed text:
{
"transcription": "Transcribed text here..."
}
When used with audio processing (/process-audio), the response will include both audio data and transcription:
{
"duration": 120,
"audio": "UklGR... (base64 of the file)",
"format": "ogg",
"transcription": "Transcribed text here..."
}
Example Requests Using cURL
Sending as Form-data
curl -X POST -F "file=@path/to/audio.mp3" http://localhost:4040/process-audio \
-F "format=ogg" \
-H "apikey: your_secret_api_key_here"
Sending as Base64
curl -X POST -d "base64=$(base64 path/to/audio.mp3)" http://localhost:4040/process-audio \
-d "format=ogg" \
-H "apikey: your_secret_api_key_here"
Sending as URL
curl -X POST -d "url=https://example.com/path/to/audio.mp3" http://localhost:4040/process-audio \
-d "format=ogg" \
-H "apikey: your_secret_api_key_here"
Response
The response will be a JSON object containing the audio duration and the converted audio file in base64:
{
"duration": 120,
"audio": "UklGR... (base64 of the file)",
"format": "ogg"
}
duration: The audio duration in seconds.audio: The converted audio file encoded in base64.format: The format of the converted file (mp3orogg).
License
This project is licensed under the MIT license.