![]() - Updated .env.example to include transcription configuration options. - Enhanced main.go to support audio transcription, including new endpoints and logic for handling transcription requests. - Added functionality to transcribe audio using OpenAI and Groq APIs. - Updated README.md with detailed instructions on enabling and using the transcription feature. |
||
---|---|---|
.dockerignore | ||
.env.example | ||
.gitignore | ||
docker_build.sh | ||
docker-compose.yaml | ||
Dockerfile | ||
go.mod | ||
go.sum | ||
LICENSE | ||
main.go | ||
README.md |
Evolution Audio Converter
This project is a microservice in Go that processes audio files, converts them to opus or mp3 format, and returns both the duration of the audio and the converted file in base64. The service accepts audio files sent as form-data, base64, or URL.
Requirements
Before starting, you'll need to have the following installed:
- Go (version 1.21 or higher)
- Docker (to run the project in a container)
- FFmpeg (for audio processing)
Installation
Clone the Repository
Clone this repository to your local machine:
git clone https://github.com/EvolutionAPI/evolution-audio-converter.git
cd evolution-audio-converter
Install Dependencies
Install the project dependencies:
go mod tidy
Install FFmpeg
The service depends on FFmpeg to convert the audio. Make sure FFmpeg is installed on your system.
-
On Ubuntu:
sudo apt update sudo apt install ffmpeg
-
On macOS (via Homebrew):
brew install ffmpeg
-
On Windows, download FFmpeg here and add it to your system
PATH
.
Configuration
Create a .env
file in the project's root directory with the following configuration:
PORT=4040
API_KEY=your_secret_api_key_here
Transcription Configuration
To enable audio transcription, configure the following variables in the .env
file:
ENABLE_TRANSCRIPTION=true
TRANSCRIPTION_PROVIDER=openai # or groq
OPENAI_API_KEY=your_openai_key_here
GROQ_API_KEY=your_groq_key_here
TRANSCRIPTION_LANGUAGE=en # Default transcription language (optional)
ENABLE_TRANSCRIPTION
: Enables or disables the transcription featureTRANSCRIPTION_PROVIDER
: Chooses the AI provider for transcription (openai or groq)OPENAI_API_KEY
: Your OpenAI API key (required if using openai)GROQ_API_KEY
: Your Groq API key (required if using groq)TRANSCRIPTION_LANGUAGE
: Sets the default transcription language (optional)
Running the Project
Locally
To run the service locally, use the following command:
go run main.go -dev
The server will be available at http://localhost:4040
.
Using Docker
If you prefer to run the service in a Docker container, follow the steps below:
-
Build the Docker image:
docker build -t audio-service .
-
Run the container:
docker run -p 4040:4040 --env-file=.env audio-service
This will start the container on the port specified in the
.env
file.
How to Use
You can send POST
requests to the /process-audio
endpoint with an audio file in the following formats:
- Form-data (to upload files)
- Base64 (to send the audio encoded in base64)
- URL (to send the link to the audio file)
Authentication
All requests must include the apikey
header with the value of the API_KEY
configured in the .env
file.
Optional Parameters
format
: You can specify the format for conversion by passing theformat
parameter in the request. Supported values:mp3
ogg
(default)
Audio Transcription
You can get the audio transcription in two ways:
- Along with audio processing by adding the
transcribe=true
parameter:
curl -X POST -F "file=@audio.mp3" \
-F "transcribe=true" \
-F "language=en" \
http://localhost:4040/process-audio \
-H "apikey: your_secret_api_key_here"
- Using the specific transcription endpoint:
curl -X POST -F "file=@audio.mp3" \
-F "language=en" \
http://localhost:4040/transcribe \
-H "apikey: your_secret_api_key_here"
Optional parameters:
language
: Audio language code (e.g., "en", "es", "pt"). If not specified, it will use the value defined inTRANSCRIPTION_LANGUAGE
in.env
. If neither is defined, the system will try to automatically detect the language.
The response will include the transcription
field with the transcribed text:
{
"transcription": "Transcribed text here..."
}
When used with audio processing (/process-audio
), the response will include both audio data and transcription:
{
"duration": 120,
"audio": "UklGR... (base64 of the file)",
"format": "ogg",
"transcription": "Transcribed text here..."
}
Example Requests Using cURL
Sending as Form-data
curl -X POST -F "file=@path/to/audio.mp3" http://localhost:4040/process-audio \
-F "format=ogg" \
-H "apikey: your_secret_api_key_here"
Sending as Base64
curl -X POST -d "base64=$(base64 path/to/audio.mp3)" http://localhost:4040/process-audio \
-d "format=ogg" \
-H "apikey: your_secret_api_key_here"
Sending as URL
curl -X POST -d "url=https://example.com/path/to/audio.mp3" http://localhost:4040/process-audio \
-d "format=ogg" \
-H "apikey: your_secret_api_key_here"
Response
The response will be a JSON object containing the audio duration and the converted audio file in base64:
{
"duration": 120,
"audio": "UklGR... (base64 of the file)",
"format": "ogg"
}
duration
: The audio duration in seconds.audio
: The converted audio file encoded in base64.format
: The format of the converted file (mp3
orogg
).
License
This project is licensed under the MIT license.