![]() - Enhanced .env.example to include S3 storage configuration options. - Updated main.go to initialize S3 client and handle audio uploads to S3. - Modified processAudio function to return S3 URL when storage is enabled. - Updated README.md with new S3 storage instructions and examples. |
||
---|---|---|
.dockerignore | ||
.env.example | ||
.gitignore | ||
docker_build.sh | ||
docker-compose.yaml | ||
Dockerfile | ||
go.mod | ||
go.sum | ||
LICENSE | ||
main.go | ||
README.md |
Evolution Audio Converter
This project is a microservice in Go that processes audio files, converts them to opus or mp3 format, and returns both the duration of the audio and the converted file (as base64 or S3 URL). The service accepts audio files sent as form-data, base64, or URL.
Requirements
Before starting, you'll need to have the following installed:
- Go (version 1.21 or higher)
- Docker (to run the project in a container)
- FFmpeg (for audio processing)
Installation
Clone the Repository
Clone this repository to your local machine:
git clone https://github.com/EvolutionAPI/evolution-audio-converter.git
cd evolution-audio-converter
Install Dependencies
Install the project dependencies:
go mod tidy
Install FFmpeg
The service depends on FFmpeg to convert the audio. Make sure FFmpeg is installed on your system.
-
On Ubuntu:
sudo apt update sudo apt install ffmpeg
-
On macOS (via Homebrew):
brew install ffmpeg
-
On Windows, download FFmpeg here and add it to your system
PATH
.
Configuration
Create a .env
file in the project's root directory. Here are the available configuration options:
Basic Configuration
PORT=4040
API_KEY=your_secret_api_key_here
Transcription Configuration
ENABLE_TRANSCRIPTION=true
TRANSCRIPTION_PROVIDER=openai # or groq
OPENAI_API_KEY=your_openai_key_here
GROQ_API_KEY=your_groq_key_here
TRANSCRIPTION_LANGUAGE=en # Default transcription language (optional)
Storage Configuration
ENABLE_S3_STORAGE=true
S3_ENDPOINT=play.min.io
S3_ACCESS_KEY=your_access_key_here
S3_SECRET_KEY=your_secret_key_here
S3_BUCKET_NAME=audio-files
S3_REGION=us-east-1
S3_USE_SSL=true
S3_URL_EXPIRATION=24h
Storage Options
The service supports two storage modes for the converted audio:
- Base64 (default): Returns the audio file encoded in base64 format
- S3 Compatible Storage: Uploads to S3-compatible storage (AWS S3, MinIO, etc.) and returns a presigned URL
When S3 storage is enabled, the response will include a url
instead of the audio
field:
{
"duration": 120,
"format": "ogg",
"url": "https://your-s3-endpoint/bucket/file.ogg?signature...",
"transcription": "Transcribed text here..." // if transcription was requested
}
If S3 upload fails, the service automatically falls back to base64 encoding.
Running the Project
Locally
To run the service locally:
go run main.go -dev
The server will be available at http://localhost:4040
.
Using Docker
-
Build the Docker image:
docker build -t audio-service .
-
Run the container:
docker run -p 4040:4040 --env-file=.env audio-service
API Usage
Authentication
All requests must include the apikey
header with your API key.
Endpoints
Process Audio
POST /process-audio
Accepts audio files in these formats:
- Form-data
- Base64
- URL
Optional parameters:
format
: Output format (mp3
orogg
, default:ogg
)transcribe
: Enable transcription (true
orfalse
)language
: Transcription language code (e.g., "en", "es", "pt")
Transcribe Only
POST /transcribe
Transcribes audio without format conversion.
Optional parameters:
language
: Transcription language code
Example Requests
Form-data Upload
curl -X POST -F "file=@audio.mp3" \
-F "format=ogg" \
-F "transcribe=true" \
-F "language=en" \
http://localhost:4040/process-audio \
-H "apikey: your_secret_api_key_here"
Base64 Upload
curl -X POST \
-d "base64=$(base64 audio.mp3)" \
-d "format=ogg" \
http://localhost:4040/process-audio \
-H "apikey: your_secret_api_key_here"
URL Upload
curl -X POST \
-d "url=https://example.com/audio.mp3" \
-d "format=ogg" \
http://localhost:4040/process-audio \
-H "apikey: your_secret_api_key_here"
Response Format
With S3 storage disabled (default):
{
"duration": 120,
"audio": "UklGR... (base64 of the file)",
"format": "ogg",
"transcription": "Transcribed text here..." // if requested
}
With S3 storage enabled:
{
"duration": 120,
"url": "https://your-s3-endpoint/bucket/file.ogg?signature...",
"format": "ogg",
"transcription": "Transcribed text here..." // if requested
}
License
This project is licensed under the MIT license.