Skip to content

Getting Started

Requirements

  • OS: Linux or macOS (Windows is not supported)
  • Python: 3.13+
  • System tools: uv, ffmpeg, git-lfs

Installation

Install system dependencies, then clone and set up VTC:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ffmpeg and git-lfs (Ubuntu/Debian)
sudo apt install ffmpeg git-lfs

# Clone the repo (--recurse-submodules is required for model weights)
git lfs install
git clone --recurse-submodules https://github.com/LAAC-LSCP/VTC.git
cd VTC

# Install Python dependencies
uv sync

# Verify everything is set up
./check_sys_dependencies.sh
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ffmpeg and git-lfs (MacOS)
brew install ffmpeg git-lfs

# Clone the repo (--recurse-submodules is required for model weights)
git lfs install
git clone --recurse-submodules https://github.com/LAAC-LSCP/VTC.git
cd VTC

# Install Python dependencies
uv sync

# Verify everything is set up
./check_sys_dependencies.sh

Don't skip --recurse-submodules

Without this flag, model weights won't be downloaded and VTC will fail.

Prepare your audio

Before running VTC, it helps to organize your files in a clear structure. Here is a recommended layout:

my_project/
├── audio/               # Your WAV files go here
│   ├── child01_day1.wav
│   ├── child01_day2.wav
│   └── child02_day1.wav
└── output/              # VTC will write results here
    ├── rttm/
    │   ├── child01_day1.rttm
    │   └── ...
    └── rttm.csv

Place all of your .wav files inside a single folder (e.g., audio/). When you run VTC, you will point it to this folder, and VTC will process every .wav file it finds inside. The results will be written to a separate output folder that VTC creates for you.

Subfolders

If your audio files are organized in subfolders (e.g., one subfolder per child), you can use the --recursive_search flag to tell VTC to look inside subfolders. See the Command Line Interface Arguments for details.

Audio format requirements

VTC expects WAV files sampled at 16 kHz and with a single channel (mono). You can check your files with:

ffprobe your_recording.wav

If your audio needs conversion, use the included script or ffmpeg directly. Both will resample to 16 kHz and average across channels to produce a single mono file.

# Using the provided script
uv run scripts/convert.py --input /path/to/raw_audio --output /path/to/converted

# Or manually with ffmpeg (works with MP3, FLAC, M4A, etc.)
ffmpeg -i input.mp3 -acodec pcm_s16le -ar 16000 -ac 1 output.wav

Common errors

If you run into problems during installation or when running VTC, check the table below for quick fixes. A more detailed list is available on the Troubleshooting page.

Problem Likely cause Fix
uv: command not found uv is not installed For detailed installation instructions, see the uv installation guide
ffmpeg: command not found ffmpeg is not installed sudo apt install ffmpeg (Linux) or brew install ffmpeg (macOS)
Model weights missing Cloned without --recurse-submodules Run git lfs install && git submodule update --init --recursive
CUDA out of memory Batch size is too large for your GPU Add --batch_size 64 (or lower) to your command, or use --device cpu
No .wav files found Wrong folder or wrong audio format Make sure the --wavs path points to a folder containing .wav files