Version History¶

VTC versions¶

	VTC 1.0 (2020)	VTC 1.5 (2025)	VTC 2.0 (2025)
Architecture	PyanNet	Whisper-based	BabyHuBERT
Average F1	50.9%	53.6%	64.6%
Labels	CHI(KCHI), OCH, MAL, FEM, SPEECH	KCHI, OCH, MAL, FEM	KCHI, OCH, MAL, FEM
Python	3.7+ (conda)	3.13+ (uv)	3.13+ (uv)
Repository	MarvinLvn/voice-type-classifier	LAAC-LSCP/VTC-IS-25	LAAC-LSCP/VTC

VTC 2.0 uses BabyHuBERT, a self-supervised model trained specifically on child-centered audio. The biggest accuracy gains are on OCH (+21 points) and MAL (+19 points vs. v1.0).

Migrating from VTC 1.0¶

The CHI label (combining KCHI + OCH) and SPEECH label no longer exist. Combine KCHI and OCH in your scripts if needed.
Output format is the same (RTTM), with CSV additionally provided.
VTC 2.0 requires a fresh install — it cannot run in a VTC 1.0 conda environment.

VTC vs. LENA¶

	VTC 2.0	LENA
Cost	Free, open-source	Commercial
Hardware	Any Unix machine	Requires LENA recorder
Speaker classes	KCHI, OCH, MAL, FEM	CHN, CXN, MAN, FAN, + others
Transparency	Code and weights available	Proprietary
Input	Any WAV audio	LENA .its files
Customizable	Yes (fine-tuning)	No

VTC is not a drop-in replacement for LENA. Categories are similar but not identical. See the ExELang book for detailed accuracy comparisons.