VoxCPM2 is the next-generation open-source audio model of the @MiniCPM family, and it fully continues their signature trait of incredible “capacity density” – packing all these features into a model that only 2b Parameters!
Despite its extremely compact size, the feature set it brings to the table is quite rare for an open-source release:
- voice design: Instead of looking for the right reference audio to clone, you can directly give the signal to the model (for example, (a young female, gentle and sweet voice) Hello World.). This instantly produces a completely unique sound.
-
Original 48kHz output: It has a built-in super-resolution VAE, which means no external upsampler is needed to get studio-quality audio.
-
controlled voice cloning: You can clone the voice from a short clip, but still control the emotion, pace, and style using text prompts.
-
ready for production: It hits an RTF of ~0.13 for real-time streaming and is completely open-source under the Apache-2.0 license.
It’s incredibly refreshing to see this level of controllable, high-fidelity audio in an open-source ecosystem in such a lightweight package.
Try it here!
<a href