Scenema Audio

Zero-shot expressive voice cloning and speech generation. Describe how a voice sounds and feels, write what it should say, and the model generates a full vocal performance.

Built by Scenema AI, the AI filmmaking platform. GitHub | Demos & Samples

Voice Description

Speech Text

Language

The model has only been tested with a limited set of languages. The language tag here is used for Whisper validation.

Scene (optional)

Shot Mode

Seed

Pace

0.5 3

Background SFX

Whisper Validation

Skip Voice Conversion

Output

Metadata

Generated XML

Preset Prompts