apps/inworld/voice-cloning

voice-cloning

Clone a voice from 5-15 seconds of audio using Inworld instant voice cloning. Use the cloned voice ID with any Inworld TTS model.

run in browser run via API

run with your agent

# install belt

$curl -fsSL https://cli.inference.sh | sh

# view schema & details

$belt app get inworld/voice-cloning

# run

$belt app run inworld/voice-cloning

api reference

about

clone a voice from 5-15 seconds of audio using inworld instant voice cloning. use the cloned voice id with any inworld tts model.

1. calling the api

install the client

the client provides a convenient way to interact with the api.

bash

1pip install inferencesh

setup your api key

set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.

bash

1export INFERENCE_API_KEY="inf_your_key"

run and get result

submit a request and wait for the final result. best for batch processing or when you don't need progress updates.

python

1from inferencesh import inference23client = inference()456result = client.run({7        "app": "inworld/voice-cloning",8        "input": {}9    })1011print(result["output"])

stream live updates

get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.

python

1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8        "app": "inworld/voice-cloning",9        "input": {}10    }, stream=True):11    if update.get("progress"):12        print(f"progress: {update['progress']}%")13    if update.get("output"):14        print(f"output: {update['output']}")

2. authentication

the api uses api keys for authentication. see the authentication docs for detailed setup instructions.

3. files

file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.

automatic upload

the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.

python

1# local file paths are automatically uploaded2result = client.run({3    "app": "inworld/voice-cloning",4    "input": {5        "image": "/path/to/local/image.png",  # detected & uploaded6        "audio": "https://example.com/audio.mp3",  # url passed through7    }8})

manual upload

you can also upload files manually and use the returned url.

python

1# upload and get a hosted URL2file = client.files.upload("/path/to/file.png")3print(file.uri)  # https://cloud.inference.sh/...

4. webhooks

get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.

python

1result = client.run({2    "app": "inworld/voice-cloning",3    "input": {},4    "webhook": "https://your-server.com/webhook"5}, wait=False)

webhook payload

your endpoint receives a JSON POST with the task result:

json

1{2  "id": "task_abc123",3  "status": 9,4  "output": { ... },5  "error": "",6  "session_id": null,7  "created_at": "2024-01-15T10:30:00Z",8  "updated_at": "2024-01-15T10:30:05Z"9}

idstring— task id

statusnumber— terminal status (9=completed, 10=failed, 11=cancelled)

outputobject— task output (when completed)

errorstring— error message (when failed)

session_idstring— session id (if using sessions)

created_atstring— iso timestamp

updated_atstring— iso timestamp

5. schema

input

audiostring(file)*

audio sample of the voice to clone (wav or mp3, 5-15 seconds). longer clips are cut off at 15s.

display_namestring*

name for the cloned voice (e.g. 'my custom voice').

languagestring

language of the voice sample.

default: "EN_US"

options:"EN_US""ZH_CN""KO_KR""JA_JP""RU_RU""AUTO""IT_IT""ES_ES""PT_BR""DE_DE""FR_FR""AR_SA""PL_PL""NL_NL""HI_IN""HE_IL"

descriptionstring

optional description of the voice (e.g. 'warm female narrator with british accent').

remove_background_noiseboolean

remove background noise from the audio sample before cloning.

default: true

preview_textstring

optional text to generate a preview with the cloned voice. if provided, returns a preview audio file.

maxLength:2000

output

display_namestring*

display name of the cloned voice

languagestring*

language code of the cloned voice

previewstring(file)

preview audio generated with the cloned voice (if preview_text was provided)

voice_idstring*

the cloned voice id — use this with any inworld tts model

ready to run voice-cloning?

try in browser browse all tools

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.