Audio Transcription

Convert speech to text with Whisper.

Using the Grid

Go to Apps
Find whisper
Upload an audio file
Click Run

Supported formats: MP3, WAV, M4A, FLAC, OGG

With Python SDK

python

1from inferencesh import inference23client = inference(api_key="inf_your_key")45result = client.run({6    "app": "infsh/whisper",7    "input": {8        "audio": "/path/to/audio.mp3"  # Uploaded automatically9    }10})1112print(result["output"]["text"])

With Timestamps

Get word-level or segment-level timestamps:

python

1result = client.run({2    "app": "infsh/whisper",3    "input": {4        "audio": "/path/to/audio.mp3",5        "timestamps": "word"  # or "segment"6    }7})89for segment in result["output"]["segments"]:10    print(f"[{segment['start']:.1f}s] {segment['text']}")

Output:

code

1[0.0s] Welcome to today's episode.2[2.1s] We're going to discuss AI agents.3[4.5s] Let's get started.

With Translation

Translate audio to English:

python

1result = client.run({2    "app": "infsh/whisper",3    "input": {4        "audio": "/path/to/french_audio.mp3",5        "task": "translate"  # Translates to English6    }7})

With Agent

Create an agent with whisper as a tool:

code

1You: Transcribe this podcast episode23Agent: I'll transcribe that audio for you.4       [Calling whisper...]5       6       Here's the transcription:7       8       "Welcome to the AI Weekly podcast. Today we're 9       discussing the latest developments in generative AI..."10       11       The full transcription is 2,450 words. Would you like12       me to summarize it?

code

12---34## Model Selection56Use **setup parameters** to choose a specific Whisper model size:78```python9client.run({10    "app": "infsh/whisper",11    "setup": {12        "model_size": "large-v3"  # tiny, base, small, medium, large, large-v313    },14    "input": { ... }15})

Options

Option	Values	Description
`task`	`transcribe`, `translate`	Transcribe or translate to English
`language`	`en`, `es`, `fr`, etc.	Force language detection
`timestamps`	`none`, `segment`, `word`	Timestamp granularity

Tips

For better accuracy:

Use high-quality audio
Remove background noise if possible
Specify language if known

For long files:

Whisper handles files up to ~2 hours
Very long files are automatically chunked

→ Content Pipeline

previousimage generation nextcontent pipeline

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.

Using the Grid

With Python SDK

With Timestamps

With Translation

With Agent

Options

Tips

Next