Convert speech to text with Whisper.
Using the Grid
- Go to Apps
- Find
whisper - Upload an audio file
- Click Run
Supported formats: MP3, WAV, M4A, FLAC, OGG
With Python SDK
python
1from inferencesh import inference2 3client = inference(api_key="inf_your_key")4 5result = client.run({6 "app": "infsh/whisper",7 "input": {8 "audio": "/path/to/audio.mp3" # Uploaded automatically9 }10})11 12print(result["output"]["text"])With Timestamps
Get word-level or segment-level timestamps:
python
1result = client.run({2 "app": "infsh/whisper",3 "input": {4 "audio": "/path/to/audio.mp3",5 "timestamps": "word" # or "segment"6 }7})8 9for segment in result["output"]["segments"]:10 print(f"[{segment['start']:.1f}s] {segment['text']}")Output:
code
1[0.0s] Welcome to today's episode.2[2.1s] We're going to discuss AI agents.3[4.5s] Let's get started.With Translation
Translate audio to English:
python
1result = client.run({2 "app": "infsh/whisper",3 "input": {4 "audio": "/path/to/french_audio.mp3",5 "task": "translate" # Translates to English6 }7})With Agent
Create an agent with whisper as a tool:
code
1You: Transcribe this podcast episode2 3Agent: I'll transcribe that audio for you.4 [Calling whisper...]5 6 Here's the transcription:7 8 "Welcome to the AI Weekly podcast. Today we're 9 discussing the latest developments in generative AI..."10 11 The full transcription is 2,450 words. Would you like12 me to summarize it?code
1 2---3 4## Model Selection5 6Use **setup parameters** to choose a specific Whisper model size:7 8```python9client.run({10 "app": "infsh/whisper",11 "setup": {12 "model_size": "large-v3" # tiny, base, small, medium, large, large-v313 },14 "input": { ... }15})Options
| Option | Values | Description |
|---|---|---|
task | transcribe, translate | Transcribe or translate to English |
language | en, es, fr, etc. | Force language detection |
timestamps | none, segment, word | Timestamp granularity |
Tips
For better accuracy:
- Use high-quality audio
- Remove background noise if possible
- Specify language if known
For long files:
- Whisper handles files up to ~2 hours
- Very long files are automatically chunked