Your app's logic lives in an entrypoint file — inference.py (Python) or src/inference.js (Node.js).
Structure
1from inferencesh import BaseApp, BaseAppInput, BaseAppOutput2from pydantic import Field34class AppSetup(BaseAppInput):5 model_id: str = Field(default="gpt2", description="Model to load")67class AppInput(BaseAppInput):8 # Define inputs here9 pass1011class AppOutput(BaseAppOutput):12 # Define outputs here13 pass1415class App(BaseApp):16 async def setup(self, config: AppSetup):17 # Runs once when worker starts or config changes18 pass1920 async def run(self, input_data: AppInput, metadata) -> AppOutput:21 # Runs for each request22 pass2324 async def unload(self):25 # Runs on shutdown26 passDefining inputs
1class AppInput(BaseAppInput):2 prompt: str = Field(description="What to generate")3 style: str = Field(default="modern", description="Style to use")4 count: int = Field(default=1, description="How many to generate")Field types
| Type | Python | Node.js (Zod) |
|---|---|---|
| Text | str | z.string() |
| Number | int / float | z.number() |
| Boolean | bool | z.boolean() |
| File | File | z.object({ path: z.string() }) |
| Optional | Optional[T] | z.string().optional() |
| Array | List[T] | z.array(z.string()) |
| Enum | Literal["a", "b"] | z.enum(["a", "b"]) |
LLM chat inputs (Python)
Model apps that accept chat-style prompts can use SDK types from inferencesh.models instead of defining every field by hand.
New apps (v0.7.9+): subclass ChatInput and put sampling parameters in model_settings:
1from inferencesh.models import ChatInput, ModelSettings, LLMOutput23class AppInput(ChatInput):4 pass56# Callers pass JSON like:7# {"text": "...", "model_settings": {"temperature": 0.7, "max_tokens": 1024}}ModelSettings fields are all optional. Omitted fields are not sent to the provider, so each model app can define its own defaults.
Existing apps: LLMInput still exposes flat temperature, top_p, and max_tokens on the input object. Migrate to ChatInput when you want a single nested object in the generated schema.
To add only model_settings to a custom input class, mix in ModelSettingsCapabilityMixin from inferencesh.models. Image, file, reasoning, and tool-calling fields use the other capability mixins in inferencesh.models.llm.
→ Python SDK: App authoring (LLM inputs)
Defining outputs
1class AppOutput(BaseAppOutput):2 result: str = Field(description="Generated text")3 image: File = Field(description="Generated image")The run method
This is where your logic goes:
1async def run(self, input_data: AppInput, metadata) -> AppOutput:2 # Log progress3 metadata.log("Processing...")45 # Do work6 result = process(input_data.prompt)78 # Return output9 return AppOutput(result=result)Setup for models
Load heavy resources in setup. Use setup schemas to define configurable parameters:
1class AppSetup(BaseAppInput):2 model_id: str = Field(default="gpt2", description="Model to load")3 precision: str = Field(default="fp16", description="Model precision")45class App(BaseApp):6 async def setup(self, config: AppSetup):7 from transformers import AutoModel8 self.model = AutoModel.from_pretrained(config.model_id)This runs once per configuration. If setup values change between requests, the app re-initializes.
Multi-function apps
Apps can expose multiple functions, each with their own input/output types.
1from pydantic import BaseModel23class GreetInput(BaseModel):4 name: str = "World"56class GreetOutput(BaseModel):7 message: str89class ReverseInput(BaseModel):10 text: str1112class ReverseOutput(BaseModel):13 reversed_text: str1415class App:16 async def run(self, input_data: GreetInput) -> GreetOutput:17 return GreetOutput(message=f"Hello, {input_data.name}!")1819 async def reverse(self, input_data: ReverseInput) -> ReverseOutput:20 return ReverseOutput(reversed_text=input_data.text[::-1])Python: Functions are discovered automatically if they have type hints using Pydantic models.
Node.js: Functions are discovered automatically if matching {PascalName}Input and {PascalName}Output Zod schemas are exported.
Calling functions
1curl -X POST https://api.inference.sh/apps/run \2 -H "Authorization: Bearer inf_your_key" \3 -d '{"app": "namespace/name", "function": "reverse", "input": {"text": "hello"}}'Working with files
Input files are downloaded for you:
1image_path = input_data.image.pathOutput files are uploaded for you:
1return AppOutput(image=File(path="/tmp/output.png"))LLM chat inputs (Python)
For text and chat model apps, use ChatInput and ModelSettings from inferencesh.models (SDK v0.7.9+). Sampling parameters live in a nested model_settings object instead of flat fields on the input schema.
1from inferencesh import BaseApp2from inferencesh.models import ChatInput, LLMOutput34class MyInput(ChatInput):5 pass67class App(BaseApp):8 async def run(self, input_data: MyInput, metadata) -> LLMOutput:9 settings = input_data.model_settings10 temperature = settings.temperature if settings else None11 ...Callers can override generation per request:
1{2 "text": "Summarize this release",3 "model_settings": {4 "temperature": 0.3,5 "max_tokens": 10246 }7}| Type | When to use |
|---|---|
ChatInput | New chat/LLM apps — nested model_settings, context, system_prompt |
LLMInput / BaseLLMInput | Legacy apps with flat temperature, top_p, max_tokens on the input |
ModelSettingsCapabilityMixin | Custom inputs that need model_settings without full ChatInput |
→ Python SDK: App authoring (LLM inputs)