Your app's logic lives in an entrypoint file — inference.py (Python) or src/inference.js (Node.js).

Structure

1from inferencesh import BaseApp, BaseAppInput, BaseAppOutput2from pydantic import Field34class AppSetup(BaseAppInput):5    model_id: str = Field(default="gpt2", description="Model to load")67class AppInput(BaseAppInput):8    # Define inputs here9    pass1011class AppOutput(BaseAppOutput):12    # Define outputs here13    pass1415class App(BaseApp):16    async def setup(self, config: AppSetup):17        # Runs once when worker starts or config changes18        pass1920    async def run(self, input_data: AppInput, metadata) -> AppOutput:21        # Runs for each request22        pass2324    async def unload(self):25        # Runs on shutdown26        pass

Defining inputs

1class AppInput(BaseAppInput):2    prompt: str = Field(description="What to generate")3    style: str = Field(default="modern", description="Style to use")4    count: int = Field(default=1, description="How many to generate")

Field types

Type	Python	Node.js (Zod)
Text	`str`	`z.string()`
Number	`int` / `float`	`z.number()`
Boolean	`bool`	`z.boolean()`
File	`File`	`z.object({ path: z.string() })`
Optional	`Optional[T]`	`z.string().optional()`
Array	`List[T]`	`z.array(z.string())`
Enum	`Literal["a", "b"]`	`z.enum(["a", "b"])`

Defining outputs

1class AppOutput(BaseAppOutput):2    result: str = Field(description="Generated text")3    image: File = Field(description="Generated image")

The run method

This is where your logic goes:

1async def run(self, input_data: AppInput, metadata) -> AppOutput:2    # Log progress3    metadata.log("Processing...")45    # Do work6    result = process(input_data.prompt)78    # Return output9    return AppOutput(result=result)

Setup for models

Load heavy resources in setup. Use setup schemas to define configurable parameters:

1class AppSetup(BaseAppInput):2    model_id: str = Field(default="gpt2", description="Model to load")3    precision: str = Field(default="fp16", description="Model precision")45class App(BaseApp):6    async def setup(self, config: AppSetup):7        from transformers import AutoModel8        self.model = AutoModel.from_pretrained(config.model_id)

This runs once per configuration. If setup values change between requests, the app re-initializes.

Multi-function apps

Apps can expose multiple functions, each with their own input/output types.

1from pydantic import BaseModel23class GreetInput(BaseModel):4    name: str = "World"56class GreetOutput(BaseModel):7    message: str89class ReverseInput(BaseModel):10    text: str1112class ReverseOutput(BaseModel):13    reversed_text: str1415class App:16    async def run(self, input_data: GreetInput) -> GreetOutput:17        return GreetOutput(message=f"Hello, {input_data.name}!")1819    async def reverse(self, input_data: ReverseInput) -> ReverseOutput:20        return ReverseOutput(reversed_text=input_data.text[::-1])

Python: Functions are discovered automatically if they have type hints using Pydantic models.

Node.js: Functions are discovered automatically if matching {PascalName}Input and {PascalName}Output Zod schemas are exported.

Calling functions

bash

1curl -X POST https://api.inference.sh/v1/apps/{app_id}/run \2  -d '{"function": "reverse", "input": {"text": "hello"}}'

Working with files

Input files are downloaded for you:

1image_path = input_data.image.path

Output files are uploaded for you:

1return AppOutput(image=File(path="/tmp/output.png"))

→ Configuration

App Code