Inference Logoinference.sh

App Code

Your app's logic lives in an entrypoint file — inference.py (Python) or src/inference.js (Node.js).


Structure

1from inferencesh import BaseApp, BaseAppInput, BaseAppOutput2from pydantic import Field34class AppSetup(BaseAppInput):5    model_id: str = Field(default="gpt2", description="Model to load")67class AppInput(BaseAppInput):8    # Define inputs here9    pass1011class AppOutput(BaseAppOutput):12    # Define outputs here13    pass1415class App(BaseApp):16    async def setup(self, config: AppSetup):17        # Runs once when worker starts or config changes18        pass1920    async def run(self, input_data: AppInput, metadata) -> AppOutput:21        # Runs for each request22        pass2324    async def unload(self):25        # Runs on shutdown26        pass

Defining inputs

1class AppInput(BaseAppInput):2    prompt: str = Field(description="What to generate")3    style: str = Field(default="modern", description="Style to use")4    count: int = Field(default=1, description="How many to generate")

Field types

TypePythonNode.js (Zod)
Textstrz.string()
Numberint / floatz.number()
Booleanboolz.boolean()
FileFilez.object({ path: z.string() })
OptionalOptional[T]z.string().optional()
ArrayList[T]z.array(z.string())
EnumLiteral["a", "b"]z.enum(["a", "b"])

LLM chat inputs (Python)

Model apps that accept chat-style prompts can use SDK types from inferencesh.models instead of defining every field by hand.

New apps (v0.7.9+): subclass ChatInput and put sampling parameters in model_settings:

python
1from inferencesh.models import ChatInput, ModelSettings, LLMOutput23class AppInput(ChatInput):4    pass56# Callers pass JSON like:7# {"text": "...", "model_settings": {"temperature": 0.7, "max_tokens": 1024}}

ModelSettings fields are all optional. Omitted fields are not sent to the provider, so each model app can define its own defaults.

Existing apps: LLMInput still exposes flat temperature, top_p, and max_tokens on the input object. Migrate to ChatInput when you want a single nested object in the generated schema.

To add only model_settings to a custom input class, mix in ModelSettingsCapabilityMixin from inferencesh.models. Image, file, reasoning, and tool-calling fields use the other capability mixins in inferencesh.models.llm.

Python SDK: App authoring (LLM inputs)


Defining outputs

1class AppOutput(BaseAppOutput):2    result: str = Field(description="Generated text")3    image: File = Field(description="Generated image")

The run method

This is where your logic goes:

1async def run(self, input_data: AppInput, metadata) -> AppOutput:2    # Log progress3    metadata.log("Processing...")45    # Do work6    result = process(input_data.prompt)78    # Return output9    return AppOutput(result=result)

Setup for models

Load heavy resources in setup. Use setup schemas to define configurable parameters:

1class AppSetup(BaseAppInput):2    model_id: str = Field(default="gpt2", description="Model to load")3    precision: str = Field(default="fp16", description="Model precision")45class App(BaseApp):6    async def setup(self, config: AppSetup):7        from transformers import AutoModel8        self.model = AutoModel.from_pretrained(config.model_id)

This runs once per configuration. If setup values change between requests, the app re-initializes.


Multi-function apps

Apps can expose multiple functions, each with their own input/output types.

1from pydantic import BaseModel23class GreetInput(BaseModel):4    name: str = "World"56class GreetOutput(BaseModel):7    message: str89class ReverseInput(BaseModel):10    text: str1112class ReverseOutput(BaseModel):13    reversed_text: str1415class App:16    async def run(self, input_data: GreetInput) -> GreetOutput:17        return GreetOutput(message=f"Hello, {input_data.name}!")1819    async def reverse(self, input_data: ReverseInput) -> ReverseOutput:20        return ReverseOutput(reversed_text=input_data.text[::-1])

Python: Functions are discovered automatically if they have type hints using Pydantic models.

Node.js: Functions are discovered automatically if matching {PascalName}Input and {PascalName}Output Zod schemas are exported.

Calling functions

bash
1curl -X POST https://api.inference.sh/apps/run \2  -H "Authorization: Bearer inf_your_key" \3  -d '{"app": "namespace/name", "function": "reverse", "input": {"text": "hello"}}'

Working with files

Input files are downloaded for you:

1image_path = input_data.image.path

Output files are uploaded for you:

1return AppOutput(image=File(path="/tmp/output.png"))

LLM chat inputs (Python)

For text and chat model apps, use ChatInput and ModelSettings from inferencesh.models (SDK v0.7.9+). Sampling parameters live in a nested model_settings object instead of flat fields on the input schema.

python
1from inferencesh import BaseApp2from inferencesh.models import ChatInput, LLMOutput34class MyInput(ChatInput):5    pass67class App(BaseApp):8    async def run(self, input_data: MyInput, metadata) -> LLMOutput:9        settings = input_data.model_settings10        temperature = settings.temperature if settings else None11        ...

Callers can override generation per request:

json
1{2  "text": "Summarize this release",3  "model_settings": {4    "temperature": 0.3,5    "max_tokens": 10246  }7}
TypeWhen to use
ChatInputNew chat/LLM apps — nested model_settings, context, system_prompt
LLMInput / BaseLLMInputLegacy apps with flat temperature, top_p, max_tokens on the input
ModelSettingsCapabilityMixinCustom inputs that need model_settings without full ChatInput

Python SDK: App authoring (LLM inputs)


Next

Configuration

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.