Inference Logoinference.sh

App Code

Your app's logic lives in an entrypoint file — inference.py (Python) or src/inference.js (Node.js).


Structure

1from inferencesh import BaseApp, BaseAppInput, BaseAppOutput2from pydantic import Field34class AppSetup(BaseAppInput):5    model_id: str = Field(default="gpt2", description="Model to load")67class AppInput(BaseAppInput):8    # Define inputs here9    pass1011class AppOutput(BaseAppOutput):12    # Define outputs here13    pass1415class App(BaseApp):16    async def setup(self, config: AppSetup):17        # Runs once when worker starts or config changes18        pass1920    async def run(self, input_data: AppInput, metadata) -> AppOutput:21        # Runs for each request22        pass2324    async def unload(self):25        # Runs on shutdown26        pass

Defining inputs

1class AppInput(BaseAppInput):2    prompt: str = Field(description="What to generate")3    style: str = Field(default="modern", description="Style to use")4    count: int = Field(default=1, description="How many to generate")

Field types

TypePythonNode.js (Zod)
Textstrz.string()
Numberint / floatz.number()
Booleanboolz.boolean()
FileFilez.object({ path: z.string() })
OptionalOptional[T]z.string().optional()
ArrayList[T]z.array(z.string())
EnumLiteral["a", "b"]z.enum(["a", "b"])

Defining outputs

1class AppOutput(BaseAppOutput):2    result: str = Field(description="Generated text")3    image: File = Field(description="Generated image")

The run method

This is where your logic goes:

1async def run(self, input_data: AppInput, metadata) -> AppOutput:2    # Log progress3    metadata.log("Processing...")45    # Do work6    result = process(input_data.prompt)78    # Return output9    return AppOutput(result=result)

Setup for models

Load heavy resources in setup. Use setup schemas to define configurable parameters:

1class AppSetup(BaseAppInput):2    model_id: str = Field(default="gpt2", description="Model to load")3    precision: str = Field(default="fp16", description="Model precision")45class App(BaseApp):6    async def setup(self, config: AppSetup):7        from transformers import AutoModel8        self.model = AutoModel.from_pretrained(config.model_id)

This runs once per configuration. If setup values change between requests, the app re-initializes.


Multi-function apps

Apps can expose multiple functions, each with their own input/output types.

1from pydantic import BaseModel23class GreetInput(BaseModel):4    name: str = "World"56class GreetOutput(BaseModel):7    message: str89class ReverseInput(BaseModel):10    text: str1112class ReverseOutput(BaseModel):13    reversed_text: str1415class App:16    async def run(self, input_data: GreetInput) -> GreetOutput:17        return GreetOutput(message=f"Hello, {input_data.name}!")1819    async def reverse(self, input_data: ReverseInput) -> ReverseOutput:20        return ReverseOutput(reversed_text=input_data.text[::-1])

Python: Functions are discovered automatically if they have type hints using Pydantic models.

Node.js: Functions are discovered automatically if matching {PascalName}Input and {PascalName}Output Zod schemas are exported.

Calling functions

bash
1curl -X POST https://api.inference.sh/v1/apps/{app_id}/run \2  -d '{"function": "reverse", "input": {"text": "hello"}}'

Working with files

Input files are downloaded for you:

1image_path = input_data.image.path

Output files are uploaded for you:

1return AppOutput(image=File(path="/tmp/output.png"))

Next

Configuration

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.