Most agent tools run on servers. The agent requests an action, the server executes it, results return to the agent. But some operations need to happen where the user is - accessing local files, using the camera, interacting with browser storage, or processing data that should not leave the user's device. Client-side tools flip the execution model: the agent requests an operation, but the user's browser or application fulfills it. This pattern enables capabilities that server-side tools cannot provide while keeping sensitive data under user control.

Why Client-Side Matters

Consider an agent helping a user analyze documents. With server-side tools only, the user must upload every document to the server. The files travel across the network, sit in server storage, and get processed remotely. For some documents, this is fine. For sensitive documents - financial records, medical files, proprietary business data - uploading to external servers might be unacceptable.

Client-side document processing changes this. The user selects files locally. Processing happens in the browser. Only results the user approves leave their device. The full documents never travel to the server. Sensitivity concerns that would block server-side processing disappear.

Or consider an agent that helps with photo editing. Camera access, photo library access, real-time preview - these are naturally client-side operations. The agent could request the server to access the user's camera, but that makes no sense. The camera is attached to the user's device. Client-side tools let the agent request camera capture, receiving the resulting image for subsequent processing.

Local resources, privacy requirements, and device capabilities all create needs that server-side tools cannot address. Client-side tools fill this gap.

How Client-Side Tools Work

The flow inverts the typical tool execution pattern.

With server-side tools: agent requests action, server executes action, server returns result to agent.

With client-side tools: agent requests action, request goes to client, client executes action locally, client returns result to agent.

The agent does not distinguish between these at the request level. It calls a tool with parameters, expecting a result. The difference is where execution happens. For server-side tools, the platform handles execution. For client-side tools, the client application handles execution.

The client application - whether a web interface, mobile app, or desktop application - must implement the client-side tools it supports. When a request arrives for a client-side tool, the application performs the operation locally and sends back results. The agent receives results the same way it would from server-side tools.

This means client-side capability depends on what the client implements. A sophisticated web application might support many client-side tools. A simple chat widget might support none. The platform defines the protocol; clients implement according to their capabilities.

Common Client-Side Capabilities

Several categories of operations naturally belong on the client side.

File access lets agents work with local files. Rather than requiring uploads, the agent requests file access, the user selects files through a standard file picker, and the client provides file contents or metadata. The user stays in control of which files to share. Nothing uploads until explicitly selected.

Camera and media capture accesses device cameras and microphones. An agent helping with visual tasks can request a photo. The user's device captures it locally. The resulting image becomes available to the agent. Video recording, audio capture, and screen capture follow similar patterns.

Local storage access reads data the application has stored locally. Session data, cached information, user preferences - if the client application stores it, client-side tools can access it. This enables personalization and context that persists across sessions.

Clipboard operations read from or write to the system clipboard. An agent might request clipboard contents to understand what the user is working with. Or it might write results to the clipboard for easy pasting elsewhere.

Device sensors access accelerometer, GPS, ambient light, and other sensors when relevant. A mobile agent application might use location for local recommendations or motion data for activity-aware features.

Local processing performs computation on the client rather than the server. Image manipulation, text extraction, format conversion - operations that can happen in the browser might be faster and more private than round-tripping to a server.

The specific capabilities depend on what the client platform supports and what makes sense for the use case.

The Security Model

Client-side tools give users control that server-side tools do not.

Explicit selection means users choose what to share. When an agent requests file access, the user picks which files through a standard picker. The agent cannot access arbitrary local files - only what the user explicitly provides. This is fundamentally different from server-side file access where uploaded files are immediately available.

Local processing means sensitive data can stay local. When processing happens client-side, raw data does not travel to servers. Only results that the user approves leave the device. This enables working with sensitive materials that could not otherwise be processed.

Permission prompts gate sensitive capabilities. Camera access, microphone access, location access - these trigger permission prompts that the user must approve. The agent cannot silently access device capabilities. Users remain in control.

Visibility shows users what is happening. When an agent requests client-side operations, the interface should show what is being requested and what data is involved. Users can see and control the information flow.

This security model makes client-side tools suitable for scenarios where trust requirements would block server-side processing. The user is not trusting the server with their data; they are processing their data locally and choosing what to share.

Implementation Considerations

Building client-side tool support requires attention to several factors.

Capability detection determines what the client supports. Not all clients support all client-side tools. The platform needs to know what the current client can do so agents can make appropriate tool choices. Capability negotiation at connection time enables this.

Timeout handling accounts for user interaction time. Server-side tools have predictable execution times. Client-side tools often require user interaction - selecting files, approving permissions, capturing photos. Timeouts must be generous enough for human response while still preventing indefinite waits.

Error handling covers client-side failure modes. The user might deny a permission request, cancel a file selection, or have a capability unavailable. These are not errors in the traditional sense but normal outcomes that the agent should handle gracefully.

Result size matters when client-side operations produce large outputs. A high-resolution photo or a large document might not be suitable for sending to the agent in full. Consider whether the client should resize, compress, or otherwise process before transmission.

Offline capability might be relevant for some applications. If the client can operate offline, client-side tools might be the only tools available during disconnection. Design accordingly if offline use matters.

Agent Design for Client-Side Tools

Agents using client-side tools need prompts that account for the different execution model.

User interaction expectations differ from server-side tools. When an agent calls a server-side tool, it executes immediately. When it calls a client-side tool, the user might need to take action first. Prompts should prepare agents to wait and to handle cases where users decline.

Graceful degradation handles capability mismatches. If an agent relies on a client-side tool that the current client does not support, it should fall back to alternatives - perhaps asking the user to upload via a different method rather than failing entirely.

Privacy-aware reasoning should prefer client-side processing when appropriate. If a task can be accomplished either way, choosing client-side processing demonstrates respect for user privacy. Prompts can guide this preference.

Clear requests help users understand what is being asked. When an agent needs a file or camera access, it should explain why. Users are more likely to approve requests they understand.

System prompts should explain which tools are client-side, what that means for execution flow, and how to handle the various outcomes (success, user cancellation, capability unavailable).

When to Use Client-Side vs Server-Side

The choice between client-side and server-side execution depends on several factors.

Sensitivity of the data often determines the right choice. Highly sensitive data that should not leave the user's device needs client-side processing. Data that can be processed remotely can use either approach.

Capability requirements constrain options. Operations requiring specialized server resources - GPU inference, large-scale data processing, access to server-side data stores - must be server-side. Operations requiring local device access - camera, local files, sensors - must be client-side.

Performance considerations vary by scenario. Client-side processing avoids network round trips but is limited by device capabilities. Server-side processing can use powerful resources but adds latency. The right choice depends on the operation and the devices involved.

Availability affects reliability. Server-side tools work when the server is available. Client-side tools work when the client is capable and the user approves. Different failure modes suit different requirements.

The cleanest designs use each approach where it fits best rather than forcing everything into one model.

For teams building agent interfaces that need local capabilities, inference.sh supports client-side tools that let agents request operations executed by the user's device. File access, media capture, and local processing become available to agents while keeping users in control of their data. The platform handles the protocol; your client implements the capabilities that make sense for your use case.

Client-side tools extend what agents can do into territory that server-side tools cannot reach. Local data, device capabilities, and privacy-sensitive operations become accessible. The combination of server-side and client-side tools lets agents work with the full range of resources and information that tasks require.

FAQ

How do I handle users who decline client-side tool requests?

Treat declines as normal outcomes, not errors. The agent should acknowledge the decline and proceed with available alternatives. If the task requires the declined capability, explain what cannot be completed and why. If alternatives exist - perhaps the user could upload a file through a different mechanism - offer them. Never repeat the same request immediately after a decline; that creates a frustrating loop. System prompts should prepare agents to handle declines gracefully, perhaps with language like "If the user declines to share a file, offer to help with what information is available or suggest alternative approaches."

Can client-side tools work with mobile apps as well as web browsers?

Yes, the pattern applies to any client that can implement the protocol. Mobile apps can support client-side tools for camera access, photo library access, local file systems, device sensors, and mobile-specific capabilities. The implementation differs - native code instead of browser APIs - but the agent interaction model is the same. Mobile environments often have richer client-side capabilities than web browsers, making client-side tools particularly valuable for native mobile agent applications.

What happens if the client application does not support a client-side tool the agent wants to use?

The tool call should return an appropriate error indicating the capability is unavailable. The agent then needs to adapt - perhaps using an alternative approach, explaining the limitation to the user, or proceeding without that information. Capability detection at session start can prevent this by informing the platform what the client supports, allowing tool availability to reflect actual capabilities. Agents should be prompted to check for alternative approaches when preferred tools are unavailable.