extract
Extract and analyze web page content using Exa's advanced content retrieval
An AI agent that can't access the web is working from memory alone. It can reason about what it already knows, but it can't check whether that knowledge is still accurate, can't discover new information, and can't verify claims against primary sources. For research agents, RAG pipelines, competitive analysis, and fact-checking workflows, web access isn't a nice-to-have. It's the difference between an agent that guesses and one that knows.
The problem is that "web access" sounds like a single capability, but it actually decomposes into several distinct operations: searching for relevant pages, extracting clean content from URLs, and synthesizing answers from retrieved sources. Each of these has different performance characteristics, different cost profiles, and different strengths depending on what you're trying to accomplish. Two providers on inference.sh cover this space from meaningfully different angles - Tavily and Exa - and choosing between them (or combining them) depends on what your agent actually needs to do.
I've spent time with all five apps. Here's what I've found.
two philosophies of web search
Tavily and Exa both return web results, but they approach the problem from opposite directions.
Tavily's search assistant (tavily/search-assistant) works the way you'd expect a browsing-oriented tool to work. You give it a query, it crawls the web, retrieves pages, and returns structured results with titles, URLs, snippets, and relevance scores. It also offers an AI-generated answer summary that distills the results into a direct response. The mental model is a research assistant who opens a browser, reads through the top results, and reports back. It understands two search depths - basic for fast lookups, advanced for thorough investigations - and lets you scope results by domain inclusion or exclusion. The topic parameter separates general searches from news-specific queries, which matters more than you'd think when timeliness is critical.
Exa's search (exa/search) takes a fundamentally different approach. It uses neural and semantic search, meaning it doesn't just match keywords - it understands what your query means and finds pages that are conceptually relevant. You can switch between neural search, keyword search, or let the system choose automatically. This distinction matters in practice. A keyword search for "transformer architecture attention mechanism" returns pages that contain those exact words. A neural search for the same query also surfaces pages that discuss the concept using different terminology - papers about self-attention, blog posts explaining the architecture without using the word "transformer," educational content that approaches the topic from unexpected angles.
Exa also offers content category filtering and date range controls that go beyond what most search tools provide. You can restrict results to research papers, company pages, news articles, or other specific content types. The date filtering uses ISO 8601 timestamps, which means you can search for content published within an exact window - useful for tracking how a topic evolved over a specific period.
extraction: getting clean content from messy pages
Search finds pages. Extraction reads them. Both providers offer extraction tools, and again, the implementations reflect different priorities.
Tavily's extract (tavily/extract) is built around reliability and batch processing. You hand it one URL or a list of URLs, it fetches each page, strips away navigation, ads, and boilerplate, and returns clean readable content. The output tells you exactly what succeeded and what failed - counts, error messages, request IDs for debugging. Two extraction depths are available: basic for standard scraping, advanced for pages that resist simple parsing. You can get output as markdown or plain text, and optionally pull images and favicons from each page.
The batch processing angle is where Tavily's extractor becomes particularly useful for agent workflows. If your agent identified 15 relevant URLs from a search step, it can extract all of them in a single call rather than making 15 sequential requests. The failed results reporting means your agent can gracefully handle pages that block scraping or time out without the entire operation breaking.
Exa's extract (exa/extract) leans harder into intelligence. Beyond basic content retrieval, it offers LLM-powered summaries of extracted content, optional subpage crawling that follows links to related pages within a site, and a "create context" mode that combines multiple pages into a single synthesized summary. The livecrawl options give you control over caching behavior - you can force fresh crawls, accept cached versions, or fall back to live crawling only when cached content is unavailable.
The subpage crawling is genuinely distinctive. When you extract a documentation page and set subpages to 3, Exa doesn't just return that page - it follows internal links and returns related pages as well. For documentation sites, wikis, and multi-page articles, this saves your agent multiple round trips. The summary_query parameter lets you guide what the summary focuses on, so extracting a long technical document can produce a summary tailored to your specific question rather than a generic overview.
Both providers offer affordable per-page extraction pricing, with costs scaling differently depending on your usage pattern and whether you need features like summaries and subpage crawling.
the answer engine
Exa offers something neither Tavily app directly replicates: exa/answer. This is a search-to-answer pipeline compressed into a single call. You ask a question, Exa searches the web, retrieves relevant sources, and generates an LLM-powered answer with citations pointing back to the source material.
I think of this as the difference between a librarian who finds books for you and a librarian who reads the books and writes you a briefing. The formatted response comes with inline citations, so your agent (or your user) can trace any claim back to its source. This is particularly valuable for fact-checking workflows where the provenance of information matters as much as the information itself.
The tradeoff with answer generation is latency and cost. The response takes longer than a raw search because it's doing more work. For applications where speed matters more than synthesis - say, quickly checking whether a company exists or finding a specific data point - raw search is faster and cheaper. For applications where the agent needs to understand and synthesize multiple sources before acting, the answer endpoint saves a round of LLM reasoning that you'd otherwise have to build yourself.
Tavily's search assistant partially overlaps here with its include_answer option, which adds an AI-generated answer summary to search results. But this is an add-on to search results rather than a primary output mode. The answer is shorter and less detailed than what Exa's dedicated answer endpoint produces, and it doesn't come with the same citation structure.
when to reach for which tool
The choice between Tavily and Exa isn't a matter of one being better than the other. They're optimized for different retrieval patterns, and most serious agent architectures will use them for different purposes.
Tavily's search assistant excels at broad information gathering where you want to see the landscape. Market research, competitive analysis, general fact-finding - scenarios where browsing multiple results and letting your agent reason across them is the right approach. The advanced search depth combined with raw content retrieval gives your agent a comprehensive view of what's out there. For news monitoring with the topic parameter set to "news," Tavily returns fresh results with appropriate recency weighting.
Exa's neural search shines when the query is conceptual rather than keyword-shaped. "Companies doing what Stripe does but for healthcare" is the kind of query where semantic search dramatically outperforms keyword matching. Research tasks that require finding related work, discovering competitors, or mapping a conceptual space benefit from Exa's ability to understand meaning rather than just matching terms. The date range filtering makes it particularly effective for tracking how a field has evolved over time.
For extraction, the choice depends on volume and depth. Tavily's batch extraction is the better fit when you have a list of known URLs and need clean content from all of them efficiently. Exa's extraction is the better fit when you need deeper analysis of fewer pages - summaries, subpage exploration, or targeted context creation.
For direct question answering with source attribution, Exa's answer endpoint is the clear choice. Nothing else in this toolkit produces cited, synthesized answers from web sources in a single call.
In practice, a well-designed research agent might use Exa's neural search for discovery, Tavily's search for verification and broader coverage, Exa's extraction for deep reading of key sources, and Tavily's extraction for batch processing of reference lists. The tools compose well because they serve different stages of the information retrieval pipeline.
building web intelligence into agent workflows
The real value of these tools emerges when they're wired into larger agent systems rather than called in isolation.
Consider a competitive intelligence agent. It might start with Exa's neural search to discover companies in a specific space, use Tavily's search to find recent news about each company, extract detailed content from their websites and press releases using either extraction tool, and then synthesize findings using Exa's answer endpoint or its own LLM reasoning. Each tool handles the part of the pipeline it's best suited for.
A fact-checking workflow looks different. Claim comes in. The agent searches for supporting and contradicting evidence using both search tools - keyword search for specific claims, neural search for conceptual verification. It extracts primary sources. It compares what it found against the original claim. The diversity of search approaches actually strengthens the verification because semantic search and keyword search surface different evidence.
RAG pipelines benefit from the extraction tools specifically. When your retrieval step identifies relevant URLs but your vector store needs clean text, Tavily's batch extraction is a cost-effective way to keep your knowledge base current. Exa's summarization can produce pre-digested content that's more useful for downstream retrieval than raw page text.
honest limitations
These tools have boundaries worth understanding before you build on them.
Both search tools depend on what's publicly accessible on the web. Paywalled content, login-gated pages, and dynamically rendered single-page applications can produce thin or empty results. Tavily's advanced extraction handles some JavaScript-rendered content better than basic mode, and Exa's livecrawl option can force fresh retrieval, but neither is a substitute for authenticated access to gated sources.
Neural search, for all its power, can occasionally surface semantically related but topically irrelevant results. A query about "apple" in a technology context might still return results about fruit if the semantic signals are ambiguous. Keyword search doesn't have this problem, which is why the ability to switch between modes in Exa matters.
Rate limits and availability are factors in production systems. Both providers have usage-based pricing, which means a runaway agent loop can generate unexpected costs. Building cost guards into your agent logic - hard limits on search calls per task, extraction budgets per workflow - is basic hygiene that's easy to forget during development.
Freshness varies. Neither tool guarantees real-time results. There's always some lag between content being published and appearing in search results. For time-critical applications like breaking news monitoring, you'll want to understand the indexing latency for your specific use case.
frequently asked questions
how do tavily and exa pricing compare for typical agent workloads?
For a research agent running daily searches and extractions, the costs differ based on usage patterns. For high-volume extraction without summaries, Exa tends to be slightly cheaper. For search-heavy workloads, Tavily's fixed pricing makes costs more predictable. Both providers are affordable enough for production agent workloads, and the choice often comes down to which features you need rather than raw cost differences.
can I use both providers together, or should I pick one?
Using both is often the strongest approach, and there's no technical or platform reason to limit yourself to one. The providers have complementary strengths that cover different retrieval scenarios. A practical pattern is using Exa's neural search for exploratory and conceptual queries, Tavily's search for keyword-specific and news-oriented lookups, and picking the extraction tool based on whether you need batch processing (Tavily) or deep single-page analysis with summaries (Exa). The unified interface on inference.sh means switching between them requires no additional integration work.
what happens when extraction fails on certain pages?
Both tools handle failures gracefully, but they report them differently. Tavily's extract returns explicit success and failure counts along with error details for each failed URL, making it straightforward for your agent to retry or skip problematic pages. Exa's extract offers the livecrawl parameter to force fresh retrieval when cached versions are unavailable, which can recover some failures caused by stale content. Pages that block automated access, require authentication, or rely heavily on client-side rendering will fail on both providers. Building fallback logic - trying one extractor when the other fails, or falling back to search snippets when full extraction isn't possible - makes your agent more resilient.
api reference
about
extract and analyze web page content using exa's advanced content retrieval
1. calling the api
install the client
the client provides a convenient way to interact with the api.
1pip install inferenceshsetup your api key
set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.
1export INFERENCE_API_KEY="inf_your_key"run and get result
submit a request and wait for the final result. best for batch processing or when you don't need progress updates.
1from inferencesh import inference23client = inference()456result = client.run({7 "app": "exa/extract",8 "input": {}9 })1011print(result["output"])stream live updates
get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.
1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8 "app": "exa/extract",9 "input": {}10 }, stream=True):11 if update.get("progress"):12 print(f"progress: {update['progress']}%")13 if update.get("output"):14 print(f"output: {update['output']}")2. authentication
the api uses api keys for authentication. see the authentication docs for detailed setup instructions.
3. files
file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.
automatic upload
the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.
1# local file paths are automatically uploaded2result = client.run({3 "app": "exa/extract",4 "input": {5 "image": "/path/to/local/image.png", # detected & uploaded6 "audio": "https://example.com/audio.mp3", # url passed through7 }8})4. webhooks
get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.
1result = client.run({2 "app": "exa/extract",3 "input": {},4 "webhook": "https://your-server.com/webhook"5}, wait=False)webhook payload
your endpoint receives a JSON POST with the task result:
1{2 "id": "task_abc123",3 "status": 9,4 "output": { ... },5 "error": "",6 "session_id": null,7 "created_at": "2024-01-15T10:30:00Z",8 "updated_at": "2024-01-15T10:30:05Z"9}5. schema
input
url to extract content from (single url for cost predictability)
extract full page text content
maximum characters to extract from the page (default: no limit)
preserve html structure in extracted text
generate llm-powered summary (costs extra)
custom query to guide summary generation
page crawling strategy: never, fallback (default), always, or preferred
crawl timeout in milliseconds (default: 10000ms)
number of subpages to crawl per url
keywords to target specific subpages
number of links to extract per page
number of image urls to extract per page
combine all contents into single llm-ready context string
ready to run extract?
we use cookies
we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.
by clicking "accept", you agree to our use of cookies.
learn more.