Skip to content

The Firecrawl gem implements a lightweight interface to the Firecrawl.dev API which takes a URL, crawls it and returns html, markdown, or structured data. It is of particular value when used with LLM's for grounding.

License

Notifications You must be signed in to change notification settings

EndlessInternational/firecrawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Firecrawl

The Firecrawl gem provides a Ruby interface to the Firecrawl API, enabling you to scrape web pages, capture screenshots, and crawl entire websites. The API returns clean, structured content in formats like Markdown and HTML, making it particularly useful for applications that need to process web content, including those using Large Language Models for grounding or real-time information retrieval.

require 'firecrawl'

Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]

response = Firecrawl.scrape( 'https://example.com' )
if response.success?
  result = response.result
  puts result.metadata[ 'title' ]
  puts result.markdown
end

Table of Contents


Installation

Add this line to your application's Gemfile:

gem 'firecrawl'

Then execute:

bundle install

Or install it directly:

gem install firecrawl

Quick Start

The simplest way to use the gem is through the module-level convenience methods. Set your API key once, then call any endpoint:

require 'firecrawl'

Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]

response = Firecrawl.scrape( 'https://example.com' )
if response.success?
  puts response.result.markdown
end

For more control, instantiate request objects directly. This allows you to configure options using a block-based DSL and reuse request instances:

request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )

options = Firecrawl::ScrapeOptions.build do
  formats [ :markdown, :html, :screenshot ]
  only_main_content true
end

response = request.submit( 'https://example.com', options )

Endpoints

Scrape

The scrape endpoint fetches a single URL and returns the page content in one or more formats. You can optionally run browser actions before content is captured.

options = Firecrawl::ScrapeOptions.build do
  formats [ :markdown, :screenshot ]
  only_main_content true
  screenshot do
    full_page true
  end
end

request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
response = request.submit( 'https://example.com', options )

if response.success?
  result = response.result
  puts result.markdown
  puts result.screenshot_url
end

For complete documentation of all scrape options and response fields, see Scrape Documentation.

Batch Scrape

The batch scrape endpoint processes multiple URLs efficiently. It returns results asynchronously, so you poll for completion:

request = Firecrawl::BatchScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )

urls = [ 'https://example.com', 'https://example.org' ]
options = Firecrawl::BatchScrapeOptions.build do
  formats [ :markdown ]
  only_main_content true
end

response = request.submit( urls, options )

while response.success?
  result = response.result
  result.each do | scrape_result |
    puts scrape_result.markdown
  end
  break unless result.scraping?
  sleep 1
  response = request.retrieve( result )
end

For complete documentation of all batch scrape options and response fields, see Batch Scrape Documentation.

Map

The map endpoint retrieves a site's URL structure without scraping content. This is useful for discovering pages before scraping:

request = Firecrawl::MapRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )

options = Firecrawl::MapOptions.build do
  limit 100
  include_subdomains false
end

response = request.submit( 'https://example.com', options )

if response.success?
  response.result.each do | link |
    puts link.url
  end
end

For complete documentation of all map options and response fields, see Map Documentation.

Crawl

The crawl endpoint recursively scrapes an entire website. Like batch scrape, it returns results asynchronously:

request = Firecrawl::CrawlRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )

options = Firecrawl::CrawlOptions.build do
  maximum_depth 2
  limit 50
  scrape_options do
    formats [ :markdown ]
    only_main_content true
  end
end

response = request.submit( 'https://example.com', options )

while response.success?
  result = response.result
  result.each do | scrape_result |
    puts scrape_result.metadata[ 'title' ]
  end
  break unless result.crawling?
  sleep 1
  response = request.retrieve( result )
end

For complete documentation of all crawl options and response fields, see Crawl Documentation.

Extract

The extract endpoint uses LLM to pull structured data from URLs. Provide a prompt and/or JSON schema to define what data you want:

request = Firecrawl::ExtractRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )

options = Firecrawl::ExtractOptions.build do
  prompt 'Extract the company name and description'
  schema( {
    type: 'object',
    properties: {
      name: { type: 'string' },
      description: { type: 'string' }
    }
  } )
end

response = request.submit( 'https://example.com', options )

while response.success?
  result = response.result
  break unless result.processing?
  sleep 2
  response = request.retrieve( result )
end

if result.completed?
  puts result.data
end

For complete documentation of all extract options and response fields, see Extract Documentation.


Responses and Errors

All request methods return a Faraday::Response object. Check response.success? to determine if the HTTP request succeeded. When successful, response.result contains the parsed result object specific to the endpoint.

response = request.submit( url, options )

if response.success?
  result = response.result
  if result.success?
    # process result
  end
else
  error = response.result
  puts error.error_type         # :authentication_error, :rate_limit_error, etc.
  puts error.error_description  # human-readable message
end

The gem maps HTTP status codes to error types:

Status Error Type Description
400 :invalid_request_error The request format or content was invalid
401 :authentication_error The API key is missing or invalid
402 :payment_required The account requires payment
404 :not_found_error The requested resource was not found
429 :rate_limit_error The account has exceeded rate limits
500-505 :api_error A server error occurred
529 :overloaded_error The service is temporarily overloaded

Connections

The gem uses Faraday for HTTP requests, which means you can customize the connection configuration. To use a custom connection:

connection = Faraday.new do | faraday |
  faraday.request :json
  faraday.response :logger
  faraday.adapter :net_http
end

Firecrawl.connection connection

Or pass it directly to a request:

request = Firecrawl::ScrapeRequest.new(
  api_key: ENV[ 'FIRECRAWL_API_KEY' ],
  connection: connection
)

License

The gem is available as open source under the terms of the MIT License.

About

The Firecrawl gem implements a lightweight interface to the Firecrawl.dev API which takes a URL, crawls it and returns html, markdown, or structured data. It is of particular value when used with LLM's for grounding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages