GitHub - rubycdp/ferrum: Headless Chrome Ruby API (original) (raw)

Ferrum - high-level API to control Chrome in Ruby

Ferrum logo

As simple as Puppeteer, though even simpler.

It is Ruby clean and high-level API to Chrome. Runs headless by default, but you can configure it to run in a headful mode. All you need is Ruby andChrome orChromium. Ferrum connects to the browser by CDP protocol and there's _no_Selenium/WebDriver/ChromeDriver dependency. The emphasis was made on a raw CDP protocol because Chrome allows you to do so many things that are barely supported by WebDriver because it should have consistent design with other browsers.

Index

Install

There's no official Chrome or Chromium package for Linux don't install it this way because it's either outdated or unofficial, both are bad. Download it from official source for Chrome orChromium. Chrome binary should be in the PATH or BROWSER_PATH and you can pass it as an option to browser instance see :browser_path inCustomization.

Add this to your Gemfile and run bundle install.

Examples

Navigate to a website and save a screenshot:

browser = Ferrum::Browser.new browser.go_to("https://google.com") browser.screenshot(path: "google.png") browser.quit

When you work with browser instance Ferrum creates and maintains a default page for you, in fact all the methods above are sent to the page instance that is created in the default_context of the browser instance. You can interact with a page created manually and this is preferred:

browser = Ferrum::Browser.new page = browser.create_page page.go_to("https://google.com") input = page.at_xpath("//input[@name='q']") input.focus.type("Ruby headless driver for Chrome", :Enter) page.at_css("a > h3").text # => "rubycdp/ferrum: Ruby Chrome/Chromium driver - GitHub" browser.quit

Evaluate some JavaScript and get full width/height:

browser = Ferrum::Browser.new page = browser.create_page page.go_to("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara") width, height = page.evaluate <<~JS [document.documentElement.offsetWidth, document.documentElement.offsetHeight] JS

=> [1024, 1931]

browser.quit

Do any mouse movements you like:

Trace a 100x100 square

browser = Ferrum::Browser.new page = browser.create_page page.go_to("https://google.com") page.mouse .move(x: 0, y: 0) .down .move(x: 0, y: 100) .move(x: 100, y: 100) .move(x: 100, y: 0) .move(x: 0, y: 0) .up

browser.quit

Docker

In docker as root you must pass the no-sandbox browser option:

Ferrum::Browser.new(browser_options: { "no-sandbox": nil })

It has also been reported that the Chrome process repeatedly crashes when running inside a Docker container on an M1 Mac preventing Ferrum from working. Ferrum should work as expected when deployed to a Docker container on a non-M1 Mac.

Customization

You can customize options with the following code in your test setup:

Ferrum::Browser.new(options)

go_to(url) : String

Navigate page to.

page.go_to("https://github.com/")

back

Navigate to the previous page in history.

page.go_to("https://github.com/") page.at_xpath("//a").click page.back

forward

Navigate to the next page in history.

page.go_to("https://github.com/") page.at_xpath("//a").click page.back page.forward

refresh

Reload current page.

page.go_to("https://github.com/") page.refresh

stop

Stop all navigations and loading pending resources on the page

page.go_to("https://github.com/") page.stop

position = **options

Set the position for the browser window

browser.position = { left: 10, top: 20 }

position : Array<Integer>

Get the position for the browser window

browser.position # => [10, 20]

window_bounds = **options

Set window bounds

browser.window_bounds = { left: 10, top: 20, width: 1024, height: 768, window_state: "normal" }

window_bounds : Hash<String, Integer | String>

Get window bounds

browser.window_bounds # => { "left": 0, "top": 1286, "width": 10, "height": 10, "windowState": "normal" }

window_id : Integer

Current window id

Finders

at_css(selector, **options) : Node | nil

Find node by selector. Runs document.querySelector within the document or provided node.

page.go_to("https://github.com/") page.at_css("a[aria-label='Issues you created']") # => Node

css(selector, **options) : Array<Node> | []

Find nodes by selector. The method runs document.querySelectorAll within the document or provided node.

page.go_to("https://github.com/") page.css("a[aria-label='Issues you created']") # => [Node]

at_xpath(selector, **options) : Node | nil

Find node by xpath.

page.go_to("https://github.com/") page.at_xpath("//a[@aria-label='Issues you created']") # => Node

xpath(selector, **options) : Array<Node> | []

Find nodes by xpath.

page.go_to("https://github.com/") page.xpath("//a[@aria-label='Issues you created']") # => [Node]

current_url : String

Returns current top window location href.

page.go_to("https://google.com/") page.current_url # => "https://www.google.com/"

current_title : String

Returns current top window title

page.go_to("https://google.com/") page.current_title # => "Google"

body : String

Returns current page's html.

page.go_to("https://google.com/") page.body # => '...

Screenshots

screenshot(**options) : String | Integer

Saves screenshot on a disk or returns it as base64.

page.go_to("https://google.com/")

Save on the disk in PNG

page.screenshot(path: "google.png") # => 134660

Save on the disk in JPG

page.screenshot(path: "google.jpg") # => 30902

Save to Base64 the whole page not only viewport and reduce quality

page.screenshot(full: true, quality: 60, encoding: :base64) # "iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uhUNAAAAAXNSR0IArs4c6Q...

Save on the disk with the selected element in PNG

page.screenshot(path: "google.png", selector: "textarea") # => 11340

Save to Base64 with an area of the page in PNG

page.screenshot(path: "google.png", area: { x: 0, y: 0, width: 400, height: 300 }) # => 54239

Save with specific background color

page.screenshot(background_color: Ferrum::RGBA.new(0, 0, 0, 0.0))

pdf(**options) : String | Boolean

Saves PDF on a disk or returns it as base64.

page.go_to("https://google.com/")

Save to disk as a PDF

page.pdf(path: "google.pdf", paper_width: 1.0, paper_height: 1.0) # => true

mhtml(**options) : String | Integer

Saves MHTML on a disk or returns it as a string.

page.go_to("https://google.com/") page.mhtml(path: "google.mhtml") # => 87742

Screencast

start_screencast(**options) { |data, metadata, session_id| ... }

Starts sending frames to record screencast to the given block.

require "base64"

page.go_to("https://apple.com/ipad")

page.start_screencast(format: :jpeg, quality: 75) do |data, metadata| timestamp = (metadata["timestamp"] * 1000).to_i File.binwrite("image_#{timestamp}.jpg", Base64.decode64(data)) end

sleep 10

page.stop_screencast

📝 NOTE

Chrome only sends new frames while page content is changing. For example, if there is an animation or a video on the page, Chrome sends frames at the rate requested. On the other hand, if the page is nothing but a wall of static text, Chrome sends frames while the page renders. Once Chrome has finished rendering the page, it sends no more frames until something changes (e.g., navigating to another location).

stop_screencast

Stops sending frames.

Network

page.network

traffic Array<Network::Exchange>

Returns all information about network traffic as Network::Exchange instance which in general is a wrapper around request, response and error.

page.go_to("https://github.com/") page.network.traffic # => [#<Ferrum::Network::Exchange, ...]

request : Network::Request

Page request of the main frame.

page.go_to("https://github.com/") page.network.request # => #<Ferrum::Network::Request...

response : Network::Response

Page response of the main frame.

page.go_to("https://github.com/") page.network.response # => #<Ferrum::Network::Response...

status : Integer

Contains the status code of the main page response (e.g., 200 for a success). This is just a shortcut for response.status.

page.go_to("https://github.com/") page.network.status # => 200

wait_for_idle(**options) : Boolean

Waits for network idle, returns true in case of success and false if there are still connections.

page.go_to("https://example.com/") page.at_xpath("//a[text() = 'No UI changes button']").click page.network.wait_for_idle # => true

wait_for_idle!(**options)

Waits for network idle or raises Ferrum::TimeoutError error. Accepts same arguments as wait_for_idle.

page.go_to("https://example.com/") page.at_xpath("//a[text() = 'No UI changes button']").click page.network.wait_for_idle! # might raise an error

clear(type)

Clear page's cache or collected traffic.

traffic = page.network.traffic # => [] page.go_to("https://github.com/") traffic.size # => 51 page.network.clear(:traffic) traffic.size # => 0

intercept(**options)

Set request interception for given options. This method is only sets request interception, you should use on callback to catch requests and abort or continue them.

browser = Ferrum::Browser.new page = browser.create_page page.network.intercept page.on(:request) do |request| if request.match?(/bla-bla/) request.abort elsif request.match?(/lorem/) request.respond(body: "Lorem ipsum") else request.continue end end page.go_to("https://google.com")

authorize(**options, &block)

If site or proxy uses authorization you can provide credentials using this method.

page.network.authorize(user: "login", password: "pass") { |req| req.continue } page.go_to("http://example.com/authenticated") puts page.network.status # => 200 puts page.body # => Welcome, authenticated client

Since Chrome implements authorize using request interception you must continue or abort authorized requests. If you already have code that uses interception you can use authorize without block, but if not you are obliged to pass block, so this is version doesn't pass block and can work just fine:

browser = Ferrum::Browser.new page = browser.create_page page.network.intercept page.on(:request) do |request| if request.resource_type == "Image" request.abort else request.continue end end

page.network.authorize(user: "login", password: "pass", type: :proxy)

page.go_to("https://google.com")

You used to call authorize method without block, but since it's implemented using request interception there could be a collision with another part of your code that also uses request interception, so that authorize allows the request while your code denies but it's too late. The block is mandatory now.

emulate_network_conditions(**options)

Activates emulation of network conditions.

page.network.emulate_network_conditions(connection_type: "cellular2g") page.go_to("https://github.com/")

offline_mode

Activates offline mode for a page.

page.network.offline_mode page.go_to("https://github.com/") # => Ferrum::StatusError (Request to https://github.com/ failed(net::ERR_INTERNET_DISCONNECTED))

cache(disable: Boolean)

Toggles ignoring cache for each request. If true, cache will not be used.

page.network.cache(disable: true)

Downloads

page.downloads

files Array<Hash>

Returns all information about downloaded files as a Hash.

page.go_to("http://localhost/attachment.pdf") page.downloads.files # => [{"frameId"=>"E3316DF1B5383D38F8ADF7485005FDE3", "guid"=>"11a68745-98ac-4d54-9b57-9f9016c268b3", "url"=>"http://localhost/attachment.pdf", "suggestedFilename"=>"attachment.pdf", "totalBytes"=>4911, "receivedBytes"=>4911, "state"=>"completed"}]

wait(timeout)

Waits until the download is finished.

page.go_to("http://localhost/attachment.pdf") page.downloads.wait

or

page.go_to("http://localhost/page") page.downloads.wait { page.at_css("#download").click }

set_behavior(**options)

Sets behavior in case of file to be downloaded.

page.go_to("https://example.com/") page.downloads.set_behavior(save_path: "/tmp", behavior: :allow)

Proxy

You can set a proxy with a :proxy option:

Ferrum::Browser.new(proxy: { host: "x.x.x.x", port: "8800", user: "user", password: "pa$$" })

:bypass can specify semi-colon-separated list of hosts for which proxy shouldn't be used:

Ferrum::Browser.new(proxy: { host: "x.x.x.x", port: "8800", bypass: "*.google.com;*foo.com" })

In general passing a proxy option when instantiating a browser results in a browser running with proxy command line flags, so that it affects all pages and contexts. You can create a page in a new context which can use its own proxy settings:

browser = Ferrum::Browser.new

browser.create_page(proxy: { host: "x.x.x.x", port: 31337, user: "user", password: "password" }) do |page| page.go_to("https://api.ipify.org?format=json") page.body # => "x.x.x.x" end

browser.create_page(proxy: { host: "y.y.y.y", port: 31337, user: "user", password: "password" }) do |page| page.go_to("https://api.ipify.org?format=json") page.body # => "y.y.y.y" end

Mouse

page.mouse

scroll_to(x, y)

Scroll page to a given x, y

page.go_to("https://www.google.com/search?q=Ruby+headless+driver+for+Capybara") page.mouse.scroll_to(0, 400)

click(**options) : Mouse

Click given coordinates, fires mouse move, down and up events.

down(**options) : Mouse

Mouse down for given coordinates.

up(**options) : Mouse

Mouse up for given coordinates.

move(x:, y:, steps: 1) : Mouse

Mouse move to given x and y.

Keyboard

page.keyboard

down(key) : Keyboard

Dispatches a keydown event.

up(key) : Keyboard

Dispatches a keyup event.

type(*keys) : Keyboard

Sends a keydown, keypress/input, and keyup event for each character in the text.

modifiers(keys) : Integer

Returns bitfield for a given keys

Cookies

page.cookies

Returns cookies hash

page.cookies.all # => {"NID"=>#<Ferrum::Cookies::Cookie:0x0000558624b37a40 @attributes={"name"=>"NID", "value"=>"...", "domain"=>".google.com", "path"=>"/", "expires"=>1583211046.575681, "size"=>178, "httpOnly"=>true, "secure"=>false, "session"=>false}>}

Returns cookie

page.cookies["NID"] # => <Ferrum::Cookies::Cookie:0x0000558624b67a88 @attributes={"name"=>"NID", "value"=>"...", "domain"=>".google.com", "path"=>"/", "expires"=>1583211046.575681, "size"=>178, "httpOnly"=>true, "secure"=>false, "session"=>false}>

set(value) : Boolean

Sets a cookie

page.cookies.set(name: "stealth", value: "omg", domain: "google.com") # => true

nid_cookie = page.cookies["NID"] # => Ferrum::Cookies::Cookie:0x0000558624b67a88 page.cookies.set(nid_cookie) # => true

remove(**options) : Boolean

Removes given cookie

page.cookies.remove(name: "stealth", domain: "google.com") # => true

clear : Boolean

Removes all cookies for current page

page.cookies.clear # => true

store(path) : Boolean

Stores all cookies of current page in a file.

Cookies are saved into cookies.yml

page.cookies.store # => 15657

load(path) : Boolean

Loads all cookies from the file and sets them for current page.

Cookies are loaded from cookies.yml

page.cookies.load # => true

Headers

page.headers

get : Hash

Get all headers

set(headers) : Boolean

Set given headers. Eventually clear all headers and set given ones.

add(headers) : Boolean

Adds given headers to already set ones.

clear : Boolean

Clear all headers.

JavaScript

evaluate(expression, *args)

Evaluate and return result for given JS expression

page.evaluate("[window.scrollX, window.scrollY]")

evaluate_async(expression, wait_time, *args)

Evaluate asynchronous expression and return result

page.evaluate_async(%(arguments[0]({foo: "bar"})), 5) # => { "foo" => "bar" }

execute(expression, *args)

Execute expression. Doesn't return the result

page.execute(%(1 + 1)) # => true

evaluate_on_new_document(expression)

Evaluate JavaScript to modify things before a page load

browser.evaluate_on_new_document <<~JS Object.defineProperty(navigator, "languages", { get: function() { return ["tlh"]; } }); JS

add_script_tag(**options) : Boolean

page.add_script_tag(url: "http://example.com/stylesheet.css") # => true

add_style_tag(**options) : Boolean

page.add_style_tag(content: "h1 { font-size: 40px; }") # => true

bypass_csp(**options) : Boolean

page.bypass_csp # => true page.go_to("https://github.com/ruby-concurrency/concurrent-ruby/blob/master/docs-source/promises.in.md") page.refresh page.add_script_tag(content: "window.__injected = 42") page.evaluate("window.__injected") # => 42

Emulation

disable_javascript

Disables Javascripts from the loaded HTML source. You can still evaluate JavaScript with evaluate or execute. Returns nothing.

set_viewport

Overrides device screen dimensions and emulates viewport.

page.set_viewport(width: 1000, height: 600, scale_factor: 3)

Frames

frames : Array[Frame] | []

Returns all the frames current page have.

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") page.frames # =>

[

#<Ferrum::Frame @id="C6D104CE454A025FBCF22B98DE612B12" @parent_id=nil @name=nil @state=:stopped_loading @execution_id=1>,

#<Ferrum::Frame @id="C09C4E4404314AAEAE85928EAC109A93" @parent_id="C6D104CE454A025FBCF22B98DE612B12" @state=:stopped_loading @execution_id=2>,

#<Ferrum::Frame @id="2E9C7F476ED09D87A42F2FEE3C6FBC3C" @parent_id="C6D104CE454A025FBCF22B98DE612B12" @state=:stopped_loading @execution_id=3>,

...

]

main_frame : Frame

Returns page's main frame, the top of the tree and the parent of all frames.

frame_by(**options) : Frame | nil

Find frame by given options.

page.frame_by(id: "C6D104CE454A025FBCF22B98DE612B12")

Frame

id : String

Frame's unique id.

parent_id : String | nil

Parent frame id if this one is nested in another one.

parent : Frame | nil

Parent frame if this one is nested in another one.

frame_element : Node | nil

Returns the element in which the window is embedded.

execution_id : Integer

Execution context id which is used by JS, each frame has its own context in which JS evaluates.

name : String | nil

If frame was given a name it should be here.

state : Symbol | nil

One of the states frame's in:

url : String

Returns current frame's location href.

page.go_to("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe") frame = page.frames[1] frame.url # => https://interactive-examples.mdn.mozilla.net/pages/tabbed/iframe.html

title

Returns current frame's title.

page.go_to("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe") frame = page.frames[1] frame.title # => HTML Demo:

main? : Boolean

If current frame is the main frame of the page (top of the tree).

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") frame = page.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93") frame.main? # => false

current_url : String

Returns current frame's top window location href.

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") frame = page.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93") frame.current_url # => "https://www.w3schools.com/tags/tag_frame.asp"

current_title : String

Returns current frame's top window title.

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") frame = page.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93") frame.current_title # => "HTML frame tag"

body : String

Returns current frame's html.

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") frame = page.frame_by(id: "C09C4E4404314AAEAE85928EAC109A93") frame.body # => ""

doctype

Returns current frame's doctype.

page.go_to("https://www.w3schools.com/tags/tag_frame.asp") page.main_frame.doctype # => ""

content = html

Sets a content of a given frame.

page.go_to("https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe") frame = page.frames[1] frame.body #