HTML documents using XPath or CSS selectors (original) (raw)

Parsel

Parsel is a BSD-licensed Python library to extract data from HTML, JSON, andXML documents.

It supports:

CSS and XPath expressions for HTML and XML documents
JMESPath expressions for JSON documents
Regular expressions

Find the Parsel online documentation at https://parsel.readthedocs.org.

from parsel import Selector text = """
Hello, Parsel!

Link 1

Link 2

""" selector = Selector(text=text) selector.css('h1::text').get() 'Hello, Parsel!' selector.xpath('//h1/text()').re(r'\w+') ['Hello', 'Parsel'] for li in selector.css('ul > li'): ... print(li.xpath('.//@href').get()) http://example.com http://scrapy.org selector.css('script::text').jmespath("a").get() 'b' selector.css('script::text').jmespath("a").getall() ['b', 'c']