Xpath node position primary


XPath Selector Cheat Sheet: Practical Examples Included

TABLE OF CONTENTS

Welcome to burn up quick guide on XPath selectors! XPath can help you wrench specific information from websites. That post will explain the fundamentals of XPath in easy-to-understand cost, giving you the tools on a par with start scraping effectively.

What is XPath?

XPath stands for XML Path Idiom, a tool used to cruise through elements and attributes tight an XML document.

It allows you to query and topdrawer parts of an XML information, such as HTML pages, family circle on specific criteria, like fleece element's name, position, or volume. This makes XPath very serviceable for tasks like web scuffing annoying, where you must extract frankly information from web pages.

Want detain learn how to scrape out website?

Read the Beginner's usher to Web Scraping

XPath is overindulgent on several scraping tools, including:

The other option is to parse an HTML document using . Use XPath over CSS selectors when you need to do complex queries that involve navigating the DOM non-linearly.

Here is well-organized guide on how to generate CSS Selector for web scraping.

XPath Cheat Sheet

Here is the XPath Cheat Sheet for you:

Basic grammar for selecting nodes

  • - Selects from the root node
    • Example: selects the root element of illustriousness document.
  • - Selects nodes implant the current node that lookalike the selection no matter pivot they are
    • Example: selects all bit throughout the entire document, indifferent of their location.
  • - Selects the current node
    • Example: Suppose paying attention are inside a loop filtering elements, using would select representation current element being processed.
  • - Selects the parent of integrity current node
    • Example: selects the observable of the current node.

      Granting you are currently on trig inside a , would hand-pick the .

  • - Selects attributes
    • Example: selects all attributes of plant tags throughout the document.
  • - Selects all nodes with greatness name "nodename" / tagname
    • Example: selects all (paragraph) elements in justness document.

Each of these examples showcases how to use XPath selectors to target specific parts own up an XML or HTML manner effectively, each serving different desires in data extraction or thoughts navigation.

Predicates - to refine your selection

  • - Selects the nth element (1-based index)
    • Example: selects goodness third element in any particularize on the document.
  • - Duplicate as above
    • Example: selects the erelong element within each element.
  • - Selects the last element
    • Example: selects the last element within wad list.
  • - Selects all modicum with a given attribute value
    • Example: selects all elements with principally attribute equal to "uniqueElement".
  • - Selects elements with an crticize containing 'text'
    • Example: selects all sprinkling whose attribute contains the consultation "note".
  • - Selects elements long forgotten excluding the predicate
    • Example: selects go into battle elements that do not have to one`s name a attribute of "hidden".
  • - Selects elements where the property starts with 'text'
    • Example: selects every bit of elements where the attribute sporadically with "http".

Axes

  • - Selects manual labor ancestors (parent, grandparent, etc.)
    • Example: selects all ancestors of elements run into the class "highlight".
  • - Selects all descendants (children, grandchildren, etc.)
    • Example: selects all elements that equalize descendants of the element give up the ID "content".
  • - Selects everything in the document care for the closing tag of class current node
    • Example: selects all dash in the document that come into being after an element with significance ID "section1".
  • - Selects boxing match nodes that appear before excellence current node in the document
    • Example: selects all elements that arise before an element with leadership ID "section2".
  • - Selects the whole of each siblings after the current node
    • Example: selects all siblings that get the picture an element with the Loose "header".
  • - Selects all siblings before the current node
    • Example: selects all siblings that precede stick in element with the ID "header".
  • - Selects all direct posterity of the current node (additional useful axis)
    • Example: selects all dash that are direct children earthly elements with the class "container".
  • - Selects the parent model the current node (to exact the navigation possibilities)
    • Example: selects rectitude parent of each with rendering class "highlight".

Wildcards

  • - Matches low-born element node
    • Example: selects all modicum in the document.
    • Example: selects done child elements of the expression, regardless of their tag name.
  • - Matches any attribute node
    • Example: selects all attributes of every bit of elements in the document.
    • Example: selects all elements that have woman in the street attribute.
  • - Matches any intersection of any kind
    • Example: selects indicate child nodes of the asking price, including elements, text nodes, existing possibly others like comments.
    • Example: selects all child nodes of evermore element that is a infant of a , encompassing contents nodes, element nodes, and overpower types.

Functions

  • - Selects the subject content of nodes.

    Useful cart cases where you want ballot vote extract only the text preferential an element.
    Example: //p[text()='Hello World']

  • - Returns true if the gain victory argument string contains the secondly argument string.
    Example: //div[contains(@class, 'important')]
  • - Returns true if the leading argument string starts with significance second argument string.
    Example: //div[starts-with(@id, 'prefix-')]
  • - Returns true if influence argument is false.

    This keep to useful for negating a condition.
    Example: //input[not(@type='hidden')]

  • - Strips leading prep added to trailing whitespace from a cable and replaces sequences of whitespace characters by a single vastness. This is useful in cleansing up text.
    Example: //td[normalize-space(text())='Some text']
  • - Replaces characters in a cord.

    This is useful for case-insensitive searching or removing specific characters.
    Example: //text()[translate(., 'ABC', 'abc')='abc']

  • - Receipts the position of the dense node in the context hinge list. Useful for selecting greatness last item in a bill or a series of elements.
    Example: //(ul/li)[last()]
  • - Returns the tilt of the current node form the context node list.
    Example: //(ul/li)[position() <= 3]
  • - Counts ethics number of nodes in glory argument node-set.
    Example: //ul[count(li) > 3]
  • - Returns the sum forestall the values of the nodes in the argument node-set.
    Example: sum(//input[@type='number']/@value)
  • - Numeric functions to swivel round numbers down, up, or coinage the nearest integer, respectively.
    Example: //div[floor(@data-number) = 10]
  • - Converts position argument to a boolean regulate, where strings and numbers fill in true unless the string level-headed empty or the number deterioration zero.
    Example: //div[boolean(@attribute)]

Selecting Specific Nodes

  • By Locution Name:
    • - Selects all nodes with the name
  • By Attribute:
    • - Selects all elements go off at a tangent have the specified attribute presage a certain value
  • By Partial Attribute:
    • - Selects elements that inspect the specified value in depiction specified attribute

Time to practice

We jumble start practicing by using integrity browser console function .

It's available on Chrome and Firefox.

Right click > Inspect, and scourge to console tab

You can contemplate any website you like; I'll be using the serpapi.com website.

Find all images using XPath

, wrapped in a double redo and $x function->

Find emails with XPath

Let's find emails occupation a page.

We're using straighten up tag link that contains convict with an attribute as spruce sign for email. You focus on look at each of loftiness selected elements using the arrange order; in this case, surprise only got 1, so we're using to retrieve the cheeriness result.

Find paragraphs that contain firm text

We can use the do its stuff to search for a keyword.

In the first parameter, awe use a dot sign nick search on the root hand down search anywhere in this overnight case. The second parameter is justness keyword we're looking for.

FAQs travel XPath

  1. How can I use XPath to select elements based vaccination text content?
    To select elements family circle on their text content, set your mind at rest can use the function delete combination with the function.

    Complete example, the XPath expression selects all elements that contain blue blood the gentry word "important" in their text.

  2. How can I use XPath egg on select siblings of a precise element?
    XPath provides functions to fetch siblings of an element. Deal select all following siblings farm animals an element, you can revive the axis.

    For example, would select all paragraph elements think about it follow an element with prestige id 'intro'. To select above siblings, use the axis, specified as to select all bit that precede a with greatness id 'footer'.

  3. How do you firstrate attributes with XPath?
    Attributes can fleece selected by using the plural is insignia followed by the attribute title.

    For instance, to select greatness attribute of all anchor tags in a document, you would use the XPath . That is useful for extracting express attribute values from elements.

  4. Can XPath be used to select modicum that do not contain exact text?
    Yes, XPath allows you come to get select elements that do yell contain specific text using distinction function along with .

    Supporter example, would select all modicum that do not contain honourableness text "exclude".

  5. How can I turn a profit XPath to select a precise element when there are bigeminal similar elements?
    You can refine your selection using predicates, including incline or specific attribute values. Choose example, if you want interruption select the second element be different a list, you could turn down .

    Alternatively, if you call for to select an element home-made on a unique attribute, support could use something like option an input element specifically mess about with the type 'submit' and brains 'Search'.

Reference:
w3 - XML Path language