0% found this document useful (0 votes)
5 views

Css selector & Xpath expla

The document provides an overview of CSS selectors and XPath expressions for selecting HTML elements. It explains how to target elements using class and ID attributes, as well as foreign attributes, and includes examples of various selector syntaxes. Additionally, it discusses value lookup techniques for selecting elements based on attribute values and the use of functions like 'contains' and 'starts-with'.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Css selector & Xpath expla

The document provides an overview of CSS selectors and XPath expressions for selecting HTML elements. It explains how to target elements using class and ID attributes, as well as foreign attributes, and includes examples of various selector syntaxes. Additionally, it discusses value lookup techniques for selecting elements based on attribute values and the use of functions like 'contains' and 'starts-with'.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

HTML web page

<!DOCTYPE html>
<html lang="en">
<head>
<title>XPath and CSS Selectors</title>
</head>
<body>
<h1>CSS Selectors simplified</h1>
<div class="intro">
<p>
I'm paragraph within a div with a class set to
intro
<span id="location">I'm a span with ID set to
location and i'm within a paragraph</span>
</p>
<p id="outside">I'm a paragraph with ID set to
outside and i'm within a div with a class set to intro</p>
</div>
<p>Hi i'm placed immediately after a div with a class
set to intro
</p>
<span class='intro'>Div with a class attribute set to
intro
</span>
<ul id="items">
<li data-identifier="7">Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>

<a href="https://www.google.com">Google</a>
<a href="http://www.google.fr">Google France</a>
<p class='bold italic'>Hi, I have two classes</p>
<p class='bold'>Hi i'm bold</p>
</body>
</html>
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
BASICS
An element is a tag in the HTML markup.
Example:
The ‘p’ tag aka paragragh is called an element.
To select any element from HTML web pages we simply call it by its
tag name.
Example:
To select all p elements we can use the following CSS Selector

Although this approach works perfectly fine, it’s not recommended to


use it, because if for example we want only to select the “p”
elements that are inside the first div with a class attribute equals to
“intro” this approach won’t be the best solution, this is why we
always prefer to target elements either by their class attribute, id or
by position so we can limit the scope of the CSS selector.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
CLASS & ID
So to select any element by its class attribute value we use the
following syntax:
.className
If we want to target an element by its id attribute value we use the
following syntax:
#id

Example:
Let’s say we want to select the “p” elements that inside the “div”
with a class attribute equals to “intro” in this case we use the
following CSS Selector:
.intro p
If we want to select the “p” element with “id” equals to “outside” we
can use the following CSS selector:
#outside
REMEBER:

Please note, the same exact class attribute value can be


assigned to more than one element however, and id can be
assigned to only and only one element.

Sometimes we want also to select elements based on a


foreign attribute which doesn’t belong to HTML
markup standard. For example to select the “li”
element with the attribute “data-identifier” equals to 7
in this case we use the following CSS Selector:
li[data-identifier=”7”]
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Sometimes the element we want to select does have two classes, for
example, to select the “p” element with a class attribute equals to
“bold” and “italic” in this case we use the following CSS Selector:
.bold.italic
OR:
p[class=’bold tialic’]
OR:
p.bold.italic
Speaking which one is better than the other one, it depends on you,
which syntax you like more and which one you can remember as fast
as possible.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Value lookup
Let’s say you want to select all the “a” elements in which the “href”
attribute value starts with “https” and not “http”, in this case we can
use the following CSS Selector:
a[href ^=’https’]
OR:
[href ^=’https’]
So search for the text at the beginning we use the caret sign “^”
Now if you want to search for a value at the end we use the “$” sign,
for example to select the “a” elements where the “href” attribute
value ends with “fr” and not “com” we use the following CSS
Selector:
a[href $=’fr’]
OR:
[href $=’fr’]
Finally if we want to search for a particular value in between we use
the tilde “~":

elementName[attribute ~=’fr’]

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
HTML web page
<!DOCTYPE html>
<html lang="en">
<head>
<title>XPath and CSS Selectors</title>
</head>
<body>
<h1>XPath expressions simplified</h1>
<div class="intro">
<p>
I'm paragraph within a div with a class set to
intro
<span id="location">I'm a span with ID set to
location and i'm within a paragraph</span>
</p>
<p id="outside">I'm a paragraph with ID set to
outside and i'm within a div with a class set to intro</p>
</div>
<p>Hi i'm placed immediately after a div with a class
set to intro
</p>
<span class='intro'>Div with a class attribute set to
intro
</span>
<ul id="items">
<li data-identifier="7">Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>

<a href="https://www.google.com">Google</a>
<a href="http://www.google.fr">Google France</a>
<p class='bold italic'>Hi, I have two classes</p>
<p class='bold'>Hi i'm bold</p>
</body>
</html>
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
BASICS
An element is a tag in the HTML markup.
Example:
The ‘p’ tag aka paragragh is called an element.
To select any element from HTML web pages we simply use the
following syntax
Example:
To select all p elements we can use the following XPath selector

//p

Although this approach works perfectly fine, it’s not recommended to


use it, because if for example we want only to select the “p”
elements that are inside the first div with a class attribute equals to
“intro” this approach won’t be the best solution, this is why we
always prefer to target elements either by their class attribute, id or
by position so we can limit the scope of the XPath expression.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
CLASS & ID
So to select any element by its class attribute value we use the
following syntax:
//elementName[@attributeName=’value’]
Example:
Let’s say we want to select the “p” elements that inside the “div”
with a class attribute equals to “intro” in this case we use the
following XPath expression:
//div[@class=’intro’]/p
If we want to select the “p” element with “id” equals to “outside” we
can use the following XPath expression:
//p[@id=’outside’]/p
REMEBER:

Please note, the same exact class attribute value can be


assigned to more than one element however, and id can be
assigned to only and only one element.

Sometimes we want also to select elements based on a foreign


attribute which doesn’t belong to HTML markup standard. For
example to select the “li” element with the attribute “data-
identifier” equals to 7 in this case we use the following XPath
expression:

//li[@data-identifier=”7”]

Sometimes the element we want to select does have two classes, for
example, to select the “p” element with a class attribute equals to
“bold” and “italic” in this case we use the following XPath expression:
//p[@class=’bold italic’]
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
OR:
Although the element does have two classes we can for example
search for a substring within the class attribute value by using the
contains function.
//p[contains(@class, ‘italic’)]
REMEBER:

The contains function takes two arguments:


 The first one is where to search, whether on the class
attribute value, id or anything else.
 The second argument is the value you’re looking for.
 The value you search for is also case sensitive, so be
careful!

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Value lookup
Let’s say you want to select all the “a” elements in which the “href”
attribute value starts with “https” and not “http”, in this case we can
use the following XPath expression:
//a[starts-with(@class, ‘https’)]
So search for the text at the beginning we use the caret sign “starts-
with” function which takes the same arguments as the contains
function.
Now if you want to search for a value at the end we use the “ends-
with” function, however, this function is not supported on XPath
version 1.0 which is the version used by the majority of the browsers
and LXML.
Finally if we want to search for a particular value in between we use
the contains function as explained before.
If you want to get the text of a particular element you can use the
text function, for example, to get the text element of the “p”
element with id equals to “outside” we use the following XPath
expression:
//p[@id=”outside”]/text()

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition

You might also like