Css selector & Xpath expla
Css selector & Xpath expla
<!DOCTYPE html>
<html lang="en">
<head>
<title>XPath and CSS Selectors</title>
</head>
<body>
<h1>CSS Selectors simplified</h1>
<div class="intro">
<p>
I'm paragraph within a div with a class set to
intro
<span id="location">I'm a span with ID set to
location and i'm within a paragraph</span>
</p>
<p id="outside">I'm a paragraph with ID set to
outside and i'm within a div with a class set to intro</p>
</div>
<p>Hi i'm placed immediately after a div with a class
set to intro
</p>
<span class='intro'>Div with a class attribute set to
intro
</span>
<ul id="items">
<li data-identifier="7">Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<a href="https://www.google.com">Google</a>
<a href="http://www.google.fr">Google France</a>
<p class='bold italic'>Hi, I have two classes</p>
<p class='bold'>Hi i'm bold</p>
</body>
</html>
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
BASICS
An element is a tag in the HTML markup.
Example:
The ‘p’ tag aka paragragh is called an element.
To select any element from HTML web pages we simply call it by its
tag name.
Example:
To select all p elements we can use the following CSS Selector
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
CLASS & ID
So to select any element by its class attribute value we use the
following syntax:
.className
If we want to target an element by its id attribute value we use the
following syntax:
#id
Example:
Let’s say we want to select the “p” elements that inside the “div”
with a class attribute equals to “intro” in this case we use the
following CSS Selector:
.intro p
If we want to select the “p” element with “id” equals to “outside” we
can use the following CSS selector:
#outside
REMEBER:
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Value lookup
Let’s say you want to select all the “a” elements in which the “href”
attribute value starts with “https” and not “http”, in this case we can
use the following CSS Selector:
a[href ^=’https’]
OR:
[href ^=’https’]
So search for the text at the beginning we use the caret sign “^”
Now if you want to search for a value at the end we use the “$” sign,
for example to select the “a” elements where the “href” attribute
value ends with “fr” and not “com” we use the following CSS
Selector:
a[href $=’fr’]
OR:
[href $=’fr’]
Finally if we want to search for a particular value in between we use
the tilde “~":
elementName[attribute ~=’fr’]
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
HTML web page
<!DOCTYPE html>
<html lang="en">
<head>
<title>XPath and CSS Selectors</title>
</head>
<body>
<h1>XPath expressions simplified</h1>
<div class="intro">
<p>
I'm paragraph within a div with a class set to
intro
<span id="location">I'm a span with ID set to
location and i'm within a paragraph</span>
</p>
<p id="outside">I'm a paragraph with ID set to
outside and i'm within a div with a class set to intro</p>
</div>
<p>Hi i'm placed immediately after a div with a class
set to intro
</p>
<span class='intro'>Div with a class attribute set to
intro
</span>
<ul id="items">
<li data-identifier="7">Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>
<a href="https://www.google.com">Google</a>
<a href="http://www.google.fr">Google France</a>
<p class='bold italic'>Hi, I have two classes</p>
<p class='bold'>Hi i'm bold</p>
</body>
</html>
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
BASICS
An element is a tag in the HTML markup.
Example:
The ‘p’ tag aka paragragh is called an element.
To select any element from HTML web pages we simply use the
following syntax
Example:
To select all p elements we can use the following XPath selector
//p
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
CLASS & ID
So to select any element by its class attribute value we use the
following syntax:
//elementName[@attributeName=’value’]
Example:
Let’s say we want to select the “p” elements that inside the “div”
with a class attribute equals to “intro” in this case we use the
following XPath expression:
//div[@class=’intro’]/p
If we want to select the “p” element with “id” equals to “outside” we
can use the following XPath expression:
//p[@id=’outside’]/p
REMEBER:
//li[@data-identifier=”7”]
Sometimes the element we want to select does have two classes, for
example, to select the “p” element with a class attribute equals to
“bold” and “italic” in this case we use the following XPath expression:
//p[@class=’bold italic’]
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
OR:
Although the element does have two classes we can for example
search for a substring within the class attribute value by using the
contains function.
//p[contains(@class, ‘italic’)]
REMEBER:
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Value lookup
Let’s say you want to select all the “a” elements in which the “href”
attribute value starts with “https” and not “http”, in this case we can
use the following XPath expression:
//a[starts-with(@class, ‘https’)]
So search for the text at the beginning we use the caret sign “starts-
with” function which takes the same arguments as the contains
function.
Now if you want to search for a value at the end we use the “ends-
with” function, however, this function is not supported on XPath
version 1.0 which is the version used by the majority of the browsers
and LXML.
Finally if we want to search for a particular value in between we use
the contains function as explained before.
If you want to get the text of a particular element you can use the
text function, for example, to get the text element of the “p”
element with id equals to “outside” we use the following XPath
expression:
//p[@id=”outside”]/text()
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition