0% found this document useful (0 votes)

53 views30 pages

Chapter3-CSS Locators, Chaining, and Responses

The document discusses CSS locators and how they can be used for web scraping in Python. Some key points include: - CSS locators can be used as an alternative to XPath locators to select elements on a webpage. Certain characters in XPath are replaced with different syntax in CSS. - CSS locators allow selecting elements by attributes like class and id. The document provides examples of finding elements by class with a period and by id with a pound sign. - In addition to selecting elements, CSS can be used to extract attribute values and text from elements. The document demonstrates extracting an href attribute value and text content. - When scraping multiple pages, the Response object in Python keeps track of

Uploaded by

Komi David ABOTSITSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views30 pages

Chapter3-CSS Locators, Chaining, and Responses

Uploaded by

Komi David ABOTSITSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CSS Locators

WEB SCRAPING IN PYTHON

Thomas Laetsch
Data Scientist, NYU
Rosetta CSStone
/ replace by > (except rst character)
XPath: /html/body/div

CSS Locator: html > body > div

// replaced by a blank space (except rst character)

XPath: //div/span//p

CSS Locator: div > span p

[N] replaced by :nth-of-type(N)

XPath: //div/p[2]

CSS Locator: div > p:nth-of-type(2)

WEB SCRAPING IN PYTHON

Rosetta CSStone
XPATH

xpath = '/html/body//div/p[2]'

CSS

css = 'html > body div > p:nth-of-type(2)'

WEB SCRAPING IN PYTHON

Attributes in CSS
To nd an element by class, use a period .
Example: p.class-1 selects all paragraph elements belonging to class-1

To nd an element by id, use a pound sign #

Example: div#uid selects the div element with id equal to uid

WEB SCRAPING IN PYTHON

Attributes in CSS
Select paragraph elements within class class1 :

css_locator = 'div#uid > p.class1'

Select all elements whose class a ribute belongs to class1 :

css_locator = '.class1'

WEB SCRAPING IN PYTHON

Class Status
css = '.class1'

WEB SCRAPING IN PYTHON

Class Status
xpath = '//*[@class="class1"]'

WEB SCRAPING IN PYTHON

Class Status
xpath = '//*[contains(@class,"class1")]'

WEB SCRAPING IN PYTHON

Selectors with CSS
from scrapy import Selector

html = '''
<html>
<body>
<div class="hello datacamp">
<p>Hello World!</p>
</div>
<p>Enjoy DataCamp!</p>
</body>
</html>
'''
sel = Selector( text = html )

>>> sel.css("div > p")

out: [<Selector xpath='...' data='<p>Hello World!</p>'>]

>>> sel.css("div > p").extract()

out: [ '<p>Hello World!</p>' ]

WEB SCRAPING IN PYTHON

C(SS) You Soon!
WEB SCRAPING IN PYTHON
Attribute and Text
Selection
WEB SCRAPING IN PYTHON

Thomas Laetsch
Data Scientist, NYU
You Must have Guts to use your Colon
Using XPath: <xpath-to-element>/@attr-name

xpath = '//div[@id="uid"]/a/@href'

Using CSS Locator: <css-to-element>::attr(attr-name)

css_locator = 'div#uid > a::attr(href)'

WEB SCRAPING IN PYTHON

Text Extraction
<p id="p-example">
Hello world!
Try <a href="http://www.datacamp.com">DataCamp</a> today!
</p>

In XPath use text()

sel.xpath('//p[@id="p-example"]/text()').extract()
# result: ['\n Hello world!\n Try ', ' today!\n']

sel.xpath('//p[@id="p-example"]//text()').extract()
# result: ['\n Hello world!\n Try ', 'DataCamp', ' today!\n']

WEB SCRAPING IN PYTHON

Text Extraction
<p id="p-example">
Hello world!
Try <a href="http://www.datacamp.com">DataCamp</a> today!
</p>

For CSS Locator, use ::text

sel.css('p#p-example::text').extract()
# result: ['\n Hello world!\n Try ', ' today!\n']

sel.css('p#p-example ::text').extract()
# result: ['\n Hello world!\n Try ', 'DataCamp', ' today!\n']

WEB SCRAPING IN PYTHON

Scoping the Colon
WEB SCRAPING IN PYTHON
Getting Ready to
Crawl
WEB SCRAPING IN PYTHON

Thomas Laetsch
Data Scientist, NYU
Let's Respond
Selector vs Response:

The Response has all the tools we learned with Selectors:

xpath and css methods followed by extract and extract_first methods.

The Response also keeps track of the url where the HTML code was loaded from.

The Response helps us move from one site to another, so that we can "crawl" the web while
scraping.

WEB SCRAPING IN PYTHON

What We Know!
xpath method works like a Selector

response.xpath( '//div/span[@class="bio"]' )

css method works like a Selector

response.css( 'div > span.bio' )

Chaining works like a Selector

response.xpath('//div').css('span.bio')

Data extraction works like a Selector

response.xpath('//div').css('span.bio').extract()
response.xpath('//div').css('span.bio').extract_first()

WEB SCRAPING IN PYTHON

What We Don't Know
The response keeps track of the URL within the response url variable.

response.url
>>> 'http://www.DataCamp.com/courses/all'

The response lets us "follow" a new link with the follow() method

# next_url is the string path of the next url we want to scrape

response.follow( next_url )

We'll learn more about follow later.

WEB SCRAPING IN PYTHON

In Response
WEB SCRAPING IN PYTHON
Scraping For Reals
WEB SCRAPING IN PYTHON

Thomas Laetsch
Data Scientist, NYU
DataCamp Site
h ps://www.datacamp.com/courses/all

WEB SCRAPING IN PYTHON

What's the Div, Yo?
# response loaded with HTML from https://www.datacamp.com/courses/all

course_divs = response.css('div.course-block')

print( len(course_divs) )
>>> 185

WEB SCRAPING IN PYTHON

Inspecting course-block
first_div = course_divs[0]
children = first_div.xpath('./*')
print( len(children) )
>>> 3

WEB SCRAPING IN PYTHON

The first child
first_div = course_divs[0]
children = first_div.xpath('./*')

first_child = children[0]
print( first_child.extract() )
>>> <a class=... />

WEB SCRAPING IN PYTHON

The second child
first_div = course_divs[0]
children = first_div.xpath('./*')

second_child = children[1]
print( second_child.extract() )
>>> <div class=... />

WEB SCRAPING IN PYTHON

The forgotten child
first_div = course_divs[0]
children = first_div.xpath('./*')

third_child = children[2]
print( third_child.extract() )
>>> <span class=... />

WEB SCRAPING IN PYTHON

Listful
In one CSS Locator

links = response.css('div.course-block > a::attr(href)').extract()

Stepwise

# step 1: course blocks

course_divs = response.css('div.course-block')
# step 2: hyperlink elements
hrefs = course_divs.xpath('./a/@href')
# step 3: extract the links
links = hrefs.extract()

WEB SCRAPING IN PYTHON

Get Schooled
for l in links:
print( l )

>>> /courses/free-introduction-to-r
>>> /courses/data-table-data-manipulation-r-tutorial
>>> /courses/dplyr-data-manipulation-r-tutorial
>>> /courses/ggvis-data-visualization-r-tutorial
>>> /courses/reporting-with-r-markdown
>>> /courses/intermediate-r
...

WEB SCRAPING IN PYTHON

Links Achieved
WEB SCRAPING IN PYTHON

Linear Methods: A General Education Course 1st Edition Andrilli All Chapter Instant Download
100% (7)
Linear Methods: A General Education Course 1st Edition Andrilli All Chapter Instant Download
62 pages
A Practical Guide to Web Scraping ( PDFDrive )
No ratings yet
A Practical Guide to Web Scraping ( PDFDrive )
107 pages
Project Closure Template
No ratings yet
Project Closure Template
3 pages
I. Complete The Sentences With The Names of The Countries and Nationalities: 10p
No ratings yet
I. Complete The Sentences With The Names of The Countries and Nationalities: 10p
2 pages
Session 1class Management
No ratings yet
Session 1class Management
1 page
Chapter 3
No ratings yet
Chapter 3
12 pages
Web-Scraping-With-Python
No ratings yet
Web-Scraping-With-Python
16 pages
Web Scraping
No ratings yet
Web Scraping
53 pages
The syntax of old Romanian 1st Edition Pană Dindelegan pdf download
No ratings yet
The syntax of old Romanian 1st Edition Pană Dindelegan pdf download
53 pages
08_web_scraping
No ratings yet
08_web_scraping
13 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
1747399713103-1747037056197-webscraping
No ratings yet
1747399713103-1747037056197-webscraping
12 pages
Using Scrapy in PyCharm
100% (1)
Using Scrapy in PyCharm
8 pages
Grade 8 LT term 2
No ratings yet
Grade 8 LT term 2
1 page
DAP_4_module
No ratings yet
DAP_4_module
45 pages
Cad Dollar to Euro - Google Search
No ratings yet
Cad Dollar to Euro - Google Search
1 page
Css selector & Xpath expla
No ratings yet
Css selector & Xpath expla
10 pages
Symbols
No ratings yet
Symbols
1 page
Web Scraping in Python Using Scrapy
No ratings yet
Web Scraping in Python Using Scrapy
30 pages
Chapter 8
No ratings yet
Chapter 8
125 pages
Python Scrapy
No ratings yet
Python Scrapy
4 pages
Web Scraping With Python Tutorials From A To Z
100% (1)
Web Scraping With Python Tutorials From A To Z
35 pages
helmquest0000fick_1
No ratings yet
helmquest0000fick_1
36 pages
PPS2-UNIT-1 (1)
No ratings yet
PPS2-UNIT-1 (1)
38 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Chapter 1
No ratings yet
Chapter 1
10 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
A Python Web Scraping How-To Guide: Devbyexample
No ratings yet
A Python Web Scraping How-To Guide: Devbyexample
6 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
CS604 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS604 Mcqs MidTerm by Vu Topper RM
60 pages
Python Libraries For Data Extraction
No ratings yet
Python Libraries For Data Extraction
10 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
No ratings yet
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
16 pages
Acariya Delson Retreat Etiquette 2023
No ratings yet
Acariya Delson Retreat Etiquette 2023
2 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (2)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
b
No ratings yet
b
77 pages
DLP For Redemption of Debenture
No ratings yet
DLP For Redemption of Debenture
5 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
Quran and Primalogy: Prime Numbers Are The Key
No ratings yet
Quran and Primalogy: Prime Numbers Are The Key
13 pages
B42_IP105__S1_D2
No ratings yet
B42_IP105__S1_D2
4 pages
Scraping HTML Chapter2
No ratings yet
Scraping HTML Chapter2
31 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
BeautifulSoup Notes
No ratings yet
BeautifulSoup Notes
22 pages
Web Scraping Python - Chapter 1
No ratings yet
Web Scraping Python - Chapter 1
29 pages
class 6 summer holiday home work (3)
No ratings yet
class 6 summer holiday home work (3)
5 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Introduction to Web Crawling chapter -13
No ratings yet
Introduction to Web Crawling chapter -13
3 pages
Id or Class Are Simply
No ratings yet
Id or Class Are Simply
5 pages
web scraping using python
No ratings yet
web scraping using python
18 pages
web_scrapping_final[1]
No ratings yet
web_scrapping_final[1]
7 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Test 2
No ratings yet
Test 2
2 pages
Day 2
No ratings yet
Day 2
20 pages
S12 Web Scraping
No ratings yet
S12 Web Scraping
13 pages
PYTHON MODULE-4
No ratings yet
PYTHON MODULE-4
109 pages
Asking To Ask: The Strategtc Function of Indirect Requests For Informatton TN Interviews
No ratings yet
Asking To Ask: The Strategtc Function of Indirect Requests For Informatton TN Interviews
19 pages
Practical Web Scraping for Economists 1744341390
No ratings yet
Practical Web Scraping for Economists 1744341390
33 pages
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
No ratings yet
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
6 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Web Crawling - python
No ratings yet
Web Crawling - python
34 pages
1.1 Web Scraping
No ratings yet
1.1 Web Scraping
34 pages
Course Notes - Web Scraping and API Fundamentals in Python
No ratings yet
Course Notes - Web Scraping and API Fundamentals in Python
10 pages
Download
No ratings yet
Download
4 pages
Pdfsearch em Ingles
No ratings yet
Pdfsearch em Ingles
29 pages
Terraform+Notes+PPT+26th+December+2024+ +KPLABS
No ratings yet
Terraform+Notes+PPT+26th+December+2024+ +KPLABS
707 pages
Test 2 8 Odd
No ratings yet
Test 2 8 Odd
3 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Lesson 4 Unstructured Data
No ratings yet
Lesson 4 Unstructured Data
20 pages
Xpath+vs+CSS+-+Everything+you+need+to+know+about+XPath+and+CSS.docx
No ratings yet
Xpath+vs+CSS+-+Everything+you+need+to+know+about+XPath+and+CSS.docx
11 pages
II SEM - AI23231- POAI
No ratings yet
II SEM - AI23231- POAI
65 pages
Cheat Sheet CSS
No ratings yet
Cheat Sheet CSS
10 pages
Chapter 3
No ratings yet
Chapter 3
16 pages
DiskBoss File and Disk Manager
No ratings yet
DiskBoss File and Disk Manager
160 pages
Principles For Devising A Reading Comprehension Test: A Library Based Review
No ratings yet
Principles For Devising A Reading Comprehension Test: A Library Based Review
20 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Power BI Case Study Meta Data Sheet-2
No ratings yet
Power BI Case Study Meta Data Sheet-2
1 page
EEE105 Chapter10
No ratings yet
EEE105 Chapter10
58 pages
Metaphysical Poets: George Herbert Go To Guide On George Herbert's Poetry Back To Top
100% (1)
Metaphysical Poets: George Herbert Go To Guide On George Herbert's Poetry Back To Top
10 pages
Christos Chen
No ratings yet
Christos Chen
42 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
Scraping
100% (1)
Scraping
25 pages
DR 900 Multiparameter Handheld Colorimeter: Fastest and Simplest Water Testing For The Most Demanding Field Environments
No ratings yet
DR 900 Multiparameter Handheld Colorimeter: Fastest and Simplest Water Testing For The Most Demanding Field Environments
2 pages
CJC Lesson3 Pronouns
No ratings yet
CJC Lesson3 Pronouns
10 pages
Xpath Cheat Sheet: Ahmed Rafik - Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy) 2 Edition
No ratings yet
Xpath Cheat Sheet: Ahmed Rafik - Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy) 2 Edition
11 pages
Web Scrapping: From NP-10
No ratings yet
Web Scrapping: From NP-10
11 pages
SYLLABUS JUNIOR EXECUTIVE (COMMON CADRE) Advt. No. 03-2023
No ratings yet
SYLLABUS JUNIOR EXECUTIVE (COMMON CADRE) Advt. No. 03-2023
1 page
Web Scraping Cheat Sheet 2.0
No ratings yet
Web Scraping Cheat Sheet 2.0
3 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
An Interview With Elizabeth Povinelli
No ratings yet
An Interview With Elizabeth Povinelli
17 pages
How To Scrap Any Website's Content Using Scrapy
0% (1)
How To Scrap Any Website's Content Using Scrapy
20 pages
Synopsis WS
No ratings yet
Synopsis WS
11 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
English Comprehension 11 Practice Paper
100% (1)
English Comprehension 11 Practice Paper
6 pages
Learn JavaScript in 24 Hours
From Everand
Learn JavaScript in 24 Hours
Alex Nordeen
3.5/5 (5)
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)