Chapter3-CSS Locators, Chaining, and Responses
Chapter3-CSS Locators, Chaining, and Responses
Thomas Laetsch
Data Scientist, NYU
Rosetta CSStone
/ replace by > (except rst character)
XPath: /html/body/div
xpath = '/html/body//div/p[2]'
CSS
css_locator = '.class1'
html = '''
<html>
<body>
<div class="hello datacamp">
<p>Hello World!</p>
</div>
<p>Enjoy DataCamp!</p>
</body>
</html>
'''
sel = Selector( text = html )
Thomas Laetsch
Data Scientist, NYU
You Must have Guts to use your Colon
Using XPath: <xpath-to-element>/@attr-name
xpath = '//div[@id="uid"]/a/@href'
sel.xpath('//p[@id="p-example"]/text()').extract()
# result: ['\n Hello world!\n Try ', ' today!\n']
sel.xpath('//p[@id="p-example"]//text()').extract()
# result: ['\n Hello world!\n Try ', 'DataCamp', ' today!\n']
sel.css('p#p-example::text').extract()
# result: ['\n Hello world!\n Try ', ' today!\n']
sel.css('p#p-example ::text').extract()
# result: ['\n Hello world!\n Try ', 'DataCamp', ' today!\n']
Thomas Laetsch
Data Scientist, NYU
Let's Respond
Selector vs Response:
The Response also keeps track of the url where the HTML code was loaded from.
The Response helps us move from one site to another, so that we can "crawl" the web while
scraping.
response.xpath( '//div/span[@class="bio"]' )
response.xpath('//div').css('span.bio')
response.xpath('//div').css('span.bio').extract()
response.xpath('//div').css('span.bio').extract_first()
response.url
>>> 'http://www.DataCamp.com/courses/all'
The response lets us "follow" a new link with the follow() method
Thomas Laetsch
Data Scientist, NYU
DataCamp Site
h ps://www.datacamp.com/courses/all
course_divs = response.css('div.course-block')
print( len(course_divs) )
>>> 185
first_child = children[0]
print( first_child.extract() )
>>> <a class=... />
second_child = children[1]
print( second_child.extract() )
>>> <div class=... />
third_child = children[2]
print( third_child.extract() )
>>> <span class=... />
Stepwise
>>> /courses/free-introduction-to-r
>>> /courses/data-table-data-manipulation-r-tutorial
>>> /courses/dplyr-data-manipulation-r-tutorial
>>> /courses/ggvis-data-visualization-r-tutorial
>>> /courses/reporting-with-r-markdown
>>> /courses/intermediate-r
...