测试用到的爬取网站
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201202123215615.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTk5ODM3MQ==,size_16,color_FFFFFF,t_70)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201202123241110.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTk5ODM3MQ==,size_16,color_FFFFFF,t_70)
In [11]:
print(response.xpath('//h3/a/@title'))
# scrapy.selector.unified.SelectorList 是Selector组成的列表
Out[11]:
# 为了方便阅读换行符我手打的
[<Selector xpath='//h3/a/@title' data='A Light in the Attic'>,
<Selector xpath='//h3/a/@title' data='Tipping the Velvet'>,
<Selector xpath='//h3/a/@title' data='Soumission'>,
<Selector xpath='//h3/a/@title' data='Sharp Objects'>,
<Selector xpath='//h3/a/@title' data='Sapiens: A Brief History of Humankind'>,
<Selector xpath='//h3/a/@title' data='The Requiem Red'>,
<Selector xpath='//h3/a/@title' data='The Dirty Little Secrets of Getting Y...'>,
<Selector xpath='//h3/a/@title' data='The Coming Woman: A Novel Based on th...'>,
<Selector xpath='//h3/a/@title' data='The Boys in the Boat: Nine Americans ...'>,]
In [9]: print(response.xpath('//h3/a/@title').extract())
# List
Out[9]:
# 为了方便阅读换行符我手打的
['A Light in the Attic',
'Tipping the Velvet',
'Soumission',
'Sharp Objects',
'Sapiens: A Brief History of Humankind',
'The Requiem Red',
'The Dirty Little Secrets of Getting Your Dream Job',
'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull',
'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics', ]
In [7]: print(response.xpath('//h3/a/@title').extract_first())
# Str
Out[7]:
A Light in the Attic
在有很多数据情况下 |
type() |
说明 |
直接Xpath |
是Selector组成的列表 |
|
Xpath.extract() |
List |
把爬到的数据组成一个列表 |
Xpath.extract_first() |
Str |
把爬到的数据组成一个列表提取第一个转化为Str格式 |
Xpath.get() Str |
把爬到的数据组成一个列表提取第一个转化为Str格式 |
|
别人博客的补充说明
https://www.ucloud.cn/yun/43396.html