python-scrapy深度爬取

2021-01-15 03:14

阅读：1016

标签：project elf pipe dia home request ret app into

爬取电影网站

movie.py

import scrapy
from MyProjectDianying.items import MyprojectdianyingItemclass MovieSpider(scrapy.Spider):
    name = ‘movie‘
    # allowed_domains = [‘www.xxx.com‘]
    start_urls = [‘https://www.1905.com/vod/list/n_1_t_1/o3.html?fr=vodhome_js_lx‘]
    url = ‘https://www.1905.com/vod/list/n_1_t_1/o3p%d.html‘
    page = 2
    def parse(self, response):
        divs = response.xpath(‘//*[@id="content"]/section[4]/div‘)
        for div in divs:
            href = div.xpath(‘./a/@href‘)[0].extract()
            title = div.xpath(‘./a/@title‘)[0].extract()
            item = MyprojectdianyingItem()
            item["href"] = href
            item["title"] = title
            print(title)
            yield scrapy.Request(href, callback=self.parse_href, meta={‘item‘: item})
            if self.page                 url = format(self.url % self.page)
                yield scrapy.Request(url,callback=self.parse)
                self.page += 1
    def parse_href(self,response):
        detail = response.xpath(‘//*[@id="playerBoxIntroCon"]/text()‘)[0].extract()
        item = response.meta[‘item‘]
        item["detail"] = detail
        yield item

items.py

import scrapyclass MyprojectdianyingItem(scrapy.Item):
    # define the fields for your item here like:
    href = scrapy.Field()
    title = scrapy.Field()
    detail = scrapy.Field()

settings.py

USER_AGENT = ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36‘
ROBOTSTXT_OBEY = False
LOG_LEVEL = ‘ERROR‘

ITEM_PIPELINES = {
   ‘MyProjectDianying.pipelines.MyprojectdianyingPipeline‘: 300,
}

pipelines.py

class MyprojectdianyingPipeline:
    fp = None
    def open_spider(self,spider):
        self.fp = open(‘dianying.txt‘, mode=‘w‘, encoding=‘utf-8‘)    def process_item(self, item, spider):
        href = item["href"]
        title = item["title"]
        detail = item["detail"]
        self.fp.write(title+href+detail+‘\n‘)
        return item
    def close_spider(self,spider):
        self.fp.close()

python-scrapy深度爬取

标签：project elf pipe dia home request ret app into

原文地址：https://www.cnblogs.com/shiyi525/p/14274049.html

上一篇：C#将一个字符串数组的元素的顺序进行反转

下一篇：python-scrapy-中间件的学习

文章来自：搜素材网的编程语言模块，转载请注明文章出处。
文章标题：python-scrapy深度爬取
文章链接：http://soscw.com/index.php/essay/42073.html

亲，登录后才可以留言！

python-scrapy深度爬取

评论

热门文章

推荐文章

最新文章

置顶文章