Environment
Python 3.7
Scrapy 1.6
MacOS Mojave 10.14.4
Request.metaを使う
1 2 |
request = scrapy.Request("http://hogehoge.com/", self.parse_items) request.meta['item'] = item |
こんな感じでRequestオブジェクトのmetaフィールドにitemを格納し、
1 |
item = response.meta['item'] |
Responseオブジェクトのmetaに渡されてるitemを読みだして使用する。
下記は公式より引用。
meta (dict) – the initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied.
A dict that contains arbitrary metadata for this request. This dict is empty for new Requests, and is usually populated by different Scrapy components (extensions, middlewares, etc). So the data contained in this dict depends on the extensions you have enabled.
See Request.meta special keys for a list of special meta keys recognized by Scrapy.
This dict is shallow copied when the request is cloned using the copy() or replace() methods, and can also be accessed, in your spider, from the response.meta attribute.
Requests and Responses — Scrapy 1.6.0 documentation
ただし、request.metaフィールドに格納できるのは決められたキーのみで、ユーザーが独自に定義することはできないっぽいので注意。
使用可能なキーは下記。
- dont_redirect
- dont_retry
- handle_httpstatus_list
- handle_httpstatus_all
- dont_merge_cookies
- cookiejar
- dont_cache
- redirect_urls
- bindaddress
- dont_obey_robotstxt
- download_timeout
- download_maxsize
- download_latency
- download_fail_on_dataloss
- proxy
- ftp_user (See FTP_USER for more info)
- ftp_password (See FTP_PASSWORD for more info)
- referrer_policy
- max_retry_times
名前みればなんとなくわかると思うが、詳しくは下記の公式ドキュメントを見てくれ!
(アクセスが多いようなら後日ちゃんと書く)
Request.meta special keys — Scrapy 1.6.0 documentation
使用例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import scrapy from example_spider.items import ExampleSpiderItem class ExampleSpider(scrapy.Spider): name = 'example_spider' allowed_domains = [<your_domains>] start_urls = [<your_urls>] def parse(self): item = ExampleSpiderItem() item['name'] = 'John Do' request = scrapy.Request("http://hogehoge.com/", self.parse_items) request.meta['item'] = item # Requestのmetaにitemを格納しておく。metaはdictなので普通にキーを指定してやればおk。 yield request def parse_items(self, response): item = response.meta['item'] # Responseオブジェクトのmetaからitemを取得。 item['url'] = response.url item['title'] = response.css('title').xpath('string()').extract_first() return item |