I am new to Python and Scrapy. I have not used callback functions before. However, I do now for the code below. The first request will be executed and the response of that will be sent to the callback function defined as second argument:
def parse_page1(self, response):
item = MyItem()
item['main_url'] = response.url
request = Request("http://www.example.com/some_page.html",
callback=self.parse_page2)
request.meta['item'] = item
return request
def parse_page2(self, response):
item = response.meta['item']
item['other_url'] = response.url
return item
I am unable to understand following things:
- How is the
itempopulated? - Does the
request.metaline executes before theresponse.metaline inparse_page2? - Where is the returned
itemfromparse_page2going? - What is the need of the
return requeststatement inparse_page1? I thought the extracted items need to be returned from here.