Scrapy: Login with Selenium webdriver, transfer cookies to spider object?

Question

I was just wondering if there's any reasonable way to pass authentication cookies from webdriver.Firefox() instance to the spider itself? It would be helpful to perform some webdriver stuff and then go about scraping "business as usual". Something to the effect of:

def __init__(self):
    BaseSpider.__init__(self)
    self.selenium = webdriver.Firefox()

def __del__(self):
    self.selenium.quit()
    print self.verificationErrors

def parse(self, response):

    # Initialize the webdriver, get login page
    sel = self.selenium
    sel.get(response.url)
    sleep(3)

    ##### Transfer (sel) cookies to (self) and crawl normally??? #####
    ...
    ...

Should be possible, I have same issue but working with PHP curl and Selenium. The bigger hassle to deal with is converting the cookie(s) returned by Selenium into format usable by the other tool (scrapy). In the case of curl, it doesn't use same format as Selenium, so you can't just simply pass over the cookie and use directly. — David, Jun 14 '12 at 23:28
to get cookie from webdriver, i believe it would be: driver.manage.get_cookies(), store that to variable, convert format if needed, then pass as input to the other tool. — David, Jun 14 '12 at 23:32

score 2 · Answer 1 · answered Apr 19 '15 at 15:27

Transfer Cookies from Selenium to Scrapy Spider

Scrapying File

from selenium import webdriver
driver=webdriver.Firefox()  
data=driver.get_cookies()
# write to temp file        
with open('cookie.json', 'w') as outputfile:
    json.dump(data, outputfile)
    driver.close()
    outputfile.close()

....

Spider

import os
if os.stat("cookie.json").st_size > 2:
    with open('./cookie.json', 'r') as inputfile:
        self.cookie = json.load(inputfile)
    inputfile.close()

Hemanth Gowda · Answer 2 · 2016-06-30T10:31:31.163

This works with chrome driver but not Firefox (Tested OK)
refer https://christopher.su/2015/selenium-chromedriver-ubuntu/ for installation.

import scrapy
from scrapy.spiders.init import InitSpider
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pickle


class HybridSpider(InitSpider):
    name = 'hybrid'
    def init_request(self):
        driver = webdriver.Chrome()`

        driver.get('https://example.com')
        driver.find_element_by_id('js-login').click()
        driver.find_element_by_id('email').send_keys('mymail@example.net')
        driver.find_element_by_id('password').send_keys('mypasssword',Keys.ENTER)

        pickle.dump( driver.get_cookies() , open(os.getenv("HOME")+"/my_cookies","wb"))
        cookies = pickle.load(open(os.getenv("HOME")+"/my_cookies", "rb"))
        FH = open(os.getenv("HOME")+"/my_urls", 'r')

        for url in FH.readlines():
            pass
            yield Request(url,cookies=cookies,callback=self.parse)


    def parse(self, response):
        pass

Haven't tried directly passing the cookies like

yield Request(url,cookies=driver.get_cookies(),callback=self.parse)

Might work too..

score 0 · Answer 3 · edited Oct 20 '18 at 09:20

0

driver = webdriver.Chrome()

Then perform the login or interact with the page through the browser. Now when using the crawler in scrapy, set the cookies parameter:

request = Request(URL, cookies=driver.get_cookies(), callback=self.mycallback)

edited Oct 20 '18 at 09:20

sɐunıɔןɐqɐp

3,332
15
36
40

answered Oct 20 '18 at 07:27

Deskom88

39
6

score 0 · Answer 4 · edited May 23 '17 at 12:32

0

You can try to override BaseSpider.start_requests method to attach to starting requests needed cookies using scrapy.http.cookies.CookieJar.

See also: Scrapy - how to manage cookies/sessions

edited May 23 '17 at 12:32

Community

1
1

answered Feb 25 '12 at 17:36

warvariuc

57,116
41
173
227

Scrapy: Login with Selenium webdriver, transfer cookies to spider object?

4 Answers4

Transfer Cookies from Selenium to Scrapy Spider