2

I have been trying everything I can find online to log in and set cookies and certificates.... can't seem to get past the redirect to a login screen.

Here is what I am trying to do:

##################################################
library("RCurl")
library("XML")

loginURL <- "http://games.espn.go.com/ffl/signin"
dataURL <- "http://games.espn.go.com/ffl/clubhouse?leagueId=123456&teamId=8&seasonId=2014"


# ESPN Fantasy Football Login Screen
userID <- dQuote("myUsername")
pword <-dQuote("myPassword")
pushbutton <- dQuote("OK")

# concatenate the url and log in options
FFLsigninURL <- paste(loginURL ,
    "&username=",userID,
    "&password=",pword,
    "&submit=",pushbutton)

page <- getURL(loginURL , verbose = TRUE)

and this seems to be leading me to a redirect for logging in - so Problem 1 - login not working

Part 2- one logged in - How can I proceed to the dataURL to scrape the tables? I tried login parameters on the data page as well but still get redirected to a login screen.

I'm sure I am missing something simple - just not seeing it...

chris
  • 101
  • 1
  • 8
  • Note - leagueId=123456 is bogus - you need a real league ID and must be a member of that league... If you are in one you can see this in a browser's address line once logged in via a browser and replace 123456 with a real ID number. – chris Sep 07 '14 at 18:23
  • This is very hard to help with given that username/passwords are required so we can't test different solutions ourselves. You'll need to be very dilligent about inspecting the HTTP requests to see how logging in and redirecting is done if you want to re-implment via a scraper. (You might also want to make sure this use is within the Terms of Service for the particular website). I bet you'd be OK if you just make sure to keep track of cookies for log in. Try the template form this answer: http://stackoverflow.com/a/15124055/2372064 – MrFlick Sep 07 '14 at 20:29

1 Answers1

1

It should be possible to follow location etc using RCurl alternatively you could use selenium and drive a browser:

library(RSelenium)
loginURL <- "http://games.espn.go.com/ffl/signin"
user <- 'myPass'
pass <- 'myUser'
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(loginURL)
webElem <- remDr$findElement('name', 'username')
webElem$sendKeysToElement(list(user))
webElem <- remDr$findElement('name', 'password')
webElem$sendKeysToElement(list(pass))
remDr$findElement('name', 'submit')$clickElement()
dataURL <- "http://games.espn.go.com/ffl/clubhouse?leagueId=123456&teamId=8&seasonId=2014"
remDr$navigate(dataURL)
# YOU can get the page source for example 
pageSrc <- remDr$getPageSource()[[1]]
# now operate on pageSrc using for example library(XML) etc
# readHTMLTable(pageSrc) # for example
remDr$close()
remDr$closeServer()
jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • Thanks for the fast response! I tried this and am getting a warning when running RSelenium::startServer() - here is the warning message: Warning message: running command 'java -jar ".../R/win-library/3.1/RSelenium/bin/selenium-server-standalone.jar" -log ".../R/win-library/3.1/RSelenium/bin/sellog.txt"' had status 127 > remDr$open() [1] "Connecting to remote server" Error in function (type, msg, asError = TRUE) : couldn't connect to host > – chris Sep 07 '14 at 19:14
  • Have you ran `RSelenium:checkForServer()` and waited for it to download a selenium server? Alternatively download the latest selenium server standalone from `http://selenium-release.storage.googleapis.com/index.html?path=2.42/` and place it in your path. – jdharrison Sep 07 '14 at 19:20
  • Yes - it installed. I also re-installed java just now and that made a difference. I still get the warning 127, but it seems to be connecting. Still not able to grab the data table yet - but I think I am on the right path. Thanks! – chris Sep 07 '14 at 19:30
  • Ok - seem to be getting somewhere. On the dataURL there is a table with id "playertable_0 - so I do this: webElem <- remDr$findElement('id', 'playertable_0') which seems to work. I then try readHTMLTable to scrape the data but it doesn't work - because it is outside the remote browser... What is the generic equivalent for this in RSelenium? Sorry but I am new to this... – chris Sep 07 '14 at 19:46
  • Get the page source of the page and operate on it as you normally would with `XML`. I have edited to give example. – jdharrison Sep 07 '14 at 19:51
  • Thanks again. And to make sure I had it I saved all and rebooted. I still get the warning with "status 127". But worse - now when I run remDr$open() I get... [1] "Connecting to remote server" Error in function (type, msg, asError = TRUE) : couldn't connect to host > and I'm stuck again. It worked one time, after which I cleaned up with remDr$close() and remDr$closeServer(). Since that I have been unable to run it successfully again. – chris Sep 07 '14 at 20:07
  • A selenium server needs to be running. `RSelenium::startServer` will start one. `remDr$closeServer()` will close the server. If the server has been closed you will need to start it again. The code I supplied opens the server carries out the steps then closes the server once finished. – jdharrison Sep 07 '14 at 20:11
  • @jdharrison So is your hypothesis that more than just cookie tracking is done to login a user? Is that why are recommend `selenium`. I understand how selenium can be required if there are a lot of AJAX calls to build the data on a page, but if its just to get around logging in, that seems to be a bit overkill, no? – MrFlick Sep 07 '14 at 20:31
  • @MrFlick my hypothesis is that it takes about 2 minutes to write code to do it using selenium either way depends on what the overall aim is. – jdharrison Sep 07 '14 at 20:47