3

I've found a few similar questions, but I am new to R and can't figure out how it applies to my specific problem. Here is my code:

library(rvest)
library(plyr)
library(stringr)

#function passes in letter and extracts bold text from each page
fetch_current_players<-function(letter){
  url<-paste0("http://www.baseball-reference.com/players/", letter, "/")
  urlHTML<-read_html(url)
  playerData<-html_nodes(urlHTML, "b a")
  player<-html_text(playerData)
  player
}

#list of letters to pass into function
atoz<-c("a","b","c","d","e","f","g","h",
        "i","j","k","l","m","n","o","p","q","r",
        "s","t","u","v","w","x","y","z")
player_list<-ldply(atoz, fetch_current_players, .progress="text")

So what this code is trying to do is use the URL structure of this website to pass a list of the letter A through Z into my function to produce a list of names that are in bold. I think the problem is that each list of players it returns is of different lengths and that is producing an error as when I manually type in each letter into the function the function appears to work.

Any help is appreciated, thanks!

Spacedman
  • 2,042
  • 12
  • 17
pjlaffey
  • 33
  • 3

1 Answers1

1

Here's a slightly modified version using some newer "tidyverse" packages:

library(rvest) 
library(purrr) # flatten/map/safely
library(dplyr) # progress bar

# just in case there isn't a valid page
safe_read <- safely(read_html)

fetch_current_players <- function(letter){

  URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter)
  pg <- safe_read(URL)

  if (is.null(pg$result)) return(NULL)

  player_data <- html_nodes(pg$result, "b a")

  html_text(player_data)

}

pb <- progress_estimated(length(letters))
player_list <- flatten_chr(map(letters, function(x) {
  pb$tick()$print()
  fetch_current_players(x)
}))
hrbrmstr
  • 351
  • 1
  • 10