I am extracting data for specific geographical areas and specific indicators from the public health agency in the UK using a package they developed for pulling data from their API called fingertipsR, and then inserting them in to an empty list, where the list consists of lists (geographies) which contain lists representing each indicator.
geog <- c("E38000220", "E38000046", "E38000144", "E38000191", "E38000210",
"E38000038", "E38000164", "E38000195", "E38000078", "E38000139",
"E38000166", "E38000211", "E38000147", "E38000183", "E38000028",
"E38000053", "E38000126", "E38000153", "E38000173", "E38000175"
)
indicators <- c(241, 92588, 90672, 90692, 90697, 90698, 90701, 90702, 91238,
90690, 90694, 93245, 93246, 93244, 93247, 93248, 93049, 93047,
90700)
## install.packages("fingertipsR"); library(fingertipsR)
library(dplyr)
list <- list()
start <- Sys.time()
for (geog_group in geog) {
for (indicator_number in indicators) {
list[[geog_group]][[as.character(indicator_number)]] <- fingertips_data(IndicatorID = indicator_number, AreaTypeID = c(152, 153, 154)) %>%
filter(AreaCode == geog_group, TimeperiodSortable == max(TimeperiodSortable)) %>%
select(Timeperiod, Value) %>% distinct()
}
}
end <- Sys.time()
end-start
On my work laptop, this takes around 15 minutes to execute - I'm wondering if there are any easy ways to optimise this code - possibly with lapply or purrr?
Edit: Ideally I want the indicators for each geographical area to be in one data frame, as they all share the same columns Time period and Value - I was going to deal with that after unlist() or something similar - but if anyone has ways to solve that inside the for loop I'm open to suggestions.