I have a df data.frame that consists of 8 years of daily values.
date <- rep(as.Date(seq(as.Date("2001-05-01"),
as.Date("2008-04-30"), by= 1), format="%Y-%m-%d"), 3)
site <- c(rep("Site_1", 2557), rep("Site_2", 2557), rep("Site_3", 2557))
value <- c(as.numeric(sample(90:271, 2557, replace=T)),
as.numeric(sample(125:340, 2557, replace=T)),
as.numeric(sample(70:173, 2557, replace=T)))
df <- data.frame(date, site, value)
In this case, each year starts in May and ends in April.
I want to get the mean and sd for value for each year at the 3 sites.
I did the following
df1 <- df %>%
dplyr::mutate(year = ifelse(date < "2002-05-01", "2001-2002",
ifelse(date < "2003-05-01", "2002-2003",
ifelse(date < "2004-05-01", "2003-2004",
ifelse(date < "2005-05-01", "2004-2005",
ifelse(date < "2006-05-01", "2005-2006",
ifelse(date < "2007-05-01", "2006-2007",
ifelse(date < "2008-05-01", "2007-2008", NA )))))))) %>%
dplyr::select(site, year, value) %>%
dplyr::group_by(site, year) %>%
dplyr::summarise_each(funs(
mean(.),
sd(.)
))
It gave me what I wanted. However, it is time taking if I have data for 30-50 years. Also, if each new data.frame has a different start month, I need to modify ifelse() each time to assign the year ID to be able to group by year and do different calculations.
Is there any straightforward way to assign yearID if the start month is any month other than January?