0

Some sample R code to consider:

df = data.frame(x=letters[1:4], y=letters[5:8])

find.key <- function(x, li, default=NA) {
  ret <- rep.int(default, length(x))
  for (key in names(li)) {
    ret[x %in% li[[key]]] <- key
  }
  return(ret)
}

x2 = list("Alpha" = "a", 
          "Beta"  = "b", 
          "Other" = c("c","d"))

y2 = list("Epi"    = "e", 
          "OtherY" = c("f", "g", "h"))

# This is the code in question, imagine many variables and calls to find.key()
df$NewX2 = find.key(df$x, x2)
df$Newy2 = find.key(df$y, y2)

# df
#   x y NewX2  Newy2
# 1 a e Alpha    Epi
# 2 b f  Beta OtherY
# 3 c g Other OtherY
# 4 d h Other OtherY

So the gist of this is I would like to add new variables (NewX2, Newy2) based on a lookup tables (associative arrays/list) via the find.key function.

Is there some way to keep my code DRY? specifically here:

df$NewX2 = find.key(df$x, x2)
df$Newy2 = find.key(df$y, y2)

I'm not sure sapply or lapply could help? Or perhaps something like %=% as seen here.

I'd like to something like this...(hopefully this makes sense):

c(df$NewX2, df$Newy2) = find.key(c(df$x, df$y), c(x2, y2))
Community
  • 1
  • 1
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
  • 1
    How about `df$New <- mapply(find.key, list(df$x, df$y), list(x2, y2))`? – shadow Sep 16 '14 at 13:51
  • Also, when recoding, it's easier to have the levels of your `x2/y2` tables flipped. For example: `x2 <- c("a"="Alpha", "b"="Beta", "c"="Other", "d"="Other"); df$newx <-x2[as.character(df$x)]` Then you don't need the find.keys function. – MrFlick Sep 16 '14 at 14:41
  • @MrFlick I understand what you're saying, but imagine a larger lookup table where I might be repeating "other" many times. This is why the list format may be preferable. – JasonAizkalns Sep 16 '14 at 16:00
  • If memory usage is your concern, that makes sense. However doing the actual transformation via indexing as I've written it should be much faster. But that would require testing to confirm. But whatever is easier for you to maintain is better. – MrFlick Sep 16 '14 at 16:15

1 Answers1

3

Use [ extraction for the lefthand-side data.frame rather than $ extraction:

df[,c('NewX2','NewY2')] <- mapply(find.key, 
                                  list(df$x, df$y), 
                                  list(x2, y2), 
                                  SIMPLIFY=FALSE)
# df
#   x y NewX2  NewY2
# 1 a e Alpha    Epi
# 2 b f  Beta OtherY
# 3 c g Other OtherY
# 4 d h Other OtherY

Or, if you don't like writing mapply you can use Vectorize, which will create an mapply-based function for you to obtain the same result:

find.keys <- Vectorize(find.key, c("x","li"), SIMPLIFY=FALSE)
df[,c('NewX2','NewY2')] <- find.keys(list(df$x, df$y), list(x2, y2))
df
#   x y NewX2  NewY2
# 1 a e Alpha    Epi
# 2 b f  Beta OtherY
# 3 c g Other OtherY
# 4 d h Other OtherY
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • People don't use `Vectorize` enough, I don't think. I love it. I've created vectorized versions of virtually all non-primitives. +1 – Rich Scriven Sep 16 '14 at 17:28