Computationally Inexpensive Imputation Techniques in R

Question

I have a large data-frame (155257 x 21 to be specific) with only a few missing values. Say, some 2.16% of the values need to be imputed. The values are floating point numbers.

I'd like to use a method that is much faster than it is accurate, because of the size of the data-set and the fact that I don't have much to lose in a speed-accuracy tradeoff.

Running missForest() takes several hours while Hmisc's impute() function gives unsatisfactory results.

What functions in R might be useful in such (or similar) case?

score 1 · Answer 1 · answered Jun 26 '16 at 20:06

Take a look at the h20 package https://cran.r-project.org/web/packages/h2o/h2o.pdf.

Everything is designed with parallelization in mind. I've had great success with many of their implementations, in R and Scala.

If you have to do it in R and are going for pure speed I doubt you'll find something faster.

Computationally Inexpensive Imputation Techniques in R

1 Answers1