I have a 4 Gb csv file to load in my 16Gb machine, fread and read.csv can't load it at once, they return memory errors.
So I decided to read the file by chunks, and it worked (after one hour or so), and I get a list of data.frames that takes 2.5 Gb if I trust the Environment tab in RStudio, and 1.2 Gb when saved as an RDS.
The issue I have now is concatenating everything back into a big data.frame. from what I understand rbindlist is the most efficient solution (or is it bind_rows ?), but in my case it still uses too much memory.
I think I can solve this by using rbindlist on list items n by n, then recursively up to when I get my final list. This n number would have to be calibrated manually though and this process is really ugly (on top of this annoying csv importation).
Another idea that crossed my mind is to find a way to feed an SQLite database from my loaded data, and then query it from R (I'll only do subset, min and max operations on the data).
Can I do better than this ?
My data is only made of integer and double, if it makes a difference.