7

I have two CSV files (each of the file size is in GBs) which I am trying to merge, but every time I do that, my computer hangs. Is there no way to merge them in chunks in pandas itself?

Shayan Shafiq
  • 1,008
  • 4
  • 13
  • 24
enterML
  • 3,091
  • 9
  • 28
  • 38

2 Answers2

4

No, there is not. You will have to use an alternative tool like dask, drill, spark, or a good old fashioned relational database.

coldspeed
  • 103
  • 4
Emre
  • 10,541
  • 1
  • 31
  • 39
3

When faced with such situations (loading & appending multi-GB csv files), I found @user666's option of loading one data set (e.g. DataSet1) as a Pandas DF and appending the other (e.g. DataSet2) in chunks to the existing DF to be quite feasible.

Here is the code I implement:

import pandas as pd

amgPd = pd.DataFrame()
for chunk in pd.read_csv(path1+'DataSet1.csv', chunksize = 100000, low_memory=False):
    amgPd = pd.concat([amgPd,chunk])
vsdaking
  • 236
  • 1
  • 6