I have two CSV files (each of the file size is in GBs) which I am trying to merge, but every time I do that, my computer hangs. Is there no way to merge them in chunks in pandas itself?
Asked
Active
Viewed 2.4k times
2 Answers
3
When faced with such situations (loading & appending multi-GB csv files), I found @user666's option of loading one data set (e.g. DataSet1) as a Pandas DF and appending the other (e.g. DataSet2) in chunks to the existing DF to be quite feasible.
Here is the code I implement:
import pandas as pd
amgPd = pd.DataFrame()
for chunk in pd.read_csv(path1+'DataSet1.csv', chunksize = 100000, low_memory=False):
amgPd = pd.concat([amgPd,chunk])
vsdaking
- 236
- 1
- 6