Can anyone recommend any command line tool for converting large CSV file into HDF5 format?
Asked
Active
Viewed 1.1k times
1 Answers
2
- 1st approach: Use append=True in the call to
to_hdf:
import numpy as np
import pandas as pd
#filename = '/tmp/test.hdf5'
filename = 'D:\test.hdf5'
df = pd.DataFrame(np.arange(10).reshape((5,2)), columns=['C1', 'C2'])
print(df)
C1 C2
0 0 1
1 2 3
2 4 5
3 6 7
Save to HDF5
df.to_hdf(filename, 'data', mode='w', format='table')
del df # allow df to be garbage collected
Append more data
df2 = pd.DataFrame(np.arange(10).reshape((5,2))*10, columns=['C1', 'C2'])
df2.to_hdf(filename, 'data', append=True)
print(pd.read_hdf(filename, 'data'))
- 2nd approach: you could append to a HDFStore instead of calling
df.to_hdf:
import numpy as np
import pandas as pd
#filename = '/tmp/test.hdf5'
filename = 'D:\test.hdf5'
store = pd.HDFStore(filename)
for i in range(2):
df = pd.DataFrame(np.arange(10).reshape((5,2)) * 10**i, columns=['C1', 'C2'])
store.append('data', df)
store.close()
store = pd.HDFStore(filename)
data = store['data']
print(data)
store.close()
- 3rd approach: using
chunksizeparameter and append each chunk to the HDF file which was answered here.
Personally, I like the 1st and 2nd approaches.
Shayan Shafiq
- 1,008
- 4
- 13
- 24
Mario
- 571
- 1
- 6
- 24