Framework to process file with 1GB

Asked Jan 31 '20 at 00:19

Active Feb 03 '20 at 10:35

Viewed 20 times

I'm new to datascience, but I have the following problem:

1) Read a file of 1GB, which each line is a json object 2) There are two more files, much smaller, which I need to JOIN some data

In this case, which kind of tools is the best?

I saw some examples in pyspark, but I have no idea what happen under the hood.

edited Feb 03 '20 at 10:35

asked Jan 31 '20 at 00:19

Ramon Medeiros

0 Answers0