1

I'm new to datascience, but I have the following problem:

1) Read a file of 1GB, which each line is a json object 2) There are two more files, much smaller, which I need to JOIN some data

In this case, which kind of tools is the best?

I saw some examples in pyspark, but I have no idea what happen under the hood.

0 Answers0