I have written a MapReduce job that works on some Protobuf files as input. Owing to the nature of the files (unsplittable), each file is processed by one mapper (implemented a custom FileInputFormat with isSplitable set to false). The application works well with input file-sizes less than ~680MB and produces the resulting files however, once the input file size crosses that limit, the application completes successfully but produces an empty file.
I'm wondering if I'm hitting some limit of file-size for a Mapper? If it matters, the files are stored on Google Storage (GFS) and not HDFS.
Thanks!