2

We are planning to use REST API calls to ingest data from an endpoint and store the data to HDFS. The REST calls are done in a periodic fashion (daily or maybe hourly).

I've already done Twitter ingestion using Flume, but I don't think using Flume would suit my current use-case because I am not using a continuous data firehose like this one in Twitter, but rather discrete regular time-bound invocations.

Please I would like to hear suggestions / alternatives (if there's easier than what I'm thinking right now) about design and which Hadoop-based component(s) to use for this use-case. If you feel I can stick to Flume, then kindly give me also an idea how to do this.

geek-tech
  • 29
  • 3

1 Answers1

2

You can use Kafka to ingest data into HDFS or any other cloud storage like S3 or Google storage. And you can use Gobblin to schedule your kafka consumer to write into HDFS.

Kafka Producers => Kafka Consumer => Gobblin (Monthly/weekly/Daily/hourly/minutes) => HDFS

Hope this will work for you

Abhis
  • 121
  • 3