Use Flume and Sqoop to import data to HDFS, HBase and Hive from a variety of sources, including Twitter and MySQL
Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems.
Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase.
Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive.
What is Covered:
Practical implementations for a variety of sources and data stores ..
- Sources : Twitter, MySQL, Spooling Directory, HTTP
- Sinks : HDFS, HBase, Hive
Flume features :
Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors
Sqoop features :
Sqoop import from MySQL, Incremental imports using Sqoop Jobs