How to Run Zephyr (MapReduce)

Zephyr is primarily a library and an abstraction; it's impossible to give you in depth instructions for your own flows. However, by running through a quick exercise to load some sample wikipedia data through Zephyr and into a tab delimited format in MapReduce, we can begin to show how you would use it.


Run Zephyr

        $ hadoop fs -mkdir -p /tmp/zephyr-twitter-example/
        $ wget
        $ hadoop fs -copyFromLocal data.tsv /tmp/zephyr-twitter-example/
        $ cd /opt/zephyr
        $ ./ -job twitter-job.xml

Configuration Options

See: Options

When this example has finished, there should be a folder in HDFS called /tmp/zephyr-twitter-output.

This is only the simplest use of Zephyr, but it should give you an example to work with as you move forward with your own ETL process.