Zephyr

How to Run Zephyr (MapReduce)

Zephyr is primarily a library and an abstraction; it's impossible to give you in depth instructions for your own flows. However, by running through a quick exercise to load some sample wikipedia data through Zephyr and into a tab delimited format in MapReduce, we can begin to show how you would use it.

Requirements

Twitter Subset - loaded into HDFS at /tmp/zephyr-twitter-example
Built and Deployed Zephyr Sample Project (assumption: deploy this to /opt/zephyr/)
Test Data

Run Zephyr

    
        $ hadoop fs -mkdir -p /tmp/zephyr-twitter-example/
        $ wget http://sotera.github.io/zephyr/data/data.tsv
        $ hadoop fs -copyFromLocal data.tsv /tmp/zephyr-twitter-example/
        $ cd /opt/zephyr
        $ ./run.sh -job twitter-job.xml

Configuration Options

See: Options

When this example has finished, there should be a folder in HDFS called /tmp/zephyr-twitter-output.

This is only the simplest use of Zephyr, but it should give you an example to work with as you move forward with your own ETL process.