View on GitHub

Graphene

Search and Graph your data

Download this project as a .zip file Download this project as a tar.gz file

GRAPHENE

Search and Discover connections in your data

Screenshots from an Instagram example:

Searching through a scrape of Instagram data

A graph of Instagram events

Graphene is a high performance Java based web framework you can use to build a searching and graphing application on top of your data. By Graphing, we mean nodes and edges that describe either shared attributes or events/relations between actors. (i.e. two people share the same address (a shared attribute), or a email between people (an event)) It is datastore agnostic, but has some built in support for Elastic Search, SQL Databases, and Titan. Graphene requires some coding to be done to adapt to your specific needs and data. You get to decide what constitutes a node and edge in your data.

A popular use of Graphene is for discovering shared attributes in forensic analysis, for instance using a corpus of financial or case data. Graphene has been applied to such domains as Anti-Money Laundering, Counter Threat Finance, social media, communication events and more. Leveraging Apache Tika, it has even been applied to collections of files. Graphene lets you visualize the connections between entities that you wouldn't otherwise see. Graphene also allows you to modify the network graph in your browser, and save those modifications back to the server. Graphene has export functionality of graphs built in.

A plugin architecture based on Apache Tapestry allows you to create additional capabilities that can be autodiscovered at deployment. An example is our integration of a UI for communicating with an MIT Lincoln Labs' MITIE Entity Extractor service, which allows you to paste in free text and get a list of clickable entities to drill down on. This is part of our ongoing effort to support Complex Queries using DARPA XDATA innovations.

For configurations, building, and deployment instructions, view the Graphene Wiki. Please see the road maps in the wiki for new features and their slated order.

Using Graphene

The core of the Graphene project contains multiple modules, some of which are optional. The main module that you'll need it the graphene-web module, since it acts as a WAR overlay for your Java based web application. The current example is Graphene-Instagram. Previous demos included Enron, Scott Walker email data, and Kiva Microloan data.

It's our goal that you use the available modules if they make sense for your needs, but we allow you to wire in your own code in most places. In addition, Graphene leverages Apache Tapestry's auto discovery of modules, so we will be expanding the number of 'plugin' modules available. Currently we offer the graphene-augment-mitie as an example of such a module. It's abilities are made available in your app simply by including the jar file in your POM, no code changes necessary!

Building Graphene

Graphene is built using Apache Maven version 3.0.4 or later and a recent version of currenly Java 7.

Quickstart (4.2.0-SNAPSHOT and later)

Once you've built the core modules (from graphene parent) you should be able to create a new project via a maven archetype. If you've never used an archetype, basically it sets up the scaffolding for your new project based on a small amount of user input. From there you can import it into your favorite IDE and modify it to your requirements. In our case, we provide an archetype that is based on Graphene-Instagram. This may mean you'll have to delete some of the classes we give you, but you'll have a better idea on how the project is intended to be structured.

Graphene overview

Graphene expects that you are familiar with some modern Java concepts:

Graphene is structured as a multi module maven project. The modules are:

Developing with Graphene

We recommend that your application use the Maven module structure, as shown in the Walker and Kiva demos. For example, if you have a company name or dataset name you are developing for, like IMDB, the structure would look as follows:

The POM.xml at the IMDB level lists the Ingest and Web modules as children, so you can build both parts together.

We recommend that the ingest module depend on parts from the web module (and not the other way around), so that code relating to ingest doesn't get deployed with your war.

The ingest module

The ingest module has to do with ETL (Extract, Transform, Load) of your data into a more generic format.

The web module

If you ETL'd into an RDBMS

The first thing you might want to do with the web module is to setup and run the DTOGeneration.java main(). Once it connects to your database, it will generate Java model objects that reflect your database tables, and query helping objects which will ensure you write valid SQL (invalid SQL or bad type conversions will be caught during compilation). This portion of the process uses QueryDSL to do code generation, which is normally tedious and error prone when done by hand.

If you didn't ETL into an RDBMS

You can skip the DTOGeneration, and go straight to implementing your DAOs

DAO Implementation

Graphene expects you to create implementations for most of the DAOs following the interfaces provided in the graphene-dao module. This allows the storage mechanism you choose to be independent of the main services and UI. In the DAO implementations, you will mostly be querying your datastore and then converting the results into one of the model or view objects in the graphene-model module.

Running a Graphene application

By default, Graphene expects some other software to be available (Although it is easy to change this or override the defaults)

Licensing

Graphene is licensed with Apache License Version 2.0 (APLv2). This project was funded by DARPA under part of the XDATA program.