Search and Discover connections in your data
Screenshots from an Instagram example:
Graphene is a high performance Java based web framework you can use to build a searching and graphing application on top of your data. By Graphing, we mean nodes and edges that describe either shared attributes or events/relations between actors. (i.e. two people share the same address (a shared attribute), or a email between people (an event)) It is datastore agnostic, but has some built in support for Elastic Search, SQL Databases, and Titan. Graphene requires some coding to be done to adapt to your specific needs and data. You get to decide what constitutes a node and edge in your data.
A popular use of Graphene is for discovering shared attributes in forensic analysis, for instance using a corpus of financial or case data. Graphene has been applied to such domains as Anti-Money Laundering, Counter Threat Finance, social media, communication events and more. Leveraging Apache Tika, it has even been applied to collections of files. Graphene lets you visualize the connections between entities that you wouldn't otherwise see. Graphene also allows you to modify the network graph in your browser, and save those modifications back to the server. Graphene has export functionality of graphs built in.
A plugin architecture based on Apache Tapestry allows you to create additional capabilities that can be autodiscovered at deployment. An example is our integration of a UI for communicating with an MIT Lincoln Labs' MITIE Entity Extractor service, which allows you to paste in free text and get a list of clickable entities to drill down on. This is part of our ongoing effort to support Complex Queries using DARPA XDATA innovations.
For configurations, building, and deployment instructions, view the Graphene Wiki. Please see the road maps in the wiki for new features and their slated order.
The core of the Graphene project contains multiple modules, some of which are optional. The main module that you'll need it the graphene-web module, since it acts as a WAR overlay for your Java based web application. The current example is Graphene-Instagram. Previous demos included Enron, Scott Walker email data, and Kiva Microloan data.
It's our goal that you use the available modules if they make sense for your needs, but we allow you to wire in your own code in most places. In addition, Graphene leverages Apache Tapestry's auto discovery of modules, so we will be expanding the number of 'plugin' modules available. Currently we offer the graphene-augment-mitie as an example of such a module. It's abilities are made available in your app simply by including the jar file in your POM, no code changes necessary!
- A plain 'mvn clean install' on graphene-parent will build all the jar files and a single war file (to be overlaid on your project)
- Test execution is part of the build, but you can add -DskipTests=true to cut down on the build time.
- A BuildAll.bat is supplied for windows users. This will perform a few cleans to overcome some windows issues, and then compile and install to your local maven repo.
- BuildAllOffline.bat will also grab the source code and javadocs for the dependencies, so you can take your .m2 directory into an offline enclave (or airgapped network) to develop.
Quickstart (4.2.0-SNAPSHOT and later)
Once you've built the core modules (from graphene parent) you should be able to create a new project via a maven archetype. If you've never used an archetype, basically it sets up the scaffolding for your new project based on a small amount of user input. From there you can import it into your favorite IDE and modify it to your requirements. In our case, we provide an archetype that is based on Graphene-Instagram. This may mean you'll have to delete some of the classes we give you, but you'll have a better idea on how the project is intended to be structured.
- Step 1: navigate to a directory you want to create the project in.
Step 2: Assuming your project name is "Omni", enter a command like this:
mvn archetype:generate -DarchetypeGroupId=graphene -DarchetypeArtifactId=graphene-archetype -DarchetypeVersion=4.2.0-SNAPSHOT -DgroupId=graphene-omni -DartifactId=graphene-omni -Dpackage=graphene.omni -DprojectName=Omni
- Step 3: If you answer No to the confirmation prompt that follows, you'll have a chance to enter in extra information, such as the ElasticSearch settings, etc.
Graphene expects that you are familiar with some modern Java concepts:
- Interfaces and Implementations
- Knowledge of Maven, or a knowledge of how to search for answers to your questions
- Dependency Injection (aka Inversion of Control or IOC or DI)
- Graphene uses Apache Tapestry to provide the IOC framework. It is very similar to Guice, but also allows distributed configuration and can act as a light weight OSGI alternative. We may use the term 'wiring' and 'binding' interchangeably. Essentially a registry is created and lives throughout the life of your program, which defines which implementation services will get when they ask for the interface. The IOC 'wiring' is mostly done at the customer implementation level, although basic shared services are wired in modules within graphene-parent. For consistency's sake, any class that performs IOC wiring we suffix with the word "Module", i.e. AppModule.java or DAOModule.java.
- Graphene currently requires you to implement an ExtJS U. The Walker and Kiva demos should be helpful in setting up the application for your own dataset.
Graphene is structured as a multi module maven project. The modules are:
- Still under development. This is for precomputed or post ingest analytics
- Defines all DAO interfaces used under graphene-parent, as well as any business logic interfaces and implementations
- Defines some DAO implementations for a standardized Neo4J property graph, and graph querying abilities using Cypher and the Neo4J Java API
- Defines some helper classes for working with the Titan graph database for property and event graphs, and graph querying abilities using the Tinkerpop/Gremlin APIs
- Defines some DAO implementations for standardized SQL tables. These are not used unless you wire them in IOC
- Defines some DAO implementations for working with Elastic Search with JEST. Using JEST allows us to be compatible through the REST layer, instead of binding to a particular version of ElasticSearch. These are not used unless you wire them in IOC
- Defines utilities for converting internal lists into CSV and native Excel XLSX files
- Under development. Defines utilities for entity extraction and resolution based on the nature and context of the data.
- Defines some basic ingest utilities which are used only by other ingest modules.
- Used during the ingest phase. Currently its main function is to run a series of queries against every table and every column, so you can get a feel for the bounds of your data and which columns are interesting.
- This module contains IDL classes generated by Apache Avro. These classes are the lingua franca between modules in Graphene.
- This module defines the REST interfaces which will be exposed to the UI.
- Your REST implementations will adhere to the interfaces defined here.
- The REST interfaces control the paths of the resource, so the UI will not break because of bad paths.
- This module uses Tynamo RESTEasy integration with the Apache Tapestry web framework.
- General search utilities
- Utilities which have cross cutting concerns for all modules. For example, query timing, logging, memory and file utilities.
- Almost all modules require this module as a dependency.
- This module defines some basic wiring and imports many other of the *Module.java classes from other graphene maven modules.
- The web module also contains shared html, css and js resources used by ExtJs, Cytoscape.js and many other js libraries.
- It also contains Apache Tapestry based UI components and pages (currently limited)
Developing with Graphene
We recommend that your application use the Maven module structure, as shown in the Walker and Kiva demos. For example, if you have a company name or dataset name you are developing for, like IMDB, the structure would look as follows:
- graphene-imdb ..graphene-imdb-ingest (aka the ingest module) ..graphene-imdb-web (aka the web module)
The POM.xml at the IMDB level lists the Ingest and Web modules as children, so you can build both parts together.
We recommend that the ingest module depend on parts from the web module (and not the other way around), so that code relating to ingest doesn't get deployed with your war.
The ingest module
The ingest module has to do with ETL (Extract, Transform, Load) of your data into a more generic format.
The web module
If you ETL'd into an RDBMS
The first thing you might want to do with the web module is to setup and run the DTOGeneration.java main(). Once it connects to your database, it will generate Java model objects that reflect your database tables, and query helping objects which will ensure you write valid SQL (invalid SQL or bad type conversions will be caught during compilation). This portion of the process uses QueryDSL to do code generation, which is normally tedious and error prone when done by hand.
If you didn't ETL into an RDBMS
You can skip the DTOGeneration, and go straight to implementing your DAOs
Graphene expects you to create implementations for most of the DAOs following the interfaces provided in the graphene-dao module. This allows the storage mechanism you choose to be independent of the main services and UI. In the DAO implementations, you will mostly be querying your datastore and then converting the results into one of the model or view objects in the graphene-model module.
Running a Graphene application
By default, Graphene expects some other software to be available (Although it is easy to change this or override the defaults)
- JDK 1.7 (latest will be fine). Graphene and its dependencies are not built with JDK 8 compatibility.
- Tomcat 7 or 8, with any JDBC on the class path (i.e. tomcat/lib or packaged with your app)
- You should set the Catalina_Home environment variable
- Graphene supports Chrome and Firefox, and does not require any browser plugins.
Graphene is licensed with Apache License Version 2.0 (APLv2). This project was funded by DARPA under part of the XDATA program.