Betweenness Centrality is a useful property when searching for critical nodes in a network. The exact calculation of betweenness is however computationally expensive, which leaves us with the need for approximate methods.
The picture above shows the approximate betweenness of the given graph. You can see some nodes are a lot more critical than others and their betweenness value reflects that.
This method has 4 primary parameters that can greatly effect the run time and accuracy of the results.
To characterize the behavior of these parameters we've run a series of tests of 5 data sets.
All tests below were run on an 8-node cluster using 8 giraph workers using 8 threads per worker.
You can use any kind of data with HBSE. It can be directional or undirected. Keep in mind, that an undirected graph will make it even more computational heavy, thus your job will take much longer.
For background information on the approximation method see: "W. Chong, Efficient Extraction of High-Betweenness Vertices"
For background information on the method of accumulation of pair dependencies and shortest path data see: "U. Brandes, A Faster Algorithm for Betweenness Centrality"
Our implementation is distributed using the BSP method on Apache giraph and built with Gradle. Giraph requires a Hadoop cluster (map reduce and hdfs)
The graph is read from an edge list file in HDFS, each line in the file represents a single edge and is coded as a "key,target,weight" (edge weight is optional and defaults to 1)
Output is a directory in HDFS with both the extracted high betweenness set and the approximate betweenness value for all nodes.
Property | Required | Description |
---|---|---|
betweenness.output.dir | N | Sets the betweenness set output directory. |
betweenness.shortest.path.phases | N | Sets the number of shortest path phases that the algorithm should run through. The default is 1. |
betweenness.set.stability | N | Sets the stability cutoff point. Defaults to 0. |
betweenness.set.stability.counter | N | Counter for the stability cutoff point. |
betweenness.set.maxSize | N | Sets the maximum number of nodes in a betweenness set. |
pivot.batch.size | N | The percentage of pivots to select out of the nodes. Must be a decimal between 0.0 and 1.0. |
pivot.batch.size.initial | N | The percentage of pivots to select initially. Must be a decimal between 0.0 and 1.0. |
pivot.batch.random.seed | N | Seed the random number generator for pivot selection. |
vertex.count | N | Sets the total number of vertices to perform the algorithm on. |
Note: First see How To Build
$ ./bin/dga-giraph hbse /path/to/input /path/to/output
$ ./dga-graphx hbse -i hdfs://url.for.namenode:port/path/to/input -o hdfs://url.for.namenode:port/path/to/output