Configurations

Note: None of these configurations are required. The defaults have been selected and the analytics will run without them. However, to get the best results for your data set, you may need to adjust some of the values. Also, if you remove some of the configuration values, the analytics will not run, so it's advised that you replace them.

IO Configuration

Input Configuration

Property Required Description
simple.edge.delimiter N The string delimiter to use when tokenizing the row of data (default is \t) *Note*: many characters are ignored by the Apache Commons CLI library
simple.edge.value.default N The default InputFormat expects there to be 2 or 3 columns read in; if no 3rd column exists, this configuration value is used for the edge weight for the current row. For Text, this is an empty String, for Long, it is the value 1.
io.edge.reverse.duplicator N Many datasets are undirected; many analytics require directed graphs. The default for this is false, but specifying true will explicitly convert the provided graph into a directed graph.
simple.edge.column.ignore N Some edge lists have a third column that is not an edge weight. You can set this value to true, so the input formats ignore this value.

Output Configuration

Property Required Description
edge.delimiter N The string delimiter to use when tokenizing the row of data (default is ,) Note: many characters are ignored by the Apache Commons CLI library
write.vertex.value N The default value is false.
write.edge.value N The default value is false.

High Betweenness Set Extraction Configuration

Property Required Description
betweenness.output.dir N Sets the betweenness set output directory.
betweenness.shortest.path.phases N Sets the number of shortest path phases that the algorithm should run through. The default is 1.
betweenness.set.stability N Sets the stability cutoff point. Defaults to 0.
betweenness.set.stability.counter N Counter for the stability cutoff point.
betweenness.set.maxSize N Sets the maximum number of nodes in a betweenness set.
pivot.batch.size N The percentage of pivots to select out of the nodes. Must be a decimal between 0.0 and 1.0.
pivot.batch.size.initial N The percentage of pivots to select initially. Must be a decimal between 0.0 and 1.0.
pivot.batch.random.seed N Seed the random number generator for pivot selection.
vertex.count N Sets the total number of vertices to perform the algorithm on.

Weakly Connected Components Configuration

There are no configuration options.

Leaf Compression Configuration

There are no configuration options.

Louvain Modularity Configuration

Property Required Description
minimum.progress N (Default: 0) The minimum delta X required to be considered progress, where X is the number of nodes that have changed their community on a particular pass. Delta X is then the difference in number of nodes that changed communities on the current pass compared to the previous pass. Using the default of 0 means that any delta is considered progress.
progress.tries N (Default: 1) Number of times the minimum.progress setting is not met before exiting form the current level and compressing the graph. Default of 1 means the first time minimum.progress is not met the algorithm exits.

PageRank Configuration

Property Required Description
damping.factor N (Default: 0.85f) The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.