PageRank is an algorithm that determines the importance of a particular node in a graph.
The picture above shows the PageRank of that specific graph. As you can see, the more times a node references another, the higher the PageRank. The nodes that are referenced the least, have the least PageRank.
Any kind of data can be used for PageRank. The PageRank algorithm results will be different if you make it undirected, but if that's what you want to do, then it will work.
The output format will output each vertex and it's page rank.
Property | Required | Description |
---|---|---|
damping.factor | N | (Default: 0.85f) The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. |
Note: First see How To Build
$ ./bin/dga-giraph pr /path/to/input /path/to/output
$ ./dga-graphx pr -i hdfs://url.for.namenode:port/path/to/input -o hdfs://url.for.namenode:port/path/to/output