Weakly Connected Components

Weakly Connected Components is useful for finding the individual components that are in any given graph.

Algorithm

  1. Look at all edges and find the vertex id who holds the highest value.
  2. Send the greatest vertex id to all the edges.
  3. Process incoming messages and compare it against the value you hold.
  4. Repeat the process until there are no messages being sent.

As you can see from the image above, that graph has 3 Weakly Connected Components.

What kind of data can be used?

As the name suggests, Weakly Connected Components needs to be undirected to work properly. However, you can run Weakly Connected Components with a directional graph. Your results might varry from the undirected version.

Output Format

For an edge list:


    A,B
    B,C
    D,C
    E,H

The output will be as follows:


    A,B,D
    B,C,D
    D,C,D
    E,H,H

The third column represents the component each edge belongs to, which is the highest ascii value in the component.

Weakly Connected Components Configuration

There are no configuration options.

How To Run

Note: First see How To Build

How To Run DGA-Giraph
    
        $ ./bin/dga-giraph wcc /path/to/input /path/to/output
    
How To Run HBSE with DGA-GraphX
    
        $ ./dga-graphx wcc -i hdfs://url.for.namenode:port/path/to/input -o hdfs://url.for.namenode:port/path/to/output