Walktrap v0.2
WalkTrap is a C++ program that finds community structure of a network.
It is based on the fact that a random walker tends to be trapped in dense
part of a network corresponding to communities. It is published under the GNU General Public License.
Download Walktrap: Source code Win32 executable
USAGE
Compiling
To compile the program type the command "make". If you do not have the "make" utility try : "g++ -O3 walktrap.cpp communities.cpp graph.cpp heap.cpp -o walktrap".
Command line usage:walktrap [input_file] [-o output_file] [-i index_file] [options]
input_file:
If one of these parameters is omitted, the standard input is used.
INPUT FORMAT: the input file must be a list of undirected weighted edges. The vertices must be encoded as consecutive integers starting from 0. Each line contains two vertices and a weight (separated by spaces or tabulations) that define a weighted edge. The weight may be omitted, in this case a default weight equal to 1.0 is considered. The multi-edges will be considered as a single edge which weight is the sum of the corresponding weights. A comment line starts with "#"
The format is less flexible than the format used in version 0.1 but it allows considering weighted networks as well as unweighted networks. You can use the graph converter tool Gconvert. It handles different formats and can generate an index of the real name of the vertices that can be used in the output (see index_file section)
output_file:
If this parameter is omitted, stdout is used.
OUTPUT FORMAT: according to the options the output may contain:
- The list of the first communities containing a single vertex.
- The successive merging of communities with:
- the modularity Q of the partition obtained at this step.
- the value of delta_sigma that has been chosen for this merging.
- the vertices that belong to the new community.
- the description of the whole partition.
- The partition with the best modularity (-b option)
- More partitions asked by user with the options -p
index_file:
If you wish to keep the real name of the vertices in the output, you can specify an index file.
INDEX FORMAT: The index is a list of the real name of the vertices. Each line begins with a vertex number (as in the input file) followed by its real name that may be an arbitrary string. All the vertices must be defined once.
options:
- -s (Silent)
Do not display the progress.
- -tx
Set the length of random walks to x. Default value is t = 4.
- -dx (Detail level)
Set to x the detail level of the output (default is 2).
- d =1 : nothing is written.
- d>=2 : the successive mergings of the communities are written.
- d>=3 : the modularity Q and the value of delta_sigma are written.
- d>=4 : the new community is written at each step.
- d>=5 : the whole partition is written at each step.
- -b (Best modularity)
Print the partition that corresponds to the best value of the modularity at the end.
- -px (print specific Partition)
Print the whole partition that corresponds to x communities at the end. This option can be used several times.
- -mxxx (Memory manager)
Limit the memory usage of the program to x MB. This option is very useful for very large networks. But the algorithm runs slower with less memory.
- -h (Help)
Print the command line usage.
Usage examples:
- walktrap network.dat -o communities.txt -t5 -b
Read network from file "network.dat" and write the community structure found in file "communities.txt". The computation is done with random walks of length 5 and the best modularity partition is printed.
- walktrap example.net -o test -d1 -p3
Only the partition into 3 communities computed by the program is writen in file "test".
- walktrap example.net -o test2 -m400
The program will never use more that 400MB of memory.
- walktrap example.net -d1 -s -b
Only the best modularity partition is printed to the screen.
MORE INFORMATION
More information about the algorithm is available in the paper: Computing communities in large networks using random walks (with Matthieu Latapy), Submitted preprint.
COMMENTS & BUG REPORT
If you find a bug, please send a bug report to pons@liafa.jussieu.fr including the input file and the parameters that caused the bug.
You can also send me any comment or suggestion about the program.
HISTORY
version 0.2 (June 2005) new features:
- Support of weighted networks (input format has been modified).
- An efficient memory manager has been implemented.
- Many optimizations on the probability vectors storage and computation.
- Some heuristic optimizations in the merging process.
- An heap structure has been implemented to store the distances between communities.
version 0.1 (November 2004) first public version of Walktrap. Source code