networkx.algorithms.similarity.simrank_similarity¶
-
simrank_similarity
(G, source=None, target=None, importance_factor=0.9, max_iterations=100, tolerance=0.0001)[source]¶ Returns the SimRank similarity of nodes in the graph
G
.SimRank is a similarity metric that says “two objects are considered to be similar if they are referenced by similar objects.” 1.
The pseudo-code definition from the paper is:
def simrank(G, u, v): in_neighbors_u = G.predecessors(u) in_neighbors_v = G.predecessors(v) scale = C / (len(in_neighbors_u) * len(in_neighbors_v)) return scale * sum(simrank(G, w, x) for w, x in product(in_neighbors_u, in_neighbors_v))
where
G
is the graph,u
is the source,v
is the target, andC
is a float decay or importance factor between 0 and 1.The SimRank algorithm for determining node similarity is defined in 2.
- Parameters
G (NetworkX graph) – A NetworkX graph
source (node) – If this is specified, the returned dictionary maps each node
v
in the graph to the similarity betweensource
andv
.target (node) – If both
source
andtarget
are specified, the similarity value betweensource
andtarget
is returned. Iftarget
is specified butsource
is not, this argument is ignored.importance_factor (float) – The relative importance of indirect neighbors with respect to direct neighbors.
max_iterations (integer) – Maximum number of iterations.
tolerance (float) – Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts.
- Returns
similarity – If
source
andtarget
are bothNone
, this returns a dictionary of dictionaries, where keys are node pairs and value are similarity of the pair of nodes.If
source
is notNone
buttarget
is, this returns a dictionary mapping node to the similarity ofsource
and that node.If neither
source
nortarget
isNone
, this returns the similarity value for the given pair of nodes.- Return type
dictionary or float
Examples
If the nodes of the graph are numbered from zero to n - 1, where n is the number of nodes in the graph, you can create a SimRank matrix from the return value of this function where the node numbers are the row and column indices of the matrix:
>>> import networkx as nx >>> from numpy import array >>> G = nx.cycle_graph(4) >>> sim = nx.simrank_similarity(G) >>> lol = [[sim[u][v] for v in sorted(sim[u])] for u in sorted(sim)] >>> sim_array = array(lol)
References