networkx.algorithms.similarity.simrank_similarity¶
-
simrank_similarity(G, source=None, target=None, importance_factor=0.9, max_iterations=100, tolerance=0.0001)[source]¶ Returns the SimRank similarity of nodes in the graph
G.SimRank is a similarity metric that says “two objects are considered to be similar if they are referenced by similar objects.” 1.
The pseudo-code definition from the paper is:
def simrank(G, u, v): in_neighbors_u = G.predecessors(u) in_neighbors_v = G.predecessors(v) scale = C / (len(in_neighbors_u) * len(in_neighbors_v)) return scale * sum(simrank(G, w, x) for w, x in product(in_neighbors_u, in_neighbors_v))
where
Gis the graph,uis the source,vis the target, andCis a float decay or importance factor between 0 and 1.The SimRank algorithm for determining node similarity is defined in 2.
- Parameters
G (NetworkX graph) – A NetworkX graph
source (node) – If this is specified, the returned dictionary maps each node
vin the graph to the similarity betweensourceandv.target (node) – If both
sourceandtargetare specified, the similarity value betweensourceandtargetis returned. Iftargetis specified butsourceis not, this argument is ignored.importance_factor (float) – The relative importance of indirect neighbors with respect to direct neighbors.
max_iterations (integer) – Maximum number of iterations.
tolerance (float) – Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts.
- Returns
similarity – If
sourceandtargetare bothNone, this returns a dictionary of dictionaries, where keys are node pairs and value are similarity of the pair of nodes.If
sourceis notNonebuttargetis, this returns a dictionary mapping node to the similarity ofsourceand that node.If neither
sourcenortargetisNone, this returns the similarity value for the given pair of nodes.- Return type
dictionary or float
Examples
If the nodes of the graph are numbered from zero to n - 1, where n is the number of nodes in the graph, you can create a SimRank matrix from the return value of this function where the node numbers are the row and column indices of the matrix:
>>> import networkx as nx >>> from numpy import array >>> G = nx.cycle_graph(4) >>> sim = nx.simrank_similarity(G) >>> lol = [[sim[u][v] for v in sorted(sim[u])] for u in sorted(sim)] >>> sim_array = array(lol)
References