simrank_similarity#
- simrank_similarity(G, source=None, target=None, importance_factor=0.9, max_iterations=1000, tolerance=0.0001)[source]#
Returns the SimRank similarity of nodes in the graph
G
.SimRank is a similarity metric that says “two objects are considered to be similar if they are referenced by similar objects.” [1].
The pseudo-code definition from the paper is:
def simrank(G, u, v): in_neighbors_u = G.predecessors(u) in_neighbors_v = G.predecessors(v) scale = C / (len(in_neighbors_u) * len(in_neighbors_v)) return scale * sum( simrank(G, w, x) for w, x in product(in_neighbors_u, in_neighbors_v) )
where
G
is the graph,u
is the source,v
is the target, andC
is a float decay or importance factor between 0 and 1.The SimRank algorithm for determining node similarity is defined in [2].
- Parameters:
- GNetworkX graph
A NetworkX graph
- sourcenode
If this is specified, the returned dictionary maps each node
v
in the graph to the similarity betweensource
andv
.- targetnode
If both
source
andtarget
are specified, the similarity value betweensource
andtarget
is returned. Iftarget
is specified butsource
is not, this argument is ignored.- importance_factorfloat
The relative importance of indirect neighbors with respect to direct neighbors.
- max_iterationsinteger
Maximum number of iterations.
- tolerancefloat
Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts.
- Returns:
- similaritydictionary or float
If
source
andtarget
are bothNone
, this returns a dictionary of dictionaries, where keys are node pairs and value are similarity of the pair of nodes.If
source
is notNone
buttarget
is, this returns a dictionary mapping node to the similarity ofsource
and that node.If neither
source
nortarget
isNone
, this returns the similarity value for the given pair of nodes.
- Raises:
- ExceededMaxIterations
If the algorithm does not converge within
max_iterations
.- NodeNotFound
If either
source
ortarget
is not inG
.
References
[2]G. Jeh and J. Widom. “SimRank: a measure of structural-context similarity”, In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM Press, 2002.
Examples
>>> G = nx.cycle_graph(2) >>> nx.simrank_similarity(G) {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}} >>> nx.simrank_similarity(G, source=0) {0: 1.0, 1: 0.0} >>> nx.simrank_similarity(G, source=0, target=0) 1.0
The result of this function can be converted to a numpy array representing the SimRank matrix by using the node order of the graph to determine which row and column represent each node. Other ordering of nodes is also possible.
>>> import numpy as np >>> sim = nx.simrank_similarity(G) >>> np.array([[sim[u][v] for v in G] for u in G]) array([[1., 0.], [0., 1.]]) >>> sim_1d = nx.simrank_similarity(G, source=0) >>> np.array([sim[0][v] for v in G]) array([1., 0.])