panther_similarity#

panther_similarity(G, source, k=5, path_length=5, c=0.5, delta=0.1, eps=None, weight='weight', seed=None)[source]#

Returns the Panther similarity of nodes in the graph G to node v.

Panther is a similarity metric that says “two objects are considered to be similar if they frequently appear on the same paths.” [1].

Parameters:
GNetworkX graph

A NetworkX graph

sourcenode

Source node for which to find the top k similar other nodes

kint (default = 5)

The number of most similar nodes to return.

path_lengthint (default = 5)

How long the randomly generated paths should be (T in [1])

cfloat (default = 0.5)

A universal constant that controls the number of random paths to generate. Higher values increase the number of sample paths and potentially improve accuracy at the cost of more computation. Defaults to 0.5 as recommended in [1].

deltafloat (default = 0.1)

The probability that the similarity \(S\) is not an epsilon-approximation to (R, phi), where \(R\) is the number of random paths and \(\phi\) is the probability that an element sampled from a set \(A \subseteq D\), where \(D\) is the domain.

epsfloat or None (default = None)

The error bound for similarity approximation. This controls the accuracy of the sampled paths in representing the true similarity. Smaller values yield more accurate results but require more sample paths. If None, a value of sqrt(1/|E|) is used, which the authors found empirically effective.

weightstring or None, optional (default=”weight”)

The name of an edge attribute that holds the numerical value used as a weight. If None then each edge has weight 1.

seedinteger, random_state, or None (default)

Indicator of random number generation state. See Randomness.

Returns:
similaritydictionary

Dictionary of nodes to similarity scores (as floats). Note: the self-similarity (i.e., v) will not be included in the returned dictionary. So, for k = 5, a dictionary of top 4 nodes and their similarity scores will be returned.

Raises:
NetworkXUnfeasible

If source is an isolated node.

NodeNotFound

If source is not in G.

Notes

The isolated nodes in G are ignored.

References

[1] (1,2,3)

Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. Panther: Fast top-k similarity search on large networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2015-August, pp. 1445–1454). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783267.

Examples

>>> G = nx.star_graph(10)
>>> sim = nx.panther_similarity(G, 0)