panther_similarity¶

panther_similarity(G, source, k=5, path_length=5, c=0.5, delta=0.1, eps=None)[source]¶

Returns the Panther similarity of nodes in the graph G to node v.

Panther is a similarity metric that says “two objects are considered to be similar if they frequently appear on the same paths.” [1].

Parameters

GNetworkX graph: A NetworkX graph
sourcenode: Source node for which to find the top k similar other nodes
kint (default = 5): The number of most similar nodes to return
path_lengthint (default = 5): How long the randomly generated paths should be (T in [1])
cfloat (default = 0.5): A universal positive constant used to scale the number of sample random paths to generate.
deltafloat (default = 0.1): The probability that the similarity $S$ is not an epsilon-approximation to (R, phi), where $R$ is the number of random paths and $ϕ$ is the probability that an element sampled from a set $A \subseteq D$ , where $D$ is the domain.
epsfloat or None (default = None): The error bound. Per [1], a good value is sqrt(1/|E|). Therefore, if no value is provided, the recommended computed value will be used.

Returns

similaritydictionary: Dictionary of nodes to similarity scores (as floats). Note: the self-similarity (i.e., v) will not be included in the returned dictionary.

References

1(1,2,3): Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. Panther: Fast top-k similarity search on large networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2015-August, pp. 1445–1454). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783267.

Examples

>>>>>> G = nx.star_graph(10)
>>> sim = nx.panther_similarity(G, 0)