Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio (1−1/e) has been proposed [Chen et al., ADC 2016]. Since this algorithm’s running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper.
In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments.
Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edge ... mehrs in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges.
In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97.
Finally, we study for the first time the correlation between the top-k nodes with highest closeness and an approximation of the most central group in large complex networks and show that the overlap between the two is relatively small.