The structure and dynamic nature of real-world networks can be revealed by communities that help in promotion of recommendation systems. Social Media platforms were initially developed for effective communication, but now it is being used widely for extending and to obtain profit among business community. The numerous data generated through these platforms are utilized by many companies that make a huge profit out of it. A giant network of people in social media is grouped together based on their similar properties to form a community. Community detection is recent topic among the research community due to the increase usage of online social network. Community is one of a significant property of a network that may have many communities which have similarity among them. Community detection technique play a vital role to discover similarities among the nodes and keep them strongly connected. Similar nodes in a network are grouped together in a single community. Communities can be merged together to avoid lot of groups if there exist more edges between them. Machine Learning algorithms use community detection to identify groups with common properties and thus for recommendation systems, health care assistance systems and many more. Considering the above, this paper presents alternative method SimEdge-CD (Similarity and Edge between's based Community Detection) for community detection. The two stages of SimEdge-CD initially find the similarity among nodes and group them into one community. During the second stage, it identifies the exact affiliations of boundary nodes using edge betweenness to create well defined communities. Evaluation of proposed method on synthetic and real datasets proved to achieve a better accuracy-efficiency trade-of compared to other existing methods. Our proposed SimEdge-CD achieves ideal value of 1 which is higher than existing sim closure like LPA, Attractor, Leiden and walktrap techniques.

With the emergence of social media, Information and communication technology has shown a drastic change and development over the past 20 years. Mobile technology is playing a vital role in shaping the impact of social media. This made people create network anywhere, at any time with their handheld devices. This led to very complex networks in the real world. Due to the advent of internet and social media usage, networks are now one of the most active research topic among the researchers. These networks create complex structures in which the basic components are nodes and links. One of a significant properties of networks is community structure that clearly pictures the interactions among the network components [

A community is defined as a group of nodes with similar characteristics and distinguishable affiliation to other community [

Many approaches have been developed in many aspects to approach the problem of community detection like statistical physics [

Rest of the article is presented with 5 sections and organized as follows: Section 2 discusses related work of the proposed techniques. Third section describes the working principal and execution of proposed work. Section 4 evaluates the outcome of Section 3 and result is compared with existing system. Section 5 concludes the paper with future outcome.

Newman [

Sweeney et al., proposed a hedonic games in [

Lancichinetti et al., [

A game theory-based framework is developed by Chen et al. [

Tang et al., [

An algorithm for overlapping community detection based on label propagation was proposed by Gregory [

Radicchi et al. [

The literature studies states community detection and its outcome in various perspectives. Most of the existing studies obtain good result, however detection is complex when there is more than 1000 nodes. This motivates use to look for effective system to handle complex network.

The proposed method takes care of three issues. Initial community formation, expanding community and finally reclassifying the wrongly classified boundary nodes. Sample social network architecture is shown in

The network considered in this work is undirected and unweighted graph. It is represented as G(V, E) where G represents a graph, V = {v1, v2, v3…vn} denotes a set of nodes and E = {e1, e2, e3…en} represents the edges.

Initial community is formed by calculating the node similarities. For community detection, one of the commonly used similarity measure is Jaccard Similarity. Jaccard Similarity is a common proximity measurement used to compute the similarity between two nodes. The main drawback in Jaccard similarity is that the similarity measure is greater for the indirectly connected nodes compared to the directly connected nodes. To address this issue, a novel similarity measure is proposed by [p1.1] which proved to be effective than existing similarity measures. This paper improvises the similarity measure proposed by [

Jaccard Similarity:

Similarity measure proposed by [p1.1] is as follows_{u,v} is the element of adjacency matrix. If (u, v) ∈ E, then a_{u,v }=_{ }1 otherwise a_{u,v} = 0.

The above algorithm is iterated until all the nodes in the graph are processed. Thus, initial communities are obtained. The second step in this work is to find the community affiliation of the boundary nodes.

Initial community is created by using Algorithm 1 which seems to be great but, there might be situations when the boundary nodes between two communities are wrongly bound to communities. This issue is depicted in

From ^{th} node of green community. It is intuitive that 9^{th} node belongs to the blue community but it should be located in blue community as per the network topology. This issue should be handled by moving the 9^{th} node to the community where its neighbours are. To resolve this, the proposed method uses the betweenness measure to identify the edges that connect the communities and then uses the similarity measure to find the similarity of the nodes to find the boundary node's affiliation. This concept is clearly depicted in Algorithm 2. This problem is resolved in two steps. First step involves the identification of boundary nodes using the betweenness of the edges [

Edge betweenness is the fraction of shortest path between all pairs of vertices passing through that edge. Each and every path is given equal weight when there is more than one shortest path. When community groups contain very few intergroup edges, the shortest path between the communities must pass along one of the few edges. Such edges will have high betweenness.

Edge betweenness can be calculated using below_{st} is the total number of shortest paths from s to t and _{st (v)} is the number of paths that passes through v.

The above Algorithm 2 identifies well defined communities by moving the boundary nodes to its appropriate community. The flow diagram of the proposed SimEdge-CD is shown in

First, the node ‘p’ is randomly chosen from social network. Next node which is similar to ‘p’ is chosen as ‘q’ using jaccard similarity. The relationship between p and q is identified and their community is assigned respectively. Then the next node is checked for belongingness of this community. If they belong to respective community their edge are identified and boundary nodes are assigned to appropriate community. This Similarity based edge detection helps to identify the relatedness between the node and form the well defined community in the social media network.

This section describes the quality of the proposed community detection method using normalized mutual information (NMI) and modularity (R) for artificial networks. The experiments are carried out with Intel (R) core (TM) i5 processor, 2.42 GHz and 16 GB RAM. The below

Network | [V] | d | dmax | expd | expcom | |C|min | |C|max | μ |
---|---|---|---|---|---|---|---|---|

LFR500 | 500 | 20 | 50 | −2 | −1 | 10 | 50 | 0.1–0.8 |

LFR1000 | 1000 | 20 | 50 | −2 | −1 | 10 | 50 | 0.1–0.8 |

LFR5000 | 5000 | 20 | 50 | −2 | −1 | 10 | 50 | 0.1–0.8 |

LFR10000 | 10000 | 20 | 50 | −2 | −1 | 20 | 500 | 0.1–0.8 |

The detected communities are embedded in 4 series network which are run in proposed method and other algorithms. The proposed method outperforms all artificial networks for mixing parameter μ ≤ 0.6, in the LFR500 and LFR1000 series of networks, μ ≤ 0.5, in LFR 5000 and μ ≤ 0.6 in LFR 10000. The detected NMI's are depicted in

Comparison of detected results of proposed system and other algorithms in terms of R and the ratio between the detected number of communities and real number of communities are presented in

This section overviews the results obtained in real world networks.

Network | [V] | |E| |
---|---|---|

Karate club | 34 | 78 |

Dolphin network | 62 | 159 |

Football game schedule | 115 | 613 |

Ecoli | 423 | 519 |

Network | Metric | Sim_closure | Walktrap | LPA | Attractor | Leiden | Proposed |
---|---|---|---|---|---|---|---|

Karate club | Q | 0.356 | 0.345 | 0.453 | 0.367 | 0.47 | 0.399 |

NMI | 0.988 | 1.000 | 0.503 | 0.854 | 0.768 | 0.998 | |

Dolphin network | Q | 0.503 | 0.500 | 0.500 | 0.478 | 0.535 | 0.512 |

NMI | 1.000 | 0.987 | 0828 | 0.755 | 0.867 | 1.000 | |

Football game schedule | Q | 0.672 | 0.668 | 0.603 | 0.601 | 0.678 | 0.670 |

NMI | 0.999 | 0.923 | 0.954 | 0.876 | 0.987 | 1.000 | |

Ecoli | Q | 0.767 | 0.733 | 0.765 | 0.640 | 0.798 | 0.788 |

NMI | 0.879 | 1.000 | 0.898 | 0.876 | 0.899 | 0.980 |

From the

This work proposed SimEdge-CD for community detection and also finds the communities of the boundary nodes to define well defined communities. This method is evaluated on 4 series of synthetic networks and 4 real time networks. The results of the SimEdge-CD is compared with sim_closure, walktrap, LPA, Attractor and Leiden. SimEdge-CD scored the largest NMI 1 for dolphin and football game schedule. The proposed method considers the similarity of the nodes and uses this property to cluster the nodes in the network for achieving high quality community detection. The results of experiments demonstrates that the proposed SimEdge-CD provides superior community detection performance compared to the existing methods in terms of NMI indicator and modularity.