To discover and identify the influential nodes in any complex network has been an important issue. It is a significant factor in order to control over the network. Through control on a network, any information can be spread and stopped in a short span of time. Both targets can be achieved, since network of information can be extended and as well destroyed. So, information spread and community formation have become one of the most crucial issues in the world of SNA (Social Network Analysis). In this work, the complex network of twitter social network has been formalized and results are analyzed. For this purpose, different network metrics have been utilized. Visualization of the network is provided in its original form and then filter out (different percentages) from the network to eliminate the less impacting nodes and edges for better analysis. This network is analyzed according to different centrality measures, like edge-betweenness, betweenness centrality, closeness centrality and eigenvector centrality. Influential nodes are detected and their impact is observed on the network. The communities are analyzed in terms of network coverage considering the Minimum Spanning Tree, shortest path distribution and network diameter. It is found that these are the very effective ways to find influential and central nodes from such big social networks like Facebook, Instagram, Twitter, LinkedIn, etc.

The recent era has marked great development in network science. Especially, after the concepts of Scale-free and Small-world models, complex systems are analyzed. The progress in network science led to the concepts of expressing complex systems in the form of networks by representing the elements/entities of the system. Network is formed with nodes, and those nodes are linked with edges. Network science has totally changed the way, where networks were looked upon providing methods and tools for modelling, analyzing and solving conventional problems. If we talk about the World Wide Web, it can be represented as a complex network considering the systems or devices as the nodes and the communication between those systems, as ‘edges’ with the frequency of communication representing weights. The robustness of the network can then be studied through dynamic processes based on network science [

Nowadays, social media networks are very famous, a big number of people are attached with SNS (Social Networking Services); such as Twitter, Instagram, Facebook, LinkedIn, etc. While analyzing online social media systems, it is of significant practice and great concern to find out the influential nodes. Those nodes can prove to be a critical factor in assuring information spreading using limited and efficient resources [

Identifying central and influential nodes in a complex network system has become one of the main problems in data mining and information retrieval [

The increasing number of users and online communities in social networks has attracted large amount of research and interest of organizations. This is to discover the information dissemination patterns in large scale networks, leading to a wide range of research and much work to be done in this field. The proposition of the Small-world and Scale-free model, lead to the conceptual analysis of the complex systems, in terms of networks with certain essential characteristics. In fact, research work led to the formation of networks from almost any problem, with the basic system elements as nodes and the relation between those elements represented by edges. This could be then analyzed systematically to find the appropriate feasible solution. The edges can be used to reflect different types of relations such as, one or two-way communication and the strength of relations represented as weights. The network approach makes use of different models and methods to understand the system and propose the solution [

As the complex network approach continued to grow, numerous methods and centrality metrics have been proposed to calculate the spreading ability of users in a network [

In general practice, the social media analytics is based on an inherent and implicit concept of associations that is represented by metrics and network factor indicators, such as sentiment analysis, communication keywords, network reach, informational flow, brand associations, etc. [

In this study the following research patterns are utilized (3.1–3.3).

This is used to find the relational dependence among the participating instances, mostly it defines the degree of linear dependency among variables. User-Tweet network is studied by considering the influential nodes and their connections. In addition, how a tweet from a single user creates impact by dividing the network into two separate components. One of those, who support the views expressed in tweet and the other of those, who stand against it. This is found by using different algorithms and observing correlation between them [

x–the values of the x-variable in a sample

y–the values of the y-variable in a sample

Observational strategy is followed to model the network according to different community formation strategies and then the results are observed to deduce the best method to detect community formation in such a way that the network ensures maximum coverage and guaranteed information propagation throughout the network. The observational study focuses on drawing conclusions for the population of data by performing analysis on a sample. The graphical results are observed from different algorithms and compared in terms of network coverage, path existence from a random node to any other given random node, the weight/strength of connections etc. The observational study draws conclusions based on a random sample taken from a population where the independent variables are not controlled by the user but by the system constraints. In the User-Tweet network, we have generated graphs using different algorithms on the data and performed the observational research methodology to find the impact of different community formation algorithms [

The meta-analytic strategy focuses on identifying these features to find the ones that offer the least trade off and offer the maximum network coverage in the least possible cost, hence, providing feasible routes within the network for information propagation of desired facts, and setting trends. This is done by using statistical techniques to find estimates based on the above factors. In this way, all methods have a weighted degree average according to the individual algorithm results and they are distinguished according to the allocation of the weights [

A dataset of 5,000 tweets is gathered about an issue. From this population (5,000 tweets), 54 nodes are selected and their tweets are formed to a network. A simple ‘random sample technique’ is utilized for this purpose.

Fifty-four nodes are considered for the network and its analysis purpose. There are 669 connections between those nodes called ‘edges’. Connections are based upon the people who tweet, their followers and the ones who retweeted showing the diameter of network in which the tweet information propagated.

The weighted degree distribution of nodes represents the sum of different weighted connections linked to a node.

Max | 3rd quadrant | Mean | Median | 1st quadrant | Min |
---|---|---|---|---|---|

41.25 | 35.00 | 24.75 | 26.00 | 12.12 | 0.50 |

A centrality analysis is performed to find the most influential nodes. Thus, it is started by filtering out the nodes that have less coverage (not connected to maximum possible nodes in the network). Different percentages (

In

The Betweenness centrality and Edge betweenness centrality is found for the original network, which highlights the top user names. Both the measures estimate too nearly equal. Unlike this, closeness centrality is a differentiator as the paths are weighted and represent the significance in a network. Similarly, eigenvector centrality includes local clustering to compare the results of different measures used.

In

In

In the complex network theory, the diameter is something that is described as the longest route to reach from one node to another. It is the highest weighted sum of edges going from one node to another. For this it is considered a starting node and then keep adding the edges with the highest weights and thus leading to the largest weighted edge path in the network. In

Minimum spanning tree gives the concept of keeping the network connected with the minimum possible number of edges. In this way, the path cost to reach from one node to any other given node is minimized to the lowest limit possible. It has been used to determine the minimum cost of information spread throughout the network within the least time possible. This is to get the idea of how long does it take for a piece of information to propagate within a network. In any highly-connected graph, most of the edges act as essential edges and need to be present to keep the graph connected. Although the minimum spanning tree can be used to identify the spine of the network on which the whole network structure is based, these are the paths that are means of connection of different clusters within the network.

To better understand the network patterns, a statistical analysis of the geodesic paths has been performed.

Max | 3rd Quad | Mean | Median | 1st Quad | Min |
---|---|---|---|---|---|

2.500 | 1.000 | 0.918 | 1.000 | 0.500 | 0.00 |

This research clearly provides a comprehensive visualization of the social network (tweeter) formation and connectivity. The concept of graph theory, including various centrality measures (

The graphical illustrations clearly highlight the network structure formed when following the algorithms. The colors of nodes represent different clusters within the network where it can be clearly observed that most of the influential nodes tend to lie in the big clusters. While, the other lesser or insignificant nodes having a low impact factor lie in the small clusters. This indicates the fact that influential users have a high tendency to connect to other influential users rather than connecting to the less influential ones. This also shows that it follows the scale-free model. Different network metrics are utilized to analyze the data and results are shown, which are best suited to study such large-scale complex networks. An analysis of the minimum spanning tree algorithm has been provided. Outcomes of the study point out the community persistence, network coverage, link strength (edge weights), and shortest path in the network. Nodes with a large number of links have a significant impact on the entire network in terms of information propagation and prevention. Moreover, social networks have also become a major means of marketing and advertisement. Big corporations and organizations devote resources to research in order to better understand large networks. So, in this way they can find the influential nodes and spread their message efficiently and effectively in less time. On the other side, in this way, large network can be controlled/destroyed as well. In future, other social networks (Facebook, Instagram, etc.) can also be analyzed in this way.