Abstract |
: |
Event-based social network analysis is an important task for monitoring the potential threats to the security of a nation and identifying various trends that are popular among the people. In this paper, we propose content-based tweets clustering and analysis method, which aims to cluster tweets based on the events represented by them. The proposed method starts with modeling tweets into a similarity graph (aka social network), in which each node represent a tweet and an edge connecting a node-pair represents the degree of similarity between the tweets represented by them. For social graph generation, each node is represented as a feature vector which is generated using Latent Dirichlet Allocation (LDA) from the respective tweet and edge weight is determined as the similarity between the nodes. Finally, the generated social graph is partitioned into a number of clusters (sub-graphs) using Markov Clustering (MCL) algorithm, where each sub-graph represent an event. We have generated a data set of 5000 tweets related to four different events – Uri attacks, Delhi assembly election, Union budget 2015, and Israel-Gaza conflict to evaluate the proposed method. The experimental results are encouraging, showing high accuracy in grouping tweets based on their contents. We have also performed a comparative analysis of the Cosine similarity and Euclidean distance based similarity graph generation, and it is found that the Cosine similarity yields better results than the Euclidian distance measure. |