Abstract
Twitter is one of the most popular micro-blogging services, with millions of users exchanging information. Twitter's popularity and low barriers has led many commercial entities to start using the service. As a result, the Twitter stream has a combination of personal and professional tweets. These professional tweets are marketing messages and do not provide insight into individual people's experiences. Thus, filtering personal tweets from commercial or professional ones is a crucial, though often overlooked, first step in mining micro-blogging data. Identifying personal messages is essential for opinion mining or product/service review in every domain, and it is specifically crucial in the healthcare domain. In this research study, we propose a method of classifying tweets as either personal or professional tweets using a novel feature set. Here we collected and analyzed three data sets from the Twitter stream related to the healthcare domain. Using a large number of hand-labeled tweets as input, we trained several classifiers on our proposed set of features and compared classifiers' accuracy, precision, and recall using 10-fold cross validation technique. On a combination of three health-related data sets, random forest classifier provided the maximum accuracy of 91.5%. This result shows that our approach can significantly increase the accuracy of data mining on the Twitter stream.
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | 2017 International Conference on Computational Science and Computational Intelligence (CSCI) |
Herausgeber*innen | Hamid R. Arabnia et al. |
Erscheinungsort | Las Vegas |
Seiten | 876 - 882 |
Publikationsstatus | Veröffentlicht - 2017 |
Österreichische Systematik der Wissenschaftszweige (ÖFOS)
- 102