1.1 Detailed Research Question
The project idea is designed to help companies to learn about the behaviours of their customers as well as determine their standing in the current market by analysing publics posts from social media, in this case, tweets from Twitter.
The project will be developed as an Android mobile platform for the ease of use for the users. The main functions of the app are to collect tweets based on one or more specific keywords or hashtags and determine the overall sentiment of the collected tweets. The results of sentiments will be positive, neutral and negative and will be displayed in percentage. Machine learning classifier will be used to do the sentiment analysis. Other than that, the app will also able to compare sentiments of tweets of two or more topics (e.g. Huawei vs Apple vs Samsung). Tweets with positive, neutral and negative sentiments can also be viewed when using the app. For the ease of usage and data transfer, graphs and any other important data or information can be exported to a CSV file.
natural language processing; twitter; android mobile application; customer satisfaction; opinion mining; big data; sentiment analysis; text classification; machine learning
1.3 Project Title
Twitter sentiment analyser using Natural Language Processing for companies
1.4 Client, Audience and Motivation
Sentiment analysis are mainly used by companies that need to monitor and satisfy their customers needs as well as make the job of their employees much easier. These kinds of analysis that available in the Natural Language Processing field benefits companies to increase performance of sales. This is because the analysis can extract relevant and useful information from large amounts of unstructured data that is generated by the users. The data is typically come from social media networking sites such as Twitter and Facebook. CITATION Pra18 l 1033 (Palani, 2018)This project will benefit Computing and Computer Science students who wants to gain knowledge of machine learning classifiers, companies that need to monitor and satisfy their customers needs as well as people who want to discover new trends to incorporate in their business models. Users can filter comments, tweets, posts that are discussing, sharing thoughts, and feedback about their companies or any hot topics that are being discussed at the time being. CITATION Mat19 l 1033 (Th?bault, 2019)The use of sentiment analysis has slowly become more popular and needed in the recent years, especially in the business world. Take Starbucks for instance, has already use these analyses to do real-time data gathering from their customers. From there, they revealed that the majority of their customers wanted to have free Wi-Fi in the shop and also to be able to pay directly from their smartphones. Starbucks had made few changes weeks after the analysis. CITATION Mat19 l 1033 (Th?bault, 2019)The knowledge gained after the end of the project, users will be able to learn how sentiment analysis works and its importance in the business world as well as in the ever-growing machine learning field.
1.5 Primary Research Plan
This project will be built for android mobile platform and will be developed by a single person; hence it is necessary to prepare a suitable research planning and project planning to ensure the project flow will go smoothly. In this project, a unique Systems Development Life Cycle (SDLC) is implemented favour to this project as it is a one-man-team. This is to ensure the project can develop efficiently and able to be delivered on the end date. The SDLC is inspired by Kanban methodology where the project is broken down into smaller modules and categorized them into parts: To Do, In Progress, Done. CITATION Dmi15 l 1033 (Gurendo, 2015)Below shows the methodology chart that will be used throughout the development of the project:
During the first few stages, literature survey written by researchers of few important topics about Twitter sentiment analysis and Natural Language Processing are briefly studied for the purpose of understanding the subject. The search of tools and libraries available is also researched thoroughly to make this project possible. After gathering enough information of said topic, a project idea is proposed, and the project scope is defined.
Requirements of the applications are analysed and finalized, this will include system, hardware and software requirements as well as knowledge of the chosen programming language. Text classification libraries and APIs will be used in the project for better performance and data accuracy.
The project will then proceed to planning and designing phase. This is the phase where the project is broken down into smaller modules to work on. After organizing and breaking down the project, prototype for each module is developed and important functions required in the modules are implemented. Several important testing such as code unit testing and usability testing will be carried out before the prototype will be submitted for review and feedback. This phase is one of the most important phases as training selected classifiers with adequate amount of training datasets are highly required here.
The processes of implementing, prototyping, testing and reviewing are repeated until all the important requirements and functions of the application are achieved. Trials will be conducted to ensure the ease of use, functionality and quality of the application before project deployment.
User experience will be tested if possible, including surveying the ease of use of the application and the usefulness of the application. Information and data will be collected and documented to prove the usability of the application. The details and findings of the project will also be documented into the final report.
Twitter is one of the places on the Internet where people share their opinions and feelings. This has made Twitter the best place for companies to monitor their target customers feelings and opinions towards their brand as well as capturing ongoing trends and gathering insights. With help of sentiment analysis, which is from Natural Language Processing field, all the unstructured data can be categorized, organized and finally put into a good use.
This report will include the project idea and solution that allow companies or other interested individuals to gather relevant data from tweets from Twitter by using sentiment analysis. The sentiment analysis will be implemented by using ready classification algorithm.
The purpose of this project is to notice trends, insights and get to know the target customers better by satisfying their needs by using sentiment analysis. Currently, there are several tools online to do the analyses, however there is not much of these applications available when it comes to android mobile platform. This leads to the one of the main goals of the project that is to build this project on an Android mobile platform for the ease of mobility. The Tweets will be collected based on one or more specific keywords or hashtags. The application will then determine the overall sentiment of the collected tweets in percentage. Other than that, the app will also able to compare sentiments of tweets of two or more topics (e.g. Huawei vs Apple vs Samsung). Graphs and other important data such as sentiment percentage of overall tweets can also be exported to a CSV file.
2.2 Mini Literature Review
Twitter is an online social networking site that provides microblogging service to the public where people are able to communicate, share opinions and thoughts in short messages called tweets. Twitter is considered to be scan-friendly as every tweet entry is limited to 280 characters or less. CITATION Pau19 l 1033 (Gil, 2019) People used Twitter for all kinds of reasons, it can be for attention, shameless self-promotions, vanity, sharing thoughts and opinions or just because of boredom. CITATION Pau19 l 1033 (Gil, 2019)Due to this, Twitter has become the gold mine of data and a perfect place to do sentiment analysis.
According to CITATION tec19 l 1033 (techopedia, 2019) , sentiment analysis also known as text mining is a tool to extract data and recognize the writers feelings when it is presented a subject opinion of a document or collection of documents such as reviews, blog posts, social media feeds like tweets from Twitter. All of these data are known as unstructured data. Unstructured data that is in the form of text can a great source of information but extracting insights from these texts is very difficult and time-consuming. This is where text classification comes in the picture.
2.2.1 Text Classification
Text classification or text categorization is used to organized by sentiment, structure and categorize any form of text that is taken from the internet. CITATION Mon191 l 1033 (MonkeyLearn , 2019) In this day and age, automatic classification is much more widely used because when it is applied with machine learning and natural language processing, the process to classify text automatically is faster and cost-effective. CITATION Mon191 l 1033 (MonkeyLearn , 2019)One of the methods in implementing text classification is called Machine Learning based systems. Unlike traditional method of manually crafted rules, machine learning learns to make classifications based on past observations and inputs. Training data that contains pre-labelled examples is used by machine learning algorithms to learn the different associations between pieces of text.
In Machine Learning based system, there are two types of sentiment classification learning methods:
This type of learning is used to classify document or sentences into distinct set of classes, such as positive, negative and neutral.
Training data sets is prepared for all kind of classes.
Training data sets are used to learn to map the input examples to expected target.
Examples of classifiers: Na?ve Bayes, fuzzy rationale, Maximum Entropy Neural System, support vector machine, Artificial Neural Networks (ANN) and so on
CITATION San19 l 1033 (Sangita N. Patel, 2019)
Rarely use training data set for classification
This learning is typically used in productive opinion examination, syntactic, term-based approach classifications
Examples of classifiers: Point wise mutual information (PMI) CITATION San19 l 1033 (Sangita N. Patel, 2019) Classifiers for supervised learning will be further discussed in the section below.
18.104.22.168 Na?ve Bayes (NB)
Na?ve Bayes is known for its simple, yet effective approach in classifying text. It is based on Bayes Theorem that helps us to calculate the conditional probabilities of occurrence of two events based on the probabilities of occurrence of each individual event. CITATION Dev18 l 1033 (Soni, 2018) Hence the reason why it makes Na?ve Bayes very useful when large data sets are involved in its learning process. In sentiment analysis, Na?ve Bayes algorithm is able to compute the probability of a given data or input to be positive, negative or neutral.
In Text classification, Na?ve Bayes act as a classification method that is based on Bayes rule of conditional probability. Below given the formula where h is the hypothesis and x is the attribute. CITATION Rah17 l 1033 (Saxena, 2017)
22.214.171.124 Support Vector Machine (SVM)
Similar to Na?ve Bayes, Support Vector Machine does not need a lot of training data to deliver accurate results. The difference is that SVM required more computational resources than Na?ve Bayes but SVM will able to deliver more accurate results than Na?ve Bayes. CITATION Mon191 l 1033 (MonkeyLearn , 2019)SVM algorithm is linear classification or regression algorithm. It searches for a hyperplane that will separate the data in two classes, one contains the vectors that belong to a group and another one that contains the vectors that do not belong to that group. CITATION ata15 l 1033 (ataspinar, 2015)
The image above presented the example of points plotted in 2D-space. Two categories are labelled to the pointed and the hyperplane is chosen by the SVM to maximize the distance between the two classes.
Even though SVM is said to yield more accurate results, however SVM is clearly showed that it can only works on data sets that linearly separable and only able to separate data sets in two classes. This makes SVM not suitable for Sentiment Classification where three classes are used (positive, neutral, negative) and Topic Classification. CITATION ata15 l 1033 (ataspinar, 2015)126.96.36.199 Deep Learning
Deep learning is the inspiration product based on the human brain. It contains a set of algorithms and techniques that work like how human brain works. Due to the recent resurgence of deep learning architecture in Artificial Intelligence field, text classification is able to be developed even further. There two main deep learning architectures used in text classification, they are: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CITATION Mon191 l 1033 (MonkeyLearn , 2019)Compare to traditional machine learning algorithms, deep learning requires millions of training data to work. However traditional machine learning classifiers like NB and SVM, do not really improve their accuracy if given more training data. In contrast to deep learning classifiers, the more data you feed them to learn, they will continue to get better and improve their accuracy. CITATION Mon191 l 1033 (MonkeyLearn , 2019)
188.8.131.52 Advantages and Disadvantages of Na?ve Bayes, Support Vector Machines and Deep Learning
Table below shows the advantages and disadvantages of each of the discussed classifiers:
Classifiers Advantages Disadvantages
Na?ve Bayes Easy, simple and fact to predict classes of training data set
Performs better when assumption of independence holds, and less training data is needed
Performs well with categorical input compared to numerical input
CITATION Gen18 l 1033 (Genesis, 2018)Unable to make prediction if categorical variable has a category and is not found in the training data set. Occurrence of Zero Frequency when model will assign a 0(zero) to the given data
Assumption of independent predictor. No set of predictors are completely independent in real life
CITATION Gen18 l 1033 (Genesis, 2018)Support Vector Machines Work very well with clear margin of separation
Useful in high dimensional spaces (text classification)
Effective in cases where number of dimensions is greater than the number of given samples
Support vectors are used as training points in decision function
CITATION Hac19 l 1033 (HackingNote, 2019) CITATION SUN17 l 1033 (Ray, 2017)Does not perform well with large training data set as it is memory-intensive
Does not perform with training data sets that has noises (target classes overlapping)
Does not directly provide probability estimates
CITATION Hac19 l 1033 (HackingNote, 2019) CITATION SUN17 l 1033 (Ray, 2017)Deep Learning (neural networks) Can handle non-linear data with large number of input features
Accuracy improve if given more training data sets
Many open source implementations
Useful for numerical inputs, vectors with constant number of values and datasets with existing data
Good in classification for text, image, video, audio
CITATION Hac19 l 1033 (HackingNote, 2019)Computationally expensive
Trained model critically depends on initial parameters
Difficult to troubleshoots issues
Hard to train, requires lots of tuning in parameters
Hard to understand
CITATION Hac19 l 1033 (HackingNote, 2019)BIBLIOGRAPHY
BIBLIOGRAPHY somaproject , 2016. What is semantic analysis. [Online] Available at:
ataspinar, 2015. Text Classification and Sentiment Analysis. [Online] Available at:
Bose, B., 2018. Twitter Sentiment Analysis Introduction and Techniques. [Online] Available at:
Genesis, 2018. Na?ve Bayes. [Online] Available at:
Gil, P., 2019. What Is Twitter & How Does It Work?. [Online] Available at:
Gurendo, D., 2015. Software Development Life Cycle (SDLC). All About Kanban Model. [Online] Available at:
HackingNote, 2019. Machine Learning Algorithms Pros and Cons. [Online] Available at:
MonkeyLearn , 2019. Text Classification A Comprehensive Guide to Classifying Text with Machine Learning. [Online] Available at:
MonkeyLearn, 2019. Sentiment Analysis Nearly Everything You Need to Know. [Online] Available at:
Palani, P., 2018. Understanding Semantic Analysis (And Why This Title is Totally Meta). [Online] Available at:
Ray, S., 2017. Understanding Support Vector Machine algorithm from examples. [Online] Available at:
Sangita N. Patel, J. B. C., 2019. A Survey of Sentiment Classification Techniques. 01(01), p. 20.
Saxena, R., 2017. HOW THE NAIVE BAYES CLASSIFIER WORKS IN MACHINE LEARNING. [Online] Available at:
Soni, D., 2018. Introduction to Naive Bayes Classification. [Online] Available at:
Symeonidis, S., 2019. 5 Things You Need to Know about Sentiment Analysis and Classification. [Online] Available at:
techopedia, 2019. Semantics. [Online] Available at:
Th?bault, M., 2019. Semantic analysis exposing the value of your companys data. [Online] Available at: