Sentimental Analysis framework for Twitter Streaming

Sentimental Analysis framework for Twitter Streaming Data

Maunika Nittala Prajkta R. Bhandarwar

Dept. of Information Technology Dept. of Information Technology

Abstract: Twitter is an online social networking service with more than 300 million users, generating a huge amount of information every day. Twitter’s most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or even something totally new. This study focuses on analysing social activity resulting from various tweets. Social set analysis consists of a generative framework for combining big social data sets with organizational and societal data sets.

Currently there are different workflows offering data analysis for Twitter, presenting general processing over streaming data. This study will attempt to develop an analytical framework with the ability of in-memory processing to extract and analyze structured and unstructured Twitter data. Spark makes it possible to perform sophisticated data processing and machine learning algorithms. We will conduct a case study on tweets about the Politics and the reactions of people with analysis of the tweets.

The proposed framework includes data ingestion, stream processing, and data visualization components.


As we are generating a huge amount of digital data which is said to be big data it is difficult to store the data and it is difficult to handle such a big data,for that we need big data analysis. Analysis of this growing data is possible through analytical tools. Big data tools and technology provide opportunity to handle big amount of data present. Data management has involved many such technology for the various types of data and that are: real-time, structured and unstructured data.

Get quality help now

Proficient in: Computer Science

4.9 (247)

“ Rhizman is absolutely amazing at what he does . I highly recommend him if you need an assignment done ”

+84 relevant experts are online
Hire writer

A product is a service or any item that is provided to the user that may be hardware or software.Every product has it’s value in the form of money.The product can be a innovation or it can be re-invented. It includes the analysis of the data through various tools and platforms available for the analysis as it is trending and also the concept of machine learning. This section gives the view of the analytical platform that are used or studied in the past. Twitter has been the most commonly used microblogging application nowadays which is why we have decided to work on it.

Twitter Application Programming Interface

The interface Twitter API is used to collect streaming Tweets from Twitter which also stores tweet scores along with its timestamp.

Publicly posted Tweets published by users are extracted. In order to create a POST request to the twitter API and fetch the search results as a stream it uses Create_Streaming_Connection() method. In one connection 5,000 Twitter user ids are allowed to submit for an application. Only publicly published Tweets can be captured using the API. The Streaming API searches for hashtags, keywords and geographic bounding boxes simultaneously. The filter API helps for searching and delivers the continuous stream of Tweets which matches the filter tag. POST method is preferred while creating the request, because long URLs are truncated and GET method is used to retrieve the results.

Literature Review

Turney [1] used bag-of-words method in which the relationships between words was not considered at all for sentiment analysis and a sentence is simply considered as a collection of words. To determine the sentiment for the whole sentence, sentiment of every individual word was determined separately and those values are aggregated using some aggregation functions.

Pak and Paroubek [2] proposed a model to classify the tweets as positive and negative. By using Twitter API they created a twitter corpus by collecting tweets and automatically annotating those tweets using emoticons. Using that corpus, the multinomial Naive Bayes sentiment classifier method was developed which uses features likePOS-tags and N-gram. The training set used in the experiment was less efficient because they considered only tweets which have emoticons.

Po-Wei Liang [3] used Twitter API to collect data from twitter. Tweets which contain opinions were filtered out. Unigram Naive Bayes model was developed for polarity identification. They also worked for elimination of unwanted features by using the Mutual Information and Chi square feature extraction method. Finally, the approach for predicting the tweets as positive or negative did not give better accuracy by this method.

Thet [4], proposes a linguistic approach system for aspect based opinion mining, which is a clause/Sentence level sentiment analysis for opinionated texts. For every message post sentence it generates a syntactic dependency tree, and splits the sentence into clauses. It then determines the

contextual based sentiment score for each clause using grammar dependency of words and uses SentiWordNet which has prior sentiment scores for the words and also from domain specific lexicons.

Hussein[5], this paper explains the previous works, the goal is to identify the most significant. challenges in sentiment and explore how to improve the accuracy results that are relevant to the used techniques.


Twitter API’s are generated using the developer account after creating an application. The proposed system extracts the data which is done using Streaming API of twitter. The extracted tweets are loaded into HDFS with the help of Flume and are converted unstructured format to structured format which is pre-processed using map reduce. Consider the number of all positive tweets, positive words and negative words. The probability of a word is then checked which then classifies, if the probability of the word is greater than 0.6 then it is positive, as neutral if the probability is between 0.4 and 0.6 and negative if it is lesser than 0.


This project would undergo three folds:

Extraction and processing of Streaming Data

In the first phase we’ll be creating a twitter developers account. In developers account,an application has to be created which generates APIs. These APIs are required to extract twitter’s live streaming data. These data is processed using Flume.

2. Classification of the processed data

In the second phase the structured data obtained from the first phase is now classified into three categories 1) Positive 2) Neutral 3) Negative.

If the words are mapped as positive then the tweet turns out to be positive. If the words are mapped as negative then the tweet turns out to be negative. If the words are neither mapped as positive nor negative then the tweet turns out to be neutral.

3. Visualization of the classified data

In the last phase the categorized data is visualized with the help of Python. It is showed in the form of pie chart or a graph.

Prior Art Search

Naive Bayes is a classifier technique used for building classifiers which uses Support Vector Machines. Another approach is using natural language processing techniques, to determine topics, extract attributes of the topics, detect opinions about the attributes, and measure the sentiment value.


This analysis can be useful to maximize the profits in any field. Today, major business decisions are taken by utilising the insights derived from data related to the organization or industry related data. As competition increases and customers are flooded with choices, it has become important to move faster in the market and that too with accuracy and similar analysis will help in increasing business. It will provide both speed and accuracy to business decisions. It can also help in politics for analysing various things and people’s need. It can be used for analysis in the technical field as well.


B. Yadranjiaghdam, N. Pool, N. Tabrizi, “A Survey on Real-time Big Data Analytics: Applications and Tools,” in progress of International Conference on

Computational Science and Computational

Intelligence, 2016.

Babak Yadranjiaghdam, Seyedfaraz Yasrobi, Nasseh Tabrizi,”Developing a real time data analytics framework for twitter streaming data.”Department of computer science ,East Carolina University Greenville,NC.2017 IEEE 6th International Congress on Big Data.

M.Trupthi, Suresh Pabboju, G.Narasimha, “SENTIMENT ANALYSIS ON TWITTER USING STREAMING API”, 2017 IEEE 7th International Advance Computing Conference.

Cite this page

Sentimental Analysis framework for Twitter Streaming. (2019, Nov 15). Retrieved from

Let’s chat?  We're online 24/7