Sentiment Analysis of Play Store App Reviews

Sentiment analysis is a process of classifying the

user opinions or sentiments expressed in the form of text.

Google play store is providing several applications in their

store and also a option to the user to rate particular

application and write the op inion of the user in the dorm of

reviews . Sentiment analysis of this user reviews will be

very useful in knowing the detailed opinion the user so that

accurate opinion of the users can be determined. Sentiment

analysis of this app reviews is difficult whe n compared to

normal sentiment analysis because these reviews consist of

several words of native language other than English and

typing mistakes. Knowledge based approach and Machine

learning approach like Natural language processing are

the two strategie s to analyze the user reviews and produce

output in form of sentiments. In this paper, we try to

analyze the play store application reviews about several

apps available in the play store of different categories like

games, travel, entertainment, healthcare etc using Natural

Language Processing approach. By performing sentiment

analysis of several reviews extracted, we generate

automatic bug reports and send personalized notifications

to the application developer and provide detailed rating of

the applicati on by analyzing the reviews using sentiment

analysis and classify them according to their opinion.

Index Terms — Play store reviews, Sentiment Analysis,

Machine Learning Techniques, App reviews


In this present world, huge data of various categories like sales

data, transactional data, customer data, user activity data and

other types of data resources are being collected by

organizations. In the recent year’s reduction in storage cost, has

create d a great opportunity for several organizations to

understand their customers by collecting the data. This reason

has made use of many latest technologies like big data, data

mining, analytics etc. End to end user data is stored in detailed

and is being an alyzed for the better understanding of the user.

This huge data can be easily handled by platforms like big data

to process them. Many platforms are available to extract

knowledge from the given data using several machine learning

techniques which includes both supervised and non -supervised

learning and other data mining techniques to obtain useful data

from the input data. Applications like R -studio and several

SAAS providers are available to

analyze them and reporting tools like Tableau are used to

gene rate specified reports from the data analyzed .

Methodologies like NLP, supervised and unsupervised learning

approaches are available for analysis. This sentimental analysis

is being used in several fields like Twitter data, Product reviews,

Movie revie ws and other types of data obtained from the user.

Google play store is an online platform available in the form of

application where several android application developers

develop their applications and place them in the google play

store for the availabi lity of users. Android users download these

apps provided by this google play store platform. Google play

store also provides many facilities to the user like users can give

their ratings for the application they have downloaded and they

can also post thei r opinions in the form of reviews. It also

provides options to report the app and you can flag their app as

inappropriate on their sophisticated platform. In this paper, we

are proposing a system to extract user reviews from the play

store, preprocess them to eliminate the unuseful data from the

data extracted, analyze the preprocessed data sentimentally to

obtain opinions of the users and segment the reviews based on

their ratings obtained after sentimentally analyzing them,

generate bug reports, generate reports containing suggestions

which help the developer to get desired result easily from this proposed system.

Fig.1.A review awarded five -star rating with negative keywords


In this part we would like to briefly explain the previous works

done by several scholars similar to the proposed concepts.

Mir Riyanul Islam [1] proposed an idea where all the reviews are

represented in rating form using optimization based approach.

This process also provide output for longer sentences un der

certain limits. He proposed system to reduce ambiguity in

extracting and understanding the user. The probabilistic

approach to obtain polarity of the user reviews and calculate the

precession and recall values are calculated, candidate

expressions are generated and numeric polarity of the reviews is

extracted from the given reviews this rating is normalized to

obtain an averaged value. Ratings extracted from the candidate

expressions created are recorded and values are extracted from

the input. This pa per lacks in achieving accuracy and uses a old

method to obtain results and also cannot generate bug reports. It

didn’t implement a sufficient data required to extract accurate

results from the input data. This paper is limited to explain only

polarity. Re sults obtained are restricted to this category so these vary from one category to another.

A paper explaining sentiment analysis of twitter reviews

byNeethu and Rajasree [2] at sentence level. They have used

several machine learning concepts like naiv esbayes, SVM,

Ensemble methods for analysis and found that all algorithms

produce similar accuracy so they concluded that feature vectors

can be made using any of these algorithms. Handling the given

emoticons and using sentiment scores to obtain the valu es. Parts

of speech are also considered for feature extraction. They failed

to explain the handling of bugs obtained in the dataset and

suggestions window is not given in the given system.

Shivaprasad T K and JothyShetty [3] suggested the several

algorithm s available for performing sentiment analysis like

machine learning techniques and lexicon based methods . They

also followed polarity based approach and concluded the results

by testing the given data using several algorithms like SVM,

Naive -Bayes, Maximu m entropy and the results obtained from

each of the algorithms is calculated and the best among them is

given to obtain more accuracy. They also explained the

usefulness of sentiment analysis in several domains. These

failed in explaining the performance m easures and lack of

implementation of the techniques explained. These techniques

hold good only for the product reviews and the corpus required

to other values is not obtained. Machine learning techniques are

implemented by Bo Pang and Lillian Lee and Shiv akumar

Vaithyanathan [4] have implemented the mining of reviews by

implementing several machine learning algorithms Naive Bayes

,support vector machines and maximum entropy classification

are used in movie review domain where list of positive words

and negative words are proposed and also explains the accuracy

of the given words are displayed. They also considered parts of

speech and grammar for analysis resulting in higher accuracy.

Unigrams and bigrams are formed from the given data to verify

the accuracy level and computed their results. They lack in

generating identifying bugs and handling suggestions reviews.

They also considered that identi fication of topics of sentences

with respect to their features is to be done in their further works.

Phong Minh Vu, Tam The Nguyen, Hung Viet Pham and Tung

Thanh Nguyen [5] have together implemented a semi -automated

sentiment analysis of app reviews by usin g a keyword based

approach which involves the process of ranking the keywords in

the given by keyword based approach called MARK which

helps in saving time to the developers to understand user

reviews by mapping their context to their approach. This pap er

fails to handle the errors or bugs mentioned in the reviews which

complicates the handling of bugs obtained. System proposed by

the Wei Zhao, Ziyu Guan_ , Long Chen, Xiaofei He, Fellow,

IAPR, Deng Cai, Beidou Wang and Quan Wang [6] used wealky

-supervis ed deep embedding for analyzing the product reviews

sentimentally. They have used deep learning concepts like

Neural Networks for the processing of their analysis. A novel

deep learning framework is proposed for sentiment classification

which involves lear ning of high level representation initially and

classifying the sentences by labeling them and model the

reviews according to their classification. several types of

networks like Recursive neural networks (RNN), Convolutional

neural networks models are us ed in learning process and also in

classification phase. A Weakly -supervised Deep Embedding

(WDE) and its ins tantiations like using convolutional neural

networks (W DE -CNN) and Long short -term memory(WDE –

LSTM) are proposed for text embedding using restricte d

boltzmann machines (RBM) is used for large quantities of data

and resulting performance measures are obtained for the given

data. Handling of reviews mentioning bugs present in the

dataset is not explained by the above proposed system.


A dataset of reviews consisting of mixed types of reviews like

positive reviews and negative reviews is used for training the

system and then we pass the test data to the system and extract

the accurate rating of the application by analyzing the re views

sentimentally and segregate the reviews containing the bug

reports and the reviews containing suggestions which helps the

developer in knowing the user opinions in a easy way.

A. System Architecture

B. Methodology

For this classification we are using a bi -graph method as a part

of bi gram as a part of tokenization which makes use of two

words at once. By using bigrams we can handle words like

doesn’t work, very bad, very good , not good etc . Here ‘ not good’

is a word with negative sense but i f we consider word as

unigrams then it considers word as ‘ good ‘ which gives positive

score but if we take in bigrams then ‘ not good ‘ is considered as

a word with negative sense so it adds negative value to the total

score of the review. This procedure p rovides more accuracy

resulting in accurate score for each review . First the data is

preprocessed which involves removing of punctuations, case

conversion into small letters, stemming, lemmatization,

handling emoticons, user mentions , handling url’s and

unnecessary spaces during the preprocessing the data. A corpus

with several words are created consisting of positive words and

negative words in the terminology of application reviews and

also with general English words like good, better, best, ok, b ad,

wo rst, terrible etc and a specific sentiment scores are allocated

to each type of words in the repository created and differentiated

according to their scores . Some words and their sentiment

scores in the corpus table are given below table

Fig.2. Table with some words and their sentiment scores

Every review in the preprocessed data is analyzed by splitti ng

the review into bigrams and total score of each review is

computed for the entire dataset. Now the reviews are classified

based on their scores into different categories by passing

through different trained classifiers like SVM, Random Forest,

Naive Bayes, in this paper we have used Random forest

classifier for better results . R eviews t hat denote the presence of

bugs in t he application are identified by checking through the

words in another corpus containing words like doesn’ t work, not

working, error, bug etc and reviews with these words are

displayed under separate window so that developer can easily

find the bug reports . Reviews denoting suggestions to the

developer are also found by checking the presence of some

keywords present in their respective corpus with words denoting

suggestions are also taken and disp layed in their respective window.


In this paper, procedure to extract the sentiments of the

user towards the android application present in the play store

and classify them according to their scores using classifier ,

consideration of reviews mentioning bugs in the application an d

reviews suggesting the developer for modifications are separated

from the raw data and displayed in their respective windows

that ease the developer to easily identify reviews with different

opinions is done. To the best of our knowledge, obtaining of

sentiment scores from the reviews and their classification

according to their scores is done in earlier studies but handling

of reviews reporting bugs and reporting suggestions are never

considered. Mostly many have proposed different approaches

for classif ication of reviews only. Since the reviews varies from

one application to another application in the play store our

proposed system is efficient from the point of view of the data

we considered and results may vary from one application to another applicati on.


[1] Mir Riyanul Islam, “” Numeric Rating of Apps on Google

Play Store by Sentiment Analysis on User Reviews”” at the

proceedings of International Conference on Electrical

Engineering and Information & Communication Technology

(ICEEIC T)2014,Available:


[2] Neethu M S and Rajasree R, “” Sentiment Analysis in Twitter

using Machine Learning Techniques “” with proceedings at 4th

ICCCNT 2013 July 4 – 6, 2013, Tiruchengode, India. Available :


[3] Shiva prasad TK and Jyothi Shetty ,”” Sentiment analysis of

product reviews:a review”” with proceedings at International

Conference on Inventive Communication and Computational

Technologies (ICICCT 2017) Availabl e:

[4] Phong Minh Vu, Tam The Nguyen, Hung Viet Pham and

Tung Thanh Nguyen, “”Mining User Opinions in Mobile App

Reviews: A Keyword -based Approach “” Available:

[5] Bo Pang and Lillian Lee and Shivakumar Vaithyanatha n,

“”Thumbs up? Sentiment Classifi cation using Machine Learning

Techniques”” with the Proceedings of the Conference on

Empirical Methods in Natural Language Processing (EMNLP),

Philadelphia, July 2002, pp. 7 9-86. Available:

[6] Wei Zhao, Ziyu Guan_ , Long Chen, Xiaofei He, Fellow,

IAPR, Deng Cai, Beidou Wang and Quan Wang “” Weakly –

supervised Deep Embedding for Product Review Sentiment

Analysis had proceedings at IEEE TRANSACTIONS ON




Cite this page

Sentiment Analysis of Play Store App Reviews. (2019, Nov 17). Retrieved from

Let’s chat?  We're online 24/7