A process of classifying the user opinions or sentiments expressed in the form of text. Google play store is providing several applications in their store and also a option to the user to rate particular application and write the op inion of the user in the dorm of reviews . Sentiment analysis of this user reviews will be very useful in knowing the detailed opinion the user so that accurate opinion of the users can be determined. Sentiment analysis of this app reviews is difficult whe n compared to normal sentiment analysis because these reviews consist of several words of native language other than English and typing mistakes.
Knowledge based approach and Machine learning approach like Natural language processing are the two strategie s to analyze the user reviews and produce output in form of sentiments.
In this paper, we try to analyze the play store application reviews about several apps available in the play store of different categories like games, travel, entertainment, healthcare etc using Natural Language Processing approach.
By performing sentiment analysis of several reviews extracted, we generate automatic bug reports and send personalized notifications to the application developer and provide detailed rating of the applicati on by analyzing the reviews using sentiment analysis and classify them according to their opinion. Index Terms Play store reviews, Sentiment Analysis, Machine Learning Techniques, App reviews. In this present world, huge data of various categories like sales data, transactional data, customer data, user activity data and other types of data resources are being collected by organizations.
In the recent year’s reduction in storage cost, has create d a great opportunity for several organizations to understand their customers by collecting the data. This reason has made use of many latest technologies like big data, data mining, analytics etc. End to end user data is stored in detailed and is being an alyzed for the better understanding of the user. This huge data can be easily handled by platforms like big data to process them. Many platforms are available to extract knowledge from the given data using several machine learning techniques which includes both supervised and non -supervised learning and other data mining techniques to obtain useful data from the input data. Applications like R -studio and several SAAS providers are available to analyze them and reporting tools like Tableau are used to gene rate specified reports from the data analyzed . Methodologies like NLP, supervised and unsupervised learning approaches are available for analysis.
This sentimental analysis is being used in several fields like Twitter data, Product reviews, Movie revie ws and other types of data obtained from the user. Google play store is an online platform available in the form of application where several android application developers develop their applications and place them in the google play store for the availabi lity of users. Android users download these apps provided by this google play store platform. Google play store also provides many facilities to the user like users can give their ratings for the application they have downloaded and they can also post thei r opinions in the form of reviews. It also provides options to report the app and you can flag their app as inappropriate on their sophisticated platform.
In this paper, we are proposing a system to extract user reviews from the play store, preprocess them to eliminate the unuseful data from the data extracted, analyze the preprocessed data sentimentally to obtain opinions of the users and segment the reviews based on their ratings obtained after sentimentally analyzing them, generate bug reports, generate reports containing suggestions which help the developer to get desired result easily from this proposed system. A review awarded five -star rating with negative keywords. In this part we would like to briefly explain the previous works done by several scholars similar to the proposed concepts. Mir Riyanul Islam proposed an idea where all the reviews are represented in rating form using optimization based approach. This process also provide output for longer sentences un der certain limits. He proposed system to reduce ambiguity in extracting and understanding the user.
The probabilistic approach to obtain polarity of the user reviews and calculate the precession and recall values are calculated, candidate expressions are generated and numeric polarity of the reviews is extracted from the given reviews this rating is normalized to obtain an averaged value. Ratings extracted from the candidate expressions created are recorded and values are extracted from the input. This pa per lacks in achieving accuracy and uses a old method to obtain results and also cannot generate bug reports. It didn’t implement a sufficient data required to extract accurate results from the input data. This paper is limited to explain only polarity. Re sults obtained are restricted to this category so these vary from one category to another.
A paper explaining sentiment analysis of twitter reviews byNeethu and Rajasree at sentence level. They have used several machine learning concepts like naiv esbayes, SVM, Ensemble methods for analysis and found that all algorithms produce similar accuracy so they concluded that feature vectors can be made using any of these algorithms. Handling the given emoticons and using sentiment scores to obtain the valu es. Parts of speech are also considered for feature extraction. They failed to explain the handling of bugs obtained in the dataset and suggestions window is not given in the given system. Shivaprasad T K and JothyShetty suggested the several algorithm s available for performing sentiment analysis like machine learning techniques and lexicon based methods .
They also followed polarity based approach and concluded the results by testing the given data using several algorithms like SVM, Naive -Bayes, Maximu m entropy and the results obtained from each of the algorithms is calculated and the best among them is given to obtain more accuracy. They also explained the usefulness of sentiment analysis in several domains. These failed in explaining the performance m easures and lack of implementation of the techniques explained. These techniques hold good only for the product reviews and the corpus required to other values is not obtained. Machine learning techniques are implemented by Bo Pang and Lillian Lee and Shiv akumar Vaithyanathan have implemented the mining of reviews by implementing several machine learning algorithms Naive Bayes ,support vector machines and maximum entropy classification are used in movie review domain where list of positive words and negative words are proposed and also explains the accuracy of the given words are displayed.
They also considered parts of speech and grammar for analysis resulting in higher accuracy. Unigrams and bigrams are formed from the given data to verify the accuracy level and computed their results. They lack in generating identifying bugs and handling suggestions reviews. They also considered that identi fication of topics of sentences with respect to their features is to be done in their further works. Phong Minh Vu, Tam The Nguyen, Hung Viet Pham and Tung Thanh Nguyen have together implemented a semi -automated sentiment analysis of app reviews by usin g a keyword based approach which involves the process of ranking the keywords in the given by keyword based approach called MARK which helps in saving time to the developers to understand user reviews by mapping their context to their approach.
This pap er fails to handle the errors or bugs mentioned in the reviews which complicates the handling of bugs obtained. System proposed by the used wealky -supervis ed deep embedding for analyzing the product reviews sentimentally. They have used deep learning concepts like Neural Networks for the processing of their analysis. A novel deep learning framework is proposed for sentiment classification which involves lear ning of high level representation initially and classifying the sentences by labeling them and model the reviews according to their classification. several types of networks like Recursive neural networks (RNN), Convolutional neural networks models are us ed in learning process and also in classification phase. A Weakly -supervised Deep Embedding (WDE) and its ins tantiations like using convolutional neural networks (W DE -CNN) and Long short -term memory(WDE – LSTM) are proposed for text embedding using restricte d boltzmann machines (RBM) is used for large quantities of data and resulting performance measures are obtained for the given data.
Handling of reviews mentioning bugs present in the dataset is not explained by the above proposed system. III. PROPOSED SOLUTION A dataset of reviews consisting of mixed types of reviews like positive reviews and negative reviews is used for training the system and then we pass the test data to the system and extract the accurate rating of the application by analyzing the re views sentimentally and segregate the reviews containing the bug reports and the reviews containing suggestions which helps the developer in knowing the user opinions in a easy way. System Architecture B. Methodology For this classification we are using a bi -graph method as a part of bi gram as a part of tokenization which makes use of two words at once. By using bigrams we can handle words like doesn’t work, very bad, very good , not good etc .
Here ‘ not good’ is a word with negative sense but i f we consider word as unigrams then it considers word as ‘ good ‘ which gives positive score but if we take in bigrams then ‘ not good ‘ is considered as a word with negative sense so it adds negative value to the total score of the review. This procedure p rovides more accuracy resulting in accurate score for each review . First the data is preprocessed which involves removing of punctuations, case conversion into small letters, stemming, lemmatization, handling emoticons, user mentions , handling url’s and unnecessary spaces during the preprocessing the data.
A corpus with several words are created consisting of positive words and negative words in the terminology of application reviews and also with general English words like good, better, best, ok, b ad, wo rst, terrible etc and a specific sentiment scores are allocated to each type of words in the repository created and differentiated according to their scores . Some words and their sentiment scores in the corpus table are given below table Fig.2. Table with some words and their sentiment scores Every review in the preprocessed data is analyzed by splitti ng the review into bigrams and total score of each review is computed for the entire dataset. Now the reviews are classified based on their scores into different categories by passing through different trained classifiers like SVM, Random Forest, Naive Bayes, in this paper we have used Random forest classifier for better results .
R reviews t hat denote the presence of bugs in t he application are identified by checking through the words in another corpus containing words like doesn’ t work, not working, error, bug etc and reviews with these words are displayed under separate window so that developer can easily find the bug reports . Reviews denoting suggestions to the developer are also found by checking the presence of some keywords present in their respective corpus with words denoting suggestions are also taken and disp layed in their respective window.
In this paper, procedure to extract the sentiments of the user towards the android application present in the play store and classify them according to their scores using classifier, consideration of reviews mentioning bugs in the application an d reviews suggesting the developer for modifications are separated from the raw data and displayed in their respective windows that ease the developer to easily identify reviews with different opinions is done. To the best of our knowledge, obtaining of sentiment scores from the reviews and their classification according to their scores is done in earlier studies but handling of reviews reporting bugs and reporting suggestions are never considered. Mostly many have proposed different approaches for classif ication of reviews only. Since the reviews varies from one application to another application in the play store our proposed system is efficient from the point of view of the data we considered and results may vary from one application to another applicati on.