Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19 1 Introduction 1.1 Goals and Objectives The project is based on a subset of machine learning known as Sentiment Analysis. The idea is that on shopping websites like Amazon, for every product there can be thousands of reviews that can be highly contradicting in nature, so going through each and every review might not be possible and the purpose of writing reviews is defeated. Our aim is to develop an application that will act as a polarizer for the large amount of customer reviews available at the Amazon website, for a wide range of individual products.
There are several existing models developed for sentiment analysis of customer re- views and comments, which do not provide any information other than the polarity of it. We wanted to see how reviews vary over a geographical region or depending on age. This will help the customer to get a better overview of the product. Our application will provide the overall percentage of positive and negative senti- mentsofcustomeronparticularproductalongwiththeanalyticsbasedongeograph- ical region and age.
To achieve this, we used a range of machine learning models to better understand what works best in this specic case. We compared the results and arrived on a machine learning model that provides good accuracy – MultinomialNB. We also de- veloped a User Interface in the form of a web application that provides the user relevant analytics in terms of the product selected. 1.2 Overview of the technical area 1.2.1 Machine Learning Machine learning is a subset of articial intelligence (AI) which refers to the learning in which a machine can learn and expand its knowledge by its own without being programmed explicitly.
Humans can train the machines to learn from past data so that the machines can do what humans can and act much faster than humans. The machines are trained by feeding good quality data and then constructing machine learning models using the data and various algorithms. The process begins with sup- plying good data and then training our computers by constructing machine learningDept. of CSE, DSCE, Bangalore 78 Page 1 Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19 models using the data and various algorithms. Depending on the kind of task we are trying to automate and the type of data we have the selection of machine learning algorithm can be done. Here we can generate a program by incorporating input and output of that program. It focuses on the advancement of the computer applications that can receive quality data and use statistical analysis to forecast an output while revising outputs as new data becomes available. So machine learning is a lot more than just learning, it’s also about understanding and reasoning. In machine learning models, more the data, better is the model and higher will be the accuracy of the model. Two sets of documents are needed in the Machine learn- ing based approach for sentiment analysis: training set and a test set. A supervised learning classier uses the labelled data to learn and train itself with respect to the dierentiating features of text, and the performance of the classier is tested using test dataset. Several machine learning algorithms like Naive Baye’s (NB), Logistic Regression (LR),Maximum Entropy (ME), and Support Vector Machines (SVM) are usually used for classication of text. Machine Learning for sentiment analysis starts with collection of dataset containing labelled data. This dataset might be noisy and subsequently should be pre handled utilizing various Natural Language processing (NLP) techniques. Then features that are relevant for sentiment analysis need to be extracted and nally the classier is trained and tested on unseen data. These algorithms are explained in detail in sec- tion 5. 1.2.2 Natural Language Processing Natural language processing refers to the articial intelligence methods of commu- nicating with an intelligent system using the natural language, i.e. the human lan- guage. NLP is divided into two major components, namely, Natural Language Un- derstanding and Natural Language Generation. The understanding generally refers to mapping the given input into natural language into useful representation & ana- lyzing those aspects of the language, whereas generation is the process of producing meaningful phrases and sentences in the form of natural language from some internal representations. There are various steps in NLP which are tokenization, stemming, lemmatization,POS tagging, name entity recognition and chunking. One of the most important applications of NLP is sentiment analysis. Other applica- tions include chatbots, speech recognition, keyword search, information extraction, advertisement matching.Dept. of CSE, DSCE, Bangalore 78 Page 2 Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19 NLP mainly rely on machine learning to automatically learn the rules by analyzing a set of examples and making a strategic reasoning. Python’s NLTK (Natural Lan- guage Toolkit) library is heavily used for all the natural language processing and text analysis. 1.2.3 Sentiment Analysis Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral. Sentiment analysis is a kind of opinion mining of customers, users and audience through social media and other online platforms towards a particular product, ser- vices, brand and companies. Sentiment analysis is a kind of measurement of positive and negative languages and it helps you to see what customers like and dislike about you, your brand and your company. Sentiment Analysis helps to investigate the feel- ings prevail about a certain things and reviewing your customer’s feedback on your business regularly. 1.2.4 Relationship between machine learning, NLP and sentiment anal- ysis Machine Learning (ML) and Natural Language Processing (NLP) are two subsets of AI, which can be combined together to help solve many data problems. Figure 1: Relationship between machine learning, NLP and sentiment analysisDept. of CSE, DSCE, Bangalore 78 Page 3 Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19