AbstractThis research project aims to discuss the concepts of


This research project aims to discuss the concepts of Big Data(BD) approach and framework as a very efficient platform used in many applications. One of such applications is the Recommendations Systems (RS). A lot of algorithms had been suggested to implement this application. They are different in terms of complexity and the way they are following for implementation. Next sections will show how the Big Data could be apply in RS and the advantages it gives to comparing to others algorithms.

Finally, there will be an analysis of the algorithm complexity.

Keywords: Big Data, Recommendation System, Hadoop, Cumulative Filtering, Content-Based filtering.


Abstract 1

1. Introduction 3

2. Recommendation Systems (RS) 3

2.1. Historical background 3

2.2. Definition 4

2.3. Variations of the technology 5

2.4. RS description 5

2.5. Techniques for implementing RS 6

2.5.1 Content-based technique 6

2.5.2 Collaborative filtering 6

2.5.3 Hybrid filtering 7

3. Big Data(BD) 7

3.1. Historical background 7

3.2. Definition 7

3.3. Description 8

3.4. BD Applications 8

4. BD Design for implementing RS 8

4.1. BD formulation 8

4.2. Solution Design 8

5. Complexity Analysis 8

6. References 9

1. Introduction

The appearance of e-commerce did change the life style of the people in the world.

That clearly happened after the giant company Amazon published its online e-commerce website. Then, EBay and other websites came later to increase the competition in the market which make a huge benefit for the customers. For keeping such a high-quality competition, such companies started to look for the technologies that increase their customers and their profits as a consequence.

One of such technology is the Recommender Systems. These are nothing but a system that playing a role of advisor for the customer and give them advices on some item they might need to buy depending on their buying history and the other customers buying history.

Get quality help now
Writer Lyla

Proficient in: Big Data

5 (876)

“ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

+84 relevant experts are online
Hire writer

This research project aims to discuss the concept of the Recommendation Systems and the algorithms used to implement them. One of such techniques is the Big Data frameworks and the embedded algorithms.

2. Recommendation Systems (RS)

2.1. Historical background

The first recommender system was implemented in 1994 by a group named Computer Supported Cooperative Work. [1] The system’s name was Tapestry. It was implemented to give a recommendation of documents from newsgroups. Also, it was the first time to introduce the term “Collaborative Filtering” (CF) which the approach used to analyze the contents and provide a recommendation upon.

Tapestry system was manual based. Soon another system come to the market using automated CF approach. Where the users became more interacting with evaluating the articles and other published journals. After some years, the recommendation systems start to be a hot topic and interesting area for researchers.

Some of those followed system are Ringo [2] for music (1995), the BallCore video recommender (1995) [3] and Jester for jokes (2001) [4]. The big change was happened after launching the giant online shopping website amazon.com. This gave a huge support for the recommendation system. Amazon spent efforts to implement a good quality RS. Nowadays, the giant online websites like Netflex and Ebay are using those technology and gain a lot of profits.

2.2. Definition

Recommendation Systems are defined as “An information filtering technology, commonly used on e-commerce Web sites that uses a collaborative filtering to present information on items and products that are likely to be of interest to the reader.” [5]. Another definition says “A recommendation engine, also known as a recommender system, is software that analyzes available data to make suggestions for something that a website user might be interested in, such as a book, a video or a job, among other possibilities. ” [6]

All those definitions are agreeing on the common point which is delivering a suggestion to the customers of their interest.

2.3. Variations of the technology

There are majorly six types of recommender systems which work primarily in the Media and Entertainment industry: Collaborative Recommender system, Content-based recommender system, Demographic based recommender system, Utility based recommender system, Knowledge based recommender system and Hybrid recommender system.[7]

This research will focus on Collaborative Recommender Systems using the technique Collaborative Filtering.

2.4. RS description

The Recommendation systems deal with the techniques that analyzing the inputs collected from different resources such as reviews, comments, rating, and then show a list of recommendations to the user in his interest.

2.5. Techniques for implementing RS

Recommendation System could be implemented using the following techniques:

2.5.1 Content-based technique

Content-based technique is focusing on analyzing the attributes of the item itself. Then, it build the prediction based on such attributes. Many algorithms could be used to do such filtration. It could use Vector Space Model such as Term Frequency Inverse Document Frequency (TF/IDF) or Probabilistic models such as Na?ve Bayes Classifier [8], Decision Trees [9] or Neural Networks[10] to model the relationship between different documents within a corpus

2.5.2 Collaborative filtering

In the contrast of content-based filtering, the collaborative filtering is focusing more on the item that couldn’t be easily described such as movies and music. The prediction is built on the preference of the customers for such items comparing it with other customers and try to find similarities on the preferences. This approach could be applied by two techniques: The Memory-based techniques such as user-based and item-based. The model-based techniques such as Clustering and Neural Networks.

2.5.3 Hybrid filtering

The hybrid filtering techniques are mixing the two concepts of the content-based and collaborative. That is to get better system performance and to come through the limitations of pure RS.

3. Big Data(BD)

3.1. Historical background

The term “Big Data” was first used by Roger Mouglas who is working with O’Reilly Media. That was on 2005, the same year where Yahoo launched its great framework “Hadoop”. [11]

The big data became more feasible after appearance of social media platforms and the web 2.0 as well as the increasing in the number of internet users and the number of smart phones. Internet of things recently entered the market which will direct the research toward the big data concerns.

3.2. Definition

There are a lot of definitions for big data. One says it “is an evolving term that describes a large volume of structured, semi-structured and unstructured data that has the potential to be mined for information and used in machine learning projects and other advanced analytics applications” [12]. Another define it as it” is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques” [13]. In general, they all agree on data nature as they are structured, semi-structured and, not structured. Also, the characterization of the data by its volume, velocity, variety, and veracity.

3.3. BD Description

There is a big debate on at what scale the data could be described as a big. One might says that once the data volume become larger than 1 TB, then, it should described as a big (see figure 1 below).

Figure 1: classification of data [14]

One interesting article title as “Moving data to compute or compute to data? That is the Big Data question” [15] by Denny Glee. Where he pointed on the issue of scaling up or scaling out the data, and that decision would give an indicator on whether someone is dealing with big data or not.

3.4. BD Applications

Big Data frameworks could be widely used in so many applications. It helps the big companies to deal with variety of problems especially the storage and the processing of the data. Facebook is an example of such big company that must deal with millions of connected users every day. Twitter is another example as well as Instagram, and so many. Figure 2 shows a different application of big data approach.

Figure 2: Applications of Big Data [16]

4. BD Design for implementing RS

4.1. RS formulation

For this project, the movies recommendation system will be used as an example of recommendation systems. The task is to take an existence movies rating training set, in this project: MovieLens [17], then analyzing them using one of Collaborative Filtering algorithms as well as one of big data algorithms.

The problem state that: given data on the activity of a set of users, the system should provide personalized recommendation to users X,Y,Z,..etc. The formulation is as follow:

Users: i in {1,2,3,…,m}

Movies: j in {1,2,3,…,n}

When user i watches movie j, he enters his rating R_{ij}. The system should predict rating for missing pairs.

4.2. Solution Design

The movie recommender system will be implemented using Item Collaborative Filtering and Hadoop MapReduce. The data set is MovieLens. The process of the recommendation is shown in figure 3 below:

Figure 3: Solution Process Design

6. References

[1] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proc. of Computer Supported Cooperative Work (CSCW), pp. 175-186, 1994.

[2] U. Shardanand and P. Maes, “Social information filtering: Algorithms for automating “word of mouth”,” in ACM CHI ’95, pp. 210–217, ACM Press/Addison-Wesley Publishing Co., 1995.

[3] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, “Recommending and evaluating choices in a virtual community of use,” in ACM CHI ’95, pp. 194–201, ACM Press/Addison-Wesley Publishing Co., 1995.

[4] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, “Eigentaste: A constant time collaborative filtering algorithm,” Information Retrieval, vol. 4, no. 2, pp. 133–151, July 2001.

[5] recommender systems, Webopedia. [online]. Available: [Accessed: 26-Mar-2019]

[6] recommendation engine, techtarget. [online]. Available: [Accessed: 26-Mar-2019]

[7] Classifying Different Types of Recommender Systems, bluepit. . [online]. Available: [Accessed: 26-Mar-2019]

[8] N. Friedman, D. Geiger, M. Goldszmidt. Bayesian network classifiers. Mach Learn, 29 (2–3) (1997), pp. 131-163

[9] R.O. Duda, P.E. Hart, D.G. Stork. Pattern classification. John Wiley & Sons (2012)

[10] Bishop CM. Pattern recognition and machine learning, vol. 4, no. 4. Springer, New York; 2006.

[11] A Short History Of Big Data, datafloq. [online]. Available: [Accessed: 26-Mar-2019]

[12] Big data, Searchdatamangement. [online]. Available: [Accessed: 27-Mar-2019]

[13] ] Big data, Webopedia. [online]. Available: [Accessed: 27-Mar-2019]

[14] How much data is “Big Data”?, Quora. [online]. Available: [Accessed: 27-Mar-2019]

[15] Moving data to compute or compute to data? That is the Big Data question, dennyglee.

[16] Application of Big Data, GreyCampus. [online]. Available: [Accessed: 27-Mar-2019]

[17] MovieLens training set, GroupLens. [online]. Available: [Accessed: 31-Mar-2019]

Cite this page

AbstractThis research project aims to discuss the concepts of. (2019, Dec 19). Retrieved from http://paperap.com/abstractthis-research-project-aims-to-discuss-the-concepts-of-best-essay/

Let’s chat?  We're online 24/7