In the previous sections of this article, we discussed in depth the mathematical model. We will demonstrate the implementation and later the evaluation of the SVDFeature algorithm in this section. The tools to be used are Recommenderlab, Rstudio, which is an open source library of recommender system as presented in the following sections below.
4.1 The infrastructure of recommenderLab
The recommenderlab is a framework package used for developing and testing the recommendation algorithms, those algorithms include: Item-based collaborative filtering, User- based collaborative filtering, association rule-based algorithms and SVD (Singular value Decomposition) which will be more considerable in this work.
The rcommenderlab package also provide basic mechanisms or infrastructure to develop the recommendation system. The figure 4.1 below demonstrates the work processing flow of our algorithm in RecommenderLab
Figure 4. 1 high-level of overview of recommenderlab
The dataset which is available for free download from is imported in MSQL data base. The data are tabulated into different tables and are stored in a directory where the data path is stored in dfs.
Data.dir. RatingMatrix describes the interfaces to develop a user RatingMatrix, except that the implementation is not done with them. The two current implementation of this project are real-valued matrix and binary matrix. The recommendation models are stored in Recommender class.The input are kept in RatingMatrix object with other parameters and constructs the required recommender model. Sparse matrices in Matrix dont store zeros, for RealRatingMatrix. We use these sparse matrices instead of zeros. Recommenderlab can be simply extended to other forms of ratings matrices with different concepts for efficient storage in future, Class Recommender implements the data structure to store recommendation models.
The prediction can return either predicted ratings or the Top-N lists of the object is recommender object, Newdata is the data for the active users. For top-N lists N is the maximal number of recommended items in each list and predict () function which will return an objects of class.
The evaluation of recommender algorithms Recommend lab package provide the infrastructure to create and maintain evaluation schemes as an object of class evaluation Scheme from rating data. Creating the evaluation scheme from a data set using a method with item withholding. The evaluation function () is then used to evaluate different recommender algorithms . The prediction function can create recommendations for unknown or unseen data (where the users dont know the recommendation) using the recommender model.
4.2 Data analysis and pre-processing
The dataset which is available for free download from onlinejester.com contains user ratings with the scale from -10 to 10 for several different jokes is loaded to R package. Those ratings will be used as the input and produce ratings for jokes which the users have not seen or unrated before.
4.2.1 Architecture of the proposed system
The figure below shows the system architecture and how the system is implemented and achieves the functionalities proposed. The data is read from the input files and is tabulated as an item-user rating table. Then the features similarity analysis is done on the data by applying the suitable model like which is SVDF (singular Value Decomposition With items features). The similarity analysis helps in identify the neighbors and generate the prediction. The effectiveness of the methods are observed by their error rates calculated from the test cases data which is actual data split into 90% for learning and 10% for testing, root mean error is calculated for each user item having rating greater than zero All these error rates mean is presented.
4.2.2 Tabulating the user-Item rating data
A sample of the data available with the Recommenderlab package, the task of Recommendation algorithm is to make suggestions to the user about a product or item that user has not seen before, and analysis their rating prediction to make more accuracy for new user , let us take a look at those data.
> library(recommenderlab, quietly = TRUE)
> data (“Jester5k”)
> str (Jester5k)
Formal class ‘realRatingMatrix’ [package “recommenderlab”] with 2 slots
[email protected] data :Formal class ‘dgCMatrix’ [package “Matrix”] with 6 slots
.. .. [email protected] i : int [1:362106] 0 1 2 3 4 5 6 7 8 9 …
.. .. [email protected] p : int [1:101] 0 3314 6962 10300 13442 18440 22513 27512 32512 35685 …
.. .. [email protected] Dim : int [1:2] 5000 100
.. .. [email protected] Dimnames:List of 2
.. .. .. ..$ : chr [1:5000] “u2841” “u15547” “u15221” “u15573” …
.. .. .. ..$ : chr [1:100] “j1” “j2” “j3” “j4” …
.. .. [email protected] x : num [1:362106] 7.91 -3.2 -1.7 -7.38 0.1 0.83 2.91 -2.77 -3.35 -1.99 …
.. .. [email protected] factors : list()
[email protected] normalize: NULL
> head([email protected][1:5,1:5])
5 x 5 sparse Matrix of class “dgCMatrix”
The ‘user-item rating’ database is used for making observations ,each user has item-rating pairs, and can be represented in a ‘user-item table’, which has ratings ‘Rij’ that are given by ith user for the jth item, as shown in table 4.1
After converting the data, RecommenderLab has several ways to split the data. First, data can be split to the train set, test set (and validation set) following a certain ratio. Second, leaving one sample as the validation set. Third, leaving several (N) samples as the validation set. Fourth, K-fold cross validation. Specifically, users can apply the mentioned methods to split the data on users or items. In our model, the validation datasets are split into training and test sets with an 80/20 ratio and the recommender takes the test data and train, predict and evaluate the results.
4.2.3 Steps of implementing new Recommender algorithm in RecommenderLab
Implementing users’ own algorithms in recommenderLab is straight forward since it uses registry mechanism to manage the algorithms. And users need to implement a creator function methods which takes a training data set, trains models and provides a prediction. In order to generate recommendations for new user the prediction function and the model are both encapsulated in object class recommender. For instance, the implementation of an algorithm as written
1. Create of function: The main activity is to create the model which is simple for new data.
2. The prediction function takes the model, new data and the number of items of desired top N list of all items kept in rating matrix.
3. Predict the used model to compute recommendations for each user in the new data and encodes them as an object of class top-NList.
4. Finally, the trained model and the prediction function are returned as an object of class Recommender.
4.3 Implementation of the SVDFeature
When using RecommnenderLab to implement a new algorithm, we only need to focus on the logic of the new algorithm and less attention to the preliminary calculations of the implementation. For instance, during our implementation, we didnt have to worry about the mathematical calculation involved in mapping user-item features which is one of the most challenging part in this model. The figure 4.3 below shows the process structure of implementation of our algorithm.
Given the sparse training dataset, which in turn is a sparse matrix. We construct a dense matrix for user and items representing rows and columns, respectively. Given the sparse training dataset, which in turn is a sparse matrix. We create a dense matrix for user and items indicating rows and columns, respectively. The initialization is done by assigning small initial values to our applied data in model including ?? and ??, ??, ??, ?? and ??. The initialization of the factorization matrices ???? and ???? now consists of different vectors of the latent factors for each user and item. The initial values of the latent factors are usually undefined. In order to avoid the cold start’ problem, we have to initialize the first components ??0,0 and ??0,0 of those factorization vectors for the first user (? = 0) and item (? = 0) respectively with two arbitrary values of 0.5 and 0.025. After initializing the baseline predictors and factorization vectors we must also calculate the average rating for the entire domain of jokes. We normally calculate the average rating only for those items for which the value of rating already exists in the matrix of ratings.