ER-SCoR: An Equal Ratings Impact-Based Recommender System Using Synthetic Coordinates

CostasPanagiotakis1✉,3Emailcpanag@hmu.gr

HarrisPapadakis2✉Emailadanar@hmu.gr

ParaskeviFragopoulou2,3Emailfragopou@ics.forth.gr

Department of Management Science and TechnologyHellenic Mediterranean UniversityLakonia72100Agios NikolaosCreteGreece

2Department of Electrical and Computer EngineeringHellenic Mediterranean UniversityEstavromenos71410HeraklionCreteGreece

3Institute of Computer ScienceFoundation for Research and Technology-HellasVassilika Vouton70013HeraklionCreteGreece

Abstract

In this article, we introduce ER-SCoR, an Equal Ratings impact-based Recommender System built upon Synthetic Coordinates, which is shown to outperform the state-of-the-art algorithmic techniques as well as the original Synthetic Coordinate based Recommendation system (SCoR). SCoR assigns a set of synthetic coordinates to every node (both users and items), such as the distance between a user and an item corresponds to an accurate prediction of the user’s preference for that item. ER-SCoR enhances this model by (i) enforcing equal contributions from all ratings during coordinate updates, and (ii) incorporating three additional terms into the recommendation process: a global system belief, a user-specific belief, and an item-specific belief. These modifications constitute fundamental changes in the core system architecture and improve convergence speed, accuracy, and stability. ER-SCoR preserves the advantages of SCoR like parameter-free configuration, robustness to cold-start problems, and linear computational complexity, while achieving faster convergence and improved predictive performance.Extensive experiments across five real-world datasets demonstrate that ER-SCoR consistently yields lower RMSE compared to existing approaches, and provides meaningful dataset annotations, including identification of outliers, users with similar preferences and items that receive similar user ratings.

Keywords

Recommender System

Matrix Factorization

Synthetic Coordinates

Personalized Recommendations

MSC Classification

68T20

68W50

Introduction

Recommender Systems (RS) collect user preference data—either through explicit ratings or by monitoring user behavior across various sources—to generate personalized predictions and item suggestions bod13, score, he2017neural, dtec. Increasing research interest in the field has led to a similarly wide range of proposed approaches, leveraging a diverse spectrum of techniques \cite{xie2016user}. The core task of a RS is to predict missing ratings for user-item pairs. Formally, given a set of users, a set of items (e.g., movies, products, songs), and a set of known user-item ratings

$R$

, the system aims to estimate ratings for pairs not included in

$R$

. In a recommender system, users typically create accounts and rate items, allowing the system to learn their preferences. As the number of users and ratings grows, prediction accuracy generally improves. However, in cases of data sparsity or lack of information about new users—known as the sparsity and cold start problems, respectively—prediction quality often degrades. The core function of RS is to predict a user’s preference for an item. Even for a-priori known ratings on items, model-based approaches often exhibit prediction errors, as they cannot fully capture the underlying rating patterns. These inaccuracies stem from the system’s degrees of freedom and the dimensionality reduction required to generalize learned knowledge into predictions \cite{dtec}.

In our previous work, a Synthetic Coordinates based Recommender System (SCoR) has been described \cite{score}. This approach is based on assigning synthetic coordinates to users and items, so that the the proximity between a user and an item accurately forecasts the user's preference for that item. SCoR updates the synthetic coordinates of nodes by randomly traversing the list of nodes.

In this work, we propose a recommender system (ER-SCoR) based on a Synthetic Coordinates based recommender system under the Equal Ratings Impact principle that improves the results of SCoR with faster convergence. ER-SCoR updates the synthetic coordinates of nodes in a more efficient way than SCoR, by ensuring equality in ratings' contribution to the updating process by randomly selecting a pair (user, item) from the ratings' space instead of traversing the list of nodes. Additionally, it introduces three extra terms in the recommendation of a rating between a user and item pair, taking into account the estimated average belief (rating) of the total recommender system, the user and the item average beliefs. ER-SCoR has the same benefits of SCoR that include: being a parameter-free method, having resistance to the cold-start problem, as well as annotating the dataset with important annotations, including the identification of both users and items with unique and/or common characteristics in addition to spotting outlier nodes, and exhibits the same linear computational cost of SCoR. The main contribution of this work concerns the significant improvement of SCoR performance and the study of the Synthetic Coordinates mechanism on the RS problem.

This paper is structured as follows: Sect. 2 reviews the related work for RS. The problem formulation is given in Sect. 3. Sect. 4 presents the proposed ER-SCoR framework. Sect. 5 describes the annotations of a dataset based on the proposed system. Sections 6 and 13 present the experimental results and our conclusions, respectively.

Related Work

As mentioned before, several techniques and approaches have been presented in recent literature, regarding the problem of recommendation. In this Section, we provide a brief overview of the most popular ones. According to the literature consensus, one can divide the various approaches of Recommender Systems into two main categories, namely, {\em Collaborative Filtering} and {\em Content-based}. Collaborative Filtering based approaches generally rely solely on the preferences (e.g., ratings) of users for items, in order to provide the necessary recommendation predictions. In contrast, Content-based Recommender Systems employ additional metadata information, such as attributes/features of both users and items (e.g., music genre, content type, demographic information, etc.) \cite{dtec}.

Collaborative Filtering (CF) approaches pap22,UI2vec,elahi2016survey analyze collective user behavior to infer and deduce each user's preferences and therefore be able to make new predictions. Usually, a number on some preference scale is used to indicate degree of preference. Despite the fact that such approaches usually suffer from cold-start and data sparsity related problems, they benefit from using pre-existing information, which can be provided either implicitly (as users access items) or explicitly (when users evaluate items).

CF Recommender System approaches are usually divided into two major categories, namely, memory-based and model-based pap22. In the first type, the required predictions can be calculated by correlating information, usually by employing a similarity function. This can be done in one of the following ways:

User-to-user: Recommendations rely on similarities between users, often based on their preferences or demographic information.

User-to-item: Recommendations are produced by analyzing the preferences of a user for specific items.

One of the primary techniques used in such systems is collaborative filtering based on memory (or similarity-based) ado12. These algorithms utilize similarity functions to measure the degree of similarity between pairs of users or items, based on historical preferences. Clustering techniques have also been applied in Recommender Systems, either directly tsai12 or as a pre-processing step nil18. For instance, clusters of similar users or items can improve Collaborative Filtering methods by narrowing down the search space to the most relevant candidates.

Model-based Collaborative Filtering (CF) Recommender Systems (RSs) employ various techniques to construct a predictive model that is subsequently used to generate recommendations. These types of approaches usually employ Dimensionality Reduction \cite{pca}, where latent variables are introduced to capture hidden structures underlying user–item interactions. In \cite{UI2vec}, the authors introduce UI2vec, a collaborative filtering model that jointly embeds users and items into a shared latent space using word-embedding techniques, and its enhanced version VUI2vec, which models users and items as Gaussian distributions via variational inference.

In Dimensionality Reduction, each user or item is typically represented as a high-dimensional vector, containing all ratings corresponding to that user or item. However, due to the inherent sparsity of these vectors, since most users rate only a small subset of available items, it becomes challenging to directly identify meaningful correlations between users and items. To address this, Dimensionality Reduction techniques are applied to uncover latent patterns and reduce the complexity of the data. Popular methods include Singular Value Decomposition (SVD) \cite{svd}, Principal Component Analysis (PCA), Probabilistic Latent Semantic Analysis (PLSA), and Latent Dirichlet Allocation (LDA) \cite{plsa}. These approaches transform the original high-dimensional space into a lower-dimensional latent space, where the underlying relationships become more apparent.

The Matrix Factorization method kor09,liu2023recommendation is also a Dimensionality Reduction technique. Both users and items are represented as vectors in a shared latent space, with latent factors inferred from observed rating patterns. Recommendations are generated by identifying items whose latent factors exhibit high similarity to those of a given user.

SCoR \cite{score} utilizes synthetic coordinates, which are assigned to all nodes (users and items), as proposed in kor09, but using the Euclidean distance between a user and an item as opposed to the dot product. Once the system has converged, the aforementioned distance, in the latent space serves as an accurate predictor of the user’s preference for that item. The SCoR framework offers several advantages. It achieves high performance without the need for parameter tuning, and it demonstrates greater robustness to data sparsity compared to alternative approaches. The Vivaldi synthetic network coordinates algorithm \cite{Vivaldi}, which this approach for the RS problem is based on, has proven useful in additional problems to movie recommendation \cite{score,dtec}, such as the identification of malicious profiles in Recommender Systems pan18,pan20a, in personalized video summarization pan20, inn community detection pap14 as well as in interactive image segmentation pan13, resulting in significant increase in performance in comparison with other state-of-the-art methods on publicly available datasets.

In recent years, various approaches have also been proposed in model-based recommendation systems, which employ artificial neural network architectures he2017neural,gao2023survey. Such approaches utilize Convolutional Neural Networks (CNNs) he18, in order to process the output of previous steps (such as the outer product of user to item ratings) to generate a 2D interaction map. This methodology facilitates the model to effectively capture user–item interaction patterns and to learn higher-order correlations.

Recent developments in graph neural networks (GNNs) have utilized embedding propagation to iteratively combine neighborhood embeddings. The information of high-order neighbors can be accessed by the nodes, by stacking propagation layers, outperforming standard methods constrained to first-order neighbors \cite{gao2023survey}.

In \cite{Enriched}, the authors combine an auto-encoder with an enriched matrix concept that adds opposing evaluations of fictional users to those of real users. This led to an increase in the density of the rating matrix, which now incorporates users with more diverse interests and preferences. The work described in \cite{he2017neural} investigates several neural network architectures in the context of collaborative filtering. A general framework is introduced with three distinct implementations: GMF, MLP, and NeuMF, each offering a unique approach to modeling user-item interactions. This work represents a new direction of using deep learning for recommendation, by complementing mainstream shallow models for collaborative filtering.

As mentioned, on the other hand, Content-Based Recommender Systems employ additional metadata information to construct item representations and user profiles, on which recommendation predictions are based pap23. The recommendation process essentially consists of locating items whose features match the user profile attributes, Pasq11, which forms the basis for recommendations. While content-based recommender systems employ mainly textual features to describe the required information of items and user profiles, several hybrid methods have been proposed which employ various information types and/or approaches with the goal of increasing recommendation performance log19.

More recently, the development of Large Language Models (LLMs), such as ChatGPT, DeepSeek and LLaMA, have transformed the domains of Natural Language Processing (NLP) and Artificial Intelligence (AI), allowing for exceptional capabilities in language understanding and generation, along with impressive reasoning and generalization abilities. As a result, recent research has focused on leveraging the power of LLMs to improve RS \cite{zhao2024recommender}. One notable example is Chat-Rec \cite{gao2023chat}, which improves both the accuracy and the explainability of recommendations by integrating ChatGPT into conversational interactions with users. In this approach, ChatGPT refines the candidate item sets originally generated by traditional recommender algorithms, as demonstrated in the context of movie recommendations. Zhang et al. \cite{zhang2023recommendation} employ T5 as an LLM-based approach, allowing users to use natural language in order to specify their explicit preferences, leading to better recommendation accuracy than approaches based merely on user–item interactions.

Problem Formulation

The recommendation prediction problem \cite{score} is formulated hereafter.The input of the problem is a list

$L$

of triplets in the form

$(u, i, r(u,i))$

, where:

$(i)$

belongs to the set

$(I = \{1, 2, ..., M\})$

, which consists of distinct identifiers for items that users have rated.

$(r(u,i)\in \mathbb{R})$

denotes the user rating

$(u)$

for the item

$(i)$

The main objective is to estimate the unknown ratings that match the user

$u$

's preference for item

$i$

, in situations where user

$u$

has not provided a rating for

$i$

, indicating that

$(u, i, r(u,i)) \notin L$

In RS research, the most common method to evaluate the accuracy of the predictions provided is to split the original list

$L$

into two sets. The first set, referred to as the Training Set (

$TS$

), contains the “known” user-item ratings that are used to train the recommendation algorithm. The remaining triplets form the Validation Set (

$VS$

), which is used to evaluate the accuracy of the recommendation algorithm.

To provide accurate predictions, the algorithm calculates a new

$\widehat{r}(u,i)$

value for each existing

$r(u,i)$

value in

$VS$

. Ideally, the system should converge to such nodes' position such as the values of

$\widehat{r}(u,i)$

match as closely as possible the true values

$r(u,i)$

.This is determined (as is usually the case) with the Root Mean Square Error Metric (RMSE):

$RMSE=\sqrt{\frac{1}{|VS|} \sum_{(u, i, r(u,i)) \in VS}(\widehat{r}(u,i) - r(u,i))^2 } % RMSE=\sqrt{E{\{(R-\widehat{R})^{2}}\}}$

RMSE

where

$|VS|$

denotes the size (number of triplets) of the

$VS$

. According to the RMSE definition, a lower calculated RMSE value corresponds to an improved prediction of the RS.

ER-SCoR method

SetKwInOut{Input}{input}

SetKwInOut{Output}{output}

Input{

$(U)$

$(I)$

$(TS)$

$(VS)$

$(minR)$

$(maxR)$

$(MaxIt)$

}

Output{

$(\widehat{r}(u,i),(u,i) \in VS)$

} \BlankLine

$(D_0 = it = 0)$

$(\rho = 0.01)$

$(\alpha = 20)$

ForEach{

$(u \in U)$

}{

$(p(u) =)$

random position in

$([0,1]^n)$

$(D_U(u) = 0)$

}

ForEach{

$(i \in I)$

}

{

$(p(i) =)$

random position in

$([0,1]^n)$

%arxikopoihsh

$(\Re^n)$

$(D_I(i) = 0)$

}

Repeat{Node positions do not change or

$(it > MaxIt)$

}{

$(it = it + 1)$

$(PM = randperm(TS))$

ForEach{

$((u,i,r(u,i))\in PM)$

}{

If{(

$(r(u,i) = minR)$

{\bf and}

$( d(u,i) \ge dd(u,i))$

) {\bf or} (

$(r(u,i) = maxR)$

{\bf and}

$( d(u,i) \le dd(u,i))$

)}{

continue

}

$(v = p(u)-p(i))$

$(g = \frac{(dd(u,i)-|v|_2) \cdot v}{d(u,i)} )$

$(p(u) = p(u) + \rho \cdot g)$

$(p(i) = p(i) - \rho \cdot g)$

}

If{

$(rem(it,50) = 0)$

}{

$(D_0 = \frac{1}{|TS|}\sum\limits_{(u,i,r(u,i))\in TS}{ dd(u,i) - d(u,i)})$

Repeat{

$(D_U)$

and

$(D_I)$

do not change}{

ForEach{

$(u \in U)$

}{

$(D_U(u) = \frac{1}{2} \cdot D_U(u) + \frac{1}{2\cdot |TS|}\sum\limits_{(u,i,r(u,i))\in TS}{ dd(u,i) - (d(u,i) + D_0 + D_I(i)}))$

}

ForEach{

$(i \in I)$

}{

$(D_I(i) = \frac{1}{2} \cdot D_I(i) + \frac{1}{2\cdot |TS|}\sum\limits_{(u,i,r(u,i))\in TS}{ dd(u,i) - (d(u,i) + D_0 + D_U(u)}))$

}

ForEach{

$((u,i)\in VS)$

}{

$(\widehat{r}(u,i) = maxR - \frac{(maxR-minR)\cdot (d(u,i)-\alpha)}{100-\alpha} )$

$(\widehat{r}(u,i) = min(max(\widehat{r}(u,i),minR),maxR))$

}

caption{\label{algo:ER-SCoR} The proposed \textit{ER-SCoR} algorithm.}

algo:ER-SCoR

label{algo:ER-SCoR} The proposed ER-SCoR algorithm.

Fig. 1

label{fig:schema} The schema of the proposed ER-SCoR method.

Fig. 2

label{fig:scor_ex} A synthetic example illustrating the position of nodes (users and items) in

$(\mathbb{R}^2)$

, after the convergence of the system. The smaller the distance between user

$(u)$

and item

$(i)$

, the higher the predicted preference of user

$(u)$

for item

$(i)$

. The predicted preference of user u for item i decreases as the distance between them increases.At each point on the graph, the preference level of the user—positioned at the center is represented through color brightness: light gray denotes high preference (like), while dark gray indicates low preference or dislike.

Here, we present in detail the ER-SCoR recommendation approach for the solution of recommendation prediction problem in linear time O(

$N$

), where

$N = |L|$

denotes the number of given triples in the list

$L$

. According to the formulation of the recommendation prediction problem, the input of ER-SCoR is the sets of distinct user identifiers (U) and item identifiers (I), along with the list of triplets formatted as

$(u, i, r(u,i))$

for both the training set (TS) and the validation set (VS). In addition, the minimum and maximum value of rating (

$minR, maxR$

) are given to constraint the method to provide recommendation values

$\widehat{r}(u,i) \in [minR, maxR],(u,i) \in VS$

. The maximum number of iterations of ER-SCoR main loop (

$MaxIt$

) is also given. Algorithm \ref{algo:ER-SCoR} presents in detail the pseudo-code of the proposed ER-SCoR method. Figure 1 shows the schema of the proposed ER-SCoR method.

In our improved version of SCoRcite{score}, a bipartite graph is created, which consists of user nodes on one side and item nodes on the other.Each

$(u, i, r(u,i))$

triplet in the Training set (

$TS$

), is also represented in the graph by a weighted edge connecting the nodes

$u$

and

$i$

. The basis of this approach is the spring metaphor (see Fig. 2), which was first introduced by the Vivaldi synthetic network coordinate algorithm \cite{Vivaldi}. In this approach a position

$p(u)$

$p(i)$

$\mathbb{R}^n$

(e.g.

$n = 40$

cite{score}) is assigned to each element

$u$

$i$

in the user and the item sets,

$U$

and

$I$

, respectively. In the original version of SCoR, the distance

$d(u,i)$

between two nodes

$u, i$

is given directly by their Euclidean distance

$|p(u)-p(i)|_2$

. In this work, we introduce three extra terms (

$D_0$

$D_U(u)$

and

$D_I(i)$

) in the calculation of the distance

$d(u,i)$

, taking into account the estimated average belief (rating) of the total recommender system (

$D_0$

), the user (

$D_U(u)$

) and the item (

$D_I(i)$

) average beliefs.

$d(u, i) = max(0,|p(u)-p(i)|_2 + D_0 + D_U(u) + D_I(i))$

eq:d

The term

$D_0$

is a term that affects all distances by the average belief (rating) of the total recommender system. The term

$D_U(u)$

adjusts the distance between the user

$u$

and any item

$i$

, either increasing or decreasing it, to achieve a more accurate recommendation, particularly in cases where synthetic coordinates fail to adequately model the user behavior. For instance, the term

$D_U(u)$

may receive negative values for users who consistently give maximum ratings to most items and especially to those that are far apart in the embedding space. The negative value in term

$D_U(u)$

effectively reduces the distance between user

$u$

and those items. Similarly, the term

$D_I(i)$

adjusts the distance between the item

$i$

and any user

$u$

Each edge is assigned a weight equal to the desired distance

$dd(u, i)$

between the nodes

$u$

and

$i$

according to the rating

$r(u, i)$

. We assign a small desired distance value to a pair

$(u,i)$

with a high rating value

$r(u, i)$

(high preference of user

$u$

for item

$i$

) and vice versa. Similarly with \cite{score}, we assign the maximum distance (set to 100) to

$minR$

,the smallest possible rating. In the initial SCoR version \cite{score}, the highest rating was assigned a distance of 0, however, we observed that the zero distance reduces the solution space, resulting in overfitting. Therefore, in order for the highest rating to be assigned a non-zero distance, we include an offset (e.g.

$\alpha = 20$

).Given these values, the desired distance

$dd(u, i)$

is defined as follows:

$dd(u, i) = \alpha+(100-\alpha) \cdot (\frac{maxR-r(u,i)}{maxR-minR})$

eq:dd

where

$minR$

$maxR$

denote the minimum (low preference) and maximum (high preference), respectively. Taking into account Eq. 3 and the values

$minR$

$maxR$

, the recommendation values

$\widehat{r}(u,i) \in [minR, maxR]$

are given by Eq. 4.

$\widehat{r^*}(u,i) &=& maxR - \frac{(maxR-minR)\cdot (d(u,i)-\alpha)}{100-\alpha} \\ \widehat{r}(u,i) &=& min(max(\widehat{r^*}(u,i),minR),maxR)$

eq:rr

In the following, we analyze all the steps of the ER-SCoR iterative method.

Firstly, ER-SCoR initializes the values

$D_0$

$D_U(u)$

and

$D_I(i)$

to zero and the Synthetic Euclidean Coordinates

$p(u)$

$p(i)$

$u \in U$

and

$i \in I$

to a random position in

$[0,1]^n$

(close to zero) (see lines 1-10 of Algorithm \ref{algo:ER-SCoR}). We perform a random permutation on the training set (see lines 13 of Algorithm \ref{algo:ER-SCoR}), so that the edges are traversing in a random way. ER-SCoR iteratively and gradually re-positions all nodes in order for the desired distances of all edges to be satisfied (see lines 14-22 of Algorithm \ref{algo:ER-SCoR}). Ideally, assuming that an item

$i$

has been rated by user

$u$

with value

$r(u,i)$

, then after convergence, the distance

$d(u,i)$

between the nodes

$u$

and

$i$

should equal

$dd(u,i)$

, as determined by Eq.3.The algorithm iteratively and gradually modifies the positions of each node' (users and items), so that for every known rating

$(u, i, r(u,i))$

, the Euclidean distance between user

$u$

and item

$i$

matches the corresponding rating. The algorithm converges when changes in positions more or less stop or the number of iterations exceeds a maximum number. The positions of nodes

$u$

and

$i$

are updated as follows:

$p(u) = p(u) + \rho \cdot (dd(u,i) - |p(u)-p(i)|_2) \cdot \frac{p(u)-p(i)}{d(u,i)}\\ p(i) = p(i) - \rho \cdot (dd(u,i) - |p(u)-p(i)|_2) \cdot \frac{p(u)-p(i)}{d(u,i)}$

eq:viv_update

where the expression

$\frac{p(u)-p(i)}{d(x,y)}$

represents the direction in which node

$u$

should be moved and

$\rho = 0.01$

controls the convergence of the method, by specifying the speed by which node

$u$

can move toward its ideal position. It holds that ideally after the system has converged, the distance between the nodes

$u$

and

$i$

should be

$dd(u,i)$

. Upon algorithm convergence, the predicted rating of an item

$i$

by a user

$u$

consists of a simple calculation of the Euclidean distance between the corresponding nodes. In the special case, where

$r(u,i)$

is equal to the minimum rating

$minR$

and the distance

$d(u,i)$

is greater than the desired distance

$dd(u,i)$

, we skip the synthetic coordinate update process, since the system recommendation

$\widehat{r}(u,i) = minR$

is satisfied (see Eq. 4). Similarly, we skip the synthetic coordinate update process when

$r(u,i)$

is equal to the maximum rating

$maxR$

and the distance

$d(u,i)$

is lower than the desired distance

$dd(u,i)$

(see lines 15-17 of Algorithm \ref{algo:ER-SCoR}).

The terms

$D_0$

$D_U(u)$

$D_I(i)$

of Equation 2 are updated every 50 iterations (see lines 23-33 of Algorithm \ref{algo:ER-SCoR}) taking into account the current positions of nodes. First,

$D_0$

, which corresponds to the average belief of the total recommender system, is calculated by the mean value of the difference

$dd(u,i) - d(u,i)$

for each edge of the training set. Then

$D_U(u)$

and

$D_I(i)$

are estimated in an iterative process by the corresponding mean values of differences

$dd(u,i) - (d(u,i) + D_0 + D_I(i))$

and

$dd(u,i) - (d(u,i) + D_0 + D_U(u))$

, respectively. The values of

$D_U(u)$

affect the estimation of the values of

$D_I(i)$

and vise versa. So, the update process of

$D_U(u)$

and

$D_I(i)$

also uses the previous values of

$D_U(u)$

and

$D_I(i)$

by the fraction of

$\frac{1}{2}$

for a smooth convergence (see lines 27 and 30 of Algorithm \ref{algo:ER-SCoR}). ER-SCoR terminates when the node positions do not change or the maximum number of iterations

$MaxIt$

(e.g.

$MaxIt = 500$

) is reached (see line 34 of Algorithm \ref{algo:ER-SCoR}). Finally, ER-SCoR provides the recommendation values

$\widehat{r}(u,i) \in [minR, maxR],(u,i) \in VS$

in the validation set (see lines 35-38 of Algorithm \ref{algo:ER-SCoR}).

In Figure 2, we illustrate a synthetic example that shows the position of the nodes (users and items), after the computation of Synthetic Coordinates. It shows the preferences of the user which is located in the center of the graph. Preferences are visually represented using a gray scale, with light gray indicating like and dark gray indicating dislike.

Dataset Annotations using ER-SCoR

Similarly to SCoR, ER-SCoR not only provides recommendations, but also generates annotations for user and item datasets by analyzing node positions and terms

$D_U(u)$

and

$D_I(i)$

. In our experiments (see Sect. 9), we investigate the ability of ER-SCoR to generate annotations identifying both users of similar tastes and items rated similarly as well as to detect outliers.

Hereafter, we study the relation of the Euclidean distance

$|u_1- u_2|_2$

and the terms

$D_U(u_1)$

$D_U(u_2)$

between two user nodes

$u_1$

$u_2$

and the maximum absolute difference in their recommendations. Let

$MRD(u_1, u_2)$

be their maximum absolute recommendation difference, according to the ER-SCoR system.

$MRD(u_1,u_2) = \max_{i\in I} |\widehat{r}(u_1, i) - \widehat{r}(u_2, i)|$

eq:MRD1

By Eq. 4 and the triangle inequality, the supremum (upper bound) of the MRD is:

$MRD(u_1,u_2) \le \frac{(maxR-minR) \cdot d_U(u_1,u_2)} {100-\alpha},\text{where}\\ d_U(u_1,u_2) = |p(u_1)- p(u_2)|_2 +|D_U(u_1)-D_U(u_2)|.$

eq:MRD2

If the distance

$d_U(u_1,u_2)$

between two users

$u_1, u_2$

is low, we can assume that these users have similar tastes. Conversely, if a user is placed far from all others in the n-dimensional Euclidean space by ER-SCoR, it suggests that the user exhibits unique or atypical preferences, that is, an outlier. This degree of uniqueness can be quantified by the distance

$D_{min}(u)$

, which represents the minimum distance between the user

$u$

and their nearest neighbor in the entire dataset.

$D_{min}(u) = \min_{u_1 \in U} d_U(u_1,u_2)$

eq:MRD3

The aforementioned approach of users can be extended to the items.

Additionally, ER-SCoR is able to provide annotations for specific users and items by analyzing the values

$D_U(u)$

and

$D_I(i)$

. According to Sect. 4, large positive or large negative values in term

$D_U(u)$

$D_I(i)$

indicate users/items that consistently give/receive low or high ratings, respectively. In Sect. 9, we study the behavior of users and items of the five real datasets by analyzing the values

$D_U(u)$

and

$D_I(i)$

Experimental Results

Experimental Setup

The following five well-known datasets were used during the experiments presented in the rest of this paper.

The MovieLens dataset (ML-100k) \cite{ml}: The MovieLens dataset, sourced from the GroupLens website \cite{ml}, comprises 100k user ratings from the MovieLens site, with ratings spanning from 1 to 5 across five distinct values.

The MovieLens dataset (ML-1M) (\cite{ml}), sourced from the GroupLens website (\cite{ml}), comprises one million user ratings from the MovieLens platform, with ratings ranging from 1 to 5 (five distinct values).

Jester and Jester2 (\cite{jester,score}): The Jester datasets consist of continuous rating values from several thousand users for 100 jokes, with scores ranging from -10 to 10. To ensure that the RMSE values are comparable with those from other datasets, we adjusted the ratings in the Jester datasets to fall within the range of

$([1,5])$

, while maintaining the decimal precision.

The Netflix prize dataset (SmallNetflix) (\cite{smallnetflix}): SmallNetflix is a reduced version of the original dataset, available from the GraphLab website (\cite{smallnetflix}). It includes ratings from thousands of users on thousands of movies, using a discrete scale ranging from 1 to 5.

Each dataset was divided into training and validation sets. The first comprises all the ratings used to train the RS and the second whose purpose is to validate and evaluate the performance of the methods.Table \ref{table:data} summarizes the important characteristics for each dataset used, namely the number of users, the number of items (e.g. movies, jokes), the number of user ratings, the average number of ratings per user (ratings/user) and the density of the dataset. The density is determined by the fraction of existing ratings relative to all possible ratings, expressed as

$\frac{ratings}{users \times items}$

In Figs. 3, 4 and 5, the Probability Mass Function (PMF) of discrete rating values is shown for: ML-100k, ML-1M, and SmallNetflix datasets, respectively. In Figs. 6 and 7, the Probability Density Function (PDF) of continuous rating values is shown for Jester and Jester2 datasets, respectively. In all datasets, higher ratings occur more frequently than lower ones.

Figure 17 depicts the PDFs of user and item degree values for ML-100k, ML-1M, Jester, Jester2 and SmallNetflix datasets. In all datasets, the range of item degrees is greater than the corresponding range of user degrees. Probability density functions (PDFs) are estimated by employing a normal kernel function and are evaluated at evenly distributed points spanning the entire range of ratings or degrees.

begin{table}[]\begin{tabular}{|l|r|r|r|r|r|r|}\hlineDataset & \multicolumn{1}{l|}{Users} & \multicolumn{1}{l|}{Items} & \multicolumn{1}{l|}{Ratings} & \multicolumn{1}{l|}{TS Ratings} & \multicolumn{1}{l|}{

$(\frac{\textbf{TS Ratings}}{<Emphasis Type="Bold">Users</Emphasis>})$

} & \multicolumn{1}{l|}{TS Density} \\ \hlineML-100k & 943 & 1682 & 100000 & 80000 & 84.84 & 5.04% \\ \hlineML-1M & 6040 & 3952 & 1000209 & 800167 & 132.48 & 3.35% \\ \hlineJester & 23500 & 100 & 1708993 & 1367194 & 58.18 & 58.18% \\ \hlineJester2 & 24938 & 100 & 616912 & 493529 & 19.79 & 19.79% \\ \hlineSmallNetflix & 93705 & 3561 & 3843340 & 3074672 & 32.81 & 0.92% \\ \hline\end{tabular}\caption{\label{table:data} Statistics for the datasets used in the experiments.}\end{table}

Fig. 8

label{fig:hist} The Probability Mass Functions (PMFs) of discrete rating values for (a) ML-100k, (b) ML-1M and (c) SmallNetflix datasets. PDFs of continues rating values for (d) Jester and (e) Jester2 datasets.

In \cite{score}, SCoR system was tested along with seven state-of-the-art similar RSs (ALS \cite{als}, ALS_COORD \cite{als_coord}, BIASSGD \cite{biassgd}, BIASSGD2 \cite{biassgd}, RBM \cite{rbm}, SGD \cite{sgd}, SVDPP \cite{biassgd}, P_MEAN [Ekstrand:2011:CFR:2185827.2185828] and USER-USER algorithm \cite{Sarwar}), resulting the highest performance under any of the following datasets: (smallnetflix) \cite{smallnetflix}, the MovieLens dataset (ml) \cite{ml}, jester and jester2 \cite{jester}. This study assesses the performance of ER-SCoR in comparison to SCoR and two more recent state-of-the-art recommendation approaches \cite{Enriched, UI2vec}, introduced after SCoR.

Our algorithm does not require parametrization, with the sole exception of the number ofdimensions of the Euclidean space. This value was set to

$n = 40$

in all experiments, as proposed in the initial version of SCoRcite{score}. The constant

$\alpha$

(see Sect. 4) is set to 20 forall datasets. A sensitivity analysis of ER-SCoR under different values of

$\alpha$

is presented in Sect. 11.

Performance Evaluation

In this Section, the performance of the ER-SCoR over the used datasets is studied.

Figure 24 depicts the evolution of RMSE during training process of ER-SCoR for different values of ratings on training (left column) and validation (right column) set of ML-100k, ML-1M and SmallNetflix datasets. As it was expected, the

RMSE values are lower on the training set compared to the validation set, while maintaining the same relative order with respect to the rating values. The lowest RMSE values are observed for rating three that is the middle rating and rating four that has the highest frequency in the datasets (see Fig. 8). The highest RMSE values is observed for the rating one (extreme value), since it has the lowest frequency in the datasets. Concerning the convergence of RMSE sequence, due to the periodic update process of

$D_0$

$D_U$

and

$D_I$

, a slight discontinuity can be observed, particularly at the fiftieth iteration, when the first update occurs. Beyond this point, the system exhibits smooth and consistent convergence across all datasets and rating values.

Figure 35 depicts the evolution of RMSE during training process of ER-SCoR for low (bottom

$20%$

) and high (top

$20%$

) degree users and items on training (left column) and validation (right column) datasets. Due to overfitting, on the training set the lowest RMSE values are observed for low-degree items and users. In contrast, in the test set, the highest RMSE values are associated with items and users of low-degree. Concerning the convergence of RMSE sequence, the system almost always converges smoothly. The periodic update of

$D_0$

$D_U$

and

$D_I$

introduces a slight discontinuity, especially noticeable in the SmallNetflix dataset during the first update iteration.

Fig. 17

label{fig:histDeg} PDFs of user and item degree values for (a) ML-100k, (b) ML-1M, (c)-(d) Jester, (e)-(f) Jester2, and (g)-(h) SmallNetflix datasets.

Fig. 24

label{fig:RMSE_Rats} The evolution of RMSE during training process of ER-SCoR for different values of ratings on training (left column) and validation (right column) datasets.

Fig. 35

label{fig:RMSE_Deg} The evolution of RMSE during training process of ER-SCoR for low and high degree users and items on training (left column) and validation (right column) datasets.

ER-SCoR annotations

The ER-SCoR based annotations for the ML-100k, ML-1M, Jester, Jester2, and SmallNetflix datasets are given hereafter.

Figure 41 illustrates the cumulative distribution functions (CDFs) of

$D_{min}$

of ER-SCoR estimated on ML-100k, ML-1M and SmallNetflix. The CDFs for items and users are illustrated with blue and curves, respectively.According to ER-SCoR method, the distance between users and items is generally in

$[0,100]$

. We can set a threshold of

$20%$

on the maximum value of the distance (which corresponds to the maximum rating difference by one in a five-degree rating system) to identify similar users and items. Furthermore, a threshold of

$60%$

on the maximum distance value (which corresponds to the maximum rating difference by three in a five-degree rating system) has been used to detect outliers. Based on these rules and the CDFs of Fig. 41, Table \ref{table:Dmin} depicts the detection rates of similar users/items and outliers by the ER-SCoR system. The results show that the SmallNetflix dataset has the highest percentage of similar users (

$34.4%$

), which is expected given that it contains the largest number of users. In contrast, the ML-100k dataset, which has the fewest users, exhibits the lowest percentage of similar users (

$2.8%$

). The ML-100k and ML-1M datasets exhibit the highest percentage of similar items (about

$15%$

), which is expected given their large number of items. In contrast, the two Jester datasets, which contain only 100 items, result in zero similar items. Regarding outliers, the Jester dataset exhibits the highest percentage of user outliers, at

$0.6%$

, while the SmallNetflix dataset shows

$2.2%$

item outliers.

Figure 47 depicts the values of

$D_I$

and

$D_U$

sorted in ascending order on ML-100k, ML-1M, Jester, Jester2 and SmallNetflix datasets. Regarding users, in all datasets, there are a few users with highly negative values (less than -10) on

$D_U$

, indicating that they consistently give high ratings. In the SmallNetflix dataset, a few users exhibit highly positive values (greater than 10) on

$D_U$

, meaning they consistently give low ratings.As for items, in the ML-100k and ML-1M datasets, a small number of items have high positive values (greater than 10) on

$D_I$

, meaning they consistently receive low ratings. Table \ref{table:DUI} shows the average values and the standard deviation of

$D_U$

$D_I$

and the

$D_0$

per dataset.

$D_0$

is positive in all cases, indicating that the average belief across the datasets is that users tend to give, and items tend to receive, high ratings, an observation that is confirmed by Fig. 8. The relatively small mean values compared to the standard deviations suggest the existence of users and items with divergent beliefs in every dataset.

Fig. 41

label{fig:Dmin} The Cumulative Distribution Function (CDF) of

$(D_{min})$

for ER-SCoR on ML-100k, ML-1M, Jester, Jester2 and SmallNetflix datasets.

begin{table}[]\begin{tabular}{|l|rr|rr|}\hlineDataset & \multicolumn{2}{c|}{Similar} & \multicolumn{2}{c|}{Outliers} \\ \hline & \multicolumn{1}{l|}{Users} & \multicolumn{1}{l|}{Items} & \multicolumn{1}{l|}{Users} & \multicolumn{1}{l|}{Items} \\ \hlineML-100k & \multicolumn{1}{r|}{2.76%} & 14.86% & \multicolumn{1}{r|}{0.32%} & 1.25% \\ \hlineML-1M & \multicolumn{1}{r|}{8.49%} & 15.56% & \multicolumn{1}{r|}{0.12%} & 0.84% \\ \hlineJester & \multicolumn{1}{r|}{8.94%} & 0.00% & \multicolumn{1}{r|}{0.63%} & 0.00% \\ \hlineJester2 & \multicolumn{1}{r|}{7.08%} & 0.00% & \multicolumn{1}{r|}{0.08%} & 1.00% \\ \hlineSmallNetflix & \multicolumn{1}{r|}{34.44%} & 7.39% & \multicolumn{1}{r|}{0.02%} & 2.19% \\ \hline\end{tabular}\caption{\label{table:Dmin} The detection rates of similar users/items and outliers by the \textit{ER-SCoR} system.}\end{table}

Fig. 47

label{fig:DUI} The values of

$(D_I(i))$

and

$(D_U(u))$

sorted in ascending order on ML-100k, ML-1M, Jester, Jester2 and SmallNetflix datasets.

begin{table}[]\begin{tabular}{|l|rr|rr|r|}\hlineDataset & \multicolumn{2}{c|}{Average} & \multicolumn{2}{c|}{Std Dev} & \multicolumn{1}{l|}{

$(D_0)$

$(D_U)$

$(D_I)$

$(D_U)$

$(D_I)$

ML-100k & \multicolumn{1}{r|}{-0.14} & 0.37 & \multicolumn{1}{r|}{1.98} & 1.40 & -1.10 \\ \hlineML-1M & \multicolumn{1}{r|}{0.02} & 0.33 & \multicolumn{1}{r|}{1.50} & 1.06 & -0.84 \\ \hlineJester & \multicolumn{1}{r|}{0.04} & 0.09 & \multicolumn{1}{r|}{1.15} & 0.81 & -0.90 \\ \hlineJester2 & \multicolumn{1}{r|}{-0.04} & 0.53 & \multicolumn{1}{r|}{1.44} & 1.46 & -1.02 \\ \hlineSmallNetflix & \multicolumn{1}{r|}{-0.15} & 0.53 & \multicolumn{1}{r|}{2.04} & 1.21 & -0.96 \\ \hline\end{tabular}\caption{\label{table:DUI} The average values and the standard deviation (Std Dev) of

$(D_U)$

$(D_I)$

and the

$(D_0)$

on \textit{ML-100k}, \textit{ML-1M}, \textit{Jester}, \textit{Jester2} and \textit{SmallNetflix} datasets.}\end{table}

ER-SCoR convergence

Fig. 50

label{fig:AbsError} The evolution of absolute validation error of RMSE during training process of (a) SCoR and (b) ER-SCoR on the ML-100k, ML-1M,Jester, Jester2 and SmallNetflix datasets.

In this Section, we study the convergence of ER-SCoR during training process compared with the SCoR method. Figure 50 shows the evolution of absolute validation error (

$e(t)$

) of RMSE during training process of SCoR and ER-SCoR on the ML-100k, ML-1M,Jester, Jester2 and SmallNetflix datasets, defined as follows:

$e(t) = |\lim_{z\to\infty} RMSE(z) - RMSE(t)|$

eq:e_n

where the first term denotes the converged RMSE on the validation set and the

$RMSE(t)$

denote the RMSE at validation set at

$t$

iteration. According to this experiment, ER-SCoR converges in at most 200 iterations across all datasets, whereas SCoR requires up to 5000 iterations to converge. The difference of approximately 25 times in speed convergence can be mainly explained by the new position update mechanism of ER-SCoR that ensures equality in rating contribution and updates the position of both nodes of the selected edge (bidirectional position update), while SCoR updates only the position of the selected node. Additionally, SCoR promotes balanced participation among all nodes, assigning greater influence to ratings from low-degree nodes compared to those from high-degree ones.

ER-SCoR sensitivity test

Fig. 51

label{fig:sens_test} The RMSE of ER-SCoR-PU,

$(D_0)$

ER-SCoR on ML-100k dataset for different values of constant

$(\alpha)$

(sensitivity test).

In this Section, the sensitivity test of ER-SCoR and two variants of the proposed method ER-SCoR-0,

$(D_0)$

$\alpha$

is studied. Figure 51 shows a sensitivity test of the proposed method

ER-SCoR and ER-SCoR-0,

$(D_0)$

$\alpha$

in the ML-100k dataset. According to the sensitivity test, all the methods results similar RMSE for any

$\alpha \in [10,30]$

, showing that the methods are not sensitive to the selection of constant

$\alpha$

. In this work, the constant

$\alpha$

is selected equal to 20 under any dataset. Similar results are obtained for the remaining datasets.

Comparisons with other Recommender Systems

begin{table}[]\begin{tabular}{|l|r|r|r|r|r|r|}\hlineSystem\textbackslash{}Dataset & \multicolumn{1}{l|}{ML-100k} & \multicolumn{1}{l|}{ML-1M} & \multicolumn{1}{l|}{Jester} & \multicolumn{1}{l|}{Jester2} & \multicolumn{1}{l|}{SmallNetflix} & \multicolumn{1}{l|}{Average} \\ \hlineER-SCoR-PU & 0.910 & 0.853 & 0.816 & 0.885 & 0.885 & 0.870 \\ \hline

$(_{\text{0}})$

ER-SCoR & 0.900 & 0.848 & 0.812 & 0.878 & 0.876 & 0.863 \\ \hlineSCoRcite{score} & 0.933 & 0.894 & 0.843 & 0.893 & 0.921 & 0.897 \\ \hlineEnriched_AE \cite{Enriched} & 0.964 & 0.921 & 0.852 & 0.918 & 0.943 & 0.920 \\ \hlineUI2vec \cite{UI2vec} & 0.977 & 0.924 & 0.854 & 0.940 & 0.950 & 0.929 \\ \hline\end{tabular}\caption{\label{table:res} The \textit{RMSE} values for \textit{ER-SCoR}, two variations of \textit{ER-SCoR} (\textit{ER-SCoR-PU}, \textit{ER-SCoR-PU-D

$(_{\text{0}})$

}) three recommender systems from literature (\textit{SCoR \}\cite{score}, Enriched_AE \cite{Enriched}and \textit{UI2vec \cite{UI2vec}}) on the \textit{ML-100k}, \textit{ML-1M},\textit{Jester}, \textit{Jester2} and \textit{SmallNetflix} datasets. The last column depicts the average \textit{RMSE} value of each recommender system computed over the five datasets. Top scores per dataset are highlighted in bold. }\end{table}

Table \ref{table:res} depicts the RMSE values for ER-SCoR, two variations of ER-SCoR (ER-SCoR-PU,

$(_{\text{0}})$

SCoR cite{score}, Enriched_AE \cite{Enriched}/TEXT> and UI2vec \cite{UI2vec}) on the ML-100k, ML-1M,Jester, Jester2 and SmallNetflix datasets. The last column shows the average RMSE value of each recommender system computed on the five datasets. Under any dataset ER-SCoR clearly outperforms all the methods from literature and slightly outperforms the two variations of ER-SCoR (ER-SCoR-PU,

$(_{\text{0}})$

ER-SCoR-PU due to the extra term

$(_{\text{0}})$

On average, ER-SCoR yields

$0.4%$

and

$0.8%$

lower RMSE than the two variations

$(_{\text{0}})$

ER-SCoR-PU, respectively. Furthermore, ER-SCoR yields

$3.8%$

$6.2%$

and

$8.2%$

lower average RMSE than the three recommender systems from literature SCoR, Enriched_AE and UI2vec, respectively. The best results of ER-SCoR are obtained under ML-1M dataset, where ER-SCoR yields

$5.1%$

$7.9%$

and

$8.2%$

lower RMSE than the SCoR, Enriched_AE and UI2vec, respectively. According to Table \ref{table:res}, among the baseline methods, SCoR performs best, followed by Enriched_AE, which in turn outperforms UI2vec.

Figure 55 shows the RMSE of SCoR, ER-SCoR-PU,

$(D_0)$

ER-SCoR in the ML-100k, ML-1M and SmallNetflix datasets with different ratings values. The outperformance of ER-SCoR and the variations of ER-SCoR (ER-SCoR-PU,

$(D_0)$

SCoR is more clear for the extreme values, especially for

$rating = 5$

. The primary reason for the difference in performance is the new position update mechanism of ER-SCoR that employs the offset

$\alpha = 20$

to mitigate the overfitting issues of SCoR (see Sect. 4) and to the equal impact on used ratings. Furthermore, the equality in rating contribution results in the improved performance of ER-SCoR, as the system tends to produce lower errors on the ratings most frequently observed.

Fig. 55

label{fig:RMSE_Rat} The RMSE of SCoR, ER-SCoR-PU,

$(D_0)$

ER-SCoR on ML-100k, ML-1M and SmallNetflix datasets under different values of ratings.

Conclusions

In this work, we introduced ER-SCoR, an Equal Ratings impact-based Recommender System utilizing Synthetic Coordinates. ER-SCoR enhances the original SCoR model by ensuring equal contribution of all ratings during coordinate updates and by integrating three additional belief-based terms (global, user-specific, and item-specific) into the recommendation process. These modifications significantly improve the performance, convergence speed and stability of the system.

Extensive experiments conducted on five widely-used datasets (ML-100k, ML-1M, Jester, Jester2, and SmallNetflix) demonstrated that ER-SCoR consistently outperforms both the original SCoR and two state-of-the-art methods, including deep learning and matrix factorization approaches. In particular, ER-SCoR achieves faster convergence, up to 25 times faster than SCoR, while consistently achieving lower RMSE values across all evaluated datasets. The method maintains all advantages of the SCoR framework, such as parameter-free operation, robustness to cold-start problems, and linear computational complexity. Beyond prediction accuracy, ER-SCoR offers interpretative value through the use of synthetic coordinates and belief terms. It enables intuitive dataset annotations and the detection of user/item outliers, thereby offering insights not only into what is recommended, but also why and to whom-a property of increasing importance in transparent and explainable AI systems. Conclusively, ER-SCoR represents a robust and efficient framework in the field of recommender systems, offering strong theoretical foundations, empirical superiority, and extensibility towards next-generation intelligent recommendation frameworks.

Future work will focus on incorporating side information and extending the model to support dynamic or sequential recommendation scenarios. Additionally, adapting the coordinate update mechanism for distributed environments could enhance scalability and computational efficiency.

bibliography{paper}

Author Contribution

C.P. wrote the main manuscript text and prepared figures 1-11. All authors reviewed the manuscript.

References:

El Youbi El Idrissi, Lamyae and Akharraz, Ismail and El Ouaazizi, Aziza and Ahaitouf, Abdelaziz (2024) Enhanced Collaborative Filtering: Combining Autoencoder and Opposite User Inference to Solve Sparsity and Gray Sheep Issues. Computers 13(11): 275 MDPI

Alharbe, Nawaf and Rakrouki, Mohamed Ali and Aljohani, Abeer (2023) A collaborative filtering recommendation algorithm based on embedding representation. Expert Systems with Applications 215: 119380 Elsevier

Papadakis, Harris and Papagrigoriou, Antonis and Kosmas, Eleftherios and Panagiotakis, Costas and Markaki, Smaragda and Fragopoulou, Paraskevi (2023) Content-based recommender systems taxonomy. Foundations of Computing and Decision Sciences 48(2): 211--241

Koren, Yehuda and Bell, Robert and Volinsky, Chris (2009) Matrix Factorization Techniques for Recommender Systems. Computer 42(8): 30--37 https://doi.org/10.1109/MC.2009.263, Computational intelligence, Computational intelligence, Netflix Prize, Matrix factorization, Matrix factorization, Netflix Prize, Los Alamitos, CA, USA, IEEE Computer Society Press, 1608614, http://dx.doi.org/10.1109/MC.2009.263, 8, 0018-9162, August, August 2009

Ekstrand, Michael D. and Riedl, John T. and Konstan, Joseph A. (2011) Collaborative Filtering Recommender Systems. Found. Trends Hum.-Comput. Interact. 4(2): 81--173 https://doi.org/10.1561/1100000009, Hanover, MA, USA, Now Publishers Inc., 2185828, http://dx.doi.org/10.1561/1100000009, 93, 1551-3955, February, February 2011

Sarwar, Badrul and Karypis, George and Konstan, Joseph and Riedl, John (2001) Item-based Collaborative Filtering Recommendation Algorithms. ACM, New York, NY, USA, 372071, 10.1145/371920.372071, http://doi.acm.org/10.1145/371920.372071, 11, 285--295, Hong Kong, Hong Kong, 1-58113-348-0, WWW '01, Proceedings of the 10th International Conference on World Wide Web

Papadakis, Harris and Papagrigoriou, Antonis and Panagiotakis, Costas and Kosmas, Eleftherios and Fragopoulou, Paraskevi (2022) Collaborative filtering recommender systems taxonomy. Knowledge and Information Systems 64(1): 35--74 Springer

Liu, Ning and Zhao, Jianhua (2023) Recommendation system based on deep sentiment analysis and matrix factorization. IEEE Access 11: 16994--17001 IEEE

Panagiotakis, Costas and Papadakis, Harris and Papagrigoriou, Antonis and Fragopoulou, Paraskevi (2021) Improving recommender systems via a Dual Training Error based Correction approach. Expert Systems with Applications : 115386 Elsevier

Gao, Chen and Zheng, Yu and Li, Nian and Li, Yinfeng and Qin, Yingrong and Piao, Jinghua and Quan, Yuhan and Chang, Jianxin and Jin, Depeng and He, Xiangnan and others (2023) A survey of graph neural networks for recommender systems: Challenges, methods, and directions. ACM Transactions on Recommender Systems 1(1): 1--51 ACM New York, NY, USA

Bobadilla, Jes{\'u}s and Ortega, Fernando and Hernando, Antonio and Guti{\'e}rrez, Abraham (2013) Recommender systems survey. Knowledge-based systems 46: 109--132 Elsevier

Bhat, Aruna (2014) K-medoids clustering using partitioning around medoids for performing face recognition. International Journal of Soft Computing, Mathematics and Control 3(3): 1--12 Citeseer

Panagiotakis, Costas (2015) Point clustering via voting maximization. Journal of classification 32(2): 212--240 Springer

Huang, Shangrong and Zhang, Jian and Schonfeld, Dan and Wang, Lei and Hua, Xian-Sheng (2017) Two-stage friend recommendation based on network alignment and series expansion of probabilistic topic model. IEEE Transactions on Multimedia 19(6): 1314--1326 IEEE

Berg, Rianne van den and Kipf, Thomas N and Welling, Max (2017) Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263

Yunhong Zhou and Dennis Wilkinson and Robert Schreiber and Rong Pan (2008) Large-Scale Parallel Collaborative Filtering for the Netflix Prize. Springer, 337--348, International conference on algorithmic applications in management

Berg, Rianne van den and Kipf, Thomas N and Welling, Max (2017) Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263

Sarwar, Badrul and Karypis, George and Konstan, Joseph and Riedl, John (2001) Item-based collaborative filtering recommendation algorithms. 285--295, Proceedings of the 10th international conference on World Wide Web

Goldberg, Ken and Roeder, Theresa and Gupta, Dhruv and Perkins, Chris (2001) Eigentaste: A Constant Time Collaborative Filtering Algorithm. Inf. Retr. 4(2): 133--151 1386-4564, July, July 2001

Nilashi, Mehrbakhsh and Ibrahim, Othman and Bagherifard, Karamollah (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications 92: 507--520 Elsevier

Elahi, Mehdi and Ricci, Francesco and Rubens, Neil (2016) A survey of active learning in collaborative filtering recommender systems. Computer Science Review 20: 29--50 Elsevier

Guo, Guibing and Zhang, Jie and Thalmann, Daniel (2014) Merging trust in collaborative filtering to alleviate data sparsity and cold start. Knowledge-Based Systems 57: 57--68 Elsevier

Ma, Hao and King, Irwin and Lyu, Michael R. (2012) Mining Web Graphs for Recommendations. IEEE Trans. on Knowl. and Data Eng. 24(6): 1051--1064 https://doi.org/10.1109/TKDE.2011.18, Recommendation, diffusion, query suggestion, image recommendation., Piscataway, NJ, USA, IEEE Educational Activities Department, 2197152, http://dx.doi.org/10.1109/TKDE.2011.18, 14, 1041-4347, June, June 2012

Jain, Sarika and Grover, Anjali and Thakur, Praveen Singh and Choudhary, Sourabh Kumar (2015) Trends, problems and solutions of recommender system. IEEE, 955--958, International Conference on Computing, Communication & Automation

Zhang, Qi and Wang, Jiawen and Huang, Haoran and Huang, Xuanjing and Gong, Yeyun (2017) Hashtag recommendation for multimodal microblog using co-attention network. 3420--3426, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia

Breiman, Leo (2001) Random forests. Machine learning 45(1): 5--32 Springer

Theocharous, Georgios and Thomas, Philip S and Ghavamzadeh, Mohammad (2015) Personalized ad recommendation systems for life-time value optimization with guarantees. Twenty-Fourth International Joint Conference on Artificial Intelligence

Koren, Yehuda (2008) Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. ACM, New York, NY, USA, collaborative filtering, recommender systems, 1401944, 10.1145/1401890.1401944, 9, 426--434, Las Vegas, Nevada, USA, 978-1-60558-193-4, KDD '08, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Thirumalai, Chandrasegar and Chandhini, Swapna Anupriya and Vaishnavi, M (2017) Analysing the concrete compressive strength using Pearson and Spearman. IEEE, 215--218, 2, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA)

G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines.. 2010, Tech report UTML TR 2010-003, University of Toronto

Yu, Hsiang-Fu and Hsieh, Cho-Jui and Si, Si and Dhillon, Inderjit (2012) Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. IEEE Computer Society, Washington, DC, USA, Recommender systems, Matrix factorization, Low rank approximation, Parallelization, 2472631, 10, 765--774, 978-0-7695-4905-7, ICDM '12, Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Herlocker, Jonathan L and Konstan, Joseph A and Terveen, Loren G and Riedl, John T (2004) Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1): 5--53 ACM

Harper, F Maxwell and Konstan, Joseph A (2016) The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5(4): 19 ACM

Zhang, Heng-Ru and Min, Fan (2016) Three-way recommender systems based on random forests. Knowledge-Based Systems 91: 275--286 Elsevier

Tsai, Chih-Fong and Hung, Chihli (2012) Cluster ensembles in collaborative filtering recommendation. Applied Soft Computing 12(4): 1417--1425 Elsevier

Logesh, R and Subramaniyaswamy, V Exploring hybrid recommender systems for personalized travel applications. Cognitive informatics and soft computing, Springer, 2019, 535--544

Xie, Weizhu and Ouyang, Yuanxin and Ouyang, Jingshuai and Rong, Wenge and Xiong, Zhang (2016) User occupation aware conditional restricted boltzmann machine based recommendation. IEEE, 454--461, Internet of Things (iThings), 2016 IEEE International Conference on

Ebesu, Travis and Fang, Yi (2017) Neural citation network for context-aware citation recommendation. ACM, 1093--1096, Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval

Panagiotakis, Costas and Papadakis, Harris and Fragopoulou, Paraskevi (2020) Personalized Video Summarization Based Exclusively on User Preferences. Springer, 305--311, European Conference on Information Retrieval

Panagiotakis, Costas and Papadakis, Harris and Fragopoulou, Paraskevi (2020) Unsupervised and Supervised Methods for the Detection of Hurriedly Created Profiles in Recommender Systems. Machine Learning and Cybernetics

Panagiotakis, Costas and Papadakis, Harris and Fragopoulou, Paraskevi (2020) A User Training Error based Correction Approach combined with the Synthetic Coordinate Recommender System. 11--16, Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization

Panagiotakis, Costas and Papadakis, Harris and Fragopoulou, Paraskevi (2018) Detection of hurriedly created abnormal profiles in recommender systems. 499--506, 2018 International Conference on Intelligent Systems (IS)

Panagiotakis, Costas (2015) Point clustering via voting maximization. Journal of classification 32(2): 212--240 Springer

Panagiotakis, Costas and Papadakis, Harris and Grinias, Elias and Komodakis, Nikos and Fragopoulou, Paraskevi and Tziritas, Georgios (2013) Interactive image segmentation based on synthetic graph coordinates. Pattern Recognition 46(11): 2940--2952 Elsevier

Dabek, Frank and Cox, Russ and Kaashoek, Frans and Morris, Robert (2004) Vivaldi: A decentralized network coordinate system. ACM, 15--26, 4, 34, ACM SIGCOMM Computer Communication Review

Papadakis, Harris and Panagiotakis, Costas and Fragopoulou, Paraskevi (2014) Distributed detection of communities in complex networks using synthetic coordinates. Journal of Statistical Mechanics: Theory and Experiment 2014(3): P03013 IOP Publishing

Haindl, Michal and Mike{\v{s}}, Stanislav (2016) A competition in unsupervised color image segmentation. Pattern Recognition 57: 136--151 Elsevier

Pasquale Lops, Marco de Gemmis and Giovanni Semeraro Content-based Recommender Systems: State of the Art and Trends. Recommender Systems Handbook, Berlin, Heidelberg, Springer-Verlag, 2010, 73--106

Truong, Ba Tu and Venkatesh, Svetha (2007) Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications (TOMM) 3(1): 3 ACM

GraphLab. The smallnetflix recommender systems dataset. 2012, April, http://www.select.cs.cmu.edu/code/graphlab/datasets/

GroupLens group. The MovieLens recommender systems dataset. 2003, February, http://http://grouplens.org/datasets/movielens/

Ken Goldberg. The Jester recommender systems dataset. 2003, May, http://www.ieor.berkeley.edu/ goldberg/jester-data/

Koren, Yehuda and Bell, Robert and Volinsky, Chris (2009) Matrix factorization techniques for recommender systems. Computer 42(8): 30--37 IEEE

Salehi, Mojtaba and Kamalabadi, Isa Nakhai (2013) Hybrid recommendation approach for learning material based on sequential pattern of the accessed material and the learner ’s preference tree. Knowledge-Based Systems 48: 57--69 Elsevier

He, Xiangnan and Liao, Lizi and Zhang, Hanwang and Nie, Liqiang and Hu, Xia and Chua, Tat-Seng (2017) Neural collaborative filtering. 173--182, Proceedings of the 26th international conference on world wide web

Liu, Xiaomeng and Ouyang, Yuanxin and Rong, Wenge and Xiong, Zhang (2015) Item category aware conditional restricted boltzmann machine based recommendation. Springer, 609--616, International Conference on Neural Information Processing

Papadakis, Harris and Panagiotakis, Costas and Fragopoulou, Paraskevi (2017) {SCoR}: A Synthetic Coordinate based System for Recommendations. Expert Systems with Applications 79: 8--19 Elsevier

Zhao, Zihuai and Fan, Wenqi and Li, Jiatong and Liu, Yunqing and Mei, Xiaowei and Wang, Yiqi and Wen, Zhen and Wang, Fei and Zhao, Xiangyu and Tang, Jiliang and others (2024) Recommender systems in the era of large language models (llms). IEEE Transactions on Knowledge and Data Engineering IEEE

Zhang, Junjie and Xie, Ruobing and Hou, Yupeng and Zhao, Xin and Lin, Leyu and Wen, Ji-Rong (2023) Recommendation as instruction following: A large language model empowered recommendation approach. ACM Transactions on Information Systems ACM New York, NY

Gao, Yunfan and Sheng, Tao and Xiang, Youlin and Xiong, Yun and Wang, Haofen and Zhang, Jiawei (2023) Chat-rec: Towards interactive and explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524

Tran, Du and Bourdev, Lubomir and Fergus, Rob and Torresani, Lorenzo and Paluri, Manohar (2015) Learning spatiotemporal features with 3d convolutional networks. 4489--4497, Proceedings of the IEEE international conference on computer vision

He, Xiangnan and Du, Xiaoyu and Wang, Xiang and Tian, Feng and Tang, Jinhui and Chua, Tat-Seng (2018) Outer Product-based Neural Collaborative Filtering. arXiv preprint arXiv:1808.03912

Zhang, Shuai and Yao, Lina and Sun, Aixin (2017) Deep learning based recommender system: A survey and new perspectives. arXiv preprint arXiv:1707.07435

Bamshad Mobasher and Robin D. Burke and Jeff J. Sandvig (2006) Model-Based Collaborative Filtering as a Defense against Profil Injection Attacks. 1388--1393, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, {USA}

Genevieve Gorrell (2006) Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing. 97--104, 11st Conference of the European Chapter of the Association for Computational Linguistics

Gunawardana, Asela and Shani, Guy (2009) A survey of accuracy evaluation metrics of recommendation tasks. Journal of Machine Learning Research 10(Dec): 2935--2962

Adomavicius, Gediminas and Kwon, Young (2012) Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24(5): 896--911 IEEE

Ricci, Francesco and Rokach, Lior and Shapira, Bracha Recommender systems: introduction and challenges. Recommender systems handbook, Springer, 2015, 1--34

Turk, Ahmet Murat and Bilge, Alper (2019) Robustness analysis of multi-criteria collaborative filtering algorithms against shilling attacks. Expert Systems with Applications 115: 386--402 Elsevier

Gygli, Michael and Song, Yale and Cao, Liangliang (2016) Video2gif: Automatic generation of animated gifs from video. 1001--1009, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Elhamifar, Ehsan and Clara De Paolis Kaluza, M (2017) Online summarization via submodular and convex optimization. 1783--1791, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Gygli, Michael and Grabner, Helmut and Riemenschneider, Hayko and Van Gool, Luc (2014) Creating summaries from user videos. Springer, 505--520, European conference on computer vision

Vasudevan, Arun Balajee and Gygli, Michael and Volokitin, Anna and Van Gool, Luc (2017) Query-adaptive Video Summarization via Quality-aware Relevance Estimation. ACM, 582--590, Proceedings of the 2017 ACM on Multimedia Conference

Sun, Min and Farhadi, Ali and Seitz, Steve (2014) Ranking domain-specific highlights by analyzing edited videos. Springer, 787--802, European conference on computer vision

Xu, Jia and Mukherjee, Lopamudra and Li, Yin and Warner, Jamieson and Rehg, James M and Singh, Vikas (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. 2235--2244, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Jaimes, Alejandro and Echigo, Tomio and Teraguchi, Masayoshi and Satoh, Fumiko (2002) Learning personalized video highlights from detailed MPEG-7 metadata. IEEE, I--I, 1, Image Processing. 2002. Proceedings. 2002 International Conference on

Covington, Paul and Adams, Jay and Sargin, Emre (2016) Deep neural networks for youtube recommendations. ACM, 191--198, Proceedings of the 10th ACM Conference on Recommender Systems

Ma, Yu-Fei and Hua, Xian-Sheng and Lu, Lie and Zhang, Hong-Jiang (2005) A generic framework of user attention model and its application in video summarization. IEEE transactions on multimedia 7(5): 907--919 IEEE

Pal, Gautam and Acharjee, Suvojit and Rudrapaul, Dwijen and Ashour, Amira S and Dey, Nilanjan (2015) Video segmentation using minimum ratio similarity measurement. International journal of image mining 1(1): 87--110 Inderscience Publishers (IEL)

Gygli, Michael (2018) Ridiculously fast shot boundary detection with fully convolutional neural networks. IEEE, 1--4, 2018 International Conference on Content-Based Multimedia Indexing (CBMI)

Zhang, Ke and Chao, Wei-Lun and Sha, Fei and Grauman, Kristen (2016) Video summarization with long short-term memory. Springer, 766--782, European conference on computer vision

Yao, Ting and Mei, Tao and Rui, Yong (2016) Highlight detection with pairwise deep ranking for first-person video summarization. 982--990, Proceedings of the IEEE conference on computer vision and pattern recognition

Mahasseni, Behrooz and Lam, Michael and Todorovic, Sinisa (2017) Unsupervised video summarization with adversarial lstm networks. 1, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

del Molino, Ana Garc{\'\i}a and Gygli, Michael (2018) PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation. arXiv preprint arXiv:1804.06604

Gygli, Michael and Grabner, Helmut and Van Gool, Luc (2015) Video summarization by learning submodular mixtures of objectives. 3090--3098, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Sharghi, Aidean and Laurel, Jacob S and Gong, Boqing (2017) Query-focused video summarization: Dataset, evaluation, and a memory network based approach. 2127--2136, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Panagiotakis, Costas and Papoutsakis, Konstantinos and Argyros, Antonis (2018) A graph-based approach for detecting common actions in motion capture data and videos. Pattern Recognition 79: 1--11 Elsevier

Pritch, Yael and Rav-Acha, Alex and Peleg, Shmuel (2008) Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis & Machine Intelligence 30(11): 1971--1984 IEEE

Money, Arthur G and Agius, Harry (2008) Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation 19(2): 121--143 Elsevier

Panagiotakis, Costas and Doulamis, Anastasios and Tziritas, Georgios (2009) Equivalent key frames selection based on iso-content principles. IEEE Transactions on circuits and systems for video technology 19(3): 447--451 IEEE

Panda, Rameswar and Roy-Chowdhury, Amit K (2017) Collaborative summarization of topic-related videos. 5, 4, 2, CVPR

McLaughlin, Matthew R. and Herlocker, Jonathan L. (2004) A Collaborative Filtering Algorithm and Evaluation Metric That Accurately Model the User Experience. Association for Computing Machinery, New York, NY, USA, SIGIR ’04, Sheffield, United Kingdom, recommender systems, algorithms, machine learning, mean absolute error, collaborative filtering, nearest neighbor, evaluation, precision, 8, 329 –336, Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 10.1145/1008992.1009050, https://doi.org/10.1145/1008992.1009050, 1581138814

Bellog\'{\i}n, Alejandro and Castells, Pablo and Cantador, Iv\'{a}n (2014) Neighbor Selection and Weighting in User-Based Collaborative Filtering: A Performance Prediction Approach. ACM Trans. Web 8(2) https://doi.org/10.1145/2579993, performance prediction, trust, Recommender systems, neighbor weighting, user-based collaborative filtering, neighbor selection, 30, 12, March, https://doi.org/10.1145/2579993, 1559-1131, New York, NY, USA, Association for Computing Machinery, March 2014

Acknowledgements

This work was supported by Project “Enhancing the Greek Safer Internet Center SaferInternet4Kids: Awareness, Helpline, Hotline”, DIGITAL EUROPE, 2024–2026.

Additional Files

Additional file 1

Additional file 2

Additional file 3

Additional file 4

Additional file 5

Additional file 6

Additional file 7

Additional file 8

Additional file 9

Additional file 10

Additional file 11

Additional file 12

Additional file 13

Additional file 14

Additional file 15

Additional file 16

Additional file 17

Additional file 18

Additional file 19

Additional file 20

Additional file 21

Additional file 22

Additional file 23

Additional file 24

Additional file 25

Additional file 26

Additional file 27

Additional file 28

Additional file 29

Additional file 30

Additional file 31

Additional file 32

Additional file 33

Additional file 34

Additional file 35

Additional file 36

Additional file 37

Additional file 38

Additional file 39

Additional file 40

Additional file 41

Additional file 43

Additional file 44

Additional file 45

Additional file 46

Additional file 47

Additional file 48

Additional file 49

Additional file 50

Additional file 51

Additional file 52

Additional file 53

Additional file 54

Additional file 55

Additional file 56

Additional file 57

Additional file 58

Additional file 59

Additional file 61

Additional file 62

Additional file 63

Additional file 64

Additional file 65

Additional file 66

Yes

Abstract

In this article, we introduce ER-SCoR, an Equal Ratings impact-based Recommender System built upon Synthetic Coordinates, which is shown to outperform the state-of-the-art algorithmic techniques as well as the original Synthetic Coordinate based Recommendation system (SCoR). \textit{SCoR} assigns a set of synthetic coordinates to every node (both users and items), such as the distance between a user and an item corresponds to an accurate prediction of the user’s preference for that item. ER-SCoR enhances this model by (i) enforcing equal contributions from all ratings during coordinate updates, and (ii) incorporating three additional terms into the recommendation process: a global system belief, a user-specific belief, and an item-specific belief. These modifications constitute fundamental changes in the core system architecture and improve convergence speed, accuracy, and stability. ER-SCoR preserves the advantages of SCoR like parameter-free configuration, robustness to cold-start problems, and linear computational complexity, while achieving faster convergence and improved predictive performance. Extensive experiments across five real-world datasets demonstrate that \textit{ER-SCoR} consistently yields lower \textit{RMSE} compared to existing approaches, and provides meaningful dataset annotations, including identification of outliers, users with similar preferences and items that receive similar user ratings.