Identifying Customer Priorities in Online Reviews through Sequence-to-Sequence Learning with Dual Contextual Attention

Sam Rahimzadeh Holagh 1,2

Jinfeng Zhou 2

Bugao Xu 1,2,3✉ Emailbugao.xu@unt.edu

1 Department of Merchandising and Digital Retailing University of North Texas Denton USA

2 Department of Information Science University of North Texas Denton USA

3 1155 Union Circle #311100 76203-5017 Denton TX USA

Sam Rahimzadeh Holagh^1,2, Jinfeng Zhou², and Bugao Xu^1,2*

¹Department of Merchandising and Digital Retailing, University of North Texas, Denton, USA

²Department of Information Science, University of North Texas, Denton, USA

*Corresponding author: Bugao Xu, bugao.xu@unt.edu;

1155 Union Circle #311100, Denton, TX 76203 − 5017, USA

Abstract

User reviews on e-commerce platforms offer real-time feedback on customer experiences. However, the large volume and unstructured nature of reviews make it challenging for businesses to identify the key aspects that matter most to consumers. This study introduces a sequence-to-sequence learning model that integrates global positional attention with local syntactic attention to extract aspect terms capturing the product and service attributes emphasized in reviews. A multi-stage aspect summarization process, combining dimensionality reduction, semantic clustering, and LLM based refinement, is utilized to distill hundreds of extracted terms into a concise set of interpretable primal words highlighting customer major concerns. Experiments on three annotated SemEval 2014 datasets demonstrate superior extraction accuracy compared to eight established baselines. Application to five large Amazon review datasets across different domains further shows strong generalizability in identifying coherent primal terms for each dataset, providing actionable themes aligned with consumer priorities.

Keywords:

sequence-to-sequence learning

dependency parsing

aspect extraction

aspect summarization

1. Introduction

Ecommerce businesses increasingly rely on consumer-generated content, especially online reviews, to understand what matters most to customers. Reviews often provide rich qualitative details about specific product attributes such as durability, usability, packaging, customer service, or price fairness [1]. These fine-grained cues are essential for retailers to evaluate product performance, diagnose emerging issues, and design data-driven marketing strategies [2, 3]. Aspect-based sentiment analysis (ABSA) applies natural language processing techniques to identify customer priorities by extracting explicit product or service aspects evaluated within reviews [4, 5]. However, ABSA continues to face several challenges in practical applications. First, customers express aspects inconsistently often with highly variable and informal language [6]. Second, aspects may appear within complex syntactic structures or multi-token phrases, leading ABSA models to miss or mislabel key product attributes [7]. Third, retailers frequently analyze reviews spanning diverse product categories with differing vocabularies, making cross-domain generalizability essential for real-world use [8].

Recently, sequence-to-sequence learning (Seq2Seq) has garnered significant interest for its application to aspect extraction. Seq2Seq models employ an encoder–decoder architecture in which the encoder transforms the input sequence into a hidden representation, and the decoder generates a corresponding label sequence that identifies aspect terms within the text [9–11]. These models are well-suited for capturing long-range dependencies and variable-length outputs, making them attractive for consumer review analysis. Nonetheless, traditional Seq2Seq faces three major challenges. First, long and syntactically complex review sentences may lead to information loss due to vanishing gradients or memory constraints [12]. Second, Seq2Seq models without attention mechanisms often struggle to align relevant input tokens with their predicted output labels, reducing extraction accuracy for nuanced or domain-specific expressions [13]. Third, a fixed-size representation can impose limitations when dealing with highly variable expressions used by consumers to describe similar product aspects [14].

In this study, we introduce an enhanced Seq2Seq model with dual contextual attention mechanisms to detect and label aspect terms in text sequences. The model integrates global positional attention with local syntactic attention, enabling it to capture both broad semantic context and fine-grained dependency patterns essential for identifying aspect terms associated with customer sentiment. A Bi-GRU encoder–decoder architecture, paired with a fusion gating mechanism, adaptively combines global and local context vectors to improve extraction accuracy. This design allows the model to recognize product and service attributes even when they appear as multi-word expressions or within complex syntactic structures. Once aspect terms are extracted from a dataset, we apply a multi-stage summarization process combining principal component analysis (PCA), clustering, and LLM refinement to condense hundreds of terms into a small set of abstract and interpretable terms, or primal words. These primal words capture the core attributes that reflect customer priorities in reviews, such as performance, usability, reliability, aesthetics, or price value, providing businesses with clear, data-driven insight into what matters most to consumers.

The remainder of the paper is structured as follows. Section 2 reviews relevant literature on aspect extraction and Seq2Seq methods. Section 3 presents the proposed dual-context Seq2Seq model and the aspect summarization pipeline. Section 4 reports experiments conducted on SemEval-2014 restaurant and laptop datasets, along with Amazon review data across five product categories by comparing the proposed model with established baselines. Section 5 concludes with key findings and managerial implications.

2. Related work

Aspect extraction, the first step in the fine-grained sentiment analysis of user reviews, has been extensively studied in recent years and can be broadly categorized into several main approaches, including rule-based, supervised, and unsupervised learning, and hybrid models [15]. Each of the approaches may leverage dependency relations, implicit expressions, syntactic dependencies, and other linguistic features to accurately identify and extract specific attributes from textual data [16].

2.1. Rule-based approach

Rule-based methods use predefined linguistic rules or patterns to identify aspects in text. Linguistic features often include syntactic patterns, part-of-speech (POS) tags and dependency relations within sentences [17]. Rules may be crafted to detect implicit expressions or contextual cues that indicate the presence of certain aspects [18]. By defining rules based on linguistic features, aspect extraction systems can effectively parse and extract relevant aspects from text data. Rule-based aspect extraction is often implemented in conjunction with other methods, such as machine learning techniques, to improve accuracy and coverage. Alqaryouti et al. [19] used a combined method implementing both a lexicon and rule-based approach to extract both explicit and implicit aspects, along with performing sentiment classification for these aspects. Venugopalan and Gupta [20] utilized a hierarchical rule-based method for aspect term extraction by prioritizing high recall and minimizing false positives. Although rule-based methods can be highly accurate for the datasets they are fine-tuned on, their adaptability is limited because extensive manual efforts are needed to create and update rules for different domains. It is labor-intensive to develop enough rules to categorize the extracted aspect terms due to the variability and ambiguity of natural language.

The syntactic relationships of words in a sentence can be determined automatically with dependency parsing, a procedure which examines the grammatical structure of the sentence. Wang et al. [21] constructed an RNN-based dependency tree, called the Recursive Neural Conditional Random Fields (RNCRF) model, to learn sentence feature representations and then to mark aspect labels by introducing a Conditional Random Field (CRF). Luo et al. [22] presented a model (BiDTressCRF) which is based on a bidirectional dependency tree network and a Bi-LSTM to extract syntactic features as well as contextual features, and to feed the features into a CRF to predict aspect labels. Dai et al. [23] developed the Ruled Incorporated Neural Aspect and Opinion Term Extraction (RINANTE) model, which automatically mines extraction rules based on syntactic information and applies these rules to label aspects. Following this, an LSTM and CFM neural network are trained to predict aspects based on the automatically labeled aspects and the ground truth labels. Liang et al. [24] introduced the Dependency Relation Embedded Graph Convolutional Network (DREGCN), to model the dependency relation of words in sentences using a graph convolutional network and applies an effective message-passing mechanism to enhance model learning for aspect extraction.

2.2. Unsupervised learning approach

Unsupervised techniques do not require labeled data. Instead, they might use clustering algorithms to group similar terms or phrases, assuming each cluster represents a distinct aspect. Topic modeling techniques like Latent Dirichlet Allocation (LDA) are also popular for identifying topics within text, which can correspond to aspects [25, 26]. Probabilistic Latent Semantic Analysis (PLAS) was one of the topic modeling methods, in which two probability distributions, the document-topic and topic-word distributions [27, 28]. The former describes the probability of topics occurring in a document, and the latter represents the probability of words appearing in a given topic [29]. Topic-based approaches may not be able to capture the contextual information of documents and long-term dependency of long sentences [30]. Another challenge with unsupervised methods is that the aspects they extract might not always align with the semantic categories of interest, requiring post-processing or human validation [31].

2.3. Supervised learning approach

Supervised methods use labeled data to train models that can recognize and extract aspects. Techniques range from traditional machine learning models like Support Vector Machines (SVM) and Random Forests to neural network architectures, including Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) [32]. Recently, many deep learning models have been developed to capture complex language patterns for aspect extraction. For instance, Alslaity and Orji [33] integrated a set of rule-based techniques with the deep net to enhance the performance of both the aspect extraction and sentiment scoring methods. Wang et al. [34] employed coupled multi-layer attentions (CMLA) to learn the relation tokens in a sentence, which could exploit indirect relations between terms for more precise information extraction. Li et al.[35] introduced the Truncated History-Attention (THA) model, which employs LSTM to generate initial word representations and integrates aspect detection history into the current aspect representation through truncated history attention, creating a history-aware aspect representation for aspect extraction. Xu et al. [36] developed the Double Embeddings CNN (DE-CNN), which concatenates general-purpose embeddings with domain-specific embeddings in a CNN model for aspect extraction. Tran et al. [37] presented the Bi-GRU-CRF model, combining a bidirectional Gated Recurrent Unit (Bi-GRU) with word embedding and CRF into a unified framework for extracting aspects. Luo et al. [38] proposed a dual cross shared RNN framework (DOER) to determine the polarity of aspect terms, incorporating ReGu to improve feature extraction capability. Chen et al. [39] applied BERT layers and direction graph convolutional networks (D-GCN) to implement aspect extraction while encoding syntactic information.

2.4. Hybrid approach

Hybrid methods combine the strengths of rule-based, supervised, and unsupervised approaches to improve aspect extraction performance. A hybrid system might use rule-based filtering to pre-process data, apply supervised learning for initial aspect extraction, and then refine the results using unsupervised clustering. For example, Chauhan et al., [40] used a two-step mixed unsupervised model by combining linguistic patterns with deep learning techniques to improve aspect extraction from unstructured data on social networks. Pereg et al., [41] combined linguistic information with a self-attention mechanism and the BERT model to extract aspects. Xu et al. [42] developed a deep learning method called DomBERT in combination with in-domain corpus and relevant domain corpora for ABSA. Kotagiri et al. [43] applied reptile search optimization, extreme gradian boosting and deep learning methods to the aspect extraction and sentiment analysis across multiple datasets. Venugopalan & Gupta [44] enhanced the LDA model using BERT based semantic filtering, particle swarm optimization strategy and linguistic rules. Liu et al., [8] used a hybrid model graph-based multi-grained convolution (CMGC) with attention mechanism to perform the sentiment analysis on automotive website consumers. Hybrid approaches aim to balance the precision of rule-based and supervised methods with the broad applicability of unsupervised techniques [32].

3. Methodology

In this study, we approach aspect extraction as a sequence labeling problem and propose a supervised sequence-to-sequence aspect extraction model (Seq2Seq) to identify important aspects explicitly mentioned in a review. Normally, a sentence of n words can be denoted by a sequence of tokens,

$\:\:X={(x}_{1},\:{x}_{2},\:\dots\:,{x}_{n})$

. The model’s output is a sequence of token-level labels,

$\:\:Y\:=\:{(y}_{1},\:{y}_{2},\:\dots\:,{y}_{n})$

, where each

$\:{y}_{i}$

belongs to the set

$\:\left[B,I,O\right],$

indicating the beginning (B), inside (I), and outside (O) of an aspect phrase. Figure 1 provides an example with all the tokens being labelled as B, I or O. An aspect phrase may contain one token (e.g., ‘price’) or multiple tokens (e.g., ‘laptop appearance’). Only tokens labeled B or I are considered as part of an aspect phrase; tokens labeled O are excluded in aspect extraction. The ultimate task of the proposed Seq2Seq model is to predict a label (B, I or O) for each token (

$\:{x}_{t}$

) in review texts.

Fig. 1

Example of aspect labeling.

3.1. Framework

Figure 2 illustrates the architecture of the proposed Seq2Seq model, designed to extract aspect phrases or words from the input text sequence

$\:X$

by generating the corresponding label sequence

$\:Y$

. Given that aspect phrases often span multiple tokens, the model must effectively capture both the contextual relationships and dependencies between these tokens. Properly labeling multi-token aspects also necessitates maintaining label consistency across dependent tokens, which requires an in-depth understanding of their syntactic structure. To achieve this, the model incorporates a global position-aware attention mechanism, which dynamically weighs the importance of each token based on its semantic and positional relevance [45]. The position-aware alignment score is enhanced by incorporating dependency parsing, which encodes the syntactic relationships between tokens as positional biases [46]. This approach ensures that the model attends to both the most relevant tokens and their syntactic dependencies, addressing inconsistencies that arise in multi-token aspect labeling [47] [48]. Furthermore, aspects often exhibit syntactic dependencies that should be captured by the model. The integration of position and dependency-aware attention enables the framework to generalize across domains, capturing cross-domain relationships while maintaining accuracy in domain-specific contexts [47] [46].

Fig. 2

The framework of the proposed Seq2Seq model for aspect extraction.

The proposed Seq2Seq model mainly comprises embedding, and multiple bidirectional GRU (Bi-GRU) and attention mechanism layers. In the embedding layer, Google's pre-trained word2vec word embedding transforms a word token,

$\:{x}_{t}$

, in a vector that represents a review sentence,

$\:X={(x}_{1},\:{x}_{2},\:\dots\:,{x}_{n})$

, into an embedding vector

$\:{e}_{t}ϵ{\mathbb{R}}^{d}$

, where d represents the dimension of the embedding feature. The embedding output that corresponds to

$\:X$

is denoted as e =

$\:{(e}_{1},\dots\:,\:{e}_{t},\dots\:,{e}_{N})$

. The Bi-GRU layer connects multiple recurrent units in a series to generate subsequent sequences that enhance the acquisition of a hidden state representation, h =

$\:{(h}_{1},\dots\:,\:{h}_{t},\dots\:,{h}_{N})$

, for all N tokens. The inclusion of the Bi-GRU architecture ensures the model captures both forward and backward dependencies, enhancing the contextual understanding of the sequence [45].

Incorporating both positional and syntactic information, attention mechanisms are applied to generate the global and local context vectors by dynamically focusing on the most relevant parts of the input sequence. The global attention mechanism first computes alignment scores between the encoder and decoder states. These scores are then normalized using the SoftMax function to produce attention weights

$\:{\alpha\:}_{ti}$

, which are probabilities that reflect the importance of each encoder hidden state

$\:{h}_{i}$

at the current time step t. Each

$\:{h}_{i}$

is subsequently weighted by its corresponding

$\:{\alpha\:}_{ti}$

, generating the global context vector

$\:{\:g}_{t}$

. This vector captures the most relevant semantic information from the input, aligned with the decoder's focus at time step t.

In addition to the global context vector

$\:{\:g}_{t}$

, the model computes a local context vector

$\:{l}_{t}$

, which emphasizes tokens that are syntactically connected to the current token based on dependency parsing. These syntactic relationships such as modifiers or dependents are derived from a dependency tree of the input sequence. Unlike the global attention mechanism, which considers all encoder hidden state

$\:{h}_{i}$

, the local attention mechanism focuses only on a subset corresponding to the syntactic children of the current token. Local alignment scores and attention weights are then computed similarly to the global attention mechanism.

Next,

$\:{\:g}_{t}$

and

$\:{l}_{t}$

are integrated through a context fusion gate. The gate, regulated by a sigmoid function, controls the contribution of each context vector, ensuring that the fused context vector

$\:{\:c}_{t}$

captures both global semantics and local syntactics.

$\:{c}_{t}\:$

is then concatenated with the embedding of the previous output token

$\:{y}_{t-1}$

and fed into the decoder’s GRU, along with the previous decoder hidden state

$\:{s}_{t-1}$

. This concatenated input results in a more contextually aware and coherent sequence labeling. If incorrect token labels are generated, the model can dynamically adjust its focus and predictions during subsequent decoding steps through the feedback. This mechanism enhances the model’s ability to capture long-range dependencies and maintain label consistency across related tokens by leveraging both the syntactically and semantically informed context in conjunction with the decoder’s evolving internal state.

The decoder’s attention mechanism is implemented with the linear and SoftMax layer [49] to determine the predicted label,

$\:{\widehat{y}}_{t}$

for the token

$\:{x}_{t}$

, as follows:

$\:{\widehat{y}}_{t}=Softmax\left(V{s}_{t}+b\right)$

where

$\:V\in\:{\mathbb{R}}^{n\times\:m}$

and

$\:b\in\:{\mathbb{R}}^{n}$

are weight parameters. The iteration continues until the decoder reaches the predefined maximum sequence length and generates an end-of-sequence (< EOS>) token. Consequently, the model predicts a complete label sequence

$\:\widehat{y}=\left({\widehat{y}}_{1},\:{\widehat{y}}_{2},\:\dots\:,{\widehat{y}}_{N}\right).$

To train the model, a set of labeled review sentences,

$\:{\mathcal{D}}_{S}=\left({x}_{S},{y}_{S}\right)$

, was used for minimizing the loss function,

$\:\mathcal{L}$

, which is calculated with cross entropy and L2 regularization as follows [50]:

$\:\mathcal{L}\mathcal{\:}=\mathcal{\:}-\sum\:_{t\:=\:1}^{N}{y}_{t}\:\text{l}\text{o}\text{g}({\widehat{y}}_{t})+\frac{\lambda\:}{2}{‖w‖}_{2}^{2}$

where w is a vector of the trainable weight parameters in the proposed Seq2Seq model, and λ is the hyperparameter of L2 regularization. The Adam optimizer was used to minimize the loss function. Further details regarding this framework are provided in the following sections.

3.2. Bi-GRU layer

In the proposed model, a serial Bi-GRU structure is employed to capture information from both preceding and subsequent time steps to generate the current state. The serial Bi-GRU layer consists of a forward and backward GRU. The forward GRU, denoted as

$\:\overrightarrow{f}$

, processes the input sequence in its original order, producing a sequence of forward hidden states,

$\:(\overrightarrow{{h}_{1}},\overrightarrow{{h}_{2}},\:\dots\:,\overrightarrow{{h}_{N}})$

. Here,

$\:f\left(.\right)$

represents a non-linear transformation function such as tanh or ReLU. On the other hand, the backward GRU, denoted as

$\:\overleftarrow{f}$

, reads the sequence in the reverse order, yielding a sequence of backward hidden states,

$\:(\overleftarrow{{h}_{1}},\overleftarrow{{h}_{2}},\dots\:,\overleftarrow{{h}_{N}})$

. For token

$\:{x}_{t}$

, its embedding vector is

$\:{e}_{t}$

and the hidden state at the preceding time step is

$\:{h}_{t-1}$

. The forward and backward hidden states,

$\:\overrightarrow{{h}_{t}}$

and

$\:\overleftarrow{{h}_{t}}$

, are computed as follows:

$\:\overrightarrow{{h}_{t}}=\:\overrightarrow{f}\left(W{e}_{t}+\:U{h}_{t-1}\right)$

$\:\overleftarrow{{h}_{t}}=\overleftarrow{f}\left(W{e}_{t}+\:U{h}_{t+1}\right)$

where

$\:W\in\:{\mathbb{R}}^{m\times\:d}$

is a weight matrix for the input embeddings, and

$\:U\in\:{\mathbb{R}}^{m\times\:m}$

is a recurrent weight matrix for the hidden state, with m representing the dimension of the hidden state. The final hidden state

$\:{h}_{t}$

of token

$\:{x}_{t}$

can be obtained by concatenating

$\:\overrightarrow{{h}_{t}}$

and

$\:\overleftarrow{{h}_{t}}$

, as described by using Eq. (5) [51]:

$\:{h}_{t}=\:\text{B}\text{i}-\text{G}\text{R}\text{U}\left({e}_{t},{h}_{t-1},{h}_{t+1}\right)=\left[\overrightarrow{{h}_{t}}:\overleftarrow{{h}_{t}}\right]\:.$

3.3. Position aware global attention mechanism

It is assumed that context words with different relative positions have varying effects on the representation learning of aspect terms [52], and context words located closer to the aspect term may have a greater influence. Inspired by the attention mechanism and position awareness, a global position aware attentional model was proposed by different researchers such as Zhao et al., [47] and Xu et al., [42]. This concept involves incorporating all hidden states of the encoders and integrating their positional information when deriving the context vector.

The designed global attention mechanism begins by calculating alignment scores that measure the relevance of each encoder hidden state

$\:{h}_{i}\:$

to the decoder’s hidden state at the previous step

$\:{s}_{t-1}$

. This relevance is determined by combining

$\:{s}_{t-1}$

and

$\:{h}_{i}$

through a learned function, typically involving a concatenation of the two vectors resulting in a raw alignment score. At each decoding step,

$\:{s}_{t-1}$

is concatenated with

$\:{h}_{i}$

, forming a joint representation that encapsulates both the input sequence's contextual features and the decoder's current focus. Without

$\:{s}_{t-1}$

, the attention mechanism would only rely on

$\:{h}_{i}$

, which could result in less accurate focus. The decoder state

$\:{s}_{t-1}$

provides a summary of what has been generated so far and what the model might need to focus on next [47].

The framework of the global position-aware attention model is presented in Fig. 3.

Fig. 3

Weighted sum in dynamically position-aware global attention mechanism.

Based on their position index, the position distance

$\:d\left({x}_{i},{x}_{t}\right)$

is defined as the sub-linear function

$\:{log}_{2}\left(2+l\right)$

in Eq. (6), where

$\:l$

is the relative distance between encoder token

$\:{x}_{i}$

and the decoder’s current focus token

$\:{x}_{t}$

. We adopt this logarithmic form because it provides a smooth attenuation of syntactic influence, nearby tokens receive stronger emphasis, while more distant tokens are gradually down-weighted without completely suppressing long-range dependencies that often carry meaningful aspect–opinion relations [53]. Then, the concatenation function is derived from the previous decoder's hidden state and each of the encoder’s hidden states is used to calculate alignment scores. The word position distance is considered as the weight decay rate for each word to optimize the alignment scores, as seen in Eq. (7). Subsequently, the attention weights are computed using the SoftMax function, as shown in Eq. (8). The encoder’s hidden states and their respective alignment scores are multiplied to form the global context vector,

$\:{g}_{t}$

, which is defined in Eq. (9).

$\:d\left({x}_{i},{x}_{t}\right)={log}_{2}\left(2+l\right)$

$\:{f}_{g}\left({s}_{t-1},{h}_{i}\right)\:=\frac{1}{d\left({x}_{i},{\:x}_{t}\right)}\:{{v}_{a}}^{T}\text{tanh}\left({W}_{a}\left[{h}_{i}:{s}_{t-1}\right]\right)$

$\:{\alpha\:}_{ti}\:=\frac{exp\left({f}_{g}\left({s}_{t-1},{h}_{i}\right)\right)}{{\sum\:}_{j=1}^{N}exp\left({f}_{g}\left({s}_{t-1},{h}_{j}\right)\right)}$

$\:{g}_{t}\:=\:\sum\:_{i\:=\:1}^{N}{\alpha\:}_{ti}{h}_{i}\:$

where

$\:{W}_{a}$

is a trainable weight matrix and

$\:{v}_{a}$

is a trainable weight vector.

Combining

$\:{s}_{t-1}$

with the encoder hidden states

$\:{h}_{i}$

enables the attention mechanism to dynamically assign weights to input tokens based on their relevance to the decoder’s hidden state at t-1. As explained in the neural machine translation context, this mechanism dynamically focuses on important parts of the input sequence, effectively aligning encoder and decoder interactions. Computed as a weighted sum of

$\:{h}_{i}$

$\:{g}_{t}$

bridges this alignment by incorporating attention scores

$\:{\alpha\:}_{ti}$

because

$\:{\alpha\:}_{ti}$

capture the interplay between past decoding outputs and the input sequence.

3.4. Syntactic-aware local attention mechanism

Dependency parsing plays a critical role in syntactic analysis by identifying the relationships between words in a sentence [48]. In this study, we integrate dependency parsing with the encoder’s output vectors contextual representations of tokens to enhance the model’s ability to identify aspects across different domains. Reviews from various domains exhibit recurring syntactic patterns that facilitate information propagation. For example, syntactic relationships such as subject-predicate and subject-adjective connections between aspect terms and descriptive words are consistently observed across domains, as seen in Fig. 4. In the sentences of “The screen is great” and “The pizza is nice,” the parser identifies “screen” and “pizza” as subjects linked to their descriptive words “great” and “nice.” Although the vocabularies are different, the syntactic pattern remains the same, helping the model learn generalizable aspect relationships across domains. These patterns form the basis for transferring syntactic knowledge, enabling models to extract aspects more effectively [21] [41] [54].

Fig. 4

Syntactic relationship of sentences from two different domains

(det: determiner; nsubj: nominal subject; acomp: adjectival complement).

Unlike the global attention mechanism, which attends to all encoder hidden states, the dependency based local attention mechanism selectively focuses on the syntactic children of each token. These relationships are determined using a dependency parse tree, which starts with a root token that usually is the main verb of a sentence. In this tree, every word is treated as a node, and its direct edges represent syntactic dependencies such as “nsubj,” “acomp,” “conj,” etc. As illustrated in Fig. 5, the token “staff” has children, “the,” “and” and “food”; both the token and the children are considered when calculating the local context vector. In this study, we used the spaCy library to generate a dependency parsing tree for each sentence.

Fig. 5

Dependency parsing information (det: determiner; cc: coordinating conjunction; nsubj: nominal subject; acomp: adjectival complement; conj: conjunct; Root: usually the main verb).

The parse tree allows the model to identify the linguistic roles of words and perform the attention mechanism more precisely [55]. The encoder’s hidden states are structured according to the dependency tree of the sentence. Each word is considered as a parent node (or a parent head) connecting with its child nodes, and a child node can be a parent node for another set of child nodes. Table 1 displays the dependency parsing information of the sentence “The staff and food are exceptional” where token “are” was identified as Root. At time step t = 1 the token "staff" has three children in the dependency tree: “the” (i = 0), “and” (i = 2), and “food” (i = 4). At time step t, if a parent node (head) has m child nodes and their hidden states are

$\:{h}_{i}$

(parent),

$\:{h}_{c1},\:{h}_{c2},\:\dots\:,\:{h}_{m},$

a tanh transformation function can be established to compute the attention score as follows:

$\:{f}_{l}\left({s}_{t-1},{h}_{i}\right)\:=\:{{v}_{a}}^{T}\text{tanh}{(W}_{p}\left[{h}_{i}:{s}_{t-1}\right]+\:\sum\:_{j=c1}^{m}{W}_{c}\left[{h}_{j}:{s}_{t-1}\right])$

where

$\:{W}_{p}$

and

$\:{W}_{c}$

are the weight matrices for the parent node and child nodes, and

$\:{v}_{a}$

is a trainable weight vector. Similar to the global attention mechanism, the decoder’s hidden state

$\:{s}_{t-1}$

from the previous time step is concatenated with the encoder’s hidden states before applying the tanh activation function. The function

$\:{f}_{l}\left({s}_{t-1},{h}_{i}\right)$

allows the attention mechanism to focus on grammatically important words rather than merely adjacent ones. The attention weight

$\:{\alpha\:}_{ti}$

for

$\:{h}_{i}$

at time step t can be obtained by normalizing

$\:{f}_{l}\left({s}_{t-1},{h}_{i}\right)\:$

with a SoftMax function in the same way as Eq. (8). The local context vector

$\:{l}_{t\:}$

is computed as the weighted sum of the relevant hidden state:

$\:{l}_{t}\:=\:\sum\:_{i=1\:}^{N}{\alpha\:}_{ti}{h}_{i}$

Here,

$\:{\alpha\:}_{ti}\:$

represents the relevance of the i-th token in contributing to the local context vector

$\:{l}_{t}\:$

at time step t, effectively capturing the local syntactic context for the current token.

Table 1

Dependency parsing information.
Time step t	Token	Token head	Token’s children	Child index [i]	Attention weights
0	the	staff	-	[0]	α_t0
1	staff	are	The, and food	[0, 2, 4]	[α_t0, α_t2, α_t4, α_t1]
2	and	staff	-	[2]	α_t2
3	the	food	-	[3]	α_t3
4	food	staff	the	[3, 4]	[α_t3, α_t4]
5	are	Root	staff, exceptional	[1, 6]	[α_t1, α_t6, α_t5]
6	exceptional	are	-	[6]	α_t6

3.5. Context fusion gate

To integrate the global and local context vectors, a context fusion gate is built as shown in Fig. 6. The gate uses the sigmoid activation function (σ) to concatenate the global context vector and local context vector in a way shown in Eq. (13), which automatically controls how much information of the final context vector should be taken from the global and local context vectors. The final context vector can be obtained by multiplying the global and local vectors with their corresponding importance weight (

$\:{\:z}_{t})$

[56]:

$\:{{{\:z}_{t}=\sigma\:(W}_{c}[g}_{t}\::\:{l}_{t}\left]\right)$

$\:{\:c}_{t}=\left(1-{z}_{t}\right)\times\:$

$\:{g}_{t}$

$\:{z}_{t}$

$\:\times\:$

$\:{l}_{t}$

, (14)

where σ is sigmoid activation function which outputs a gating coefficient

$\:{\:z}_{t}$

$\:{W}_{c}$

is a trainable weight matrix used in the fusion gate. The fusion gate dynamically balances information between the global context vector and the local context vector by computing a gating coefficient with the sigmoid function that controls the importance of each input. This fusion keeps only the most relevant information for the decoding.

Fig. 6

Context fusion gate.

3.6. Input feeding mechanism

The input-feeding strategy has been implemented to enhance decoder performance in Seq2Seq models. Cha et al. [57] incorporate context vectors

$\:{\:c}_{t}$

into subsequent hidden states, implicitly achieving a "coverage" effect that maintains alignment information across decoding steps. Similarly, Xu et al. [36] introduce a doubly attentional approach, which enforces training constraints to distribute attention evenly across all input parts, ensuring global coverage but limiting the model's flexibility. Kumar et al. [58] used attentional vectors as the feedback to maintain past alignment information and guide the decoder's focus. Our approach was built on these techniques by explicitly concatenating the previous prediction

$\:{\widehat{y}}_{t-1}$

with the current context vector

$\:{\:c}_{t}$

as the input to the decoding Bi-GRU. Let

$\:{E}_{t-1}$

denote the embedding of the previously generated token label

$\:{\widehat{y}}_{t-1}$

. The vector

$\:{q}_{t}$

, formed by concatenating

$\:{E}_{t-1}$

and

$\:{c}_{t}$

, is equal to

$\:[{E}_{t-1}:{c}_{t}]$

. The decoder’s Bi-GRU then updates its hidden state at time step t as follows:

$\:{s}_{t}=\:\text{B}\text{i}-\text{G}\text{R}\text{U}\left({q}_{t},{s}_{t-1},{s}_{t+1}\right)$

This explicit concatenation enables the decoder to adapt its focus based on both past outputs and the current input context and enhances task-specific contextual awareness through alignment-focused translation.

4. Experiment

In this study, the software development and the model performance evaluations were conducted on a PC machine equipped with an Intel(R) Core (TM) i7-7700 CPU@360GHz, Nvidia GeForce GTX1070 GPU, and 32GB RAM. The aspect extraction and summarization algorithms were implemented with Python 3.7 and Pytorch 1.71.

In Google's word2vec embedding layer, the embedding vectors were set to a dimension of 300. Each word token was mapped to a unique vector using a lookup function. For out-of-vocabulary words, vectors were initialized randomly with a uniform distribution

$\:U\:(-\frac{1}{\sqrt{300}},\:\frac{1}{\sqrt{300}})$

. Embedding weights were initialized using a normal distribution. Dependency parsing for each sentence was performed using the spaCy library. The model was trained with the Adam optimizer at a learning rate of 0.001 and a batch size of 32. To mitigate overfitting, dropout and early stopping were employed, with a dropout rate of 0.2 and an early stopping patience of 20 epochs. Model performance was evaluated by comparing the predicted labels to annotated ground truth labels. The F1 score, which provides a balance of precision and recall, was used for both model evaluation and ablation studies.

4.1. Baselines

In this study, we compared our model against eight baseline models for aspect extraction. These baselines consist of four deep learning-based models and four models augmented with dependency parsing. The four deep learning-based models are:

CMLA (Coupled Multi-Layer Attention): Utilizes coupled multi-layer attention mechanisms to capture indirect relationships between terms for more accurate information extraction [34].

DE-CNN (Double Embeddings Convolutional Neural Network): Combines general-purpose embeddings with domain-specific embeddings within a CNN framework to improve aspect extraction [56].

THA (Truncated History-Attention): Uses LSTM to generate initial word representations and incorporates aspect detection history into the current aspect representation [35].

Seq2Seq: Applies sequence-to-sequence learning specifically tailored for aspect term extraction [9].

The four-dependency parsing augmented models, which incorporate syntactic knowledge into deep neural networks, are:

RNCRF (Recursive Neural Conditional Random Fields): Employs an RNN and the dependency tree of each sentence to learn feature representations [21].

BiDTreesCRF (Bidirectional Dependency Tree CRF): Uses a directional dependency tree network to extract syntactic features and integrates these features into a Bi-LSTM model [22].

RINANTE (Ruled Incorporated Neural Aspect and Opinion Term Extraction): Automatically derives extraction rules based on syntactic information [23].

DREGCN (Dependency Relation Embedded Graph Convolutional Network): Models the dependency relations between words in sentences using a graph convolutional network [24].

4.2. Datasets

The baseline models and our sequence-to-sequence with dual context (Seq2Seq-DC) model were trained using three annotated datasets: SE14-Laptop, SE14-Restaurant, and SE15-Restaurant [59] [60]. These datasets include aspect term annotations (e.g., battery, screen, food) with sentiment polarity labels (positive, negative, neutral). SE14-Laptop includes 3,841 review sentences focused on laptops, while SE14-Restaurant comprises 3,845 restaurant-related review sentences. SE15-Restaurant expands the dataset with an additional 2,000 sentences and extends sentiment classification beyond the sentence level [60]. Each dataset was partitioned into training, validation, and test sets using a split ratio of 63:16:21.

To evaluate the generalizability of our model, we applied it to unannotated target datasets derived from the Amazon Review Data collected by McAuley and Leskovec, [61]. The data include metadata, review texts and product ratings across 24 domains. For this study, we selected five domains: Musical Instruments (Music), Office Products (Office), Sports and Outdoors (Sport), Grocery and Gourmet Food (Food), and Video Games (Game). Table 2 summarizes key statistics for these five target datasets.

Table 2

Summary statistics of the five Amazon review datasets.
Dataset	Game	Office	Food	Sport	Music
Number of Users	24303	4905	14	35598	1429
Number of Products	10672	2420	8731	18357	900
Number of Rating	231577	53228	151254	296337	10261
Density	0.09%	0.45%	0.12%	0.05%	0.80%

5. Results and Discussion

5.1. Model Performance

Our Seq2Seq-DC model and ten baseline models were trained and evaluated on three annotated datasets, with their performance measured by F1 scores as shown in Table 3. The F1 scores for the baselines were sourced from the respective cited references. Among the vanilla deep learning baselines, DE-CNN achieved the highest F1 score of 81.59% on the SE14-Laptop dataset, while Seq2Seq excelled on the SE14-Restaurant and SE15-Restaurant datasets with F1 scores of 87.42% and 72.34%, respectively. Overall, Seq2Seq appears to be the most effective model within this group. In the dependency parsing-augmented baselines, RINANTE had the highest F1 score of 80.61% on SE14-Laptop, while DREGCN achieved the highest F1 scores of 87.00% and 73.30% on the two restaurant datasets. The average F1 scores for the two groups of baselines across the three datasets (bolded numbers in the table) indicate that the dependency parsing-augmented baselines slightly outperformed the vanilla deep learning baselines overall. This suggests that incorporating syntactic information improves aspect extraction performance beyond what is achieved with contextual information alone. Our Seq2Seq-DC model, which integrates Seq2Seq with dependency parsing, significantly outperformed both groups of baselines. It achieved F1 scores of 82.66% for SE14-Laptop, 88.66% for SE14-Restaurant, and 74.04% for SE15-Restaurant. These scores represent increases of 3.55%, 3.26%, and 5.45%, respectively, for the three datasets, as opposed to the average F1 scores of the eight baselines. The Δ values represent the relative differences in F1 scores between the baselines and Seq2Seq-DC, highlighting the performance gain achieved by Seq2Seq-DC.

Table 3

Performance comparisons between baselines and Seq2Seq-DC*
Dataset Model	SE14-Laptop		SE14-Restaurant		SE15-Restaurant
Dataset Model	F1 (%)	Δ (%)	F1 (%)	Δ (%)	F1 (%)	Δ (%)
Deep learning-based	79.35	4.17	85.28	3.96	70.16	5.53
CMLA [34]	77.80	6.25	85.29	3.95	70.73	4.68
DE-CNN [36]	81.59	1.31	85.20	4.06	68.28	8.44
THA [35]	79.52	3.95	85.61	3.56	71.46	3.61
Seq2Seq [9]	80.31	2.93	87.42	1.42	72.34	2.35
Dependency parsing-augmented	79.85	3.52	85.92	3.19	70.44	5.11
RNCRF [21]	78.42	5.41	84.93	4.39	67.74	9.30
BiDTressCRF [22]	80.57	2.59	85.31	3.93	70.83	4.53
RINANTE [23]	80.61	2.54	86.45	2.56	69.90	5.92
DREGCN [24]	79.78	3.61	87.00	1.91	73.30	1.01
Seq2Seq-DC	82.66		88.66		74.04

* F1 scores of the baselines were sourced from the respective cited references. The bolded numbers are the average F1 scores of the deep learning-based and dependency parsing-augmented baselines across the three datasets. Δ values represent the relative differences in F1 scores of the baselines compared to those of Seq2Seq-DC.

5.2. Module Ablation

As previously described, the Seq2Seq-DC model primarily consists of a series of Bi-GRUs and an attention mechanism layer. The attention mechanism layer includes two modules designed to capture both global and local contextual information from review sentences. To evaluate the contributions of the global contextual (GC) and local contextual (LC) modules to the model’s performance, we conducted an ablation study by selectively removing each module and evaluating the resulting models using the F1 score. The LC module includes dependency parsing (DP), thereby reflecting its impact on performance as well. Table 4 presents a comparison of F1 scores between the ablated models and the complete Seq2Seq-DC model (Bi-GRU + GC + LC). Specifically, Bi-GRU + GC refers to the model without the LC module, while Bi-GRU + LC denotes the model without the GC module. As shown in Table 4, both Bi-GRU + GC and Bi-GRU + LC show negative Δ values, reflecting a decrease in F1 scores compared to Seq2Seq-DC. However, Bi-GRU + LC consistently achieves higher F1 scores across all three datasets than Bi-GRU + GC, suggesting that the LC module has a greater impact on the model’s performance in aspect extraction. This can be attributed to the LC module’s ability to capture syntactically adjacent words, which are particularly important for accurately identifying aspect terms.

Table 4

Performance comparison between ablated models and Seq2Seq-DC*
Model	SE14-Laptop		SE14-Restaurant		SE15-Restaurant
Model	F1 (%)	Δ (%)	F1 (%)	Δ (%)	F1 (%)	Δ (%)
Bi-GRU + GC	81.44	-1.50	87.20	-1.67	72.91	-1.55
Bi-GRU + LC	82.41	-0.30	88.57	-0.10	73.93	-0.15
Seq2Seq-DC	82.66		88.66		74.04

* Δ values represent the relative differences in F1 scores when the LC and GC modules are removed from Seq2Seq-DC.

5.3. Aspect Summarization

Aspect summarization is a process to distill a wide range of aspect terms into a concise set of representative concepts, which are more focused and interpretable for downstream tasks like recommendation or visualization [62]. To validate the applicability of our model, we performed aspect summarization using the five Amazon review datasets. In the Music dataset, which contains 3105 unique tokens (excluding repetitions), the model extracted 232 aspect terms. For the other datasets, the number of extracted aspect terms was as follows: 506 for Office (8638 tokens), 341 for the Food dataset (4997 tokens), 583 for Game (7513 tokens), 232 for Sport (3134 tokens). The number of the aspect terms in each dataset reflects a high-dimensional semantic space. To reduce the dimensionality, PCA was applied to project the aspect-term space onto a smaller set of principal components (PCs). PCA acted as a semantic filter by emphasizing components that capture the most meaningful variance while discarding less informative ones. We implemented PCA with Python’s Scikit-learn library, and plotted the cumulative explained variance against the number of PCs for each Amazon review dataset in Fig. 7. The dashed line in the figure marks the number of PCs required to explain 95% of the total variance: 19 for Office, 23 for Game, 30 for Food and Music, and 34 for Sport.

Fig. 7

Cumulative explained variance across the Amazon review datasets.

Table 5 presents the contribution of the PCs for each dataset. The first PC captures the largest amount of variance, representing the direction of greatest variability in the data. Each subsequent PC accounts for progressively smaller portions of variance along orthogonal directions, ensuring no redundancy in variance explanation. The last PC for each dataset was determined by setting a cumulative variance threshold of 95%, meaning the remaining unexplained variance is less than 5%.

Table 5

Explained variances of the PCs*.
Dataset PC	Game	Office	Food	Sport	Music
PC1	14.97%	19.45%	17.08%	16.48%	21.12%
PC2	5.61%	5.87%	7.01%	6.34%	6.08%
PC3	3.88%	4.01%	4.95%	4.72%	3.59%
…	…	…	…	…	…
Last PC	PC23 (3.67%)	PC19 (2.15%)	PC30 (3.17%)	PC34 (3.51%)	PC30 (3.19%)

The last PC and its explained variance (in parenthesis) vary across the datasets.

PCs are orthogonal linear combinations of the original aspect terms. To filter out less informative terms, those with variance primarily located in the bottom 5% of components can be excluded from further analysis [63]. As a result, we focus only on the PCs that collectively explain 95% of the variance. Table 6 shows the number of aspect terms before and after applying PCA, with reductions of 36.64% for Music, 62.64% for Office, 53.37% for Food, 34.05% for Sport, and 68.10% for Game.

Table 6

Aspect term reduction by PCA.
Dataset	Token	Extracted aspect term	Aspect terms after PCA	Reduction ratio
Game	7513	583	186	68.10%
Office	8638	506	191	62.26%
Food	4997	341	159	53.37%
Sport	3134	232	153	34.05%
Music	3105	232	147	36.64%

Having reduced the dimensionality with PCA, we applied K-means clustering to group aspect terms with similar semantic meanings in the principal component space. The optimal number of clusters for each dataset was determined using the elbow method [64], which indicated that five clusters would provide a balance between model complexity and variance explained across all datasets. K-means clustering was implemented using Scikit-learn library. Since the first three PCs capture the majority of the variance in the original aspect term space, they were used to visualize the spatial distribution of clustered terms across the five datasets, as shown in Fig. 8. Each point represents an aspect term, with colors indicating the clusters assigned by K-means clustering. Notably, the Music and Food datasets exhibit relatively well-separated clusters, implying stronger semantic distinctions among their aspects, while the Office dataset shows more overlap between clusters, suggesting thematic interdependence or ambiguity among its terms. Each cluster in a dataset groups aspect terms with similar semantic meanings, allowing for the derivation of a more abstract label, referred to as the primal term. However, the significance of individual aspect terms can vary within a cluster, making it necessary to narrow down the terms used when identifying an appropriate primal term. The frequency of a term mentioned in reviews of the dataset may be used as an indicator of the term’s significance.

Fig. 8

Spatial distribution of clustered aspect terms in the first three principal components.

As an example, Fig. 9 presents the sorted relative frequencies of the top 15 terms in each cluster from the Office dataset. In each cluster, the frequencies are concentrated among a few leading terms, indicating their dominance within the cluster. To automatically derive a concise primal term for each semantic cluster, we used the GPT-3.5 API with prompts to for cluster-level abstraction that generate primal terms that represent the core semantic content of the cluster [65]. For each cluster, we selected several top-frequency aspect terms as prompts since their frequencies are dominant the distributions (Fig. 9). Through trial tests using different numbers of top terms, we found that the top five aspect terms can achieve a good balance between semantic richness and interpretability, consistently producing meaningful and representative primal terms.

Fig. 9

Relative frequencies of the Top 15 terms in each cluster of the Office dataset.

Table 7 lists the top five aspect terms and their relative frequencies from user reviews for each cluster across the five Amazon datasets. For instance, in the Game dataset, Cluster 2 includes the top five terms: gameplay (1.6%), fun (1.3%), adventure (1.3%), player (1.0%), and story (1.0%). The GPT-3.5 API generated “gameplay experience” as the primal term. In the Music dataset, the top five terms for one cluster are strings (8.4%), guitar (7.9%), pick (3.5%), pedals (2.8%), and price (2.0%), which led to the primal term “guitar gear.” The results suggest that the GPT-3.5-generated primal terms are both semantically appropriate and scalable for representing the core content of each cluster. These terms serve as meaningful labels that highlight the key issues that users care about most, and thus provide useful perspectives for the subsequent aspect-based sentiment analysis.

Table 7

Aspect summarization of the five Amazon dataset*.
Dataset	Cluster	# of Aspect terms	Top aspect term and frequency	Primal term
Game	1	44	game (22.0%), time (3.0%), graphic (2.5%), characters (2.2%), series (1.0%)	Game elements
	2	51	gameplay (1.6%), fun (1.3%), adventure (1.3%), player (1.0%), story (1.0%)	Gameplay experience
	3	20	wii (1.2%), xbox (1%), ps3 (0.9%), rpg (0.9%), sony (0.7%)	Gaming platforms
	4	32	controller (1.6%), price (1.1%), control (1%), interface (1.0%), system (0.8%),	Interaction mechanics
	5	39	level (2.6%), enemy (0.8%), puzzle (0.7%), battle (0.7%), weapon (0.6%)	In-game challenge
Office	1	37	printer (2%), office (1.1%), cartridges (1.1%), scanner (1.0%), software (0.6%)	Office technology
	2	34	label (9.1%), paper (4.2%) tape (3.3%), pencil (2.1%), binder (1%)	Office supplies
	3	32	quality (1.4%), size (1.1%), laser (1.0%), sturdy (0.9%), plastic (0.7%)	Material, quality & build
	4	26	ink (1.2%), colors (1.1%), template (1.0%), printing (0.8%), photo (0.6%)	Print & visual output
	5	62	price (1.5%), brand (1%), hp (0.8%), epson (0.7%), canon (0.7%)	Brands
Food	1	35	tea (4.0%), coffee (3.0%), water (1.2%), milk (1.0%), bottle (0.8%)	Beverages & drinks
	2	42	honey (1.6%), sugar (1.4%), butter (1.4%) chocolate (1.2%), sweet (1.1%)	Sweeteners & additives
	3	21	flavor (3.8%), taste (2.8%), bitter (1.2%), delicious (1.1%), quality (0.8%)	Flavor perception
	4	36	diet (2.2%), sauce (2.1%), soup (1.4%), brand (1.4%), ingredients (1.3%)	Nutrient composition
	5	25	price (1.5%), enjoy (1.4%), stuff (1.2%), little (1.1%), lot (1.0%)	Value & experience
Sport	1	27	product (2.0%), quality (1.7%), sturdy (1.3%), durable (1.2%), thick (0.9%)	Comfort & material
	2	37	price (1.6%), cheap (1.5%), recommend (1%), purchase (0.8%), wish (0.8%)	Purchase behavior
	3	21	work (6.2%), time (3%), bag (1.2%), size (1.2%), bottle (1.0%)	Portability & duration
	4	24	band (1.3%), lock (1.2%), tent (1%), tire (0.8%), bike (0.7%)	Outdoor gear & accessories
	5	25	knife (2.2%), rifle (1.6%), holster (1.1%), grip (1.1%), pistol (1.1%)	Hunting tools
Music	1	31	strings (8.4%), guitar (7.9%), pick (3.5%), pedals (2.8%), price (2.0%)	Guitar gear
	2	25	sound (2.7%), cables (2.5%), quality (1.7%), acoustic (1.1%), lot (1.0%)	Sound & playback
	3	36	tuner (1.8%), fender (1.4%), amp (1.2%), mic (0.9%), instrument (0.7%)	Value & perception
	4	26	time (2.2%), capo (1.8%), strap (1.5%), case (1.2%), noise (1.2%)	Supportive equipment
	5	29	bass (1.0%), tune (1.0%), stand (0.9%), bit (0.9%), distortion (0.6%)	Sound calibration

*The numbers in the parentheses are the relative frequencies of the top five aspect terms appearing in the user reviews.

6. Conclusions

Aspect extraction is a critical step in identifying key issues or topics from reviews in domain-specific datasets, forming the foundation for aspect-based sentiment analysis. In this study, we developed a sequence-to-sequence aspect extraction model that integrates a series of bidirectional gated recurrent units with both global and local attention mechanisms in the encoder. The bi-GRUs sequentially capture hidden states from the input token sequence. The global attention mechanism incorporates encoder hidden states and positional information to produce a global context vector, while the local attention mechanism uses dependency parsing to identify the syntactic children of each token and aggregates their corresponding hidden states into a local context vector. These fine-grained syntactic cues are especially useful when the model processes data across diverse domains. A gated fusion unit then adaptively combines the global and local context vectors at each time step, enabling the model to balance broad semantic information with detailed syntactic structure—thereby improving both accuracy and robustness in aspect term extraction. The model was trained on three annotated SemEval datasets and achieved F1 scores of 82.66% on SE14-Laptop, 88.66% on SE14-Restaurant, and 74.04% on SE15-Restaurant. These scores represent improvements of 3.55%, 3.26%, and 5.45%, respectively, compared to the average F1 scores of eight baseline models.

When applied to an unannotated dataset from a different domain, the model may extract a broad range of aspect terms, which require condensation through a multi-stage post-processing procedure known as aspect summarization. This process involves PCA for dimensionality reduction, K-means for semantic clustering, and GPT-3.5 for semantic refinement, ultimately distilling hundreds of aspect terms into five primal terms that capture the key issues most relevant to users. To validate the effectiveness of the aspect summarization process, we applied it to Amazon review datasets across five domains Music, Office, Sport, Food, and Game—and successfully generated primal terms for each dataset that are semantically appropriate and meaningful for subsequent aspect-based sentiment analysis.

Declarations

The authors declare no financial or other conflict of interest.

Reference:

Karn AL, Karna RK, Kondamudi BR, Bagale G, Pustokhin DA, Pustokhina IV, Sengan S (2022) Customer centric hybrid recommendation system for e-commerce applications by integrating hybrid sentiment analysis. Electron Commer Res 23(1):279–314. https://doi.org/10.1007/s10660-022-09630-z

Mikhaylov A, Bhatti IM, Dinçer H, Yüksel S (2022) Integrated decision recommendation system using iteration-enhanced collaborative filtering, golden cut bipolar for analyzing the risk-based oil market spillovers. Comput Econ 63(1):305–338. https://doi.org/10.1007/s10614-022-10341-8

Wu Y, Xie R, Zhu Y, Zhuang F, Zhang X, Lin L, He Q (2024) Personalized prompt for sequential recommendation. IEEE Trans Knowl Data Eng 36(7):3376–3389. https://doi.org/10.1109/tkde.2024.3357498

Davoodi L, Mezei J, Heikkilä M (2025) Aspect-based sentiment classification of user reviews to understand customer satisfaction of e-commerce platforms. Electron Commer Res. https://doi.org/10.1007/s10660-025-09948-4

Cambria E, White B (2014) Jumping NLP curves: A review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57. https://doi.org/10.1109/MCI.2014.2307227

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pp1631-1642. https://aclanthology.org/D13-1170

Dang CN, Moreno-García MN, De Prieta F (2021) la. Hybrid Deep Learning Models for Sentiment Analysis. Complexity, vol. 2021, Article ID 9986920 (16 pages). https://doi.org/10.1155/2021/9986920

Liu Y, Shi J, Huang F, Hou J, Zhang C (2024) Unveiling consumer preferences in automotive reviews through aspect-based opinion generation. J Retailing Consumer Serv 77:103605. https://doi.org/10.1016/j.jretconser.2023.103605

Ma D, Li S, Wu F, Xie X, Wang H (2019) Exploring sequence-to-sequence learning in aspect term extraction. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1344

10.

Li K, Chen C, Quan X, Ling Q, Song Y (2020) Conditional augmentation for aspect term extraction via masked sequence-to-sequence generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.631

11.

Di Palma D (2023) Retrieval-augmented recommender system: Enhancing Recommender Systems with large language models. Proceedings of the 17th ACM Conference on Recommender Systems. https://doi.org/10.1145/3604915.3608889

12.

Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 1243–1252. https://doi.org/10.48550/arXiv.1705.03122

13.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin,I. (2017). Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762

14.

Linzen T, Dupoux E, Goldberg Y (2016) Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans Association Comput Linguistics 4:521–535. https://doi.org/10.1162/tacl_a_00115

15.

Truşcǎ MM, Frasincar F (2023) Survey on aspect detection for aspect-based sentiment analysis. Artif Intell Rev 56(5):3797–3846. https://doi.org/10.1007/s10462-022-10252-y

16.

da Silva FL, Slodkowski BK, da Silva KK, Cazella SC (2022) A systematic literature review on Educational Recommender Systems for teaching and learning: Research trends, limitations and opportunities. Educ Inform Technol 28(3):3289–3328. https://doi.org/10.1007/s10639-022-11341-9

17.

Yin D, Wu X, Chang B (2020) Interactive neural network: Leveraging part-of-speech window for aspect term extraction (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10), 13977–13978. https://doi.org/10.1609/aaai.v34i10.7261

18.

Rana TA, Cheah Y-N, Rana T (2020) Multi-level knowledge-based approach for implicit aspect identification. Appl Intell 50(12):4616–4630. https://doi.org/10.1007/s10489-020-01817-x

19.

Alqaryouti O, Siyam N, Monem A, A., Shaalan K (2020) Aspect-based sentiment analysis using Smart Government Review Data. Appl Comput Inf 20(1/2):142–161. https://doi.org/10.1016/j.aci.2019.11.003

20.

Venugopalan M, Gupta D (2020) An unsupervised hierarchical rule-based model for aspect term extraction augmented with pruning strategies. Procedia Comput Sci 171:22–31. https://doi.org/10.1016/j.procs.2020.04.303

21.

Wang W, Pan SJ, Dahlmeier D, Xiao X (2016) Recursive neural conditional random fields for aspect-based sentiment analysis. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d16-1059

22.

Luo H, Li T, Liu B, Wang B, Unger H (2019) Improving aspect term extraction with bidirectional dependency tree representation. IEEE/ACM Trans Audio Speech Lang Process 27(7):1201–1212. https://doi.org/10.1109/taslp.2019.2913094

23.

Dai H, Song Y (2019) Neural aspect and opinion term extraction with mined rules as weak supervision. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1520

24.

Liang Y, Meng F, Zhang J, Chen Y, Xu J, Zhou J (2021) A dependency syntactic knowledge augmented interactive architecture for end-to-end aspect-based sentiment analysis. Neurocomputing 454:291–302. https://doi.org/10.1016/j.neucom.2021.05.028

25.

Gao C, Zheng Y, Li N, Li Y, Qin Y, Piao J, Quan Y, Chang J, Jin D, He X, Li Y (2023) A survey of graph neural networks for recommender systems: Challenges, methods, and directions. ACM Trans Recommender Syst 1(1):1–51. https://doi.org/10.1145/3568022

26.

He R, Lee WS, Ng HT, Dahlmeier D (2017) An unsupervised neural attention model for aspect extraction. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/p17-1036

27.

Chemudugunta C, Smyth P, Steyvers M (2007) Modeling general and specific aspects of documents with a probabilistic topic model. Adv Neural Inf Process Syst 19:241–248. https://doi.org/10.7551/mitpress/7503.003.0035

28.

Kim D, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for Document Context-Aware recommendation. RecSys ’16: Proceedings of the 10th ACM Conference on Recommender Systems, 233–240. https://doi.org/10.1145/2959100.2959165

29.

Liu X, Wu D, Peng H, Wang R (2018) Health topics mining in online medical community. 2018 IEEE Global Communications Conference (GLOBECOM). https://doi.org/10.1109/glocom.2018.8647970

30.

Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2018) Latent dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools Appl 78(11):15169–15211. https://doi.org/10.1007/s11042-018-6894-4

31.

Bastani K, Namavari H, Shaffer J (2019) Latent dirichlet allocation (LDA) for topic modeling of the CFPB Consumer Complaints. Expert Syst Appl 127:256–271. https://doi.org/10.1016/j.eswa.2019.03.001

32.

Etemadi M, Bazzaz Abkenar S, Ahmadzadeh A, Haghi Kashani M, Asghari P, Akbari M, Mahdipour E (2023) A systematic review of healthcare recommender systems: Open issues, challenges, and Techniques. Expert Syst Appl 213:118823. https://doi.org/10.1016/j.eswa.2022.118823

33.

Alslaity A, Orji R (2022) Machine learning techniques for emotion detection and sentiment analysis: Current State, Challenges, and Future Directions. Behav Inform Technol 43(1):139–164. https://doi.org/10.1080/0144929x.2022.2156387

34.

Wang W, Pan SJ, Dahlmeier D, Xiao X (2017) Coupled multi-layer attentions for co-extraction of aspect and opinion terms. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10974

35.

Li X, Bing L, Li P, Lam W, Yang Z (2018) Aspect term extraction with history attention and selective transformation. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/583

36.

Xu H, Liu B, Shu L, Yu PS (2018) Double embeddings and CNN-based sequence labeling for aspect extraction. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/p18-2094

37.

Tran TU, Hoang T, H. T., Huynh HX (2019) Aspect extraction with bidirectional GRU and CRF. 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF). https://doi.org/10.1109/rivf.2019.8713663

38.

Luo H, Li T, Liu B, Zhang J (2020) Doer: Dual cross-shared RNN for aspect term-polarity co-extraction. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1056

39.

Chen G, Tian Y, Song Y (2020) Joint aspect extraction and sentiment analysis with directional graph convolutional networks. Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.24

40.

Chauhan GS, Meena YK, Gopalani D, Nahta R (2020) A two-step hybrid unsupervised model with attention mechanism for aspect extraction. Expert Syst Appl 161:113673. https://doi.org/10.1016/j.eswa.2020.113673

41.

Pereg O, Korat D, Wasserblat M (2020) Syntactically aware cross-domain aspect and opinion terms extraction. Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.158

42.

Xu L, Li H, Lu W, Bing L (2020) Position-aware tagging for aspect sentiment triplet extraction. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.183

43.

Kotagiri S, Sowjanya AM, Anilkumar B, Devi NL (2024) Aspect-oriented extraction and sentiment analysis using optimized hybrid deep learning approaches. Multimedia Tools Appl. https://doi.org/10.1007/s11042-024-18964-9

44.

Venugopalan M, Gupta D (2022) An enhanced guided LDA model augmented with Bert based semantic strength for aspect term extraction in sentiment analysis. Knowl Based Syst 246:108668. https://doi.org/10.1016/j.knosys.2022.108668

45.

Wei W, Wang Z, Mao X, Zhou G, Zhou P, Jiang S (2020) Position-aware self-attention based neural sequence labeling. Pattern Recogn 110:107636. https://doi.org/10.1016/j.patcog.2020.107636

46.

Zhang Y, Zhong V, Chen D, Angeli G, Manning CD (2017) Position-aware attention and supervised data improve slot filling. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d17-1004

47.

Zhao F, Wu Z, Dai X (2021) Attention Transfer Network for aspect-level sentiment classification. Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.70

48.

Zhang M, Li Z, Fu G, Zhang M (2021) Dependency-based syntax-aware word representations. Artif Intell 292:103427. https://doi.org/10.1016/j.artint.2020.103427

49.

Chen P, Chen S, Liu J (2020) Hierarchical sequence labeling model for aspect sentiment triplet extraction. Nat Lang Process Chin Comput 654–666. https://doi.org/10.1007/978-3-030-60450-9_52

50.

Shi W, Demberg V (2019) Learning to Explicitate Connectives with Seq2Seq Network for Implicit Discourse Relation Classification. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers, pp188–199, Gothenburg, Sweden. Association for Computational Linguistics. 10.18653/v1/W19-0416

51.

Liu Q, Zhang H, Zeng Y, Huang Z, Wu Z (2018) Content Attention Model for Aspect Based Sentiment Analysis. Proceedings of the Web Conference 2018, Lyon, France, April 2018, 1023–1032. https://doi.org/10.1145/3178876.3186001

52.

Yin Y, Wang C, Zhang M (2019) Pod: Positional dependency-based word embedding for aspect term extraction. Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.150

53.

Dai B, Li J, Xu R (2020) Multiple Positional Self-Attention Network for text classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7610–7617. https://doi.org/10.1609/aaai.v34i05.6261

54.

Anand D, Mampilli BS (2021) A novel evolutionary approach for learning syntactic features for cross domain opinion target extraction. Appl Soft Comput 102:107086. https://doi.org/10.1016/j.asoc.2021.107086

55.

Chen J, Dong H, Wang X, Feng F, Wang M, He X (2023) Bias and Debias in Recommender System: A survey and future directions. ACM Trans Inform Syst 41(3):1–39. https://doi.org/10.1145/3564284

56.

Xu C, Shen J, Du X, Zhang F (2018) An Intrusion Detection System Using a Deep Neural Network with Gated Recurrent Units. IEEE Access 6:48698. 10.1109/ACCESS.2018.2867564

57.

Cha J, Kim S, Park E (2022) A lexicon-based approach to examine depression detection in social media: The case of twitter and University Community. Humanit Social Sci Commun 9(1). https://doi.org/10.1057/s41599-022-01313-2

58.

Kumar Sharma A, Bajpai B, Adhvaryu R, Dhruvi Pankajkumar S, Gordhanbhai P, P., Kumar A (2023) An efficient approach of product recommendation system using NLP technique. Materials Today: Proceedings, 80, 3730–3743. https://doi.org/10.1016/j.matpr.2021.07.371

59.

Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), page 27–35, Dublin, Ireland. https://doi.org/10.3115/v1/s14-2004

60.

Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I (2015) SemEval-2015 Task 12: Aspect Based Sentiment Analysis. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 486–495, Denver, Colorado, USA. https://doi.org/10.18653/v1/s15-2082

61.

McAuley J, Leskovec J (2013) Hidden factors and hidden topics. Proceedings of the 7th ACM Conference on Recommender Systems. https://doi.org/10.1145/2507157.2507163

62.

Zhang L, Wang S, Liu B (2018) Deep Learning for Sentiment Analysis: A Survey. WIREs Data Min Knowl Discov 8(4):e1253. https://doi.org/10.1002/widm.1253

63.

Viszlay P, Janecko J, Juhar J (2012) Eigenvalue Criterion-Based Feature Selectionin Principal Component Analysis of Speech. Adv Electr Electron Eng 10(4):303

64.

Robertson J, Kaptein M (2016) Modern Statistical Methods for HCI. Springer

65.

Yang K, Zong L, Tang M, Hu J, Zheng Y, Chen Y, Zhao M (2025) MPGM:Multi-prompt generation model with self-supervised contrastive learning for aspect sentiment triplet extraction. Neural Netw 192:107894. https://doi.org/10.1016/j.neunet.2025.107894

Yes