Practice Structure Predicts Skill Growth in Online Chess: A Behavioral Modeling Approach.

*corresponding author: Luís, C. Meireles (luis.c.meireles@inesctec.pt)

Practice Structure Predicts Skill Growth in Online Chess: A Behavioral Modeling Approach.

Luís C. Meireles^a*, Tiago Mendes-Neves^b, and João Mendes-Moreira^a,b

^aLIAAD, INESCTEC, Porto, Portugal; ^bFaculdade de Engenharia, Universidade do Porto, Portugal

Abstract

Skill acquisition is central to developing expertise, yet the behavioral mechanisms that separate more successful learners from less successful ones remain poorly understood. Using a large naturalistic dataset of about one million online chess games played by ~ 820 individuals over three years (2013–2015), we built an interpretable machine learning model to classify learners based only on behavioral features. Learners were labeled as “fast learners” or “not fast learners” based on normalized monthly Elo progression, adjusted for both starting rating and the increasing difficulty of improving at higher levels.

We engineered time-sensitive features across four behavioral dimensions: practice structure, challenge level, strategic exploration (measured via move-sequence entropy), and tactical efficiency (the number of rounds needed to reach a 70% win probability in games eventually won). A logistic regression model trained on the five strongest predictors - optimal challenge steady magnitude, optimal challenge late slope, entropy steady magnitude, optimal challenge mean, and tactical efficiency mean - achieved an F1 of 0.68 and an AUC of 0.78.

Coefficients showed that average tactical efficiency was a strong predictor of fast learning, whereas the role of challenge-level features was less clear. To explore this, we fitted a linear regression with average tactical efficiency (as a proxy for expertise) as the dependent variable. This model explained 53% of the variance (R² = 0.53, RMSE = 0.05) and revealed optimal challenge as the strongest predictor. This suggests that well-calibrated challenge levels are key to differences in chess performance.

Key-words

expertise development, skill acquisition, chess, learning modeling.

Introduction

From both theoretical and practical perspectives, chess offers an ideal natural laboratory for the study of expertise; it is even often called the drosophila of reasoning and/or artificial intelligence (Ensmenger, 2011; Kasparov, 2018; McCarthy, 1990; Simon & Chase, 1973). Mathematically, it is a zero-sum game, meaning that every game state (s) and the transition between states (s → s_+ 1) can be formally described using space-time vectors or probabilistic models within a closed system (Chassy & Gobet, 2011). Moreover, the probabilities associated with one player's actions are inversely related to those of the opponent, reinforcing its suitability for analytical modeling. Chess games can be analyzed at a highly granular level - move by move - or at higher levels of abstraction, such as through sequences associated with the opening, middle-game, and endgame (Gobet & Simon, 2000). Unlike other domains of expertise such as team sports or musical performance, chess allows for the precise generation, recording, and management of performance data. Furthermore, the widespread use of online chess platforms (eg., lichess.org, chess.com) enables the collection of large-scale datasets from players engaging in practice and competition under ecologically valid, real-world conditions (Gee et al., 2025).

Indeed, historically chess has served as a paradigmatic domain in the study of expertise. Since the seminal work of Adrian de Groot (1946), who explored the perceptual and cognitive processes distinguishing novice from expert players, the field has grown substantially. This foundational research was later expanded by Chase and Simon (1973), who introduced the concept of chunking to explain how experts encode and retrieve structured patterns of information more efficiently than novices.

In recent years, a substantial body of primarily laboratory-based research has examined a wide range of cognitive and neurocognitive factors associated with expertise. These investigations have addressed the role of general intelligence in skill acquisition (Grabner, Stern, & Neubauer, 2006), the effects of structured training routines such as deliberate practice (Ericsson, Krampe, & Tesch-Römer, 1993), and additional contextual variables such as age of training onset and optimal challenge levels for learning (Gobet & Campitelli, 2007). Neuroimaging studies have also identified brain regions implicated in expert performance, including the fusiform face area, which has been associated with expert-level pattern recognition (Bilalić, Langner, Ulrich, & Grodd, 2011).

From the perspective of scientific parsimony, the various components of chess expertise can be coherently situated within a standard model of the mind that incorporates multiple levels of cognitive abstraction (Laird et al., 2017). Within this framework, structural and neurocognitive factors interact with environmental and computational processes to jointly shape perception and cognition. For instance, expert players may exhibit comparable levels of general intelligence, executive functioning, or fusiform face area activation, yet adopt distinct strategies for perceiving and understanding the game - often shaped by their training background (e.g., “classical school” versus “hyper-modernism”). Contemporary research increasingly supports the view that structural neural characteristics are not entirely innate but can be shaped by experience and training, as shown by studies such as Maguire et al. (2000). Even though, in the end there is no maximal adaptation to a task (i.e., “real” expertise) without a often long and structured training regimen, this body of evidence suggests that expertise emerges through a dynamic interaction between neurological and psychological dispositions and experiential adaptations - occupying a middle ground as a dual-faceted phenomenon.

Building on this foundation, the present study leverages online chess data to investigate the behavioral structure of learning over time. Rather than focusing solely on outcome measures such as Elo rating or win/loss ratios, we examine how players engage in practice across five key behavioral dimensions, grouped into two conceptual categories: foundational contextual conditions and gameplay-derived indicators. The foundational conditions reflect the broader practice environment that shapes performance and include: (1) engagement (i.e., frequency and regularity of play), (2) practice structure (i.e., spacing between practice sessions), and (3) challenge level (i.e., the relative difficulty of opponents faced). In contrast, the gameplay indicators are derived from in-game performance metrics and include: (4) strategic exploration (measured via the entropy of move sequences), and (5) tactical efficiency (i.e., how quickly players convert advantageous positions into wins).

By identifying behavioral profiles of fast learners in ecologically valid settings, this study contributes to a growing body of work that seeks to understand expertise not merely through end results, but through the structural and developmental dynamics of learning behavior itself.

Literature review

The primary contributions of pioneering studies on chess expertise lie in demonstrating how task-specific knowledge shapes both selective attention and memory retrieval. For instance, de Groot (1965) and later Chase and Simon (1973) found that expert players could quickly and accurately recall meaningful chess positions, yet their performance declined dramatically when presented with random or non-game-like configurations. This effect, attributed to the formation of domain-specific “chunks” of information, suggests that expertise is rooted in long-term memory structures developed through extensive exposure to meaningful patterns (Chase & Simon, 1973; Gobet & Simon, 2000). Rather than relying on superior general memory, expert players leverage structured perceptual encoding to guide their attention and decision-making within familiar domains (Gobet & Chassy, 2008).

Building on this idea, Simon and Gilmartin (1973) developed the MAPP program, a computational model designed to simulate how expert chess players perceive and recall meaningful positions. Rather than relying on brute-force search strategies like those used in traditional chess engines - such as the Minimax algorithm originally formalized by von Neumann (1928) - MAPP aimed to emulate human cognitive processes, particularly pattern recognition and memory retrieval. Based on this model, they estimated that chess grandmasters possess a repertoire of approximately 50,000 chunks: familiar configurations stored in long-term memory that support rapid and accurate decision-making.

Historically, these studies helped confirm that human behavior involves mechanisms far more complex than the simple stimulus-response chains proposed by traditional behaviorism. They highlighted the crucial role of cognition and information processing in guiding expert performance - aligning with broader psychological constructs such as the priming effect (Meyer & Schvaneveldt, 1971) and chronic or category accessibility, as introduced by Bruner (1957). While these insights contributed significantly to the cognitive revolution in psychology, they also reignited a longstanding debate: to what extent is chess expertise driven by structured training versus general cognitive abilities - such as intelligence or working memory? This debate reflects the enduring epistemological divide between nature and nurture in the study of human skill acquisition.

The 2014 special issue of Intelligence offers a clear illustration of the ongoing disagreement among scholars regarding the foundations of expertise and skill acquisition. This collection of articles highlights the deep divide between proponents of nurture-oriented perspectives - such as Ericsson (2014), who defends the central role of deliberate practice - and nature-oriented scholars like Plomin et al. (2014) and Ruthsatz et al. (2014), who emphasize the influence of genetic and dispositional factors, including general intelligence. The exchange underscores that, despite decades of research, the relative contributions of training, innate ability, and individual differences remain highly contested in the study of expertise. While methodological limitations are evident on both sides - no method being entirely free from flaws, however well-designed - the intensity of this debate often seems to be fueled not only by empirical differences but by deeper philosophical commitments. On one side, the belief in a fundamental discontinuity between the minds of geniuses and ordinary individuals. On the other, the view that excellence is largely a matter of structured practice, motivation, and sustained effort (Ericsson et al, 1993; Hambrick et al., 2014).

With regard specifically to the domain of chess expertise, this division is also evident. Hambrick et al. (2014) state that deliberate practice - i.e., “highly structured activities, the explicit goal of which is to improve performance” (Ericsson et al., 1993) - is insufficient to explain individual differences between chess players after reviewing a set of classical chess studies (e.g., Bilalić, 2006; Charness et al., 2005; Gobet & Campitelli, 2008) that met the following inclusion criteria: 1) the presence of continuous (quantitative) measures of both cumulative deliberate practice and performance; and 2) a reported correlation coefficient (or enough information to compute it) between deliberate practice and performance. Without denying that prolonged, deliberate exposure to structured, learning-tailored activities - coupled with immediate and accurate feedback - is crucial to attaining extraordinary levels of performance, the authors found that correlations (corrected for measurement error variance) between deliberate practice and performance explained only 39.7% of the variance in performance, leaving room for other variables to potentially account for the remaining 60.3%. Furthermore, when reliability for deliberate practice was controlled, deliberate practice failed to explain more than half of the variance (Hambrick et al., 2014). In a certain sense, Hambrick et al. (2014) found evidence that refutes the conception that, in order to become an expert in chess, every aspiring player must commit to a relatively stable number of hours of deliberate practice. Their conclusion is mainly supported by Gobet and Campitelli’s (2007) findings, who, after binning chess players based on their ratings, found large variability in deliberate practice hours within skill groups.

While compelling at first glance, the conclusions drawn by Hambrick et al. (2014) are largely refuted by Ericsson (2014), who argues that the authors failed to accurately represent the concept of deliberate practice. According to Ericsson, Hambrick and colleagues' emphasis on the quantity of practice overlooks the far more critical quality and structure of practice - such as the recurrence and spacing of sessions, the quality of feedback, and the deliberate design of training tasks. These structural factors, Ericsson contends, play a much greater role in developing expertise than a vaguely defined sum of hours. Furthermore, Ericsson (2014) strongly criticizes Hambrick et al.'s reliance on correlational analyses and their omission of extreme performers in the name of statistical control. If factors other than deliberate practice do explain chess expertise, Ericsson notes, it remains unclear what those are, given that research consistently shows deliberate practice - regardless of how it is defined - outperforms general intelligence or IQ as a predictor of expertise (Bilalić et al., 2007; Burgoyne et al., 2016; Grabner et al., 2006).

Unfortunately, despite the inherently diachronic nature of skill acquisition, purely longitudinal research on chess development remains relatively scarce. Moreover, existing studies often present somewhat contradictory findings.

De Bruin et al. (2008) investigated how adolescent elite chess players developed their skills over several years by associating retrospective practice estimates (collected via training diaries) with individual growth curves using linear mixed models. They found that deliberate practice strongly predicted performance, with monotonic effects - that is, the benefits of practice held consistently throughout the players’ development. Additionally, the authors reported that: (1) both players who persisted in chess training and those who eventually dropped out benefited similarly from accumulated practice, suggesting that differences in final performance were due to quantity of practice, not to its differential effectiveness; and (2) given that both groups improved through deliberate practice, the findings provided no support for innate-talent confounds.

Nevertheless, the study is not without limitations. The use of linear mixed models with fixed slopes likely altered the natural shape of participants’ learning trajectories by assuming uniform rates of improvement - thereby masking non-linear features typical of real-world learning (e.g., plateaus, bursts, regressions). While this model simplification may have favored interpretability over precision in capturing complex learning dynamics, it likely came at the cost of lost variation, potentially mischaracterizing learners with atypical or more individualized growth patterns.

In a more recent longitudinal study, Vaci et al. (2019) reported findings that diverge from earlier models focused solely on deliberate practice. Specifically: (1) both practice and general intelligence independently explained a significant portion of raw Elo rating variance over time, and when modeled together, they accounted for 47% of the variance- more than double the contribution of either factor alone (17% each); (2) more intelligent players profited larger benefits from the same amount of practice, with this interaction most pronounced around peak skill age (approximately 35 years); and (3) both practice and intelligence displayed nonlinear effects that varied across age and skill level. The use of Generalized Additive Models (GAMs) was particularly well-suited to this analysis, as it allowed for the flexible modeling of nonlinear developmental trajectories without assuming a specific functional form. Although individual growth curves were not explicitly modeled, GAMs enabled the authors to estimate smooth, data-driven effects across individuals - avoiding the oversimplification inherent in linear trend assumptions. Additionally, the sample included a broad and well-distributed range of Elo ratings, enhancing the generalizability of the findings across skill levels.

Despite its widespread use as a proxy for chess expertise, raw Elo rating - and particularly Elo gain over time - is often treated in the literature as a linear, unproblematic dependent variable. However, this approach overlooks a key feature of the Elo system: the difficulty of rating improvement increases nonlinearly with skill level (Elo, 1957). A 100-point gain at lower ratings (e.g., from 1200 to 1300) is not equivalent in effort or significance to a gain from 2100 to 2200. Notably, studies such as de Bruin et al. (2008) and Vaci et al. (2019) model Elo progression without explicitly correcting for this scaling, which may lead to misrepresentations of individual learning efficiency.

To address this limitation, the present study introduces a normalized Elo progression metric that adjusts for both starting level and the increasing difficulty of improvement at higher ratings. This allows for a more meaningful identification of fast learners - not by absolute gains, but by progress made relative to expected difficulty. Unlike prior research relying on lab-based or retrospective data, this study adopts a fully naturalistic approach, using online chess data collected over three years in ecologically valid conditions. While this comes with trade-offs (e.g., less experimental control, absence of meta-data), it offers a rich, dynamic view of how learning unfolds in real-world settings. Methodologically, the study favors behavioral description as a first step - prioritizing what learners do over premature modeling of why they do.

Building on this foundation, the present study seeks to identify behavioral predictors of accelerated skill acquisition in a large and ecologically valid sample of online chess players (N = 826). Rather than conceptualizing expertise as a fixed outcome or essential trait, we frame it as a developmental process shaped by the structure, context, and responsiveness of learners’ engagement over time. Drawing on theoretical frameworks including deliberate practice (Ericsson et al., 1993), chunking-based pattern recognition (Chase & Simon, 1973), and Vygotsky’s Zone of Proximal Development (1978), we focus on behavioral features that capture not only the frequency and spacing of practice (e.g., intervals between sessions), but also the degree to which learners calibrate their training to productive levels of challenge - operationalized as the Elo rating difference between player and opponent.

To reflect the cognitive dynamics of gameplay, we include a measure of move-sequence entropy, capturing the consistency and complexity of players’ decision patterns over time. Lower and more stable entropy may indicate the emergence of structured, automatized knowledge, consistent with chunking theory, whereas higher or more erratic entropy may reflect exploratory or unstructured play. In addition, we incorporate a measure of tactical efficiency, defined as the number of rounds required for a player's win probability to exceed a threshold of 0.70 - an indicator of how quickly players reach a clear advantageous game status.

Rather than attempting to fully explain individual learning trajectories, the primary goal of this study is to identify contextual and gameplay-derived variables that may serve as reliable behavioral signals of learning pace. These features could inform the design of future longitudinal or panel-based studies focused on modeling individual growth curves in greater detail.

Methods

This project involved mining and analyzing a large-scale dataset consisting of 45,043,493 rows, each representing an individual chess game, across 16 feature columns, compiled from the monthly lichess.org database from 01-2013 to 12-2015. Databases such as this are originally structured for efficient storage and retrieval rather than for scientific analysis, which entail substantial effort in data extraction, cleaning, and transformation before it could be made analytically viable. The first section of this Methods section, titled "Data Preparation”, details the procedures undertaken to curate the dataset into a research-ready format. The second section, “Label and Feature Engineering,” explains how the outcome variable (label) was constructed and outlines the derivation of key behavioral features that served as independent variables in our classification model. Such features correspond to five theoretically grounded dimensions of learning behavior: engagement, practice time structure, level of challenge, strategic exploration, and tactical efficiency. The final section, “Statistical Modeling”, describes the machine learning techniques used to develop and optimize the predictive model, including decisions related to feature selection, model calibration, and validation.

Data Preparation

There is no standardized protocol for generating a workable sample from a dataset of this magnitude. A common approach might involve randomly sampling a fixed percentage of the data - for example, 10%. However, such a strategy was unsuitable for our purposes. Our primary objective was to examine how players acquire skill over time on online chess platforms, which necessitated focusing exclusively on users who demonstrated sustained engagement. To this end, we filtered the original dataset to include only players who had participated in at least 45 rated classical games within any given quarter of the study period - equivalent to roughly one game every two days. We focused specifically on “Rated Classical games”, the second-most common game type in the dataset (11,309,110 games), following the more prevalent “Rated Blitz” category. This choice was motivated by the need for comparability with prior research, as studies on chess expertise frequently focus on classical formats, where longer time controls allow for more deliberate and strategic play.

This filtering procedure yielded a subset of 58,686 users from the original pool of 499,966 unique players. To reduce computational demands for modeling and feature extraction, we then randomly selected a sample of 25,000 players from this subset. No additional stratification was applied, as the sample size was sufficiently large and the Elo rating distribution in large populations tends to approximate normality (Elo, 1957), allowing us to rely on the Law of Large Numbers to preserve key distributional characteristics. Nonetheless, we conducted visual inspections using histograms to compare the Elo rating distributions of the filtered and sampled datasets, confirming their similarity (see Fig. 1).

Fig. 1

Distribution of median monthly peak Elo ratings across three cohorts: the full filtered dataset (N = 56,770), the sampled subset (N = 24,213), and the final modeling cohort (N = 821), after excluding users with incomplete data. The distributions are approximately normal, with a slight rightward shift in the final cohort, suggesting that the selection criteria (e.g., sustained activity) may have mildly biased the sample toward higher Elo players. Nonetheless, the overall distributional shape is preserved, supporting the cohort’s representativeness for modeling learning trajectories

Fig. 1

. Feature correlation matrix. Pairwise Pearson correlation coefficients among all candidate features. The matrix was used to visually inspect and identify clusters of highly correlated variables prior to feature selection. One or two representative variables from each conceptual group (engagement, spacing, optimal challenge) were retained to reduce redundancy and mitigate potential multicollinearity.

After generating the working sample, we assessed which players demonstrated sufficient temporal engagement to support analysis of Elo growth. To classify players as “fast learners” or “not fast learners" based on adjusted Elo progression, we required a minimum of 18 months of monthly activity data. This was determined by scanning each player's game history in reverse chronological order - starting from the final month in the dataset (December 2015) - and including only those who maintained consistent activity over an 18-month window. This filtering step yielded a final cohort of 826 players. While this selection process introduced a slight bias toward higher Elo ratings - likely reflecting that more skilled or engaged players tend to remain active longer - the resulting distribution remained approximately normal, supporting the representativeness of the modeling cohort. Descriptive statistics for median monthly Elo ratings across key groups were as follows:

Filtered full set (N = 56,770): M = 1610.77, SD = 221.08

Sampled subset (N = 24,213): M = 1610.94, SD = 220.93

Final modeling cohort (N = 826): M = 1651.93, SD = 176.38

This threshold ensured that Elo growth could be computed over a meaningful time span and that learning classifications were based on sustained behavioral patterns rather than short-term fluctuations.

Label and Feature Engineering

Label Engineering

As previously discussed, we aimed to test the hypothesis that practice structure (i.e., the consistency and spacing of engagement), average level of challenge (i.e., opponent strength), evolution in strategic exploration (as reflected by variability or entropy), and tactical efficiency (i.e., the ability to convert advantage in won games) would distinguish “fast learners” from “not fast learners”, as operationalized through adjusted Elo growth. To define the outcome variable for our classification model, we first computed each player's Elo growth rate, calculated as the absolute change in Elo rating divided by the number of active months. This is formally defined in Eq. 1 below:

$\:{\text{Growth\:Rate}}_{i}=\frac{{\text{Elo}}_{f,i}-{\text{Elo}}_{i,i}}{{\text{Active\:Months}}_{i}}$

(Eq. 1)

However, because the Elo rating system is nonlinear and convergent - such that higher-rated players experience slower rating changes for equivalent performance - raw growth rates tend to underestimate the learning progress of those starting at higher levels. In the Elo framework (Elo, 1957), updates occur according to Eq. 2:

$\:{\text{Elo}}_{\text{new}}={\text{Elo}}_{\text{old}}+K\cdot\:(S-E)$

(Eq. 2)

where S is the observed score (win/draw/loss), E is the expected score based on the relative ratings of the player and opponent, and K is a constant governing rating volatility. As players improve, the expected value E increases and the magnitude of Elo updates decreases - even for identical performance outcomes. To correct for this systemic compression at higher skill levels, we introduced a percentile-based adjustment. Each player's initial Elo was converted to a percentile rank within the full distribution of final ratings. This percentile was then added to a weighted growth rate, producing a final adjusted growth rate defined by Eq. 3:

$\:{\text{Adjusted\:Growth\:Rate}}_{i}=\left(\frac{{\text{Elo}}_{f,i}-{\text{Elo}}_{i,i}}{{\text{Active\:Months}}_{i}}\right)\cdot\:k+\text{Percentile}\left({\text{Elo}}_{f,i}\right)$

(Eq. 3)

where:

Elo_f,i and Elo_i,i are the final and initial Elo ratings, respectively.

Active Months_i is the number of months the player was active.

Percentile(Elo_f,i) is the player’s rank in the distribution of final Elo ratings.

k ∈ [0,1] is a is a scaling factor that calibrates the contribution of the growth term.

The inclusion of the scaling constant k helps to prevent over-penalization of lower-rated players, whose percentile scores would otherwise dominate the adjusted metric - even if they demonstrated substantial Elo improvement. By tuning k we ensured that the percentile acts as a regularizing term- moderating the growth rate without overshadowing it. This approach preserves the interpretability of Elo growth while correcting for structural inequities in the rating system, thereby enabling fair comparisons across skill levels in the classification task.

Using this adjusted growth score (with the k scaling factor set to 0.5), we classified players as “fast learners” if they ranked in the top quartile (≥ 75th percentile) and as “not fast learners” otherwise, yielding a binary outcome suitable for supervised classification. This approach avoids arbitrary cutoffs while allowing fair comparisons across rating levels, aligning with our goal of identifying behavioral predictors of accelerated skill development. Because group membership is derived directly from the adjusted Elo growth distribution, statistical differences between groups are built into the design and do not require further inferential testing.

Feature Engineering

To capture the previously described dimensions - practice timing structure, challenge level, strategic exploration in move sequences, and tactical efficiency - as predictors of skill growth, we derived a set of aggregated trend-based features such as weighted means and slopes. In addition, we also computed a metric termed steady magnitude to quantify the directional consistency of change in a behavioral time series (e.g., tactical efficiency, spacing, engagement). This measure captures the extent to which local (month-to-month) changes align with the overall trend across a player’s active period. To compute steady magnitude, we first calculate the first-order differences in the time series:

$\:{{\Delta\:}}_{i}={x}_{i+1}-{x}_{i}$

(Eq. 4)

We then estimated the global trend s using ordinary least squares regression for each local change Δ_i = x_i₊₁ − x_i, and evaluate whether its sign matches the sign of s. The steady magnitude is defined as the proportion of absolute change that is directionally consistent with the global trend:

$\:\text{Steady\:Magnitude}=\frac{{\sum\:}_{i}\left|{{\Delta\:}}_{i}\right|\cdot\:{1}_{\text{sign}\left({{\Delta\:}}_{i}\right)=\text{sign}\left(s\right)}}{{\sum\:}_{i}\left|{{\Delta\:}}_{i}\right|}$

(Eq. 5)

where 1_{sign(Δ_i) = sign(s)} is an indicator function that equals 1 when the local change is aligned with the overall trend and 0 otherwise; values near 1 indicate that most local changes were in the same direction as the overall trend, suggesting consistent behavioral evolution, while values near 0 reflect frequent directional reversals or erratic fluctuations, even if a global trend exists.

While the metrics computed under the engagement, practice structure were relatively straightforward to derive, those associated with optimal challenge level, evolving entropy (i.e, strategic exploration), and tactical efficiency are more nuanced and merit further methodological explanation.

To quantify optimal challenge - defined as situations where the player is neither overwhelmed nor underwhelmed by his/her opponent proficiency level - we estimated win probabilities based on Elo rating differences between the player and their opponent. Using the standard Elo formula, the expected probability of winning is computed as:

$\:{P}_{\text{win}}=\frac{1}{1+{10}^{({E}_{\text{opponent}}-{E}_{\text{player}})/400}}$

(Eq. 6)

where:

E_opponent and E_player are the Elo ratings of the opponent and the player, respectively.

Each game was then categorized in either optimal challenge zone or suboptimal challenge zone based on the player’s estimated win probability:

Optimal challenge: 0.35 ≤ win probability ≤ 0.65

Suboptimal challenge: win probability < 0.35 OR win probability > 0.65

This range selection was informed by Vygotsky’s concept of the Zone of Proximal Development, with the goal of capturing games that reflect an optimal level of challenge - neither too easy nor prohibitively difficult. While it would have been possible to further subdivide the 0.35–0.65 interval (e.g., by separating games in which the player had a win probability above or below 0.50), we opted to treat this range as a unified zone of productive engagement. Given that sustained motivation is essential for continued skill development, we assumed it is best supported by a mix of moderately favorable matchups - games in which players face meaningful difficulty but still have a realistic chance of success. Within this framework, games with win probabilities between 0.50 and 0.65 provide positive reinforcement, while those between 0.35 and 0.50 offer constructive challenge, together fostering the balance necessary for engagement and learning. Although Lichess typically pairs players based on Elo ratings, match difficulty can still vary due to rating volatility, manual pairings, and different game formats. Our optimal challenge metric was developed to account for this variability and provide a more individualized estimate of challenge level.

Importantly, we focused exclusively on the optimal challenge range - rather than analyzing under- or over-challenging games separately - because our theoretical interest lies in identifying the behavioral conditions that facilitate effective learning. By concentrating on games within the hypothesized “sweet spot” for learning, we aimed to isolate how sustained exposure to productive challenge relates to long-term skill acquisition.

The strategic exploration features are grounded in the concept of information entropy, as originally formalized by Shannon (1948). According to Shannon, the entropy of a signal - or a random variable - quantifies the average amount of information or uncertainty associated with that signal. Mathematically, it is computed as the sum over all possible states of the variable, where each state is weighted by the negative logarithm of its probability. This formulation is captured in Eq. 4:

$\:H\left(X\right)=-\stackrel{n}{\sum\:_{i=1}}p\left({x}_{i}\right){\text{l}\text{o}\text{g}}_{2}p\left({x}_{i}\right)$

(Eq. 4)

where:

H(X) represents the entropy of a discrete random variable X.

p(x_i) is the probability outcome x_i.

Based on chunking and skill acquisition theories (e.g., Anderson, 1982; Chase & Simon, 1973; Fitts & Posner, 1967), we assumed - drawing from information theory - that entropy in move sequences would generally decrease as learners became more proficient in a given task. This aligns with well-documented phenomena such as expertise-induced inattentional blindness favored by top-down attentional tuning in experts (i.e, Drew et al, 2014; Potchen, 2006), which suggest that performance becomes more structured and less variable with practice. Guided by this framework, we applied Shannon’s entropy formula to bigrams (i.e., two-move sequences) extracted from each game played by a given individual, while controlling for color. Specifically, entropy was calculated only for games in which the player was playing as White. This decision was based on the assumption that players controlling the White pieces are more likely to adopt an initiating or offensive stance, while Black players are more often forced into reactive or adaptive play. This asymmetry, particularly within a dataset that spans a wide Elo bandwidth (826–2327 pts.), was deemed critical for isolating sequences that most meaningfully reflect the player’s strategic initiative.

Regarding tactical efficiency, and as mentioned above, we defined our metric as the number of moves required for a player to reach a win probability exceeding the 0.70 threshold, considering only games that were ultimately won. A player's ability to steer the game toward a decisive advantage in fewer moves serves as a direct and interpretable measure of tactical sharpness and execution efficiency. The 0.70 threshold was selected as both a psychologically and statistically meaningful tipping point - representing the transition from equilibrium to a position of clear dominance, as interpreted by modern chess evaluation engines (Regan & Hawthorne, 2011; Guid & Bratko, 2006). It reflects not merely positional advantage, but the consolidation of that advantage into likely victory. To compute win probabilities, we used Stockfish (1.2.0 version) - an open-source, high-performance chess engine that consistently ranks among the strongest available. Stockfish evaluations are expressed in centipawns, which we converted into win probabilities using a logistic transformation based on prior modeling work (Regan et al., 2011). Because Stockfish is player-agnostic - evaluating positions under the assumption of optimal play - we normalized the number of moves to reach the 70% win threshold by the Elo rating gap between the player and their opponent. This adjustment accounts for the contextual difficulty of converting advantage against stronger or weaker opponents, allowing for a fairer assessment of tactical efficiency across varied matchups.

Table 1
summarizes the full set of engineered features derived from players’ behavioral data. These features are organized by conceptual dimension - such as engagement, spacing, challenge, and gameplay complexity - and capture both average tendencies and temporal dynamics.
Feature Group	Feature Subtype / Metric	Description
(F) Engagement	Steady magnitude, Slope, Early slope, Late slope, Mean, Consistency	Frequency and regularity of play over time
(F) Spacing	Steady magnitude, Slope, Early slope, Late slope, Mean	Average spacing between sessions
(F) Challenge	Steady magnitude, Slope, Early slope, Late slope, Mean	Alignment of opponent difficulty with player's skill level
(G) Strategic Exploration	Steady magnitude, Slope, Early slope, Late slope, Mean	Measures the complexity and consistency of move sequences (lower = more structured)
(G) Tactical Efficiency	Steady magnitude, Slope, Early slope, Late slope, Mean	Efficiency in converting advantages in won games

Table 1. Summary of engineered behavioral features used in the classification models. Features are grouped according to their theoretical dimension and include trend-based metrics (e.g., slope, steady magnitude) that capture both the direction and consistency of change over time. Each group reflects a distinct aspect of learning behavior relevant to chess skill acquisition. Parenthetical labels (F) and (G) indicate whether a given feature belongs to the "Foundational Conditions" or "Gameplay Indicators" class, respectively.

To ensure that our behavioral features reflected not only trends over time but also their relative significance, all temporal metrics (e.g., slopes, means, and consistency measures) were weighted by the number of games played per month. This weighting strategy enhances signal fidelity by giving greater influence to months where players were more behaviorally active - thus reducing the impact of sparse or anomalous months. In practical terms, it allowed slope calculations, entropy trends, and consistency measures to better reflect meaningful engagement rather than noise from low-activity periods.

Statistical Modeling

Given our goal of identifying the behavioral features that distinguish fast learners - defined as those whose adjusted Elo growth falls within the top quartile (≥ 75th percentile) - we framed the problem as a hierarchical classification task. The final modeling dataset comprised 826 players, with 207 (25.1%) classified as fast learners and 619 (74.9%) as not fast learners, based on the top quartile of adjusted Elo growth. In this framework, we first modeled learner status using features from the Foundational Conditions class. This of variables reflect conditions largely external to the player’s internalized skill acquisition (e.g., volume and structure of play, opponent difficulty), and can therefore be treated as potential preconditions for learning. In contrast, the other two dimensions - strategic exploration and tactical efficiency - are more likely to reflect the consequences or outputs of learning across time (e.g., improvements in sequence structure and move quality).

This conceptual distinction motivated our hierarchical approach, where foundational conditions (practice structure, engagement, and optimal challenge) are assessed first, and gameplay-derived indicators (entropy and efficiency metrics) are introduced in later stages. To evaluate model performance fairly and avoid sampling bias, we split the dataset using stratified sampling, ensuring that both the training and testing sets preserved the original distribution of the binary target variable (“fast learners” vs. "not fast learners”). A random seed was set to ensure reproducibility, and the dataset was stratified into a training set (N_train = 660) and a test set (N_test =166), preserving the class balance across splits. Cross-validation was not applied, as model performance was evaluated on a fully held-out test set (see Fig. 2).

Fig. 2

Overview of the modeling framework. The full sample was stratified into a training set and a held-out test set, preserving class balance. Two models were evaluated: Model 1 included only foundational behavioral conditions (engagement structure and challenge level), while Model 2 added gameplay-derived indicators (strategic exploration and tactical efficiency) to assess their incremental predictive value.

We evaluated the performance of two distinct classifiers: logistic regression and a random forest classifier. Logistic regression is a linear model that classifies data by fitting a sigmoid function to the input features. The further a data point lies from the decision boundary (centered at 0), the greater the model’s confidence in assigning it to one of the two classes (e.g., -1 for "Not Fast Learner" and 1 for "Fast Learner"). In contrast, random forest is a non-linear ensemble method based on decision trees. It operates by recursively splitting the data according to feature values to maximize class separation at each node. As such, random forests are well-suited to capturing complex, non-linear relationships among predictors. Testing classifiers with different modeling assumptions allowed us to explore diverse nuances concerning the relationship between foundational conditions (i.e., practice structure, engagement, and optimal challenge), gameplay-derived indicators (i.e., entropy and tactical efficiency), and learning rate.

Given our focus on interpretability and theory-driven modeling, we did not perform hyper-parameter tuning or fine-tuning (we used those predefined on Python’s “scikit-learn” library). Our primary goal was to evaluate the relative predictive value of behavioral features, rather than to maximize predictive performance. To address class imbalance between "Fast Learners" and "Not Fast Learners," we applied class weighting in logistic regression and used Synthetic Minority Over-Sampling Technique (SMOTE) in the random forest model. All the data preparation, feature engineering, and modeling were performed in Python (3.10.11 version).

Results

The first model aimed to predict class membership - that is, to distinguish “fast learners” from “not fast learners" - using only foundational conditions related to practice structure (i.e., features reflecting engagement, spacing, and optimal challenge). It served as a baseline to assess whether early behavioral patterns alone could meaningfully explain skill acquisition, without relying on gameplay-derived indicators.

The results were not satisfactory, indicating that this subset of features alone was insufficient to reliably separate the two groups. It is important to note that only the top five explanatory features, identified based on the magnitude of their logistic regression coefficients, were included in the model. This decision was driven by our emphasis on interpretability rather than predictive performance. A parsimonious model with limited multicollinearity provides clearer theoretical insights than one with a large number of interacting features that are difficult to interpret (see Babyak, 2004, for a discussion of overfitting and multicollinearity). Table 2 presents the achieved results:

Table 2
Performance metrics for Logistic Regression and Random Forest (with SMOTE) in Model 1. Precision, recall, and F1 scores are reported separately for each class, alongside overall accuracy and ROC AUC. The Random Forest model achieved higher overall accuracy and more balanced performance across both classes, while Logistic Regression showed higher recall for fast learners despite lower precision.
Model 1	Not Fast Learners			Fast Learners			Overall
Model 1	Precision	Recall	F1	Precision	Recall	F1	Accuracy	ROC AUC
Logistic Regression	0.80	0.53	0.64	0.30	0.61	0.40	0.55	0.58
Random Forest	0.79	0.80	0.79	0.37	0.35	0.36	0.69	0.61

Although the Random Forest model outperformed Logistic Regression in overall accuracy and balance across classes, its ROC AUC score remained only modestly above chance level (0.61), suggesting limited discriminative power. In both models, the separation between “fast learners” and “not fast learners” was weak, reinforcing the conclusion that foundational conditions alone do not sufficiently explain differences in learning trajectories. This reinforces the need to incorporate gameplay-derived indicators in subsequent modeling stages.

The second model added the Gameplay Indicators dimensions (i.e, strategic exploration and tactical efficiency) to the previous set of explaining variables. Considering only the top five explanatory features, assessed by the same method described above, the results clearly improved, as can be checked in Table 3:

Table 3
Performance metrics for Logistic Regression and Random Forest (with SMOTE) in Model 2. Precision, recall, and F1 scores are reported separately for fast and not fast learners, along with overall accuracy and ROC AUC. Logistic Regression yielded better recall for fast learners, while Random Forest achieved higher accuracy and precision for not fast learners but struggled to identify fast learners effectively.
Model 2	Not Fast Learners			Fast Learners			Overall
Model 2	Precision	Recall	F1	Precision	Recall	F1	Accuracy	ROC AUC
Logistic Regression	0.89	0.65	0.75	0.42	0.77	0.55	0.68	0.78
Random Forest	0.77	0.89	0.83	0.42	0.23	0.29	0.72	0.59

Compared to the first model, classification performance improved notably - not only in overall accuracy but, more importantly, in the ROC AUC score for the logistic regression model. Given the nature of the dataset - which is unfiltered, raw, and ecologically valid - these results are relatively robust. The model correctly classifies players as “fast learners” or “not-fast learners” approximately 70% of the time, a performance well above chance level, as illustrated by the ROC curve in Fig. 3. In psychological research, an AUC between 0.70 and 0.80 is generally considered acceptable for models that aim to predict complex, real-world behaviors (Hosmer, Lemeshow, & Sturdivant, 2013). However, it is also evident that the model performs substantially better in identifying “not-fast learners” than in detecting “fast learners".

Fig. 3

Receiver Operating Characteristic (ROC) curves for the logistic regression and random forest models. The logistic regression model (AUC = 0.78) demonstrates stronger discriminative performance in distinguishing fast learners from not fast learners, relative to the random forest model (AUC = 0.59), which performs only slightly above chance level. The dashed diagonal represents random classification performance (AUC = 0.50).

As shown in Fig. 4, the most predictive features in the logistic regression model are the mean tactical efficiency and the mean proportion of games played at an optimal challenge level. Mean tactical efficiency shows a strong positive association (coefficient > 1) with the “fast learner” class, suggesting that “faster learners” are more efficient at converting advantageous positions into wins. Interestingly, the mean of optimal challenge is negatively associated with fast learning (coefficient ≈ -0.75), implying that playing more games at optimal challenge levels does not necessarily predict faster skill acquisition - an unexpected finding.

However, the broader pattern of optimal challenge features presents a complex and somewhat inconsistent picture. For instance, a higher steady magnitude in optimal challenge is also associated with the “not fast” learner class, while the late slope of optimal challenge (reflecting change in the final quarter of the data) shows a positive relationship with fast learning. This variability suggests the presence of indirect or potentially non-linear effects, which may not be fully captured by linear modeling.

Figure 4. Logistic regression coefficients for the top five most predictive behavioral features. Positive coefficients indicate greater odds of being classified as a “fast learner.” The strongest predictor is the average tactical efficiency, suggesting that faster learners convert advantageous positions into wins more reliably. Conversely, a higher mean of games played at an optimal challenge level is negatively associated with fast learning. Interestingly, other optimal challenge indicators (e.g., late slope) display positive associations, suggesting potential non-linear or context-dependent effects. Steady magnitude in strategic exploration (entropy) shows a modest positive link to fast learning.

Finally, the steady slope of strategic exploration - operationalized through bigram entropy of move sequences - exhibits a modest positive association with the “fast learners” class. This indicates that “fast learners” may display greater variability in their move sequences over time, possibly reflecting a broader or more flexible strategic repertoire.

Given the inconsistent behavior of optimal challenge features - and the absence of other foundational practice variables (e.g., engagement and spacing) among the top predictors in the classification model - we sought to examine whether these factors might still exert an indirect influence on chess learning. Since tactical efficiency (i.e., a player’s ability to convert probabilistic advantages into actual wins) emerged as a strong predictor of fast learning, we treated it as a proxy for chess skill development.

To investigate whether foundational practice conditions could contribute to the development of tactical efficiency, we conducted a multiple linear regression analysis. Here, tactical efficiency served as the dependent variable, while engagement, spacing, and optimal challenge metrics were included as independent variables. Regarding multicollinearity mitigation, we reduced the initial set of predictors prior to regression analysis. We first computed a Pearson correlation matrix of all candidate features and observed several clusters of highly correlated variables within the engagement, spacing, and optimal challenge domains (see appendix 1). To avoid redundancy, we retained a small subset of conceptually representative variables from each domain while ensuring low pairwise correlations. This resulted in the following reduced feature set: engagement (steady magnitude, slope, mean), spacing (steady magnitude, slope, mean), optimal challenge (steady magnitude, slope, mean).

This approach allowed us to test whether foundational practice behaviors may mediate or facilitate the emergence of tactical performance over time - even if they do not directly classify learners as “fast learns” or “not fast learned” in the initial model. The results of the multiple linear regression are shown in Table 4.

Predictor	B	SE	t	p	95% CI (LL, UL)
Intercept	0.294	0.002	127.27	< .001	[0.289, 0.298]
Engagement – Steady Magnitude	−0.001	0.003	-0.50	.619	[− 0.007, 0.004]
Engagement – Slope	0.004	0.003	1.37	.170	[− 0.002, 0.009]
Engagement – Mean	0.005	0.003	2.00	.046	[0.000, 0.011]
Spacing (Weighted Mean) – Steady Magnitude	−0.001	0.003	-0.41	.680	[− 0.007, 0.004]
Spacing (Weighted Mean) – Slope	−0.002	0.003	-0.73	.465	[− 0.008, 0.004]
Spacing (Weighted Mean) – Mean	0.004	0.003	1.34	.182	[− 0.002, 0.009]
Optimal Challenge – Steady Magnitude	0.001	0.003	0.22	.826	[− 0.005, 0.006]
Optimal Challenge – Slope	−0.003	0.003	-1.09	.276	[− 0.008, 0.002]
Optimal Challenge – Mean	0.059	0.003	23.06	< .001	[0.054, 0.064]

Model fit: R² = 0.53, Adj. R² = 0.52, F(9, 566) = 69.76, p < .001

Table 4. Summary of multiple linear regression predicting tactical efficiency from foundational practice-related features. The model includes variables related to engagement, spacing, and optimal challenge. Mean optimal challenge emerged as the strongest significant predictor, while most other dynamics (slopes and magnitudes) were not statistically significant. Standard errors and confidence intervals are reported for each coefficient. Note that all continuous predictors were standardized (z-scored) prior to analysis, so reported regression coefficients represent standardized effects.

The model explained approximately 53% of the variance in tactical efficiency (R² = 0.53, Adj. R² = 0.52), which is notably strong for psychological or naturalistic behavioral data. The overall model was statistically significant (F(9, 566) = 69.76, p < .001), indicating that several predictors reliably contributed to explaining variation in the outcome. Among them, optimal challenge mean emerged as the strongest and only highly significant predictor (β = 0.396, p < .001), suggesting that players who, on average, spent more time playing at an optimal challenge level (win probability between 0.35 and 0.65) exhibited higher tactical efficiency. This aligns with Vygotsky’s Zone of Proximal Development (1978), which posits that learning is maximized under conditions of balanced difficulty. To assess potential multicollinearity among predictors, we calculated Variance Inflation Factors (VIFs) for each independent variable. All VIF values were below 2, indicating that multicollinearity was not a concern (O'Brien, 2007).

Engagement – Mean had a small but statistically significant positive effect (β ≈ 0.000057, p = .046), suggesting that greater overall activity is modestly associated with improved tactical efficiency. However, the selection bias introduced by filtering for players with at least 18 months of sustained activity may have masked the full effect of engagement. It is possible that engagement has a linear impact on Elo growth only up to a certain threshold. This finding also implies that how engagement changed over time (e.g., increasing or decreasing slopes) did not substantially influence tactical efficiency. A visual representation of these results is shown in Fig. 5.

Fig. 5

Standardized regression coefficients (B) and 95% confidence intervals for predictors of tactical efficiency. Only “Optimal Challenge – Mean” shows a statistically significant positive effect, while all other predictors—including engagement and spacing metrics—exhibit small and non-significant associations. The plot illustrates both the direction and uncertainty of each coefficient.

Discussion

The overarching goal of this study was to identify contextual and gameplay-derived variables that may serve as reliable behavioral signals of chess learning pace. In line with this aim, several important findings emerged. First, foundational practice conditions - namely engagement, spacing, and optimal challenge - were not sufficient on their own to reliably distinguish fast from not-fast learners, with the classification model achieving only modest performance (AUC ≈ .61). However, when gameplay-derived indicators were incorporated, model performance improved substantially. The logistic regression model that included these features reached an AUC of .78, indicating acceptable-to-strong discriminative power for predicting complex, naturalistic learning behaviors (Hosmer, Lemeshow, & Sturdivant, 2013). Among all predictors, tactical efficiency emerged as the most robust indicator of fast learning, suggesting that players who more effectively reach advantageous positions, converting them into wins tend to show greater Elo progression. Given the ambiguous behavior of optimal challenge indicators in the classification model, a follow-up linear regression revealed that the mean optimal challenge level, and in although in a much less extent, overall engagement, accounted for over 50% of the variance in tactical efficiency (R² = .53). This reinforces the importance of practicing under balanced difficulty conditions, albeit in an indirect manner. By contrast, spacing features, and their temporal dynamics (e.g., slopes and steady magnitudes) exhibited non-significant effects on tactical efficiency.

Although not specifically designed to adjudicate the longstanding debate between structured practice and innate cognitive ability, this study offers valuable insights into that discussion. While we did not control for participants’ general cognitive abilities, the results clearly underscore the importance of practicing under optimal challenge conditions. Specifically, sustained exposure to games within a balanced difficulty range emerged as a key predictor of tactical efficiency - a behavioral marker that, in turn, significantly differentiated “fast learners" from “not fast learners" in the classification model. Unfortunately, Ericsson’s theory of deliberate practice has often been reduced in public discourse to the oversimplified notion of “10,000 hours,” a misrepresentation popularized by Gladwell (2008). This framing tends to obscure Ericsson and colleagues’ (1993) original emphasis on the quality of practice - its structure, difficulty, and feedback mechanisms. In that respect, our findings lend support to Ericsson’s core argument: not all practice is equal. Conditions of appropriately calibrated challenge appear to play a critical role in fostering tactical development and, by extension, skill acquisition in chess.

Another clear theoretical implication of this study is its alignment with Vygotsky’s (1978) framework on learning - particularly the concept of the Zone of Proximal Development and the role of scaffolding. Players who demonstrated higher tactical efficiency, defined as their ability to convert relative advantages into actual wins, were those who had spent more time playing within an optimal challenge range. This strong linear association supports the notion that learning is most effective when tasks are neither too simple nor overwhelmingly difficult. The oft-cited quote attributed to Isaac Newton - “If I have seen further, it is by standing on the shoulders of giants”-aptly applies to this context: fast learners appear to benefit most from engaging with opponents who fall within their zone of proximal development. While derived from chess data, this insight has broader implications, potentially informing pedagogical strategies that prioritize structured peer matching and challenge calibration across various educational and instructional domains.

Although strategic exploration contributed less than tactical efficiency to the classification model, it still demonstrated some discriminative power. However, its relationship with Elo growth is likely non-linear and may exhibit temporal dynamics - such as bursts, plateaus, or cyclical patterns - that are better captured in a time series framework. It is plausible that, along their developmental trajectory, chess players periodically modulate their level of strategic exploration in order to internalize new schemas and refine their game. From a cognitive perspective, this pattern may echo Piaget’s (1970) theory of assimilation and accommodation: players assimilate new positions into existing schemas when variability is low and shift toward accommodation - modifying those schemas - during phases of higher exploration and entropy. From a systems-theoretic lens, this oscillation between structured and exploratory behavior can also be viewed through the framework of entropy and negentropy. Periods of high entropy (i.e., strategic variability) may serve as necessary precursors to later reductions in entropy as more efficient strategies consolidate. This pattern resonates with Prigogine’s theory of self-organization in far-from-equilibrium systems, where local disorder (entropy) is not only tolerated but required for emergent order and adaptive growth (Prigogine & Stengers, 1984). Future research should explore how strategic exploration and tactical efficiency co-evolve, potentially following an exploration–exploitation dynamic - where shifts between variability and consolidation mark transitions in learning plateaus or breakthroughs in chess expertise.

It is important to note that operationalizing strategic exploration solely as the bigram entropy of move sequences - limited to games played with the white pieces - is likely a narrow and restrictive approach. While informative, this metric captures only one facet of the broader behavioral dimension it aims to represent. Alternative entropy measures exist in the literature, and perhaps more significantly, there is potential to apply entropy-based analyses to other types of distributions - such as procedural entropy, which could be defined as the variability in a player's responses when presented with identical or highly similar board conditions. This form of entropy would more directly tap into the consistency and automatization of decision-making, aligning closely with core theories of procedural expertise (e.g., Klein, 1997). These avenues warrant further exploration in future research.

This study offers several methodological strengths. First, it leverages a large, ecologically valid dataset spanning over one million games, allowing for rich behavioral modeling in a real-world context rather than relying on lab-based or retrospective self-reports. Second, the introduction of a normalized Elo progression metric, adjusted for rating difficulty and percentile rank, addresses a long-standing issue in chess expertise research - namely, the nonlinearity of improvement across skill levels. Third, the use of time-sensitive trend metrics (e.g., steady magnitude, slope) contributes a dynamic perspective to player behavior, distinguishing this work from prior studies that have relied solely on cumulative or static indicators.

Despite its strengths, the study is not with its limitations. Most notably, the use of aggregate data precludes fine-grained temporal analysis. While slopes and steady magnitudes offer proxies for behavioral change, they cannot substitute for true longitudinal modeling of within-person variation across time. Additionally, the filtering criterion requiring 18 months of sustained activity may have introduced selection bias, potentially over-representing highly engaged players. The study also does not account for individual cognitive or dispositional factors - such as intelligence, working memory, or motivation - which may interact with practice behaviors in determining learning outcomes. Lastly, and as discussed before, strategic exploration was measured only in White-piece games using bigram entropy, a simplification that omits potential strategic variance in other game contexts.

Future research could extend this work in several directions. A panel-based longitudinal design would enable the modeling of individual learning trajectories and the detection of non-linear growth patterns - such as plateaus or bursts. Second, entropy-based measures could be refined by incorporating longer n-gram sequences, conditional entropy (“if–then" decision structures/distributions), or domain-specific entropy tied to tactical motifs or positional features. Third, the dynamic relationship between strategic exploration and tactical efficiency over time could be modeled using state-space or dynamic time warping methods, potentially revealing exploration–exploitation cycles. Lastly, integrating cognitive assessments or metadata (e.g., age, training background, motivation) would allow for richer multivariate models that bridge behavioral and dispositional factors in chess expertise development.

Conclusion

This study set out to identify which behavioral conditions and gameplay-derived metrics most effectively signal learning pace in online chess. Rather than attempting to fully explain individual learning trajectories, our primary goal was to determine which features might warrant further longitudinal modeling as markers of accelerated skill acquisition. In this context, several key insights emerged.

First, foundational practice variables - such as engagement, spacing, and average challenge - proved insufficient on their own to classify fast versus not-fast learners with high accuracy. However, once gameplay-based indicators were added, particularly tactical efficiency, model performance significantly improved. This suggests that internalized, in-game behaviors provide richer insights into learning outcomes than context alone. Among these, tactical efficiency emerged as the strongest predictor, with a direct link to adjusted Elo growth and a high association with fast learning status. Crucially, the mean level of optimal challenge was found to explain over half the variance in tactical efficiency, offering empirical support for both Ericsson’s (1993) emphasis on practice quality and Vygotsky’s (1978) theory of the Zone of Proximal Development.

While strategic exploration showed more modest predictive power, its relationship to learning may be better understood as dynamic, nonlinear, and temporally structured. The potential for an exploration-exploitation cycle - supported by Piagetian learning theory and entropy-based systems thinking - offers a promising theoretical direction for future studies. Likewise, methodological limitations, including the use of aggregate data and a single entropy operationalization, point toward the need for more granular, time-aware designs.

Altogether, this work contributes both methodological innovation and theoretical insight to the study of skill acquisition. By demonstrating that certain behavioral signals - especially tactical efficiency - can reliably flag “fast learners”, the findings lay the groundwork for future panel-based research capable of modeling the full complexity of chess expertise development. These insights are not only valuable within the domain of chess but may also generalize to other high-skill environments where deliberate practice, feedback, and adaptive engagement are critical to expertise growth.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflicts of interest/Competing interests

The authors declare that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Data Availability

The datasets analyzed during the current study are publicly available from the Lichess database: https://database.lichess.org/.

Author Contribution

Author ContributionsLuís C. Meireles conceived the study, implemented the modeling framework, conducted the analyses, and wrote the manuscript. Tiago Mendes-Neves provided technical guidance, contributed to the design and refinement of the modeling procedures. João Mendes-Moreira supervised the research, contributed to methodological validation and interpretation, and provided critical revisions to the manuscript. All authors approved the final version of the paper.

References

Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89(4), 369–406.

Anderson, J. R. (1990). The adaptive character of thought. Lawrence Erlbaum Associates.

Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411–421.

Bilalić, M., Langner, R., Ulrich, R., & Grodd, W. (2011). Many faces of expertise: Fusiform face area in chess experts and novices. Journal of Neuroscience, 31(28), 10206–10214.

Bilalić, M., McLeod, P., & Gobet, F. (2007). Does chess need intelligence?—A study with young chess players. Intelligence, 35(5), 457–470. Bruner, J. S. (1957). On perceptual readiness. Psychological Review, 64(2), 123–152. Burgoyne, A. P., et al. (2016). Examining the relationship between intelligence and chess skill: A meta-analysis. Intelligence, 59, 72–83. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4(1), 55–81. Chassy, P., & Gobet, F. (2011). Measuring chess experts’ single-use sequence knowledge: An archival study of departure from ‘theoretical’ openings. PLOS ONE, 6(10), e26692. de Bruin, A. B. H., Rikers, R. M. J. P., & Schmidt, H. G. (2008). Deliberate practice in chess: Differences in strategy and resource investment. Learning and Individual Differences, 18(1), 18–27. Drew, T., Võ, M. L.-H., & Wolfe, J. M. (2014). The invisible gorilla strikes again: Sustained inattentional blindness in expert observers. Psychological Science, 25(9), 1857–1867. Elo, A. E. (1957). The rating of chess players, past and present. Arco Pub. Ensmenger, N. (2011). The computer boys take over: Computers, programmers, and the politics of technical expertise. MIT Press. Ericsson, K. A. (2014). Why expert performance is special and cannot be extrapolated from studies of performance in the general population: A response to criticisms. Intelligence, 45, 81–103. Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406. Fitts, P. M., & Posner, M. I. (1967). Human performance. Brooks/Cole. Gobet, F., & Campitelli, G. (2007). The role of domain-specific practice, handedness, and starting age in chess. Developmental Psychology, 43(1), 159–172. Gobet, F., & Chassy, P. (2008). Towards a taxonomy of encapsulated knowledge in the acquisition of expertise. Cognitive Processing, 9(1), 1–11. Gobet, F., & Simon, H. A. (2000). Five seconds or sixty? Presentation time in expert memory. Cognitive Science, 24(4), 651–682. Gladwell, M. (2008). Outliers: The story of success. Little, Brown and Company. Grabner, R. H., Stern, E., & Neubauer, A. C. (2006). Individual differences in chess expertise: A psychometric investigation. Acta Psychologica, 124(3), 398–420. Guid, M., & Bratko, I. (2006). Computer analysis of world chess champions. ICGA Journal, 29(2), 65–73. Hambrick, D. Z., et al. (2014). Deliberate practice: Is that all it takes to become an expert? Intelligence, 45, 34–45. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons. Kasparov, G. (2018). Deep thinking: Where machine intelligence ends and human creativity begins. John Murray.

Klein, G. (1997). Developing expertise in decision making. Thinking & Reasoning, 3(4), 337–352. Laird, J. E., Lebiere, C., & Rosenbloom, P. S. (2017). A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Magazine, 38(4), 13–26. Maguire, E. A., et al. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398–4403. McCarthy, J. (1990). Chess as the Drosophila of AI. In Computers, Chess, and Cognition (pp. 227–237). Springer. Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing pairs of words. Journal of Experimental Psychology, 90(2), 227–234. O'Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41, 673–690. Piaget, J. (1970). Structuralism. Basic Books. Plomin, R., et al. (2014). Common DNA markers can account for more than half of the genetic influence on cognitive abilities. Psychological Science, 25(6), 1342–1349. Potchen, E. J. (2006). Measuring observer performance in chest radiology: Some experiences. Journal of the American College of Radiology, 3(6), 423–432. Prigogine, I., & Stengers, I. (1984). Order out of chaos: Man’s new dialogue with nature. Bantam Books. Regan, K., & Haworth, G. (2011). Intrinsic chess ratings. ICGA Journal, 34(3), 150–163. Ruthsatz, J., et al. (2014). The role of intelligence in predicting domain-specific achievement: A longitudinal study. Intelligence, 45, 1–10. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. Simon, H. A., & Chase, W. G. (1973). Skill in chess. American Scientist, 61(4), 394–403. Simon, H. A., & Gilmartin, K. (1973). A simulation of memory for chess positions. Cognitive Psychology, 5(1), 29–46. Vaci, N., et al. (2019). The joint influence of intelligence and practice on skill development throughout the life span. Nature Human Behaviour, 3, 484–491. von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100, 295–320. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.

Appendix 1

Yes

However, we acknowledge that this relationship is unlikely to be perfectly linear. The learning process likely includes local optima and plateaus, as learners periodically revise their internal models or explore new strategies in response to increasing task difficulty. Rather than progressing at a constant rate, skill development more closely resembles a nonlinear optimization journey - with alternating periods of refinement and exploration. Although Anderson (1990) does not formally use a hill metaphor, his adaptive view of cognition emphasizes that learning optimizes behavior within environmental and cognitive constraints - implying a landscape with both gradual slopes and sudden inflection points along the path to expertise.