Visual and tactile motion cues enhance the categorisation of novel object shapes

MartinaA.Seveso1,3✉Emailsevesom@tcd.ie

RebeccaJ.Hirst1

AlanO’Dowd1

IvanCamponagara2

FionaN.Newell1

School of Psychology and Institute of NeuroscienceTrinity College DublinIreland

2Department of Psychology, College of Natural and Health SciencesZayed UniversityAbu DhabiUnited Arab Emirates

3Institute of NeuroscienceTrinity College DublinDublinIreland

Martina A. Seveso¹, Rebecca J. Hirst¹, Alan O’Dowd¹, Ivan Camponagara² and Fiona N. Newell¹

¹School of Psychology and Institute of Neuroscience, Trinity College Dublin, Ireland.

²Department of Psychology, College of Natural and Health Sciences, Zayed University, Abu Dhabi, United Arab Emirates.

Author Note

Correspondence concerning this article should be addressed to Martina A. Seveso, Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland. Email: sevesom@tcd.ie. ORCID: https://orcid.org/0000-0001-6566-4578.

Statements and Declarations

Funding

This research was funded by a Frontiers for the Future grant from Science Foundation Ireland, no. 19/FFP/6812, and by an Advanced Laureate Award from the Irish Research Council, no. IRCA/2023/1509, awarded to FNN. Ivan Camponogara was supported by the Research Incentive Fund grant from Zayed University, UAE, no. 23076.

Conflict of interest statement

One of the authors (Rebecca Hirst) is also author of the package used to demonstrate vibration in this manuscript (PsychoPy/PsychoJS), this software is provided free and open source to all users. This author is also part of Open Science Tools Ltd. This is a Social Enterprise that receives income from studies being conducted on Pavlovia.org. Pavlovia.org is agnostic to the software package creating the studies - it supports studies from PsychoPy, jsPsych and Lab.js - and the resulting revenue stream funds further tool development.

Abstract

Categorisation is a fundamental cognitive process, involving the integration of information across the senses. We investigated remotely using smartphones whether visual and tactile motion cues could enhance object category learning and generalisation to novel object shapes. Two categories of similar shapes were associated with specific correlated visual and tactile vibration motion cues. After learning object categories, participants were assessed on categorisation of learned and novel objects across four cue conditions: shape-only, shape-visual motion, shape-tactile motion, and shape-visual and tactile motion. We also assessed if accuracy was influenced by blocked versus interleaved cue-conditions at test. In Experiment 1, we found more accurate categorisation and generalisation when all cues were available at test. In Experiment 2 we replicated this effect even when the reliability of the shape-only cue for predicting category membership was reduced. In Experiment 3, we found that the absence of motion cues during learning removed the benefit of motion cues at test. Overall, our findings suggest that multisensory motion cues benefit the formation of novel object categories and allow for better generalisation. The results have implications for our understanding of the underlying dynamic and multisensory nature of object categories and the predictive role of multisensory features on category formation.

Keywords:

object categories

multisensory perception

tactile perception

object motion

online testing

Public Significance Statement

This study shows that combining visual and tactile (vibration) motion cues supports both learning and generalisation of object categories more efficiently than relying on shape alone. Additionally, the way we learn influences our ability to generalise categories to novel objects. These findings highlight the multisensory nature of how we form object categories and suggest applications for designing multisensory educational tools, haptic interfaces, and virtual environments.

Introduction

Imagine enjoying a peaceful wander through a forest when suddenly, to your horror, something lands on your back. After a while, you recognize that it is an insect and not a leaf falling from a tree. When trying to categorise the “something” as an insect and not a leaf, you might rely on multiple cues, such as the feel of the flutter of the insect’s wings or the scuttling of its legs, while it moves into view. Such a task leverages multisensory integration for object categorisation learned through experience. Studies have shown that object category learning is a complex process involving the integration of multiple features to differentiate between object categories as well as generalize from individual exemplars (Goldstone&Hendrickson,2010;Pérez-Gay et al.,2017). Shared object features, such as shape, sound, or movement, are used for category formation and for generalisation to novel objects. However, even though much is known about unisensory visual (Rosch,1978) auditory (Griffiths&Warren,2004;Feng et al,2021;Brunel et al.,2013) and tactile (Newell,2004) features on object categorisation, less is known regarding how multisensory cues are used together for category learning and generalisation (Newell et al.,2023).

Previous studies have shown that both vision (Peissig&Tarr,2007;Grill-Spector,2003) and touch (Gibson,1962;Lederman&Klatzky,1987;1990) are sufficient to enable the learning and subsequent recognition of individual objects (Newell et al.,2001;Lacey et al.,2009). Moreover, object information can be efficiently transferred between vision and touch (Yildrim & Jacobs,2013). For familiar objects in particular, the resulting shared representations may lead to the formation of more robust object categories than unisensory information alone (Yildirim&Jacobs,2013;Gaissert&Wallraven,2012;Haag,2011;O'Callaghan et al.,2018;Broadbent et al.,2020), highlighting the crucial role of multisensory integration in the representation of object categories in memory (Naci et al.,2012). Despite our knowledge that multisensory features contribute to categorisation, it is unclear how multiple sensory inputs affect the formation of novel object categories, and whether such multisensory categories allow for generalisation to novel exemplars. Indeed, previous studies investigating generalization from learned multisensory representations have produced mixed results, with some reporting evidence for a benefit of multisensory learning on generalization (Wu et al.,2021), while others report no specific benefit of multisensory information on categorisation performance (Edmunds et al.,2020;Roark et al.,2021;Sun et al.,2023;Atkin et al.,2023;Roark et al.,2023;Li&Deng,2023;Roark,2024;O’Dowd et al.,2025). Some of the discrepancies across these studies may be due to the amount of prior knowledge of the object stimuli, the extent to which sensory information is correlated across modalities during learning, or the relative distinctiveness or predictability of each sensory cue to category membership.

Objects in the real world are not only multisensory, but often dynamic. Furthermore, when information from two sensory modalities shares temporal properties, such as synchronous movement, they are more likely to be combined (Parise&Ernst,2016), which may serve to enhance any benefit of multisensory information for category learning. Within such a multisensory process, information gathered about the movement of an object through vision (Robert et al.,2023;Shatek et al.,2022) or touch (Gaissert&Wallraven,2012,Simões-Franklin et al.,2011;Sumser et al., 2024) can play a fundamental role in category formation. Visual object movement can facilitate the recognition of novel objects (Stone, 1999;Newell et al., 2004;Setti&Newell,2010) compared to static conditions and the neural substrates underpinning tactile and visual object motion appear to be shared (Chan et al,2010;Amemiya et al.,2017). Within the tactile modality, dynamic motion cues are often received in the form of vibrations, which can provide valuable information about specific object properties. In this regard, studies on vibrotactile discrimination suggest that touch can enhance discrimination (Mahns et al.,2006;Verrillo et al.,1969), particularly when visual temporal cues alone are unreliable or ambiguous (e.g.,Pomper et al.,2014;Hirst et al.,2025). Tactile stimulation is quickly detected on the skin, and passive exposure to moving tactile information, such as vibrations or flutterings, can help discriminate objects (Fleming et al.,2013; Fleming,2017; Ryan et al.,2021; Shao et al.,2016; Ziat,2023). Because visual and tactile motion cues contribute to object recognition and discrimination, we can assume they also contribute to object categorisation, but this has not hitherto been investigated. Building on this, the present work investigates whether motion-related cues can facilitate not only category learning but also generalisation across novel exemplars.

Across three experiments, we aimed to investigate whether tactile and visual object motion influenced the categorisation of novel object shapes, and if the formation of these multisensory categories subsequently benefited generalisation to new exemplars. In the first two experiments, participants learned to categorise novel object shapes presented with visual and tactile motion cues (in Experiment 3 only the object shape was presented during learning). We manipulated the cue informativeness of category membership, such that in Experiment1 all cues were fully informative, while in Experiment2 one cue (e.g., shape similarity) was less informative while both visual and motion cues remained fully informative. Visual motion consisted of each object either rolling, jumping, swinging, or shaking, whereas tactile motion consisted of vibrations that were correlated with the temporal characteristics of each of these visual motion patterns. For example, if an object was seen swinging from left to right, the tactile vibrations would share the same onset and offset events present during the visual motion sequence. Following successful learning, participants were then tested on their categorisation accuracy to learned exemplars and generalisation to novel exemplars. The test phase assessed performance across four different cue conditions in which we manipulated the number and type of cues available. Importantly, in our experiments, fewer motion than shape features were used as predictive cues to category membership. That is, each category was defined by nine shapes and two distinct motion features (e.g.,rolling,swinging). As such, the within-category variability of the shape cue was relatively higher than within category motion cues. This relative variability across shape and motion cues may result in different weightings of shape or motion cues for categorisation and generalisation performance, which we investigate here.

We conducted two different versions of our experiments: following learning, participants were tested on their categorisation of learned and novel objects in trials that were either blocked by cue condition or interleaved. A secondary aim of the current study was to understand the impact of blocked versus interleaved trial presentation on multisensory category learning. Previous studies have indicated that blocked versus interleaved presentation influences categorisation performance (Kornell&Bjork,2008; Abel et al,2021). Blocked trial presentation facilitates learning by reinforcing within-category featural similarities, whereas interleaved trial presentation promotes discriminative contrast across categories (Carvalho&Goldstone,2015;Kost et al.,2015). Furthermore, it has also been suggested that interleaving trials improves long-term retention and category induction across sensory modalities (Ge et al.,2021;Abel,2023). Thus, in multisensory contexts, interleaving of trials may further strengthen cross-modal cue combination, enhancing participants’ ability to differentiate between categories (Abel,2023). On the other hand, interleaving trials may disrupt performance if the ability to adapt to varying cue types across trials is not sufficiently flexible (Carvalho&Goldstone,2019) and generate uncertainty about which cue to prioritise. Moreover, switching between different sensory modalities may increase cognitive load.

Accordingly, the presentation of blocked or interleaved trials allowed us to tease apart these cognitive effects on cue combination.

Experiment 1

Our first experiment was designed to explore whether the concurrent availability of visual and tactile motion cues enhances object category formation and categorisation performance. Building on prior research on the perception of moving objects (e.g.,Setti&Newell,2004; Chan et al.,2010), we hypothesised that the combination of visual and tactile motion cues with object shape would improve object categorisation and generalisation to novel exemplars relative to static shape alone. We therefore predicted higher categorisation accuracy, including for novel exemplars, in conditions in which all cues were available relative to conditions in which one or more sensory cues were missing. To allow us to explore our secondary factor (blocked versus interleaved trials), we tested this prediction using trials that were either blocked by cue condition (Experiment-1A) or interleaved (Experiment-1B).

Method

Participants

We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. All participants were recruited using Prolific (https://www.prolific.co/), based on the following inclusion criteria: fluency in English, normal or corrected-to-normal vision and no hearing impairments. An a priori power analysis for each of the groups was performed in PANGEA (Westfall,2016,v0.2) for a within-subjects design. The advised minimum sample size is for 80% power to detect the effects of interest with size (Cohen's d) of 0.3, was 32 participants per experimental design (blocked or interleaved).

We initially recruited 155 participants of whom 110 (71%) successfully reached the learning criteria. Of these participants who learned the categories, a further 33 failed to perform above chance in the test (29% of learners) and their data were not included in any subsequent analyses. While this attrition rate may appear high, it is in line with findings from other studies on successful categorisation performance in laboratory settings (e.g. Smith et al.,2014; Roark&Chandrasekaran,2023). In total, 77 participants (mean age = 38.83years, SD = 10.97; 52%female) completed the experiment online with 45 participants allocated to the blocked design (Experiment-1A) and 32 to the interleaved design (Experiment-1B). For all experiments, all participants were naïve to the purpose of the study and were compensated at a rate of £9.00 per hour.

The study was approved by the Trinity College Dublin, School of Psychology Research Ethics Committee (approval number-SPREC102020-50) and complied with GDPR data protection legislation.

Stimuli

We created 28 3D novel object shapes with uniform spacing between neighbouring objects (except for object shapes at each end of the shape space, see Fig. 1). The design of our shapes was based on a morphing process described elsewhere (Li et al.,2020), and the design of the shape space was adapted from a circular shape space described in previous studies (Li et al.,2020; O’Dowd et al.,2025). These object shapes were used as stimuli throughout all the experiments.

Fig. 1

An illustration of the 'shape space' of object stimuli used in the Experiments.

Note

The shape space was adapted from a previous circular shape space described by Li et al., (2020). The objects are arranged within the shape space such that neighbouring objects are equal in angular distance from each other. The exception to this is the object stimuli at the two endpoints of the shape space. Participants were trained to categorise objects highlighted in green and tested using both these learned as well as a set of novel object stimuli highlighted in yellow. The dashed line depicts the (arbitrary) category boundary of this shape space used in the Experiments. The learned stimuli are highlighted in green (9 per category) whereas the stimuli used to test for generalisation are highlighted in yellow (5 per category). The green and red framed stimuli are relevant to the design of Experiment 2 only. See Methods for further details.

We used 3D modelling software (Blender Foundation,3.5.0,2023,www.blender.org) to render each 3D object stimulus used in the present experiments. The objects were created using the following pipeline: (1)each shape was converted from .png to Scalable Vector Graphics(.svg) keeping three different colour levels constant (e.g.,white,black,grey); (2)each vector was imported into the 3D-space; (3)the outline was isolated and converted into a mesh; (4)each mesh was rotated along the central vertical axes through the Spin Function in Edit mode (360°, steps-100); (5) each 3D-shape was extracted from Blender. The 3D-space, lighting (point,radius-0.1m,1000W;coordinates:11m,-14 m,6.9m; rotation:40°,34.8°) and viewpoint (coordinates:10.9m,-14m,4m;rotation:70°,0°) settings were kept constant. Each 3D-object was rendered using the Workbench Engine (28-render samples,Single pass Anti-Aliasing viewpoint;Studio Lighting,Colour Material[dark grey,RGB:107,109,109,254;HEX:#6b6d6d] and Specular Lighting). All the object images were extracted with a resolution of 1080 x 1080 px, scale100%, and presented in a canonical, 3/4view so that the 3D-object and relevant features (e.g.,concavities) were visually accessible in the image. The visual presentation of each object was followed by a visual mask. These individual masks were created by scrambling the image of each object shape using a MATLAB script (MathWorks,2023). All object stimuli and scripts for generating the visual masks are available on the Open Science Framework page for Experiment 1, https://osf.io/s369c/?view_only=9015e8935be24628b5b15596b8eb6271 .

We allocated an arbitrary category boundary to the middle of the stimulus shape space (see Fig. 1), such that 14 objects were allocated per category. Within each category, shapes were highly similar, and shape similarity was the main cue for categorisation. Of the 14 objects per category, 9 objects were selected at the extreme point of the category for the learning session (green highlighted object shapes in Fig. 1) and 5 objects per category, closer to the category boundary, were used for generalization testing (yellow highlighted object shapes in Fig. 1). The use of object stimuli nearest to the category boundary for generalisation was to avoid using clearly distinguishable shapes, which would have reduced the difficulty of the task and possibly obscured any generalization effects.

Each rendered 3D-object was animated based on one of four different visual motion patterns; swing, jump, roll or shake, using the Workbench Engine. The motion patterns were chosen based on type (e.g., smooth or abrupt movement) and reference (e.g., movement of the object’s horizontal or vertical axis). All movement sequences had a duration of 2seconds. For the animation, each object stimulus was presented against a background consisting of two grey walls and a grey floor. The camera angle (coordinates:10.9m,-14m,4m;rotation:70°,0°) and lighting (point,radius-0.1m,1000W;coordinates:11m,-14 m,6.9m;rotation:40°,34.8°) were both held constant. The light source illuminated the scene from above, creating an object shadow on the floor to aid depth and motion perception. The individual moving objects were then extracted and used as stimuli in the experiment to be displayed against a black background on a mobile phone screen. The entire screen was used to display the objects and response options (see Fig. 2 for an illustration).

Fig. 2

A schematic illustration of the sequence of events from stimulus presentation to the response display used in (A) the learning session and (B) the categorisation test in Experiments 1 and 2.

Note

Only android mobile phones were used to in the experiment. The timeline of events starts from the left image and proceeds towards the right image for each of the learning and test examples. Wavy lines represent tactile vibrations which were correlated with the visual motion patterns. The categorisation test (B) depicts a trial from the Shape-only (Sv) condition. See text for details on the trial structure.

Tactile stimuli were delivered via the navigator.vibrate() function; a method of delivering tactile stimuli remotely via android smartphone browsers. This method is described, tested and validated elsewhere (Hirst et al.,2025), in short it allows the presentation of a vibration pulse sequence by specifying binary on/off commands, it does not allow for manipulation of vibration amplitude. Our tactile stimuli therefore consisted of a vibration pulse sequence derived from the visual motion pattern of each stimulus. This was achieved by extracting the temporal dynamics of visual motion for each movement (i.e., frequency and amplitude of movement) and mapping this onto an audio waveform with corresponding frequency and amplitude. The audio waveform was analysed using a 100ms rolling average to smooth fluctuations in amplitude, and a threshold was applied at 40% of the maximum amplitude to detect markers used to define pulse onset (“on” or vibration) and offset (“off” or pause) events. The resulting vibration patterns therefore closely mirrored the structure and timing of the visual motion stimuli with high temporal precision. A custom Python code (see OSF project page) ensured that the output format retained the timing and structure of the original input and was compatible with PsychoPy’s tactile delivery system. The validity of our tactile motion stimuli was supported by the results of pilot tests requiring participants to match vibration pulse sequences with their visual counterparts as well as an assessment of synchrony judgements based on tactile-only and bimodal (i.e., visual motion paired with tactile vibration) stimuli (see Supplementary-Materials–TableS1 for results).

Design

The experiment was structured around two main sessions: a learning block (with feedback) followed by a test block (without feedback). We conducted two versions of the experiment in which trials were either blocked (i.e.,Experiment-1A) or interleaved (i.e.,Experiment-1B) at test, with different participants taking part in each experiment. Both versions were based on the same within-subjects, fully factorial design with cue condition(4) and exemplar type(2) as factors. The four levels to the cue condition were: shape only (S_v); shape with visual motion (S_vM_v); shape with tactile motion (S_vM_t) and shape with both visual and tactile motion (S_vM_vt). The exemplar factor had two levels: learned or novel. Our primary outcome measure of interest in the test phase was categorisation accuracy.

Procedure

The experiment was built using PsychoPy (Pierce et al.,2019;2022,v2024.2.3) and delivered online through Pavlovia (https://pavlovia.org/) on Android mobile devices. To mask the sound of the vibration stimuli, continuous Brown noise was presented throughout the study at a volume level determined based on an initial method of adjustment procedure.

In this procedure, participants were presented with a repetitive vibration stimulus of 200 ms duration with a 500 ms ISI, alongside continuous Brown noise. They were asked to adjust their phone volume until they could no longer hear the sound of the vibration. Following this, the participants were asked to not adjust the volume on their phone for the duration of the entire experiment.

Following the method of adjustment procedure participants began the learning phase (Fig. 2A). Each trial in the learning started with a 250ms fixation cross, followed by 2 seconds the presentation of the object stimulus in which three cues were combined as a visually moving object presented in synchrony with tactile vibrations. The stimulus was followed by a 250ms visual mask.

The participant was then presented with a screen indicating the response options (i.e.,Category A” or “Category B”). A response triggered the offset of the response screen, if a response was not made in 4 seconds the task progressed automatically. Feedback on both accuracy and response time was presented for 750 ms at the end of each learning trial, e.g., green “Correct” or red “Incorrect” and “Too slow! Please respond faster” respectively). During the learning session, all objects were displayed with visual and (correlated) tactile motion (S_vM_vt); therefore, category membership was defined by all three cues equally. Each shape was associated with one of 4 visual motion patterns (swing, jump, roll, or shake), which were randomly assigned to the shapes across participants (but the shape-motion associations remained constant for each participant). The group of shapes assigned to each category were counterbalanced across participants. An accuracy threshold of 75% was required at the end of the learning session to continue to the categorisation test. If participants failed to reach this threshold within 3 repetitions of the trials, the study was terminated. There was a maximum of 54 trials (18 object shapes, with no more than 3 repetitions) during learning. Trial order was fully randomised across participants.

In the test phase, 28 stimuli (14 per category) were presented, including the 9 previously learned and 5 novel exemplars per category. Each test trial began with the presentation of a 250ms fixation cross, followed by an image of a stimulus (depending on cue condition) alongside response options, participants were given 4 seconds maximum to make a response before the task progressed automatically. A visual mask appeared for 250 ms after the cue to indicate the end of each trial. No feedback was provided at test.

Each stimulus was presented under one of the four cue conditions (Sv,SvMv,SvMt,SvMvt) and participants were instructed to categorise each stimulus as either 'A' or 'B' as accurately and quickly as possible. In Experiment-1A, cue condition was blocked at test, such that one block included trials in which stimuli were displayed from one cue condition only (e.g.,Sv-only). The blocks were presented in a random order across participants. In Experiment-1B all trials were presented as interleaved and in a random order across participants. For both Experiment-1A and − 1B, no feedback was provided during the categorisation test. Each participant took approximately 18 minutes to complete the experiment.

Analysis

The data were analysed with R via RStudio (R Core Team,2021). To assess the effect of cue condition and exemplar learning on categorisation performance, we fitted a generalised linear mixed-effects model (GLMM) on the raw participants’ responses (0,1) with a binomial distribution and logit link function. Themodel included cue condition (S_v,S_vM_v,S_vM_t,S_vM_vt), category exemplar (learned or novel), and their interaction as the predictors of interest. To establish statistical significance, likelihood ratio tests (type-II) were performed to compare the fit of the model with and without that predictor of interest, the model fit was evaluated via AIC, BIC, and log-likelihood values. 'Participant' was included as a random intercept, and a random slope for 'exemplar' was also included to account for within-subject variability in the effect of exemplar on categorisation performance. Models were fitted using the ‘lme4’ package (Bates et al.,2014). Post-hoc comparisons were conducted using the ‘emmeans’ package (Lenth Russell et al.,2022) and the Bonferroni correction was applied to correct for multiple comparisons (in which case the corrected p-values are reported). Fixed effects from the final model were converted from log-odds to predicted probabilities using the inverse logit transformation to facilitate interpretation (Muller & MacLehose,2014).

Results

The mean categorisation performance across each of the cue conditions and exemplar types is presented in Fig. 3 for the blocked (A) and interleaved (B) versions, respectively. In Experiment-1A, in which trials were blocked at test, likelihood ratio tests indicated statistically significant main effects of cue condition (χ2(3) = 26.50,p < .001) and exemplar type (χ2(1) = 11.80,p < .001) on categorisation accuracy. However, the cue condition*exemplar type interaction did not significantly contribute to the model (χ2(3) = 6.40,p = .094). Post-hoc comparisons for the main effect of cue condition confirmed that categorisation accuracy was significantly higher for the S_vM_v odds ratio, OR = .395, 95%CI[0.224,0.698], p < .001), S_vM_t (OR = .565,95%CI[0.376, 0.848],p = .001), and S_vM_vt (OR = .209,95%CI[0.101,0.430], p < .001) compared to the S_v condition. Moreover, categorisation accuracy was significantly lower in both the S_vM_v (OR = .528,95%CI[0.330, 0.845],p = .002) and S_vM_t (OR = .370,95%CI[0.208,0.657], p < .001) conditions, compared to the S_vM_vt condition, see Fig. 3A. No significant differences were observed between the S_vM_v and S_vM_t conditions (OR = 1.429,95%CI[0.871,2.344],p = .343). Post-hoc comparisons on the main effect of exemplar type suggested that participants’ categorisation performance was less accurate for the novel (75%) compared to the learned (82%) object exemplars (OR = .67,95%CI[0.538,0.834], p < .001;see Fig. 3A).

Likelihood ratio tests on the performance in the interleaved version of the Experiment-1B suggested similar effects. We found statistically significant main effects of cue condition (χ2(3) = 12.20,p = .007) and exemplar type (χ2(1) = 8.08,p = .005) in the model predicting categorisation accuracy. Again, the cue condition*exemplar type interaction did not significantly contribute to this model (χ²(3) = 1.47,p = .69). Post-hoc comparisons for the cue condition effect revealed that accuracy was significantly lower in the S_v condition compared to the S_vM_v (OR = .806,95%CI[0.654,0.993],p = .039) and S_vM_vt conditions (OR = .772,95%CI[0.627,0.950],p = .006), as shown in Fig. 3B. However, we failed to find a significant difference between the S_v and the S_vM_t condition (OR = 0.845,95%CI[0.686,1.040],p = .194). Additionally, no significant differences were found between the two-cue conditions (S_vM_v vs S_vM_t: OR = 1.05,p = 1); and there was no benefit found to the S_vM_vt condition relative to either the S_vM_v (OR = .958,p = 1) or S_vM_t conditions (OR = .914,p = 1). Regarding the exemplar type, categorisation accuracy was significantly lower for novel (65%) compared to learned (76%) exemplars (OR = .597,95%CI[0.422,0.844],p = .004), see Fig. 3B. Finally, a one-sample t-test showed that accuracy in the S_v condition was significantly above chance for both Experiment-1A (M = .61,SD = .23,t(44) = 3.21,p = .002), and 1B (M = .63,SD = .18,t(30) = 4.09,p < .001).

Fig. 3

Note

Error bars represent 95% confidence intervals. Dashed line indicates chance level (50%).

Discussion

Experiments 1A and 1B indicated a benefit of having all learned cues - object-shape, visual motion, and tactile vibrations - available at test, relative to when only the shape of the object exemplars was available. Categorisation performance was least accurate, although above chance, in the shape-only cue condition, suggesting that shape similarity alone did not result in robust categorisation. Performance to novel exemplars was further affected, although our data suggested that generalisation occurred even in the shape-only cue condition. Interestingly, while categorisation performance was affected by the number of available cues, the data suggest no specific benefit for any one sensory modality, at least when trials were blocked. In other words, performance across both two-cue condition(S_vM_v and S_vM_t) was equally accurate, suggesting that visual and tactile motion contributed to category formation in a similar way. However, this result was influenced by the randomization procedure: in the interleaved trials tactile motion alone(S_vM_t) did not improve performance relative to the object shape-only cue(S_v), unlike in blocked conditions where both tactile(S_vM_t) and visual(S_vM_v) motion independently improved performance relative to static conditions.

In contrast, we found a stronger advantage for cue combination in the blocked (Experiment 1A) trial presentation. Under blocked presentations, there was a greater distinction in performance across the different cue conditions. This finding suggests that task consistency may support more effective use of the cues, whereas interleaving trials may add uncertainty reducing the observed benefit of information from combined cues on categorisation.

As expected, accuracy was consistently better for learned relative to novel exemplars, and there was no influence of trial presentation on the categorisation benefit for learned exemplars (7% and 11% difference between learned and novel exemplars in the blocked and interleaved versions, respectively), suggesting little effect of task demands. As noted earlier, exemplars that are positioned closer to the category boundary are typically more difficult to categorise than those that are positioned further from the boundary (Newell&Bulthoff,2002). Therefore, the difference in performance between learned and novel exemplars may be influenced by stimulus position within the category shape space as well as exemplar novelty. Future research could help tease apart the relative contributions of exemplar position in shape space and frequency of exposure on categorisation. Interestingly, the effect of cues did not differ between learned and novel exemplars in either version of the experiment, suggesting that the benefit afforded by multisensory motion cues is no higher for novel compared to learned exemplars.

The number of participant data sets included in the analyses was lower than the initial number of participants recruited. This was necessary for a number of reasons. First, we required that the participants first reach a learning rate of 75%, which is a reasonable rate given the number of factors involved in the experiment and is consistent with other studies on categorisation. Second, we also required that participants' performance was greater than chance level across the task. A number of participants failed to meet this requirement although they had successfully learned to categorise the objects. Although our overall attrition rate from initial recruitment appears high, it is not inconsistent with other studies of rates of successful categorisation performance in the lab (Smith et al.,2014;Roark&Chandrasekaran,2023) or indeed in reports of drop outs from online testing in behavioural studies (e.g.,Peer et al.,2022; although notably, data collected via Prolific is often considered of high quality).

Together, these findings suggest that a combination of cues from different sensory modalities can facilitate object categorisation and generalisation relative to object shape alone. Given that the visual and tactile motion cues to category membership were few in number and consistently reliable, our result that tactile motion did not benefit categorisation relative to shape-only cues in the interleaved trials might be considered surprising. Because of their consistency, both the visual motion and tactile motion cues might be expected to dominate categorisation performance relative to the shape cue alone. Since categorisation performance was better than chance to the shape cue(S_v) alone we can infer that shape was informative for categorisation. Indeed, the presence of each motion cue during test appears to have influenced categorisation performance in an additive manner, at least when trials were blocked. Given this, in Experiment2, we aimed to further explore the contribution of the shape cue by manipulating its informativeness relative to other cues on categorisation.

Experiment 2

Perceptual similarity may support the formation of categories, such that similar shapes are grouped together under a single category label (e.g.,Nosofsky,1986;Goldstone,1994), even if encoded through touch (Cooke et al.,2007). However, in the real world, information about an object's shape can be affected by environmental factors such as distance, occlusion, lighting or viewpoint (e.g.,Newell,1998), thus reducing the reliability of shape similarity for determining categorisation. Consequently, other cues to categorisation may become salient such as the way an object moves (Newell et al.,2004;Blake&Shiffrar,2007). To investigate this, we reduced the informativeness of the object-shape similarity as a cue on category membership. Importantly, we maintained both the visual and tactile motion cues as fully predictive of category membership, as in Experiment1. Consequently, we hypothesised that participants' categorisation performance, to both learned and novel exemplars, should be mainly influenced by motion cues. As in Experiment1, we are testing this in both blocked (Experiment-2A) and interleaved (Experiment-2B) test conditions.

Method

Participants

An a priori power analysis was performed (PANGEA,v0.2, for a within-subjects design) and the advised minimum sample size for 80% power to detect the effects of interest with size (Cohen's d) of 0.3, was 32 participants for each of the version of the experiment. All participants were recruited using Prolific (https://www.prolific.co/), and inclusion criteria were fluency in English, normal or corrected-to-normal vision and no hearing impairments. We initially recruited 85 participants, of whom 76 (i.e. 89%) successfully learned the categories. Of the learners, 10 participants failed to score above chance in the categorisation test and their data were not included in subsequent analyses. Following these exclusions, a total of 66 participants (mean age = 39.72years, SD = 10.1; 48.5%female) completed Experiment 2, with 33 assigned to the blocked version (Experiment-2A) and 33 to the interleaved version (Experiment-2B).

Stimuli

The same stimulus set, and motion types described in Experiment1 were used. However, we used a different arrangement of the object shapes across categories to reduce object shape similarity as a cue to category membership. To that end, we reassigned two out of the nine exemplars in each category to the opposite category. For example, referring to Fig. 1, object shapes highlighted in red, which, in Experiment 1, were members of the same object category (e.g.,Category A), were now assigned to the opposite category (e.g.,Category B) during learning (and likewise for object shapes highlighted in green). Thus, only 78% of the neighbouring shapes were assigned to one category (see Fig. 1) in the current experiment. The subset of exemplars that were novel but included in the test was the same as in Experiment1 (Fig. 1).

Design and Procedure

Experiment2 followed the same design and procedure as described in Experiment 1 with the same four cue conditions (S_v,S_vM_v,S_vM_t,S_vM_vt) and two exemplar levels (learned and novel). As in Experiment1, trials were either blocked by cue (Experiment-2A) or interleaved (Experiment-2B) and participants were pseudo-randomly assigned to each version of the experiment.

Results

The main analysis procedure was identical to that of Experiment 1.

Categorisation performance across the four main cue conditions and each learned or novel exemplar condition is presented in Fig. 4. First, categorisation performance on the blocked trials (Experiment-2A) was analysed using a likelihood ratio test, which compared the full model, which included the interaction between cue condition and exemplar type, to a reduced model without the interaction term. The comparison indicated that the full model significantly improved the fit of the data (χ²(3) = 9.09,p = 0.028), suggesting that the interaction between cue condition and exemplar type contributed to categorisation accuracy (Fig. 4A). There were significant main effects of both the cue condition and exemplar type on categorisation accuracy. Post-hoc comparisons revealed that categorisation accuracy was significantly lower in the object-shape-only(S_v) condition compared to all the combined cue conditions. Specifically, accuracy to the S_v condition was significantly lower than to the S_vM_v condition (OR = .570,95%CI[0.306,1.064],p = .018), the S_vM_t condition (OR = 1.589,95%CI[0.987,2.557],p = .013), and the S_vM_vt condition (OR = 4.832,95%CI[1.911,12.222],p < .001). As in Experiment 1, no significant differences were found between the S_vM_v and S_vM_t conditions (p = 1), nor between each of these two-cue conditions and the three-cue condition (S_vM_v-vs-S_vM_vt, p = 1; S_vM_v-vs-S_vM_vt,p = 1). There was a main effect of exemplar type with better performance to learned (76%) over novel (65%) exemplars (OR = .597,95%CI[0.422,0.844],p = .004). The significant interaction between cue condition and exemplar type was driven mainly by performance between each of the two-cue conditions and the S_v condition (see Fig. 4A): performance was better to the S_vM_v than S_v condition for the learned (OR = .570,p = .018), but not novel (OR = .770,p = .282) exemplars and, similarly, performance was better to the S_vM_t than S_v condition for learned (OR = 1.589,p = .013) but not novel (OR = 1.218,p = .282) exemplars. Performance was better to the S_vM_vt condition than the S_v condition for the novel version only (OR = 4.658,p < .001).

We then conducted likelihood ratio tests on performance when trials were interleaved (Experiment-2B) which indicated a main effect of cue condition (χ²(3) = 276.72,p < .001) but not of the exemplar type (χ2(1) = .953,p = .329) in the model predicting categorisation accuracy, as shown in Fig. 4B. Furthermore, the cue condition by exemplar type interaction did not significantly contribute to this model (χ²(3) = .73,p = .866). Post-hoc comparisons indicated that categorisation accuracy was significantly lower in the S_v condition compared to the S_vM_v condition (OR = .617,p < .001), the S_vM_t condition (OR = .455,p < .001), as well as the S_vM_vt condition (OR = .299,p < .001). Moreover, performance was better to the S_vM_vt condition compared to the S_vM_v(OR = .484,p < .001) and S_vM_t(OR = .657,p < .001) conditions. In contrast to the results from the blocked trials, performance to S_vM_v was lower than to the S_vM_t(OR = .736,p < .001) condition when trials were interleaved.

Finally, a one-sample t-test showed that accuracy in the S_v condition was significantly above chance for Experiment-2A(M = .61,SD = .17,t(32) = 4.27,p < .001), but not for Experiment-2B(M = .50,SD = .06,t(31)=-0.29,p = .77).

Fig. 4

Note

Error bars represent 95% confidence intervals. Dashed line indicates chance level (50%).

Discussion

In Experiment2, we reduced the informativeness of shape similarity as a cue to category membership to investigate whether participants would rely more on motion cues. The results suggest that when only the shape cue was available at test, performance was relatively poor. This was especially the case when trials were interleaved, i.e., when cue availability was less predictable from one trial to the next, relative to the version in which trials were blocked.

Categorisation performance was best when all cue combinations were available across experiment versions (blocked or interleaved). That is, the combination of shape with both motion cues led to significantly better accuracy than object-shape alone, for both learned and novel exemplars. However, in the blocked condition, this advantage did not extend to the two-cue combinations, in the blocked condition, this advantage did not extend to the two-cue combinations (i.e. S_vM_v or S_vM_t vs S_v independently) for novel exemplars, which did not significantly outperform the shape-only condition. The significant interaction between cue condition and exemplar type suggests that while all cue combinations supported learning of the trained exemplars, successful generalisation to novel exemplars relied more on the availability of both motion cues together This suggests that when the reliability of shape as a cue to category membership is reduced, generalisation required the full cue-combination context at test.

Interestingly, when trials were interleaved, tactile motion provided a greater benefit than visual motion on both categorisation and generalisation performance. Again, this finding suggests a greater weighting of reliable cues, such as motion over shape. The relative benefit of tactile over visual motion may be due to the easier combination of visual motion with, and therefore segregation of tactile motion from, the object shape (see, Chen&Spence,2017).

In Experiment1 and 2, the shape cue was always presented with motion cues during learning; therefore, it was unclear to what extent shape itself influenced learning and consequent categorisation and generalisation performance at test. Indeed, performance was consistently worse to the shape-only cue condition at test, and we attribute this performance to the absence of the other learned cues. In the following control experiment, we investigated the specific contribution of visual object-shape as a cue to category learning.

Experiment 3

In Experiment3, participants learned categories defined solely by object shape (static images) and at test were assessed across the same four cue conditions used previously (i.e.,S_v,S_vM_v,S_vM_t,S_vM_vt). The use of motion cues at test further allowed us to assess the added value of multisensory motion cues beyond shape information on categorisation and generalisation.

Methods

Participants

We initially recruited 45 participants online using Prolific, 8 of whom failed to reach learning criteria and a further 7 failed to perform above chance at test and their data were not included in the analyses. Therefore, total of 30 participants (mean age = 37.44years,SD = 10.10;48.5%female) successfully completed Experiment3; the reimbursement rate and inclusion criteria were identical to those of the previous Experiments.

Stimuli

All the stimuli, and arrangement of shapes into the two categories, were identical to that of Experiment1(see Fig. 1).

Design and Procedure

The experiment was based on the same design and procedure as described in Experiment1 with the exception that the category learning task was performed on static object-shape stimuli. Also, we blocked trials by cue condition during the test session (as in Experiment-1A). Both learned and novel exemplars were presented at test in a random order across participants. Overall, the experiment took approximately 16 minutes for each participant to complete.

Results and Discussion

Likelihood ratio tests indicated statistically significant main effects of exemplar type (χ2(1) = 34.95,p < .001) but not of cue condition (χ2(3) = 5.25,p = .15) to the model predicting categorisation accuracy. The cue condition*exemplar type interaction did not significantly contribute to this model (χ²(3) = 2.748,p = .43). Overall, irrespective of condition, participants categorisation performance was less accurate to the novel (76%) compared to the learned (86%) exemplars(OR=-7.38,95%CI[0.08,0.24],p < .001).

The results suggest that object shape was sufficient for category formation. However, we found that generalisation to novel exemplars was poor, with a drop of approximately 20% accuracy compared to the categorisation of learned exemplars (Fig. 5). Overall, these findings underline the important role of multiple cues during category formation and suggest that the benefit of visual and tactile object motion cues observed in the test session in the previous experiments are contingent on their prior integration into the category structure.

Fig. 5

Note

Error bars represent 95% confidence intervals. Dashed line indicates chance level (50%).

General Discussion

Across three categorisation experiments, delivered via mobile phones, we examined whether previously learned multisensory categories – defined by shape, visual motion, and tactile vibration – could support object categorisation and generalisation. The motion cues were correlated across modalities and predictive of category membership (Experiment 1 and 2). Although shape similarity underpinned category membership in Experiment1, its reliability was reduced in Experiment2. Overall, the results of both experiments suggested that visual and tactile motion facilitated both the categorisation and generalisation of exemplars relative to shape-only information. The results of Experiment 3 confirmed that visual shape alone could support category learning in our tasks, although generalisation was poor. In addition, the task format, i.e. whether trials were blocked by cue or interleaved, significantly modulated the observed multisensory benefits: whereas blocked trials (Experiments-1A and − 2A) consistently led to higher accuracy and better generalisation, an interleaved presentation (Experiments-1B and − 2B) resulted in smaller performance differences between cue conditions and a less robust contribution from tactile cues. Our findings elucidate the featural and context-sensitive nature of multisensory categorisation and demonstrate that the effectiveness of cross sensory cues for categorisation is dependent on cue reliability as well as task structure.

Our results align with previous evidence on the role of cue reliability on perception (Oruç et al.,2003;Helbig et al.,2012;Bankieris et al.,2017) whilst extending this to the realm of category learning. For example, when the inter-object similarity of shapes was reduced as a cue to categorisation (Experiment2), and motion cues remained fully informative, participants' categorisation performance was influenced more by the tactile and visual motion cues, particularly under blocked conditions. Our results further support the idea that tactile cues are not merely supplementary but can be integral to the formation of category representations (Yildirim&Jacobs,2013;Gaissert&Wallraven,2012), especially when they are temporally aligned and task-relevant (Nordmark,1978;Heller,1989). Indeed, tactile motion cues played an important role in enhancing categorisation performance in all experiments, suggesting that individuals can strategically adapt their category learning performance to prioritise diagnostic information. Even under interleaved trial presentation (Experiment-2B), performance in combined cue condition remained high, suggesting that redundant motion cues can compensate against uncertainty from the shape cue.

The different effect of motion cues under blocked or interleaved trials suggests that task demands can influence cue combination, including across modalities (Carvalho&Goldstone,2019;Maddox et al.,2006). Although some evidence suggests that interleaved presentation enhances generalisation due to discriminative contrast (Brunmair & Richter,2019;Kornell & Bjork,2008;Carvalho & Goldstone,2014;2015), including across modalities (Abel,2023), we found that categorisation performance was weaker for novel than for learned exemplars. Although the reason for this inconsistency is unclear, we suggest that interleaved presentation may either trigger a greater reliance on individual cues or modalities or may impair the retrieval of specific cue-category associations. Under blocked conditions, the motion cues integrated into the object representations may act as contextual supports in memory for categorising the object shape (Duarte et al.,2023;2025).

In Experiment 3, the substantial (~ 20%) drop in performance to novel exemplars relative to the performance in previous experiments supports the idea that multisensory encoding during learning, rather than test-phase cue availability, is critical for supporting generalisation. This is consistent with evidence that learned multisensory representations subsequently support better generalisation across modalities (Yildirim & Jacobs,2013). Interestingly, there was no benefit for motion cues at test suggesting that unfamiliar—but potentially informative—cues are not automatically used without prior association.

Our ability to categorise objects in the real world is likely to be dependent on a complex interaction between the nature and type of sensory information available, prior knowledge, and other context-dependent factors related to the task. For example, only tactile motion seems to have had a differential influence on categorisation and generalisation performance across task structure (interleaved or blocked trials). Moreover, methodological factors may help explain some of the inconsistencies in the reported findings in the literature (Wang&Zeng,2022;Roark &Holt,2022;Roark et al.,2021;O'Dowd et al.,2025). Consistent with this complexity, our findings suggest that no single mechanism fully explains our data. Future studies are needed to investigate how cue combination benefits learning and subsequent performance. In particularly it would be interesting to know if multisensory cues lead to enhanced learning performance (Shams&Seitz,2008) or if categorisation responses go beyond predictions based on the summation of probability alone (see Nardini et al.,2010). In general, our results support the use of flexible, context-dependent cue combination strategies for learning novel object categories, and therefore help contribute to the ongoing debate about the role of multisensory information in object categorisation (Newell et al.,2023).

Our task was designed to enable remote data collection online via mobile devices. Although object perception experiments involving touch typically necessitate in-person testing (see Newell,2004;2010;Woods&Newell,2004), technological improvements have meant that remote psychophysical testing using different modalities is rapidly increasing (Marin-Campos et al.,2021;Zhao et al.,2022;Hirst et al.,2025;Inuggi et al.,2024). The benefits for remote assessment are well-documented including the recruitment of larger and more heterogeneous participant samples (Grootswagers,2020), and easier longitudinal follow ups. Our work has important practical implications for the remote testing of multisensory category formation. Recent research from our team has validated the use of vibrations for remote testing contexts (Hirst et al.,2025), however to our knowledge no study has yet used this method to deliver complex patterns of vibration stimuli. This is therefore the first study to use complex vibration stimuli, delivered via the browser, to explore multisensory processes. It is of practical note that this method was successful, participants identified the correspondence between vibration patterns and visual stimuli with ease (Supplementary-Material) and this enabled us to conduct the current design and share this method for future research. We argue that remote delivery of tactile stimulation through mobile devices is an ecologically valid assessment method given that tactile feedback is increasingly integrated into technology (Muender et al.,2022;Ziat et al.,2022;Zwoliński et al.,2022). Indeed, technological advances now mean that many everyday objects incorporate some form of vibration feedback: you may feel resistance in the steering wheel of your car when changing lanes, the driver's seat might pulsate to signal danger, or your mobile phone may vibrate to signal an incoming call. Tactile feedback has also been shown to be effective at improving task performance (Fang et al.,2023) and the sense of presence (Huard et al.,2022;Kim et al.,2022). Moreover, delivery through commodity devices has real world implication for gaming or training app design. Despite these potential benefits, few published studies have utilised vibration delivery through mobile phones to study perception, and those that have report similar effects as lab-based studies (Inuggi et al.,2024;Hirst et al.,2025). Methodologically, the use of smartphone-delivered tactile cues offers an accessible and ecologically valid way for investigating multisensory interactions, particularly in applied domains such as education, human-computer interaction, and rehabilitation, where redundant sensory input—including from touch—can support cognitive abilities (see Gori et al.,2022).

Conclusions

The present study investigated whether visual and tactile motion cues enhance the categorisation of novel object shapes, and whether these multisensory categories support generalisation to new exemplars. Across three experiments delivered via mobile phones, we tested whether previously learned multisensory categories – defined by shape, visual motion, and tactile vibration – could support object categorisation and generalisation, across blocked versus interleaved cue-conditions at test. In Experiment 1 all cues were fully predictive of category membership, while in Experiment 2 the reliability of shape information for categorisation was reduced. In Experiment 3, categories were learned with static shape only. When cues were equally reliable (Experiment 1), performance improved when all cues were available at test, and this was replicated when shape reliability was reduced (Experiment 2). Task format modulated these benefits: blocked presentation (Experiments-1A and − 2A) yielded higher accuracy and stronger generalisation, whereas interleaved presentation (Experiments-1B and − 2B) reduced performance differences and weakened contributions from tactile cues. Finally, visual and tactile motion facilitated categorisation and generalisation relative to shape-only learning (Experiment 3). Collectively, these findings demonstrate that multisensory motion cues promote object category formation and generalisation, with effectiveness influenced by cue reliability and task structure. The results have important implications for our understanding of the underlying dynamic and multisensory nature of object categories and the predictive role of multisensory features on category formation. To our knowledge, this is also the first study to use complex vibration stimuli, delivered via the browser, to investigate multisensory processes. These results and methodology provide a valuable resource for future work on tactile and multisensory processing in applied contexts.

Acknowledgments

Fundings

Competing interests

One of the authors (Rebecca J. Hirst) is also an author of the package used to demonstrate vibration in this manuscript (PsychoPy/PsychoJS). This software is provided free and open source to all users.

This author is also part of Open Science Tools Ltd., a social enterprise that receives income from studies conducted on Pavlovia.org. Pavlovia.org is agnostic to the software package creating the studies—it supports studies from PsychoPy, jsPsych, and Lab.js—and the resulting revenue stream funds further tool development.

Ethics approval

This study was performed in line with the principles of the 1964 Declaration of Helsinki. The study was approved by the Trinity College Dublin, School of Psychology Research Ethics Committee (approval number SPREC102020-50).

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publish

Participants provided consent for anonymized data to be published as part of this work.

Data Availability

Experiments were preregistered prior to data collection on the Open Science Framework: Experiment-1A, [https://osf.io/2ms48](https:/osf.io/2ms48) ; Experiment-2A, [https://osf.io/dxwej](https:/osf.io/dxwej) ; Experiment 3, [https://osf.io/pem72](https:/osf.io/pem72) . All data and both R and Python code supporting the findings of this study are openly available on the Open Science Framework (OSF) at [https://osf.io/s369c/?view\_only=9015e8935be24628b5b15596b8eb6271](https:/osf.io/s369c/?view_only=9015e8935be24628b5b15596b8eb6271) for Experiment 1 (A,B), at [https://osf.io/vkg7c/?view\_only=08bc08d34e704ccbb9e90b1b4437f08b](https:/osf.io/vkg7c/?view_only=08bc08d34e704ccbb9e90b1b4437f08b) for Experiment 2 (A,B), and [https://osf.io/cnmhv/?view\_only=eb8ecf26e5da471aa39b455a101b0dbb](https:/osf.io/cnmhv/?view_only=eb8ecf26e5da471aa39b455a101b0dbb) for Experiment3.

Author Contribution

Conceptualization: MAS and FNN; Methodology: MAS, RJH, AOD and FNN; Formal analysis and investigation: MAS, RJH, AOD, IC and FNN; Writing - original draft preparation: MAS and FNN; Writing - review and editing: MAS, RJH, AOD, IC and FNN; Funding acquisition: FNN; Supervision: FNN. All authors read and approved the final manuscript.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

References

Abel R (2023) Interleaving effects in blindfolded perceptual learning across various sensory modalities. Cogn Sci 47(4):e13270. https://doi.org/10.1111/cogs.13270

Abel R, Brunmair M, Weissgerber SC (2021) Change one category at a time: Sequence effects beyond interleaving and blocking. J Experimental Psychology: Learn Memory Cognition 47(7):1083–1105. https://doi.org/10.1037/xlm0001003

Amemiya T, Beck B, Walsh V, Gomi H, Haggard P (2017) Visual area V5/hMT + contributes to perception of tactile motion direction: a TMS study. Sci Rep 7(1):40937. https://doi.org/10.1038/srep40937

Atkin C, Stacey JE, Roberts KL, Allen HA, Henshaw H, Badham SP (2023) The effect of unisensory and multisensory information on lexical decision and free recall in young and older adults. Sci Rep 13(1):16575. https://doi.org/10.1038/s41598-023-41791-1

Bankieris KR, Bejjanki VR, Aslin RN (2017) Sensory cue-combination in the context of newly learned categories. Sci Rep 7(1):10890. https://doi.org/10.1038/s41598-017-11341-7

Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67:1–48. https://doi.org/10.18637/jss.v067.i01

Blake R, Shiffrar M (2007) Perception of human motion. Annu Rev Psychol 58(1):47–73. https://doi.org/10.1146/annurev.psych.57.102904.190152

Blender Foundation (2023) Blender (Version 3.6) [Computer software]. https://www.blender.org

Broadbent H, Osborne T, Kirkham N, Mareschal D (2020) Touch and look: The role of visual-haptic cues for categorical learning in primary school children. Infant Child Dev 29(2):e2168. https://doi.org/10.1002/icd.2168

Brunel L, Goldstone RL, Vallet G, Riou B, Versace R (2013) When seeing a dog activates the bark. Exp Psychol 60(2):100–112. https://doi.org/10.1027/1618-3169/a000176

Brunmair M, Richter T (2019) Similarity matters: A meta-analysis of interleaved learning and its moderators. Psychol Bull 145(11):1029–1052. https://doi.org/10.1037/bul0000209

Carvalho PF, Goldstone RL (2015) The benefits of interleaved and blocked study: Different tasks benefit from different schedules of study. Psychon Bull Rev 22(1):281–288. https://doi.org/10.3758/s13423-014-0676-4

Carvalho PF, Goldstone RL (2019) When does interleaving practice improve learning? In J. Dunlosky & K. A. Rawson (Eds.), The Cambridge Handbook of Cognition and Education (pp. 411–436). Cambridge University Press. https://doi.org/10.1017/9781108235631.017

Chan JS, Simões-Franklin C, Garavan H, Newell FN (2010) Static images of novel, moveable objects learned through touch activate visual area hMT+. NeuroImage 49(2):1708–1716. https://doi.org/10.1016/j.neuroimage.2009.09.068

Chen YC, Spence C (2017) Assessing the role of the ‘unity assumption’on multisensory integration: A review. Front Psychol 8:445. https://doi.org/10.3389/fpsyg.2017.00445

Cooke T, Jäkel F, Wallraven C, Bülthoff HH (2007) Multimodal similarity and categorization of novel, three-dimensional objects. Neuropsychologia 45(3):484–495. https://doi.org/10.1016/j.neuropsychologia.2006.02.009

Duarte SE, Ghetti S, Geng JJ (2023) Object memory is multisensory: Task-irrelevant sounds improve recollection. Psychon Bull Rev 30(2):652–665. https://doi.org/10.3758/s13423-022-02182-1

Duarte SE, Yonelinas AP, Ghetti S, Geng JJ (2025) Multisensory processing impacts memory for objects and their sources. Mem Cognit 53(2):646–665. https://doi.org/10.3758/s13421-024-01592-x

Edmunds CER, Inkster AB, Jones PM, Milton F, Wills AJ (2020) Absence of cross-modality analogical transfer in perceptual categorization. Open J Experimental Psychol Neurosci 1:3–13. https://doi.org/10.46221/ojepn.2020.8639

Fang L, Müller T, Pescara E, Fischer N, Huang Y, Beigl M (2023) Investigating passive haptic learning of piano songs using three tactile sensations of vibration, stroking and tapping. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(3), 1–19. https://doi.org/10.1145/3610899

Feng G, Gan Z, Yi HG, Ell SW, Roark CL, Wang S, Chandrasekaran B (2021) Neural dynamics underlying the acquisition of distinct auditory category structures. NeuroImage 244:118565. https://doi.org/10.1016/j.neuroimage.2021.118565

Fleming RW (2017) Material perception. Annual Rev Vis Sci 3(1):365–388. https://doi.org/10.1146/annurev-vision-102016-061429

Fleming RW, Wiebel C, Gegenfurtner K (2013) Perceptual qualities and material classes. J Vis 13(8):9–9. https://doi.org/10.1167/13.8.9

Gaissert N, Wallraven C (2012) Categorizing natural objects: a comparison of the visual and the haptic modalities. Exp Brain Res 216(1):123–134. https://doi.org/10.1007/s00221-011-2916-4

Ge Y, Li F, Li X, Li W (2021) What is the mechanism underlying the interleaving effect in category induction: An eye-tracking and behavioral study. Front Psychol 12:770885. https://doi.org/10.3389/fpsyg.2021.770885

Gibson JJ (1962) Observations on active touch. Psychol Rev 69(6):477–491. https://doi.org/10.1037/h0046962

Goldstone RL (1994) The role of similarity in categorization: Providing a groundwork. Cognition 52(2):125–157. https://doi.org/10.1016/0010-0277(94)90065-5

Goldstone RL, Hendrickson AT (2010) Categorical perception. Wiley Interdisciplinary Reviews: Cogn Sci 1(1):69–78. https://doi.org/10.1002/wcs.26

Gori M, Price S, Newell FN, Berthouze N, Volpe G (2022) Multisensory Perception and Learning: Linking Pedagogy, Psychophysics, and Human–Computer Interaction. Multisensory Res 35(4):335–366. https://doi.org/10.1163/22134808-bja10072

Griffiths TD, Warren JD (2004) What is an auditory object? Nat Rev Neurosci 5(11):887–892. https://doi.org/10.1038/nrn1538

Grill-Spector K (2003) The neural basis of object perception. Curr Opin Neurobiol 13(2):159–166. https://doi.org/10.1016/S0959-4388(03)00040-0

Grootswagers T (2020) A primer on running human behavioural experiments online. Behav Res Methods 52(6):2283–2286. https://doi.org/10.3758/s13428-020-01395-3

Haag S (2011) Effects of vision and haptics on categorizing common objects. Cogn Process 12:33–39. https://doi.org/10.1007/s10339-010-0369-5

Helbig HB, Ernst MO, Ricciardi E, Pietrini P, Thielscher A, Mayer KM, Noppeney U (2012) The neural mechanisms of reliability weighted integration of shape information from vision and touch. NeuroImage 60(2):1063–1072. https://doi.org/10.1016/j.neuroimage.2011.09.072

Heller MA (1989) Texture perception in sighted and blind observers. Percept Psychophys 45(1):49–54. https://doi.org/10.3758/BF03208032

Hirst R, Roberts K, Seveso MA, O'Dowd A, Peirce J, Newell FN (in review). Delivering Tactile Stimuli via Mobile Browsers: A Method for Remote Multisensory Research. https://doi.org/10.31219/osf.io/y84js_v1

Huard A, Chen M, Sra M (2022) CardsVR: a two-person VR experience with passive haptic feedback from a deck of playing cards. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 538–547, https://doi.org/10.1109/ISMAR55827.2022.00070

Inuggi A, Domenici N, Tonelli A, Gori M (2024) PsySuite: An android application designed to perform multimodal psychophysical testing. Behav Res Methods 56(8):8308–8329. https://doi.org/10.3758/s13428-024-02475-4

Kim D, Kim Y, Jo D (2022) Exploring the effect of virtual environments on passive haptic perception. Appl Sci 13(1):299–312. https://doi.org/10.3390/app13010299

Kornell N, Bjork RA (2008) Learning concepts and categories: Is spacing the enemy of induction? Psychol Sci 19(6):585–592. https://doi.org/10.1111/j.1467-9280.2008.02127.x

Kost AS, Carvalho PF, Goldstone RL (2015) Can you repeat that? The effect of item repetition on interleaved and blocked study. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 37). https://escholarship.org/uc/item/864370qv

Lacey S, Pappas M, Kreps A, Lee K, Sathian K (2009) Perceptual learning of view-independence in visuo-haptic object representations. Exp Brain Res 198(2):329–337. https://doi.org/10.1007/s00221-009-1856-8

Lederman SJ, Klatzky RL (1987) Hand movements: A window into haptic object recognition. Cogn Psychol 19(3):342–368. https://doi.org/10.1016/0010-0285(87)90008-9

Lederman SJ, Klatzky RL (1990) Haptic classification of common objects: Knowledge-driven exploration. Cogn Psychol 22(4):421–459. https://doi.org/10.1016/0010-0285(90)90009-S

Lenth Russell V, Paul B, Maxime H, Maarten J, Jonathon L, Fernando M, Henrik S (2022) emmeans: Estimated marginal means, aka least-squares means. R package version 1(7):5

Li AY, Liang JC, Lee AC, Barense MD (2020) The validated circular shape space: Quantifying the visual similarity of shape. J Exp Psychol Gen 149(5):949–966. https://doi.org/10.1037/xge0000693

Li J, Deng SW (2023) Facilitation and interference effects of the multisensory context on learning: a systematic review and meta-analysis. Psychol Res 87(5):1334–1352. https://doi.org/10.1007/s00426-022-01733-4

Maddox WT, Ing AD, Lauritzen JS (2006) Stimulus modality interacts with category structure in perceptual category learning. Percept Psychophys 68(7):1176–1190. https://doi.org/10.3758/BF03193719

Mahns DA, Perkins NM, Sahai V, Robinson L, Rowe MJ (2006) Vibrotactile frequency discrimination in human hairy skin. J Neurophysiol 95(3):1442–1450. https://doi.org/10.1152/jn.00483.2005

Marin-Campos R, Dalmau J, Compte A, Linares D (2021) StimuliApp: Psychophysical tests on mobile devices. Behav Res Methods 53(3):1301–1307. https://doi.org/10.3758/s13428-020-01491-4

MathWorks (2023) MATLAB (Version 23.2.0.2380103 R2023b) [Computer software]. The MathWorks Inc.

Muender T, Bonfert M, Reinschluessel AV, Malaka R, Döring T (2022) Haptic fidelity framework: Defining the factors of realistic haptic feedback for virtual reality. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1–17). https://doi.org/10.1145/3491102.3501953

Muller CJ, MacLehose RF (2014) Estimating predicted probabilities from logistic regression: different methods correspond to different target populations. Int J Epidemiol 43(3):962–970. https://doi.org/10.1093/ije/dyu029

Naci L, Taylor KI, Cusack R, Tyler LK (2012) Are the senses enough for sense? Early high-level feedback shapes our comprehension of multisensory objects. Front Integr Nuerosci 6:82–93. https://doi.org/10.3389/fnint.2012.00082

Nardini M, Bedford R, Mareschal D (2010) Fusion of visual cues is not mandatory in children. Proceedings of the National Academy of Sciences, 107(39), 17041–17046. https://doi.org//10.1073/pnas.1001699107

Newell FN (1998) Stimulus context and view dependence in object recognition. Perception 27(1):47–68. https://doi.org/10.1068/p270047

Newell FN (2004) Cross-modal object recognition. The Handbook of Multisensory Processes, 123–139. Review. https://doi.org/10.7551/mitpress/3422.003.0011

Newell FN, Bülthoff HH (2002) Categorical perception of familiar objects. Cognition 85(2):113–143. https://doi.org/10.1016/S0010-0277(02)00104-X

Newell FN, Ernst MO, Tjan BS, Bülthoff HH (2001) Viewpoint dependence in visual and haptic object recognition. Psychol Sci 12(1):37–42. https://doi.org/10.1111/1467-9280.00307

Newell FN, McKenna E, Seveso MA, Devine I, Alahmad F, Hirst RJ, O'Dowd A (2023) Multisensory perception constrains the formation of object categories: A review of evidence from sensory-driven and predictive processes on categorical decisions. Philosophical Trans Royal Soc B 378(1886):20220342. https://doi.org/10.1098/rstb.2022.0342

Newell FN, Wallraven C, Huber S (2004) The role of characteristic motion in object categorization. J Vis 4(2):5–5. https://doi.org/10.1167/4.2.5

Nordmark JO (1978) Frequency and periodicity analysis. Handbook of Perception, 243–282. https://doi.org/10.1016/B978-0-12-161904-6.50014-2

Nosofsky RM (1986) Attention, similarity, and the identification–categorization relationship. J Exp Psychol Gen 115(1):39–61. https://doi.org/10.1037//0096-3445.115.1.39

O'Callaghan G, O'Dowd A, Simões-Franklin C, Stapleton J, Newell FN (2018) Tactile-to-Visual Cross-Modal Transfer of Texture Categorisation Following Training: An fMRI Study. Front Integr Nuerosci 12:24. https://doi.org/10.3389/fnint.2018.00024

O'Dowd A, Hirst RJ, Seveso MA, McKenna EM, Newell FN (2025) Generalisation to novel exemplars of learned shape categories based on visual and auditory spatial cues does not benefit from multisensory information. Psychon Bull Rev 32(1):417–429. https://doi.org/10.3758/s13423-024-02548-7

Oruç I, Maloney LT, Landy MS (2003) Weighted linear cue combination with possibly correlated error. Vision Res 43(23):2451–2468. https://doi.org/10.1016/s0042-6989(03)00435-8

Parise CV, Ernst MO (2016) Correlation detection as a general mechanism for multisensory integration. Nat Commun 7:11543. https://doi.org/10.1038/ncomms11543

Peer E, Rothschild D, Gordon A, Evernden Z, Damer E (2022) Behav Res Methods 54(4):1643–1662. https://doi.org/10.3758/s13428-021-01694-3. Data quality of platforms and panels for online behavioral research

Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, Kastman E, Lindeløv JK (2019) PsychoPy2: Experiments in behavior made easy. Behav Res Methods 51(1):195–203. https://doi.org/10.3758/s13428-018-01193-y

Peirce J, Hirst R, MacAskill M (2022) Building experiments in PsychoPy. Sage

Peissig JJ, Tarr MJ (2007) Visual object recognition: do we know more now than we did 20 years ago? Ann Rev Psychol 58:75–96. https://doi.org/10.1146/annurev.psych.58.102904.190114

Pérez-Gay Juárez F, Thériault C, Gregory M, Rivas D, Sabri H, Harnad S (2017) How and why does category learning cause categorical perception? Int J Comp Psychol 30. https://doi.org/10.46867/ijcp.2017.30.01.01

Pomper U, Brincker J, Harwood J, Prikhodko I, Senkowski D (2014) Taking a call is facilitated by the multisensory processing of smartphone vibrations, sounds, and flashes. PLoS ONE 9(8):e103238. https://doi.org/10.1371/journal.pone.0103238

R Core Team (2021) R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

Roark CL (2024) Perceptual category learning results in modality-specific representations. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 46). https://escholarship.org/uc/item/236389bq

Roark CL, Chandrasekaran B (2023) Stable, flexible, common, and distinct behaviors support rule-based and information-integration category learning. npj Sci Learn 8(1):14. https://doi.org/10.1038/s41539-023-00163-0

Roark CL, Holt LL (2022) Long-term priors constrain category learning in the context of short-term statistical regularities. Psychon Bull Rev 29(5):1925–1937. https://doi.org/10.3758/s13423-022-02114-z

Roark CL, Lescht E, Hampton Wray A, Chandrasekaran B (2023) Auditory and visual category learning in children and adults. Dev Psychol 59(5):963–975. https://doi.org/10.1037/dev0001525

Roark CL, Paulon G, Sarkar A, Chandrasekaran B (2021) Comparing perceptual category learning across modalities in the same individuals. Psychon Bull Rev 28(3):898–909. https://doi.org/10.3758/s13423-021-01878-0

Robert S, Ungerleider LG, Vaziri-Pashkam M (2023) Disentangling Object Category Representations Driven by Dynamic and Static Visual Input. J Neuroscience: official J Soc Neurosci 43(4):621–634. https://doi.org/10.1523/JNEUROSCI.0371-22.2022

Rosch E (2024) Principles of categorization. Cognition and Categorization. Routledge, pp 27–48

Ryan CP, Bettelani GC, Ciotti S, Parise C, Moscatelli A, Bianchi M (2021) The interaction between motion and texture in the sense of touch. J Neurophysiol 126(4):1375–1390. https://doi.org/10.1152/jn.00583.2020

Setti A, Newell FN (2010) The effect of body and part-based motion on the recognition of unfamiliar objects. Visual Cognition 18(3):456–480. https://doi.org/10.1080/13506280902830561

Shams L, Seitz AR (2008) Benefits of multisensory learning. Trends Cogn Sci 12(11):411–417. https://doi.org/10.1016/j.tics.2008.07.006

Shao Y, Hayward V, Visell Y (2016) Spatial patterns of cutaneous vibration during whole-hand haptic interactions. Proceedings of the National Academy of Sciences, 113(15), 4188–4193. https://doi.org/10.1073/pnas.1520866113

Shatek SM, Robinson AK, Grootswagers T, Carlson TA (2022) Capacity for movement is an organisational principle in object representations. NeuroImage 261:119517. https://doi.org/10.1016/j.neuroimage.2022.119517

Simões-Franklin C, Whitaker TA, Newell FN (2011) Active and passive touch differentially activate somatosensory cortex in texture perception. Hum Brain Mapp 32(7):1067–1080. https://doi.org/10.1002/hbm.21091

Smith JD, Johnston JJ, Musgrave RD, Zakrzewski AC, Boomer J, Church BA, Ashby FG (2014) Cross-modal information integration in category learning. Atten Percept Psychophys 76:1473–1484. https://doi.org/10.3758/s13414-014-0659-6

Stone JV (1999) Object recognition: View-specificity and motion-specificity. Vision Res 39(24):4032–4044. https://doi.org/10.1016/S0042-6989(99)00123-6

Sumser A, Isaías-Camacho EU, Mease RA, Groh A (2024) Differential representation of active and passive touch in mouse somatosensory thalamus. bioRxiv 2024–2007. https://doi.org/10.1101/2024.07.16.603697

Sun X, Yao L, Fu Q, Fu X (2023) Multisensory transfer effects in implicit and explicit category learning. Psychol Res 87(5):1353–1369. https://doi.org/10.1007/s00426-022-01754-z

Verrillo RT, Fraioli AJ, Smith RL (1969) Sensation magnitude of vibrotactile stimuli. Percept Psychophys 6(6):366–372. https://doi.org/10.3758/BF03212793

Wang Y, Zeng Y (2022) Multisensory concept learning framework based on spiking neural networks. Front Syst Neurosci 16:845177. https://doi.org/10.3389/fnsys.2022.845177

Westfall J (2016) PANGEA: Power ANalysis for GEneral ANOVA designs [Computer software]. Retrieved from https://jakewestfall.org/pangea/

Woods AT, Newell FN (2004) Visual, haptic and cross-modal recognition of objects and scenes. J Physiology-Paris 98(1–3):147–159. https://doi.org/10.1016/j.jphysparis.2004.03.006

Wu J, Li Q, Fu Q, Rose M, Jing L (2021) Multisensory Information Facilitates the Categorization of Untrained Stimuli. Multisensory Res 35(1):79–107. https://doi.org/10.1163/22134808-bja10061

Yildirim I, Jacobs RA (2013) Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies. Cognition 126(2):135–148. https://doi.org/10.1016/j.cognition.2012.08.005

Zhao S, Brown CA, Holt LL, Dick F (2022) Robust and efficient online auditory psychophysics. Trends Hear 26. https://doi.org/10.1177/23312165221118792

Ziat M (2023) Haptics for human-computer interaction: From the skin to the brain. Found Trends® Human–Computer Interact 17(1–2):1–194. http://dx.doi.org/10.1561/1100000061

Ziat M, Jhunjhunwala R, Clepper G, Kivelson PD, Tan HZ (2022) Walking on paintings: Assessment of passive haptic feedback to enhance the immersive experience. Front Virtual Real 3:997426. https://doi.org/10.3389/frvir.2022.997426

Zwoliński G, Kamińska D, Laska-Leśniewicz A, Adamek Ł (2022) Vibrating tilt platform enhancing immersive experience in vr. Electronics 11(3):462. https://doi.org/10.3390/electronics11030462

Yes

Note: The structure of the random effect changes in the experiments, due to conversion issues. Experiment-1(A) has ‘task’ and ‘exemplar’, 1(B) has ‘exemplar’, 2(A) has ‘task’ and 2(B) has none, and Experiment 3 has ‘exemplar’ as random intercept. For more detailed information, refer to Supplemental-Materials-TableS2.