MartinaA.Seveso1,3✉Emailsevesom@tcd.ie
RebeccaJ.Hirst1
AlanO’Dowd1
IvanCamponagara2
FionaN.Newell1
1A
School of Psychology and Institute of NeuroscienceTrinity College DublinIreland 2Department of Psychology, College of Natural and Health SciencesZayed UniversityAbu DhabiUnited Arab Emirates
3Institute of NeuroscienceTrinity College DublinDublinIreland
Martina A. Seveso1, Rebecca J. Hirst1, Alan O’Dowd1, Ivan Camponagara2 and Fiona N. Newell1
1School of Psychology and Institute of Neuroscience, Trinity College Dublin, Ireland.
2Department of Psychology, College of Natural and Health Sciences, Zayed University, Abu Dhabi, United Arab Emirates.
Author Note
Correspondence concerning this article should be addressed to Martina A. Seveso, Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland. Email: sevesom@tcd.ie. ORCID: https://orcid.org/0000-0001-6566-4578.
Statements and Declarations
Abstract
Categorisation is a fundamental cognitive process, involving the integration of information across the senses. We investigated remotely using smartphones whether visual and tactile motion cues could enhance object category learning and generalisation to novel object shapes. Two categories of similar shapes were associated with specific correlated visual and tactile vibration motion cues. After learning object categories, participants were assessed on categorisation of learned and novel objects across four cue conditions: shape-only, shape-visual motion, shape-tactile motion, and shape-visual and tactile motion. We also assessed if accuracy was influenced by blocked versus interleaved cue-conditions at test. In Experiment 1, we found more accurate categorisation and generalisation when all cues were available at test. In Experiment 2 we replicated this effect even when the reliability of the shape-only cue for predicting category membership was reduced. In Experiment 3, we found that the absence of motion cues during learning removed the benefit of motion cues at test. Overall, our findings suggest that multisensory motion cues benefit the formation of novel object categories and allow for better generalisation. The results have implications for our understanding of the underlying dynamic and multisensory nature of object categories and the predictive role of multisensory features on category formation.
Keywords:
object categories
multisensory perception
tactile perception
object motion
online testing
Public Significance Statement
This study shows that combining visual and tactile (vibration) motion cues supports both learning and generalisation of object categories more efficiently than relying on shape alone. Additionally, the way we learn influences our ability to generalise categories to novel objects. These findings highlight the multisensory nature of how we form object categories and suggest applications for designing multisensory educational tools, haptic interfaces, and virtual environments.
A
A
A
Introduction
Imagine enjoying a peaceful wander through a forest when suddenly, to your horror, something lands on your back. After a while, you recognize that it is an insect and not a leaf falling from a tree. When trying to categorise the “something” as an insect and not a leaf, you might rely on multiple cues, such as the feel of the flutter of the insect’s wings or the scuttling of its legs, while it moves into view. Such a task leverages multisensory integration for object categorisation learned through experience. Studies have shown that object category learning is a complex process involving the integration of multiple features to differentiate between object categories as well as generalize from individual exemplars (Goldstone&Hendrickson,2010;Pérez-Gay et al.,2017). Shared object features, such as shape, sound, or movement, are used for category formation and for generalisation to novel objects. However, even though much is known about unisensory visual (Rosch,1978) auditory (Griffiths&Warren,2004;Feng et al,2021;Brunel et al.,2013) and tactile (Newell,2004) features on object categorisation, less is known regarding how multisensory cues are used together for category learning and generalisation (Newell et al.,2023).
Previous studies have shown that both vision (Peissig&Tarr,2007;Grill-Spector,2003) and touch (Gibson,1962;Lederman&Klatzky,1987;1990) are sufficient to enable the learning and subsequent recognition of individual objects (Newell et al.,2001;Lacey et al.,2009). Moreover, object information can be efficiently transferred between vision and touch (Yildrim & Jacobs,2013). For familiar objects in particular, the resulting shared representations may lead to the formation of more robust object categories than unisensory information alone (Yildirim&Jacobs,2013;Gaissert&Wallraven,2012;Haag,2011;O'Callaghan et al.,2018;Broadbent et al.,2020), highlighting the crucial role of multisensory integration in the representation of object categories in memory (Naci et al.,2012). Despite our knowledge that multisensory features contribute to categorisation, it is unclear how multiple sensory inputs affect the formation of novel object categories, and whether such multisensory categories allow for generalisation to novel exemplars. Indeed, previous studies investigating generalization from learned multisensory representations have produced mixed results, with some reporting evidence for a benefit of multisensory learning on generalization (Wu et al.,2021), while others report no specific benefit of multisensory information on categorisation performance (Edmunds et al.,2020;Roark et al.,2021;Sun et al.,2023;Atkin et al.,2023;Roark et al.,2023;Li&Deng,2023;Roark,2024;O’Dowd et al.,2025). Some of the discrepancies across these studies may be due to the amount of prior knowledge of the object stimuli, the extent to which sensory information is correlated across modalities during learning, or the relative distinctiveness or predictability of each sensory cue to category membership.
Objects in the real world are not only multisensory, but often dynamic. Furthermore, when information from two sensory modalities shares temporal properties, such as synchronous movement, they are more likely to be combined (Parise&Ernst,2016), which may serve to enhance any benefit of multisensory information for category learning. Within such a multisensory process, information gathered about the movement of an object through vision (Robert et al.,2023;Shatek et al.,2022) or touch (Gaissert&Wallraven,2012,Simões-Franklin et al.,2011;Sumser et al., 2024) can play a fundamental role in category formation. Visual object movement can facilitate the recognition of novel objects (Stone, 1999;Newell et al., 2004;Setti&Newell,2010) compared to static conditions and the neural substrates underpinning tactile and visual object motion appear to be shared (Chan et al,2010;Amemiya et al.,2017). Within the tactile modality, dynamic motion cues are often received in the form of vibrations, which can provide valuable information about specific object properties. In this regard, studies on vibrotactile discrimination suggest that touch can enhance discrimination (Mahns et al.,2006;Verrillo et al.,1969), particularly when visual temporal cues alone are unreliable or ambiguous (e.g.,Pomper et al.,2014;Hirst et al.,2025). Tactile stimulation is quickly detected on the skin, and passive exposure to moving tactile information, such as vibrations or flutterings, can help discriminate objects (Fleming et al.,2013; Fleming,2017; Ryan et al.,2021; Shao et al.,2016; Ziat,2023). Because visual and tactile motion cues contribute to object recognition and discrimination, we can assume they also contribute to object categorisation, but this has not hitherto been investigated. Building on this, the present work investigates whether motion-related cues can facilitate not only category learning but also generalisation across novel exemplars.
A
Across three experiments, we aimed to investigate whether tactile and visual object motion influenced the categorisation of novel object shapes, and if the formation of these multisensory categories subsequently benefited generalisation to new exemplars. In the first two experiments, participants learned to categorise novel object shapes presented with visual and tactile motion cues (in Experiment 3 only the object shape was presented during learning). We manipulated the cue informativeness of category membership, such that in Experiment1 all cues were fully informative, while in Experiment2 one cue (e.g., shape similarity) was less informative while both visual and motion cues remained fully informative. Visual motion consisted of each object either rolling, jumping, swinging, or shaking, whereas tactile motion consisted of vibrations that were correlated with the temporal characteristics of each of these visual motion patterns. For example, if an object was seen swinging from left to right, the tactile vibrations would share the same onset and offset events present during the visual motion sequence. Following successful learning, participants were then tested on their categorisation accuracy to learned exemplars and generalisation to novel exemplars. The test phase assessed performance across four different cue conditions in which we manipulated the number and type of cues available. Importantly, in our experiments, fewer motion than shape features were used as predictive cues to category membership. That is, each category was defined by nine shapes and two distinct motion features (e.g.,rolling,swinging). As such, the within-category variability of the shape cue was relatively higher than within category motion cues. This relative variability across shape and motion cues may result in different weightings of shape or motion cues for categorisation and generalisation performance, which we investigate here.
We conducted two different versions of our experiments: following learning, participants were tested on their categorisation of learned and novel objects in trials that were either blocked by cue condition or interleaved. A secondary aim of the current study was to understand the impact of blocked versus interleaved trial presentation on multisensory category learning. Previous studies have indicated that blocked versus interleaved presentation influences categorisation performance (Kornell&Bjork,2008; Abel et al,2021). Blocked trial presentation facilitates learning by reinforcing within-category featural similarities, whereas interleaved trial presentation promotes discriminative contrast across categories (Carvalho&Goldstone,2015;Kost et al.,2015). Furthermore, it has also been suggested that interleaving trials improves long-term retention and category induction across sensory modalities (Ge et al.,2021;Abel,2023). Thus, in multisensory contexts, interleaving of trials may further strengthen cross-modal cue combination, enhancing participants’ ability to differentiate between categories (Abel,2023). On the other hand, interleaving trials may disrupt performance if the ability to adapt to varying cue types across trials is not sufficiently flexible (Carvalho&Goldstone,2019) and generate uncertainty about which cue to prioritise. Moreover, switching between different sensory modalities may increase cognitive load.
A
Accordingly, the presentation of blocked or interleaved trials allowed us to tease apart these cognitive effects on cue combination.
Experiment 1
Our first experiment was designed to explore whether the concurrent availability of visual and tactile motion cues enhances object category formation and categorisation performance. Building on prior research on the perception of moving objects (e.g.,Setti&Newell,2004; Chan et al.,2010), we hypothesised that the combination of visual and tactile motion cues with object shape would improve object categorisation and generalisation to novel exemplars relative to static shape alone. We therefore predicted higher categorisation accuracy, including for novel exemplars, in conditions in which all cues were available relative to conditions in which one or more sensory cues were missing. To allow us to explore our secondary factor (blocked versus interleaved trials), we tested this prediction using trials that were either blocked by cue condition (Experiment-1A) or interleaved (Experiment-1B).
Method
Participants
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. All participants were recruited using Prolific (
https://www.prolific.co/), based on the following inclusion criteria: fluency in English, normal or corrected-to-normal vision and no hearing impairments. An a priori power analysis for each of the groups was performed in PANGEA (Westfall,2016,v0.2) for a within-subjects design. The advised minimum sample size is for 80% power to detect the effects of interest with size (Cohen's d) of 0.3, was 32 participants per experimental design (blocked or interleaved).
A
We initially recruited 155 participants of whom 110 (71%) successfully reached the learning criteria. Of these participants who learned the categories, a further 33 failed to perform above chance in the test (29% of learners) and their data were not included in any subsequent analyses. While this attrition rate may appear high, it is in line with findings from other studies on successful categorisation performance in laboratory settings (e.g. Smith et al.,2014; Roark&Chandrasekaran,2023). In total, 77 participants (mean age = 38.83years, SD = 10.97; 52%female) completed the experiment online with 45 participants allocated to the blocked design (Experiment-1A) and 32 to the interleaved design (Experiment-1B). For all experiments, all participants were naïve to the purpose of the study and were compensated at a rate of £9.00 per hour.
A
A
The study was approved by the Trinity College Dublin, School of Psychology Research Ethics Committee (approval number-SPREC102020-50) and complied with GDPR data protection legislation.
Stimuli
We created 28 3D novel object shapes with uniform spacing between neighbouring objects (except for object shapes at each end of the shape space, see Fig. 1). The design of our shapes was based on a morphing process described elsewhere (Li et al.,2020), and the design of the shape space was adapted from a circular shape space described in previous studies (Li et al.,2020; O’Dowd et al.,2025). These object shapes were used as stimuli throughout all the experiments.
We used 3D modelling software (Blender Foundation,3.5.0,2023,
www.blender.org) to render each 3D object stimulus used in the present experiments. The objects were created using the following pipeline: (1)each shape was converted from .png to Scalable Vector Graphics(.svg) keeping three different colour levels constant (e.g.,white,black,grey); (2)each vector was imported into the 3D-space; (3)the outline was isolated and converted into a mesh; (4)each mesh was rotated along the central vertical axes through the Spin Function in Edit mode (360°, steps-100); (5) each 3D-shape was extracted from Blender. The 3D-space, lighting (point,radius-0.1m,1000W;coordinates:11m,-14 m,6.9m; rotation:40°,34.8°) and viewpoint (coordinates:10.9m,-14m,4m;rotation:70°,0°) settings were kept constant. Each 3D-object was rendered using the Workbench Engine (28-render samples,Single pass Anti-Aliasing viewpoint;Studio Lighting,Colour Material[dark grey,RGB:107,109,109,254;HEX:#6b6d6d] and Specular Lighting). All the object images were extracted with a resolution of 1080 x 1080 px, scale100%, and presented in a canonical, 3/4view so that the 3D-object and relevant features (e.g.,concavities) were visually accessible in the image. The visual presentation of each object was followed by a visual mask. These individual masks were created by scrambling the image of each object shape using a MATLAB script (MathWorks,2023). All object stimuli and scripts for generating the visual masks are available on the Open Science Framework page for Experiment 1,
https://osf.io/s369c/?view_only=9015e8935be24628b5b15596b8eb6271 .
We allocated an arbitrary category boundary to the middle of the stimulus shape space (see Fig. 1), such that 14 objects were allocated per category. Within each category, shapes were highly similar, and shape similarity was the main cue for categorisation. Of the 14 objects per category, 9 objects were selected at the extreme point of the category for the learning session (green highlighted object shapes in Fig. 1) and 5 objects per category, closer to the category boundary, were used for generalization testing (yellow highlighted object shapes in Fig. 1). The use of object stimuli nearest to the category boundary for generalisation was to avoid using clearly distinguishable shapes, which would have reduced the difficulty of the task and possibly obscured any generalization effects.
Each rendered 3D-object was animated based on one of four different visual motion patterns; swing, jump, roll or shake, using the Workbench Engine. The motion patterns were chosen based on type (e.g., smooth or abrupt movement) and reference (e.g., movement of the object’s horizontal or vertical axis). All movement sequences had a duration of 2seconds. For the animation, each object stimulus was presented against a background consisting of two grey walls and a grey floor. The camera angle (coordinates:10.9m,-14m,4m;rotation:70°,0°) and lighting (point,radius-0.1m,1000W;coordinates:11m,-14 m,6.9m;rotation:40°,34.8°) were both held constant. The light source illuminated the scene from above, creating an object shadow on the floor to aid depth and motion perception. The individual moving objects were then extracted and used as stimuli in the experiment to be displayed against a black background on a mobile phone screen. The entire screen was used to display the objects and response options (see Fig. 2 for an illustration).
Tactile stimuli were delivered via the navigator.vibrate() function; a method of delivering tactile stimuli remotely via android smartphone browsers. This method is described, tested and validated elsewhere (Hirst et al.,2025), in short it allows the presentation of a vibration pulse sequence by specifying binary on/off commands, it does not allow for manipulation of vibration amplitude. Our tactile stimuli therefore consisted of a vibration pulse sequence derived from the visual motion pattern of each stimulus. This was achieved by extracting the temporal dynamics of visual motion for each movement (i.e., frequency and amplitude of movement) and mapping this onto an audio waveform with corresponding frequency and amplitude. The audio waveform was analysed using a 100ms rolling average to smooth fluctuations in amplitude, and a threshold was applied at 40% of the maximum amplitude to detect markers used to define pulse onset (“on” or vibration) and offset (“off” or pause) events. The resulting vibration patterns therefore closely mirrored the structure and timing of the visual motion stimuli with high temporal precision. A custom Python code (see OSF project page) ensured that the output format retained the timing and structure of the original input and was compatible with PsychoPy’s tactile delivery system. The validity of our tactile motion stimuli was supported by the results of pilot tests requiring participants to match vibration pulse sequences with their visual counterparts as well as an assessment of synchrony judgements based on tactile-only and bimodal (i.e., visual motion paired with tactile vibration) stimuli (see Supplementary-Materials–TableS1 for results).
Design
A
The experiment was structured around two main sessions: a learning block (with feedback) followed by a test block (without feedback). We conducted two versions of the experiment in which trials were either blocked (i.e.,Experiment-1A) or interleaved (i.e.,Experiment-1B) at test, with different participants taking part in each experiment. Both versions were based on the same within-subjects, fully factorial design with cue condition(4) and exemplar type(2) as factors. The four levels to the cue condition were: shape only (S
v); shape with visual motion (S
vM
v); shape with tactile motion (S
vM
t) and shape with both visual and tactile motion (S
vM
vt). The exemplar factor had two levels: learned or novel. Our primary outcome measure of interest in the test phase was categorisation accuracy.
Procedure
The experiment was built using PsychoPy (Pierce et al.,2019;2022,v2024.2.3) and delivered online through Pavlovia (
https://pavlovia.org/) on Android mobile devices. To mask the sound of the vibration stimuli, continuous Brown noise was presented throughout the study at a volume level determined based on an initial method of adjustment procedure.
A
In this procedure, participants were presented with a repetitive vibration stimulus of 200 ms duration with a 500 ms ISI, alongside continuous Brown noise. They were asked to adjust their phone volume until they could no longer hear the sound of the vibration. Following this, the participants were asked to not adjust the volume on their phone for the duration of the entire experiment.
Following the method of adjustment procedure participants began the learning phase (Fig. 2A). Each trial in the learning started with a 250ms fixation cross, followed by 2 seconds the presentation of the object stimulus in which three cues were combined as a visually moving object presented in synchrony with tactile vibrations. The stimulus was followed by a 250ms visual mask.
A
A
The participant was then presented with a screen indicating the response options (i.e.,Category A” or “Category B”). A response triggered the offset of the response screen, if a response was not made in 4 seconds the task progressed automatically. Feedback on both accuracy and response time was presented for 750 ms at the end of each learning trial, e.g., green “Correct” or red “Incorrect” and “Too slow! Please respond faster” respectively). During the learning session, all objects were displayed with visual and (correlated) tactile motion (S
vM
vt); therefore, category membership was defined by all three cues equally. Each shape was associated with one of 4 visual motion patterns (swing, jump, roll, or shake), which were randomly assigned to the shapes across participants (but the shape-motion associations remained constant for each participant). The group of shapes assigned to each category were counterbalanced across participants. An accuracy threshold of 75% was required at the end of the learning session to continue to the categorisation test. If participants failed to reach this threshold within 3 repetitions of the trials, the study was terminated. There was a maximum of 54 trials (18 object shapes, with no more than 3 repetitions) during learning. Trial order was fully randomised across participants.
In the test phase, 28 stimuli (14 per category) were presented, including the 9 previously learned and 5 novel exemplars per category. Each test trial began with the presentation of a 250ms fixation cross, followed by an image of a stimulus (depending on cue condition) alongside response options, participants were given 4 seconds maximum to make a response before the task progressed automatically. A visual mask appeared for 250 ms after the cue to indicate the end of each trial. No feedback was provided at test.
A
Each stimulus was presented under one of the four cue conditions (Sv,SvMv,SvMt,SvMvt) and participants were instructed to categorise each stimulus as either 'A' or 'B' as accurately and quickly as possible. In Experiment-1A, cue condition was blocked at test, such that one block included trials in which stimuli were displayed from one cue condition only (e.g.,Sv-only). The blocks were presented in a random order across participants. In Experiment-1B all trials were presented as interleaved and in a random order across participants. For both Experiment-1A and − 1B, no feedback was provided during the categorisation test. Each participant took approximately 18 minutes to complete the experiment.
Analysis
A
The data were analysed with R via RStudio (R Core Team,2021). To assess the effect of cue condition and exemplar learning on categorisation performance, we fitted a generalised linear mixed-effects model (GLMM) on the raw participants’ responses (0,1) with a binomial distribution and logit link function. Themodel included cue condition (S
v,S
vM
v,S
vM
t,S
vM
vt), category exemplar (learned or novel), and their interaction as the predictors of interest. To establish statistical significance, likelihood ratio tests (type-II) were performed to compare the fit of the model with and without that predictor of interest, the model fit was evaluated via AIC, BIC, and log-likelihood values. 'Participant' was included as a random intercept, and a random slope for 'exemplar' was also included to account for within-subject variability in the effect of exemplar on categorisation performance
. Models were fitted using the ‘lme4’ package (Bates et al.,2014). Post-hoc comparisons were conducted using the ‘emmeans’ package (Lenth Russell et al.,2022) and the Bonferroni correction was applied to correct for multiple comparisons (in which case the corrected
p-values are reported). Fixed effects from the final model were converted from log-odds to predicted probabilities using the inverse logit transformation to facilitate interpretation (Muller & MacLehose,2014).
Results
The mean categorisation performance across each of the cue conditions and exemplar types is presented in Fig. 3 for the blocked (A) and interleaved (B) versions, respectively. In Experiment-1A, in which trials were blocked at test, likelihood ratio tests indicated statistically significant main effects of cue condition (χ2(3) = 26.50,p < .001) and exemplar type (χ2(1) = 11.80,p < .001) on categorisation accuracy. However, the cue condition*exemplar type interaction did not significantly contribute to the model (χ2(3) = 6.40,p = .094). Post-hoc comparisons for the main effect of cue condition confirmed that categorisation accuracy was significantly higher for the SvMv odds ratio, OR = .395, 95%CI[0.224,0.698], p < .001), SvMt (OR = .565,95%CI[0.376, 0.848],p = .001), and SvMvt (OR = .209,95%CI[0.101,0.430], p < .001) compared to the Sv condition. Moreover, categorisation accuracy was significantly lower in both the SvMv (OR = .528,95%CI[0.330, 0.845],p = .002) and SvMt (OR = .370,95%CI[0.208,0.657], p < .001) conditions, compared to the SvMvt condition, see Fig. 3A. No significant differences were observed between the SvMv and SvMt conditions (OR = 1.429,95%CI[0.871,2.344],p = .343). Post-hoc comparisons on the main effect of exemplar type suggested that participants’ categorisation performance was less accurate for the novel (75%) compared to the learned (82%) object exemplars (OR = .67,95%CI[0.538,0.834], p < .001;see Fig. 3A).
Likelihood ratio tests on the performance in the interleaved version of the Experiment-1B suggested similar effects. We found statistically significant main effects of cue condition (χ2(3) = 12.20,p = .007) and exemplar type (χ2(1) = 8.08,p = .005) in the model predicting categorisation accuracy. Again, the cue condition*exemplar type interaction did not significantly contribute to this model (χ²(3) = 1.47,p = .69). Post-hoc comparisons for the cue condition effect revealed that accuracy was significantly lower in the Sv condition compared to the SvMv (OR = .806,95%CI[0.654,0.993],p = .039) and SvMvt conditions (OR = .772,95%CI[0.627,0.950],p = .006), as shown in Fig. 3B. However, we failed to find a significant difference between the Sv and the SvMt condition (OR = 0.845,95%CI[0.686,1.040],p = .194). Additionally, no significant differences were found between the two-cue conditions (SvMv vs SvMt: OR = 1.05,p = 1); and there was no benefit found to the SvMvt condition relative to either the SvMv (OR = .958,p = 1) or SvMt conditions (OR = .914,p = 1). Regarding the exemplar type, categorisation accuracy was significantly lower for novel (65%) compared to learned (76%) exemplars (OR = .597,95%CI[0.422,0.844],p = .004), see Fig. 3B. Finally, a one-sample t-test showed that accuracy in the Sv condition was significantly above chance for both Experiment-1A (M = .61,SD = .23,t(44) = 3.21,p = .002), and 1B (M = .63,SD = .18,t(30) = 4.09,p < .001).
Discussion
Experiments 1A and 1B indicated a benefit of having all learned cues - object-shape, visual motion, and tactile vibrations - available at test, relative to when only the shape of the object exemplars was available. Categorisation performance was least accurate, although above chance, in the shape-only cue condition, suggesting that shape similarity alone did not result in robust categorisation. Performance to novel exemplars was further affected, although our data suggested that generalisation occurred even in the shape-only cue condition. Interestingly, while categorisation performance was affected by the number of available cues, the data suggest no specific benefit for any one sensory modality, at least when trials were blocked. In other words, performance across both two-cue condition(SvMv and SvMt) was equally accurate, suggesting that visual and tactile motion contributed to category formation in a similar way. However, this result was influenced by the randomization procedure: in the interleaved trials tactile motion alone(SvMt) did not improve performance relative to the object shape-only cue(Sv), unlike in blocked conditions where both tactile(SvMt) and visual(SvMv) motion independently improved performance relative to static conditions.
In contrast, we found a stronger advantage for cue combination in the blocked (Experiment 1A) trial presentation. Under blocked presentations, there was a greater distinction in performance across the different cue conditions. This finding suggests that task consistency may support more effective use of the cues, whereas interleaving trials may add uncertainty reducing the observed benefit of information from combined cues on categorisation.
As expected, accuracy was consistently better for learned relative to novel exemplars, and there was no influence of trial presentation on the categorisation benefit for learned exemplars (7% and 11% difference between learned and novel exemplars in the blocked and interleaved versions, respectively), suggesting little effect of task demands. As noted earlier, exemplars that are positioned closer to the category boundary are typically more difficult to categorise than those that are positioned further from the boundary (Newell&Bulthoff,2002). Therefore, the difference in performance between learned and novel exemplars may be influenced by stimulus position within the category shape space as well as exemplar novelty. Future research could help tease apart the relative contributions of exemplar position in shape space and frequency of exposure on categorisation. Interestingly, the effect of cues did not differ between learned and novel exemplars in either version of the experiment, suggesting that the benefit afforded by multisensory motion cues is no higher for novel compared to learned exemplars.
The number of participant data sets included in the analyses was lower than the initial number of participants recruited. This was necessary for a number of reasons. First, we required that the participants first reach a learning rate of 75%, which is a reasonable rate given the number of factors involved in the experiment and is consistent with other studies on categorisation. Second, we also required that participants' performance was greater than chance level across the task. A number of participants failed to meet this requirement although they had successfully learned to categorise the objects. Although our overall attrition rate from initial recruitment appears high, it is not inconsistent with other studies of rates of successful categorisation performance in the lab (Smith et al.,2014;Roark&Chandrasekaran,2023) or indeed in reports of drop outs from online testing in behavioural studies (e.g.,Peer et al.,2022; although notably, data collected via Prolific is often considered of high quality).
Together, these findings suggest that a combination of cues from different sensory modalities can facilitate object categorisation and generalisation relative to object shape alone. Given that the visual and tactile motion cues to category membership were few in number and consistently reliable, our result that tactile motion did not benefit categorisation relative to shape-only cues in the interleaved trials might be considered surprising. Because of their consistency, both the visual motion and tactile motion cues might be expected to dominate categorisation performance relative to the shape cue alone. Since categorisation performance was better than chance to the shape cue(Sv) alone we can infer that shape was informative for categorisation. Indeed, the presence of each motion cue during test appears to have influenced categorisation performance in an additive manner, at least when trials were blocked. Given this, in Experiment2, we aimed to further explore the contribution of the shape cue by manipulating its informativeness relative to other cues on categorisation.
Experiment 2
Perceptual similarity may support the formation of categories, such that similar shapes are grouped together under a single category label (e.g.,Nosofsky,1986;Goldstone,1994), even if encoded through touch (Cooke et al.,2007). However, in the real world, information about an object's shape can be affected by environmental factors such as distance, occlusion, lighting or viewpoint (e.g.,Newell,1998), thus reducing the reliability of shape similarity for determining categorisation. Consequently, other cues to categorisation may become salient such as the way an object moves (Newell et al.,2004;Blake&Shiffrar,2007). To investigate this, we reduced the informativeness of the object-shape similarity as a cue on category membership. Importantly, we maintained both the visual and tactile motion cues as fully predictive of category membership, as in Experiment1. Consequently, we hypothesised that participants' categorisation performance, to both learned and novel exemplars, should be mainly influenced by motion cues. As in Experiment1, we are testing this in both blocked (Experiment-2A) and interleaved (Experiment-2B) test conditions.
Method
Participants
An a priori power analysis was performed (PANGEA,v0.2, for a within-subjects design) and the advised minimum sample size for 80% power to detect the effects of interest with size (Cohen's d) of 0.3, was 32 participants for each of the version of the experiment. All participants were recruited using Prolific (https://www.prolific.co/), and inclusion criteria were fluency in English, normal or corrected-to-normal vision and no hearing impairments. We initially recruited 85 participants, of whom 76 (i.e. 89%) successfully learned the categories. Of the learners, 10 participants failed to score above chance in the categorisation test and their data were not included in subsequent analyses. Following these exclusions, a total of 66 participants (mean age = 39.72years, SD = 10.1; 48.5%female) completed Experiment 2, with 33 assigned to the blocked version (Experiment-2A) and 33 to the interleaved version (Experiment-2B).
Stimuli
The same stimulus set, and motion types described in Experiment1 were used. However, we used a different arrangement of the object shapes across categories to reduce object shape similarity as a cue to category membership. To that end, we reassigned two out of the nine exemplars in each category to the opposite category. For example, referring to Fig. 1, object shapes highlighted in red, which, in Experiment 1, were members of the same object category (e.g.,Category A), were now assigned to the opposite category (e.g.,Category B) during learning (and likewise for object shapes highlighted in green). Thus, only 78% of the neighbouring shapes were assigned to one category (see Fig. 1) in the current experiment. The subset of exemplars that were novel but included in the test was the same as in Experiment1 (Fig. 1).
Design and Procedure
Experiment2 followed the same design and procedure as described in Experiment 1 with the same four cue conditions (Sv,SvMv,SvMt,SvMvt) and two exemplar levels (learned and novel). As in Experiment1, trials were either blocked by cue (Experiment-2A) or interleaved (Experiment-2B) and participants were pseudo-randomly assigned to each version of the experiment.
Results
The main analysis procedure was identical to that of Experiment 1.
Categorisation performance across the four main cue conditions and each learned or novel exemplar condition is presented in Fig. 4. First, categorisation performance on the blocked trials (Experiment-2A) was analysed using a likelihood ratio test, which compared the full model, which included the interaction between cue condition and exemplar type, to a reduced model without the interaction term. The comparison indicated that the full model significantly improved the fit of the data (χ²(3) = 9.09,p = 0.028), suggesting that the interaction between cue condition and exemplar type contributed to categorisation accuracy (Fig. 4A). There were significant main effects of both the cue condition and exemplar type on categorisation accuracy. Post-hoc comparisons revealed that categorisation accuracy was significantly lower in the object-shape-only(Sv) condition compared to all the combined cue conditions. Specifically, accuracy to the Sv condition was significantly lower than to the SvMv condition (OR = .570,95%CI[0.306,1.064],p = .018), the SvMt condition (OR = 1.589,95%CI[0.987,2.557],p = .013), and the SvMvt condition (OR = 4.832,95%CI[1.911,12.222],p < .001). As in Experiment 1, no significant differences were found between the SvMv and SvMt conditions (p = 1), nor between each of these two-cue conditions and the three-cue condition (SvMv-vs-SvMvt, p = 1; SvMv-vs-SvMvt,p = 1). There was a main effect of exemplar type with better performance to learned (76%) over novel (65%) exemplars (OR = .597,95%CI[0.422,0.844],p = .004). The significant interaction between cue condition and exemplar type was driven mainly by performance between each of the two-cue conditions and the Sv condition (see Fig. 4A): performance was better to the SvMv than Sv condition for the learned (OR = .570,p = .018), but not novel (OR = .770,p = .282) exemplars and, similarly, performance was better to the SvMt than Sv condition for learned (OR = 1.589,p = .013) but not novel (OR = 1.218,p = .282) exemplars. Performance was better to the SvMvt condition than the Sv condition for the novel version only (OR = 4.658,p < .001).
We then conducted likelihood ratio tests on performance when trials were interleaved (Experiment-2B) which indicated a main effect of cue condition (χ²(3) = 276.72,p < .001) but not of the exemplar type (χ2(1) = .953,p = .329) in the model predicting categorisation accuracy, as shown in Fig. 4B. Furthermore, the cue condition by exemplar type interaction did not significantly contribute to this model (χ²(3) = .73,p = .866). Post-hoc comparisons indicated that categorisation accuracy was significantly lower in the Sv condition compared to the SvMv condition (OR = .617,p < .001), the SvMt condition (OR = .455,p < .001), as well as the SvMvt condition (OR = .299,p < .001). Moreover, performance was better to the SvMvt condition compared to the SvMv(OR = .484,p < .001) and SvMt(OR = .657,p < .001) conditions. In contrast to the results from the blocked trials, performance to SvMv was lower than to the SvMt(OR = .736,p < .001) condition when trials were interleaved.
Finally, a one-sample t-test showed that accuracy in the Sv condition was significantly above chance for Experiment-2A(M = .61,SD = .17,t(32) = 4.27,p < .001), but not for Experiment-2B(M = .50,SD = .06,t(31)=-0.29,p = .77).
Discussion
In Experiment2, we reduced the informativeness of shape similarity as a cue to category membership to investigate whether participants would rely more on motion cues. The results suggest that when only the shape cue was available at test, performance was relatively poor. This was especially the case when trials were interleaved, i.e., when cue availability was less predictable from one trial to the next, relative to the version in which trials were blocked.
Categorisation performance was best when all cue combinations were available across experiment versions (blocked or interleaved). That is, the combination of shape with both motion cues led to significantly better accuracy than object-shape alone, for both learned and novel exemplars. However, in the blocked condition, this advantage did not extend to the two-cue combinations, in the blocked condition, this advantage did not extend to the two-cue combinations (i.e. SvMv or SvMt vs Sv independently) for novel exemplars, which did not significantly outperform the shape-only condition. The significant interaction between cue condition and exemplar type suggests that while all cue combinations supported learning of the trained exemplars, successful generalisation to novel exemplars relied more on the availability of both motion cues together This suggests that when the reliability of shape as a cue to category membership is reduced, generalisation required the full cue-combination context at test.
Interestingly, when trials were interleaved, tactile motion provided a greater benefit than visual motion on both categorisation and generalisation performance. Again, this finding suggests a greater weighting of reliable cues, such as motion over shape. The relative benefit of tactile over visual motion may be due to the easier combination of visual motion with, and therefore segregation of tactile motion from, the object shape (see, Chen&Spence,2017).
In Experiment1 and 2, the shape cue was always presented with motion cues during learning; therefore, it was unclear to what extent shape itself influenced learning and consequent categorisation and generalisation performance at test. Indeed, performance was consistently worse to the shape-only cue condition at test, and we attribute this performance to the absence of the other learned cues. In the following control experiment, we investigated the specific contribution of visual object-shape as a cue to category learning.
Experiment 3
In Experiment3, participants learned categories defined solely by object shape (static images) and at test were assessed across the same four cue conditions used previously (i.e.,Sv,SvMv,SvMt,SvMvt). The use of motion cues at test further allowed us to assess the added value of multisensory motion cues beyond shape information on categorisation and generalisation.
Methods
Participants
We initially recruited 45 participants online using Prolific, 8 of whom failed to reach learning criteria and a further 7 failed to perform above chance at test and their data were not included in the analyses. Therefore, total of 30 participants (mean age = 37.44years,SD = 10.10;48.5%female) successfully completed Experiment3; the reimbursement rate and inclusion criteria were identical to those of the previous Experiments.
Stimuli
All the stimuli, and arrangement of shapes into the two categories, were identical to that of Experiment1(see Fig. 1).
Design and Procedure
The experiment was based on the same design and procedure as described in Experiment1 with the exception that the category learning task was performed on static object-shape stimuli. Also, we blocked trials by cue condition during the test session (as in Experiment-1A). Both learned and novel exemplars were presented at test in a random order across participants. Overall, the experiment took approximately 16 minutes for each participant to complete.
Results and Discussion
Likelihood ratio tests indicated statistically significant main effects of exemplar type (χ2(1) = 34.95,p < .001) but not of cue condition (χ2(3) = 5.25,p = .15) to the model predicting categorisation accuracy. The cue condition*exemplar type interaction did not significantly contribute to this model (χ²(3) = 2.748,p = .43). Overall, irrespective of condition, participants categorisation performance was less accurate to the novel (76%) compared to the learned (86%) exemplars(OR=-7.38,95%CI[0.08,0.24],p < .001).
The results suggest that object shape was sufficient for category formation. However, we found that generalisation to novel exemplars was poor, with a drop of approximately 20% accuracy compared to the categorisation of learned exemplars (Fig. 5). Overall, these findings underline the important role of multiple cues during category formation and suggest that the benefit of visual and tactile object motion cues observed in the test session in the previous experiments are contingent on their prior integration into the category structure.
General Discussion
Across three categorisation experiments, delivered via mobile phones, we examined whether previously learned multisensory categories – defined by shape, visual motion, and tactile vibration – could support object categorisation and generalisation. The motion cues were correlated across modalities and predictive of category membership (Experiment 1 and 2). Although shape similarity underpinned category membership in Experiment1, its reliability was reduced in Experiment2. Overall, the results of both experiments suggested that visual and tactile motion facilitated both the categorisation and generalisation of exemplars relative to shape-only information. The results of Experiment 3 confirmed that visual shape alone could support category learning in our tasks, although generalisation was poor. In addition, the task format, i.e. whether trials were blocked by cue or interleaved, significantly modulated the observed multisensory benefits: whereas blocked trials (Experiments-1A and − 2A) consistently led to higher accuracy and better generalisation, an interleaved presentation (Experiments-1B and − 2B) resulted in smaller performance differences between cue conditions and a less robust contribution from tactile cues. Our findings elucidate the featural and context-sensitive nature of multisensory categorisation and demonstrate that the effectiveness of cross sensory cues for categorisation is dependent on cue reliability as well as task structure.
Our results align with previous evidence on the role of cue reliability on perception (Oruç et al.,2003;Helbig et al.,2012;Bankieris et al.,2017) whilst extending this to the realm of category learning. For example, when the inter-object similarity of shapes was reduced as a cue to categorisation (Experiment2), and motion cues remained fully informative, participants' categorisation performance was influenced more by the tactile and visual motion cues, particularly under blocked conditions. Our results further support the idea that tactile cues are not merely supplementary but can be integral to the formation of category representations (Yildirim&Jacobs,2013;Gaissert&Wallraven,2012), especially when they are temporally aligned and task-relevant (Nordmark,1978;Heller,1989). Indeed, tactile motion cues played an important role in enhancing categorisation performance in all experiments, suggesting that individuals can strategically adapt their category learning performance to prioritise diagnostic information. Even under interleaved trial presentation (Experiment-2B), performance in combined cue condition remained high, suggesting that redundant motion cues can compensate against uncertainty from the shape cue.
The different effect of motion cues under blocked or interleaved trials suggests that task demands can influence cue combination, including across modalities (Carvalho&Goldstone,2019;Maddox et al.,2006). Although some evidence suggests that interleaved presentation enhances generalisation due to discriminative contrast (Brunmair & Richter,2019;Kornell & Bjork,2008;Carvalho & Goldstone,2014;2015), including across modalities (Abel,2023), we found that categorisation performance was weaker for novel than for learned exemplars. Although the reason for this inconsistency is unclear, we suggest that interleaved presentation may either trigger a greater reliance on individual cues or modalities or may impair the retrieval of specific cue-category associations. Under blocked conditions, the motion cues integrated into the object representations may act as contextual supports in memory for categorising the object shape (Duarte et al.,2023;2025).
In Experiment 3, the substantial (~ 20%) drop in performance to novel exemplars relative to the performance in previous experiments supports the idea that multisensory encoding during learning, rather than test-phase cue availability, is critical for supporting generalisation. This is consistent with evidence that learned multisensory representations subsequently support better generalisation across modalities (Yildirim & Jacobs,2013). Interestingly, there was no benefit for motion cues at test suggesting that unfamiliar—but potentially informative—cues are not automatically used without prior association.
Our ability to categorise objects in the real world is likely to be dependent on a complex interaction between the nature and type of sensory information available, prior knowledge, and other context-dependent factors related to the task. For example, only tactile motion seems to have had a differential influence on categorisation and generalisation performance across task structure (interleaved or blocked trials). Moreover, methodological factors may help explain some of the inconsistencies in the reported findings in the literature (Wang&Zeng,2022;Roark &Holt,2022;Roark et al.,2021;O'Dowd et al.,2025). Consistent with this complexity, our findings suggest that no single mechanism fully explains our data. Future studies are needed to investigate how cue combination benefits learning and subsequent performance. In particularly it would be interesting to know if multisensory cues lead to enhanced learning performance (Shams&Seitz,2008) or if categorisation responses go beyond predictions based on the summation of probability alone (see Nardini et al.,2010). In general, our results support the use of flexible, context-dependent cue combination strategies for learning novel object categories, and therefore help contribute to the ongoing debate about the role of multisensory information in object categorisation (Newell et al.,2023).
Our task was designed to enable remote data collection online via mobile devices. Although object perception experiments involving touch typically necessitate in-person testing (see Newell,2004;2010;Woods&Newell,2004), technological improvements have meant that remote psychophysical testing using different modalities is rapidly increasing (Marin-Campos et al.,2021;Zhao et al.,2022;Hirst et al.,2025;Inuggi et al.,2024). The benefits for remote assessment are well-documented including the recruitment of larger and more heterogeneous participant samples (Grootswagers,2020), and easier longitudinal follow ups. Our work has important practical implications for the remote testing of multisensory category formation. Recent research from our team has validated the use of vibrations for remote testing contexts (Hirst et al.,2025), however to our knowledge no study has yet used this method to deliver complex patterns of vibration stimuli. This is therefore the first study to use complex vibration stimuli, delivered via the browser, to explore multisensory processes. It is of practical note that this method was successful, participants identified the correspondence between vibration patterns and visual stimuli with ease (Supplementary-Material) and this enabled us to conduct the current design and share this method for future research. We argue that remote delivery of tactile stimulation through mobile devices is an ecologically valid assessment method given that tactile feedback is increasingly integrated into technology (Muender et al.,2022;Ziat et al.,2022;Zwoliński et al.,2022). Indeed, technological advances now mean that many everyday objects incorporate some form of vibration feedback: you may feel resistance in the steering wheel of your car when changing lanes, the driver's seat might pulsate to signal danger, or your mobile phone may vibrate to signal an incoming call. Tactile feedback has also been shown to be effective at improving task performance (Fang et al.,2023) and the sense of presence (Huard et al.,2022;Kim et al.,2022). Moreover, delivery through commodity devices has real world implication for gaming or training app design. Despite these potential benefits, few published studies have utilised vibration delivery through mobile phones to study perception, and those that have report similar effects as lab-based studies (Inuggi et al.,2024;Hirst et al.,2025). Methodologically, the use of smartphone-delivered tactile cues offers an accessible and ecologically valid way for investigating multisensory interactions, particularly in applied domains such as education, human-computer interaction, and rehabilitation, where redundant sensory input—including from touch—can support cognitive abilities (see Gori et al.,2022).
Conclusions
The present study investigated whether visual and tactile motion cues enhance the categorisation of novel object shapes, and whether these multisensory categories support generalisation to new exemplars. Across three experiments delivered via mobile phones, we tested whether previously learned multisensory categories – defined by shape, visual motion, and tactile vibration – could support object categorisation and generalisation, across blocked versus interleaved cue-conditions at test. In Experiment 1 all cues were fully predictive of category membership, while in Experiment 2 the reliability of shape information for categorisation was reduced. In Experiment 3, categories were learned with static shape only. When cues were equally reliable (Experiment 1), performance improved when all cues were available at test, and this was replicated when shape reliability was reduced (Experiment 2). Task format modulated these benefits: blocked presentation (Experiments-1A and − 2A) yielded higher accuracy and stronger generalisation, whereas interleaved presentation (Experiments-1B and − 2B) reduced performance differences and weakened contributions from tactile cues. Finally, visual and tactile motion facilitated categorisation and generalisation relative to shape-only learning (Experiment 3). Collectively, these findings demonstrate that multisensory motion cues promote object category formation and generalisation, with effectiveness influenced by cue reliability and task structure. The results have important implications for our understanding of the underlying dynamic and multisensory nature of object categories and the predictive role of multisensory features on category formation. To our knowledge, this is also the first study to use complex vibration stimuli, delivered via the browser, to investigate multisensory processes. These results and methodology provide a valuable resource for future work on tactile and multisensory processing in applied contexts.