References
A
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014) Standards for Educational and Psychological Testing. American Educational Research Association, Washington, DC
Asparouhov T, Muthén B (2009) Exploratory structural equation modeling. Struct Equation Modeling: Multidisciplinary J 16(3):397–438. https://doi.org/10.1080/10705510903008204
Bachman LF (1990) Fundamental considerations in language testing. Cambridge University Press, Cambridge, UK
Barkaoui K, Brooks L, Swain M, Lapkin S (2013) Test-takers’ strategic behaviors in independent and integrated speaking tasks. Appl Linguist 34(3):304–324. https://doi.org/10.1093/applin/ams046
Bollen KA (1989) Structural equations with latent variables. Wiley, New York, NY
Brown A, Ducasse AM (2019) An equal challenge? Comparing TOEFL iBT™ Speaking Tasks with Academic Speaking Tasks. Lang Assess Q 16(2):253–270. https://doi.org/10.1080/15434303.2019.1628240
Brown A, Iwashita N, McNamara T (2005) An examination of rater orientations and test-taker performance on English-for-academic purposes speaking tasks. TOEFL Monograph Series. Educational Testing Service, Princeton, pp MS–29
Bygate M (1987) Speaking. Oxford University Press, Oxford, UK
Byrne BM (2013) Structural equation modeling with Mplus: Basic concepts, applications, and programming. Routledge. https://doi.org/10.4324/9780203807644
Chapelle CA (1999) Validity in language assessment. Annu Rev Appl Linguist 19:254–272. https://doi.org/10.1017/S0267190599190135
Chapelle CA, Grabe W, Berns M (1997) Communicative language proficiency: Definition and implications for TOEFL 2000. TOEFL Monograph Series. MS-10. Educational Testing Service
Cheng FX (2011) Justifying the interpretations about a Listening-to-retell task in CELST in NMET(GD). Guangdong University of Foreign Studies, Guangzhou, China
Cohen AD (2014) Strategies in learning and using a second language. Longman
Crossley SA, Kim YJ (2019) Text integration and speaking proficiency: Linguistic, individual differences, and strategy use considerations. Lang Assess Q 16(2):217–235. https://doi.org/10.1080/15434303.2019.1628239
de Jong NH (2023) Assessing second language speaking proficiency. Annual Reviews Linguistics 9:541–560. https://doi.org/10.1146/annurev-linguistics-030521052114
Dörnyei A (2007) Research methods in applied linguistics. Oxford University Press, Oxford, UK
A
Education Examinations Authority of Guangdong Province (2016)
Test syllabus and sample paper disk for Computer-based English Listening and Speaking Test (CELST) of National Matriculation English Test (Guangdong Version). Guangzhou. Guangdong Pacific Electronic, China
Embretson S (1983) Construct validity: Construct representation versus nomothetic span. Psychol Bull 93(1):179–197. https://www.researchgate.net/publication/289963742
Fan J, Yan X (2020) Assessing speaking proficiency: A narrative review of speaking assessment research within the argument-based validation framework. Front Psychol 11:330. https://doi.org/10.3389/fpsyg.2020.00330
Farnsworth TL (2013) An investigation into the validity of the TOEFL iBT speaking test for international teaching assistant certification. Lang Assess Q 10(3):274–291. https://doi.org/10.1080/15434303.2013.769548
Frost K, Clothier J, Huisman A, Wigglesworth G (2019) Responding to a TOEFL iBT integrated speaking task: Mapping task demands and test takers’ use of stimulus content. Lang Test 37(1):133–155. https://doi.org/10.1177/0265532219860750
Frost K, Wigglesworth G, Clothier J (2021) Relationships between comprehension, strategic behaviours and content-related aspects of test performances in integrated speaking tasks. Lang Assess Q 18(2):133–153. https://doi.org/10.1080/15434303.2020.1835918
Fulcher G (2003) Testing second language speaking. Routledge
Gaciu N (2021) Understanding quantitative data in educational research. SAGE
Gist CD, Bristol TJ (eds) (2020) Fairness in Educational and Psychological Testing. American Educational Research Association, Washington DC
Hirai A, Koizumi R (2013) Validation of empirically derived rating scales for a story retelling speaking test. Lang Assess Q 10(4):398–422. https://doi.org/10.1080/15434303.2013.824973
Huang HD, Hung SA (2018) Investigating the strategic behaviors in integrated speaking assessment. System 78(1):201–212. https://doi.org/10.1016/j.system.2018.09.007
Huang HD, Hung SA, Plakans L (2018) Topical knowledge in L2 speaking assessment: Comparing independent and integrated speaking test tasks. Lang Test 35(1):27–49. https://doi.org/10.1177/0265532216677106
Hou YP (2018) A study on the washback effect of the reform of SHNMET listening and speaking test. TEFLE 183(05):25–31
Inoue C, Lam DMK (2021) The effects of extended planning time on candidates’ performance, processes, and strategy use in the lecture listening-into-speaking tasks of the TOEFL iBT® test (TOEFL Research Report No. RR-93). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/ets2.12322
Ishikawa S (2020) Influence of learner attributes on complexity, accuracy, and fluency in English oral outputs of Japanese learners. In: Mentz O, Papaja K (eds) Focus on language: Challenging language learning and language teaching in peace and global education. LIT, pp 43–68
Iwashita N (2022) Speaking assessment. In: Derwing TM, Munro MJ, Thomson RI (eds) The Routledge handbook of second language acquisition and speaking. Routledge, New York, NY, pp 130–140
Iwashita N, Brown A, McNamara T, O’Hagan S (2008) Assessed levels of second language speaking proficiency: How distinct? Appl Linguist 29(1):24–49. https://doi.org/10.1093/applin/amm017
Jin X (2012) Working memory constraints on L2 learners’ speech production. Foreign Lang Teach Res 44(4):523–535
Jin Y, Wu J (2010) A preliminary study of the validity of the Internet-Based CET-4 —— Factors Affecting Test-takers’ Perception of the Performance on the Test. Technol Enhanced Foreign Lang Educ 132(2):3–10
Kim HJ (2015) A qualitative analysis of rater behavior on an L2 speaking assessment. Lang Assess Q 12(3):239–261. https://doi.org/10.1080/15434303.2015.1049353
Lin R (2023) Examining the scoring of content integration in a listening-speaking test: A G-theory analysis. Lang Assess Q 20(3):319–338. https://doi.org/10.1080/15434303.2023.2242334
Liu S, Chen YJ (2018) A practical exploration on NMET (Shanghai)-based English listening and speaking teaching. TEFLE 183(05):32–36
Luoma S (2004) Assessing speaking. Cambridge University Press, Cambridge, UK
Kormos J, Suzuki S, Eguchi M (2022) The role of input modality and vocabulary knowledge in alignment in reading-to-speaking tasks. System 108:102854. https://doi.org/10.1016/j.system.2022.102854
Marsh HW, Muthén B, Asparouhov T, Lüdtke O, Robitzsch A, Morin AJS, Trautwein U (2009) Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Struct Equation Modeling: Multidisciplinary J 16(3):439–476. https://doi.org/10.1080/10705510903008220
Marsh HW, Lüdtke O, Bengt M, Asparouhov T, Morin AJS, Trautwein U, Nagengast B (2010) A new look at the big five factor structure through exploratory structural equation modeling. Psychol Assess 22(3):471–491. https://doi.org/10.1037/a0019227
Messick S (1987) Validity (TOEFL Report). Educational Testing Service, Princeton, NJ
Ministry of Education of the People’s Republic of China (2020) General senior high school curriculum standards. People’s Education
Pallant J (2020) SPSS survival manual: A step by step guide to data analysis using IBM SPSS, 7th edn. Routledge
Phakiti A (2008) Construct validation of Bachman and Palmer’s (1996) strategic competence model over time in EFL reading tests. Lang Test 25(2):237–272. https://doi.org/10.1177/0265532207086783
Pusey K (2020) Assessing L2 listening at a Japanese university: Effects of input type and response format. Lang Educ Assess 3(1):13–35. https://doi.org/10.29140/lea.v3n1.193
Rui YP, Ji HJ (2017) The impact of multimodal listening & speaking teaching on English speaking anxiety and classroom reticence. TEFLE,178(6): 50–55
Rukthong A (2021) MC listening questions vs. integrated listening-to-summarize tasks: What listening abilities do they assess? System 97(1):102439. https://doi.org/10.1016/j.system.2020.102439
Rukthong A, Brunfaut T (2020) Is anybody listening? The nature of second language listening in integrated listening-to-summarize tasks. Lang Test 37(1):31–53. https://doi.org/10.1177/0265532219871470
Swain M, Huang L, Barkaoui K, Brooks L, Lapkin S (2009) The speaking section of the TOEFL iBT™ (SSTiBT): Test-takers’ reported strategic behaviors. TOEFL iBT-10. Educational Testing Service
Swami V, Maïano C, Morin AJS (2023) A guide to exploratory structural equation modeling (ESEM) and bifactor-ESEM in body image research. Body Image 47:101641. https://doi.org/10.1016/j.bodyim.2023.101641
Suzuki S, Kormos J (2023) The multidimensionality of second language oral fluency: Interfacing cognitive fluency and utterance fluency. Stud Second Lang Acquisition 45(1):38–64. https://doi.org/10.1017/S0272263121000899
Tabachnick BG, Fidell LS (2013) Using Multivariate Statistics (6thed). Pearson Education
Tsang A, Lee JS (2023) The making of proficient young FL speakers: The role of emotions, speaking motivation, and spoken input beyond the classroom. System 115:103047. https://doi.org/10.1016/j.system.2023.103047
Van Zyl LE, ten Klooster PM (2022) Exploratory structural equation modeling: Practical guidelines and tutorial with a convenient online tool for Mplus. Front Psychiatry 12(1):1–28. https://doi.org/10.3389/fpsyt.2021.795672
Wang H, Fan TT, Zeng YQ (2018) Investigating the construct of speaking proficiency under the listening-to-speak integrated task. Mod Foreign Lang 41(3):413–424
Wei J, Liosa L (2015) Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Lang Assess Q 12(3):283–304. https://doi.org/10.1080/15434303.2015.1037446
Xu W (2016) Analysis of National Matriculation English Test (Shanghai) under the new reform of examination and enrollment system: Innovation, elucidation and prospection. Foreign Lang Test Teach 4:24–31
A
Xu W (2021) Practice of a speaking assessment task in a high-stake test: Taking NMET(Shanghai) as an example. Foreign Lang Test Teach, (1): 21–27
Xu Y, Huang M, Chen J, Zhang Y (2023a) Investigating a shared-dialect effect between raters and candidates in English speaking tests. Front Psychol 14:1143031. https://doi.org/10.3389/fpsyg.2023.1143031
Xu Y, Li XD, Chen J (2024) The review: Computer-based English Listening and Speaking Test (CELST) of National Matriculation English Test (NMET) Guangdong version in China. Lang Test 42(2):238–249. https://doi.org/10.1177/02655322241255712
Xu Y, Li XD, Wang PC (2023b) Validating an empirically developed rating scale of story retelling task. J PLA Univ Foreign Lang 46(5):11–19
Xu Y, Liao TH, Han S, Wang YQ (2019) Development and validation of the content rubric of a story retelling task. Foreign Lang Test Teach 4:21–30
Xu Y, Liao TH, Han S, Wang YQ (2020) Investigating language features for the listening-to-speak integrated task: A corpus-based approach. Foreign Lang Res 1:56–63
Xu Y, Yang MN, Li XD (2025) Investigating the relationships between listening strategies and speaking performance in integrated listening-to-speak tasks. System 129:103586. https://doi.org/10.1016/j.system.2024.103586
Xu Y, Zhang YQ (2021) Investigating pronunciation features of the integrated listening-to-speak task construct. Foreign Lang Test Teach 3:39–48
Yan X, Cheng LX, Ginther A (2019) Factor analysis for fairness: Examining the impact of task type and examinee L1 background on scores of an ITA speaking test. Lang Test 36(2):207–234. https://doi.org/10.1177/0265532218775764
Yang HC (2009) Exploring the complexity of second language writers’ strategy use and performance on an integrated writing test through structural equation modeling and qualitative approaches. Unpublished doctoral dissertation. The University of Texas
Zhan Y, Wan ZH (2016) Test takers’ beliefs and experiences of a high-stakes Computer-based English Listening and Speaking Test. RELC J 47(3):363–376. https://doi.org/10.1177/0033688216631174
Zeng QM (2011) The efficacy of multi-modal teaching on the development of L2 listening and speaking abilities. J PLA Univ Foreign Lang 6:72–76
Zhang R (2019) Washback effect analysis of NMET(Shanghai) listening and speaking test: Taking J school as an example. Foreign Lang Test Teach 4:47–53
Zhou WJ (2005) Effects of input modes on oral English production. J PLA Univ Foreign Lang 28(6):53–58
Zhang Y, Elder C (2009) Measuring the speaking proficiency of advanced EFL learners in China: The CET-SET solution. Lang Assess Q 6(4):298–314. https://doi.org/10.1080/15434300902990967
Zhou Y, Zeng YQ (2016) Many-facet Rasch model analysis on computer automatic scoring of a computer-based English listening-speaking test. Foreign Lang Test Teach 1:22–31
Zhang WW, Zhang LJ (2022) Understanding assessment tasks: Learners’ and teachers’ perceptions of cognitive load of integrated speaking tasks for TBLT implementation. System 111:102951. https://doi.org/10.1016/j.system.2022.102951
Zhang WW, Zhang DL, Zhang LJ (2021) Metacognitive instruction for sustainable learning: Learners’ perceptions of task difficulty and use of metacognitive strategies in completing integrated speaking tasks. Sustainability 13:6275. https://doi.org/10.3390/su13116275
Zhang WW, Zhao MJ, Zhu Y (2022) Understanding individual differences in metacognitive strategy use, task demand, and performance in integrated L2 speaking assessment tasks. Front Psychol 13:876208. https://doi.org/10.3389/fpsyg.2022.876208