Hybrid journal (It can contain Open Access articles) ISSN (Print) 2049-5986 - ISSN (Online) 2049-5994 Published by Inderscience Publishers[451 journals]
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Okure U. Obot, Samuel S. Udoh, Kingsley F. Attai Pages: 207 - 222 Abstract: Grading of short answers in an examination is a tedious exercise that takes so much of examiners' time. Fatigue could set in leading to errors. Sometimes sentiments come into play. The attendant effect of this is variations in the marks awarded to candidates even when they express the same opinion. In this study, Jaccard, Cosine, Jaro and Dice similarity measures were used to grade the answers provided by candidates in examinations of 647 questions. The similarity measures were tested with the aim of ascertaining the measure that rank closest to the average scores provided by three human examiners with the same examinations' answers and marking guides. Results showed that Jaro similarity measure ranked closest to the mean score of the examiners with a variance absolute error of 0.62% and covaried strongly by 97% with a significant level of 0.001. Keywords: Jaro; Jaccard; Cosine; Dice; examination; short answers; similarity measures; semantics; lexical; natural language processing; NLP Citation: International Journal of Quantitative Research in Education, Vol. 5, No. 3 (2021) pp. 207 - 222 PubDate: 2021-12-21T23:20:50-05:00 DOI: 10.1504/IJQRE.2021.119816 Issue No:Vol. 5, No. 3 (2021)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Okure U. Obot, Samuel S. Udoh, Kingsley F. Attai Pages: 223 - 243 Abstract: The common item equating design requires two forms of a test which have a set of items in common in order to control for differences in examinee ability. The common set is subject to compromise when it is used repeatedly, which most likely becomes a serious threat to test fairness. If cheating occurs on common items, the equating process produces inaccurate results which might vary as a result of common item difficulty. This simulation study was conducted to evaluate the impact of the difficulty level of compromised common items on the equating process. The recovery of scaling coefficients and equated scores was assessed using bias and RMSE under various cheating conditions. The results indicated that cheating on higher-difficulty common items produced the most overestimation in the scaling coefficients; which, in turn, caused the most inflation in equating true scores for all test takers, whether they engage in cheating or not. Keywords: item response theory; linking; equating; common items; test compromise Citation: International Journal of Quantitative Research in Education, Vol. 5, No. 3 (2021) pp. 223 - 243 PubDate: 2021-12-21T23:20:50-05:00 DOI: 10.1504/IJQRE.2021.119805 Issue No:Vol. 5, No. 3 (2021)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Amery D. Wu, Minjeong Park, Shun-Fu Hu Pages: 244 - 267 Abstract: The CELPIP-G test is used by the Canadian federal government to screen immigration eligibility for the skilled worker class. Differential option functioning is a technique used to detect potential bias in the options of multiple-choice items. The purpose of this paper is to investigate DOF in a CELPIP-G reading test form by way of multinomial logistic regression. The results showed that 13.7% of options were flagged as gender DOF. Nonetheless, 11.2% were negligible or small DOF. In the case of uniform gender DOF, twice as many options were found to function against female immigration applicants than against their male counterparts. Female test-takers were more likely to be disadvantaged when tackling questions that asked them to make direct inferences based on factual but unfamiliar information. In contrast, male test-takers were more likely to be disadvantaged when tackling questions that asked them to develop their own interpretations over different views. Moreover, test questions that required an understanding of more sophisticated ideas in complex language structure and allowing personal interpretation tended to show more marked and non-uniform gender DOF. Keywords: measurement bias; differential options functioning; gender equity; test fairness; reading comprehension; immigration language testing; CELPIP-general; measurement invariance; multinomial logistic regression Citation: International Journal of Quantitative Research in Education, Vol. 5, No. 3 (2021) pp. 244 - 267 PubDate: 2021-12-21T23:20:50-05:00 DOI: 10.1504/IJQRE.2021.119811 Issue No:Vol. 5, No. 3 (2021)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Aolin Xie, Ting-Wei Chiu, Keyu Chen, Gregory Camilli Pages: 268 - 276 Abstract: This study compared candidates' scores based on the normalised model and the two-parameter item response theory (2PL IRT) model using simulated multi-form exam data. Candidates' calculated scores, rankings, qualification status and score ties from the two models were compared with their true values. The results suggest that the 2PL IRT model outperformed the normalised model when the candidate ability distributions varied across forms. It was found that candidate scores based on the 2PL model were more closely related to the true scores. The qualification status of candidates belonging to the top 10% group were more accurately classified by the 2PL model than the normalised model when group abilities differed. Keywords: 2PL IRT model; normalised model; multi-form exam; equating; candidate classification Citation: International Journal of Quantitative Research in Education, Vol. 5, No. 3 (2021) pp. 268 - 276 PubDate: 2021-12-21T23:20:50-05:00 DOI: 10.1504/IJQRE.2021.119812 Issue No:Vol. 5, No. 3 (2021)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Soodeh Bordbar, Seyyed Mohammad Alavi Pages: 277 - 310 Abstract: The present study explores the validity of a high-stakes university entrance exam and considers the role of gender as a source of bias in different subtests of this language proficiency test. To achieve this, the Rasch model was used to inspect biased items and to examine the construct-irrelevant factors. To obtain DIF analysis, the Rasch model was used for 5,000 participants who were selected randomly from a pool of examinees taking part in the National University Entrance Exam in Iran for Foreign Languages (NUEEFL) as a university entrance requirement for English language studies in 2015. The findings reveal that the test scores are not free from construct-irrelevant variance and some misfit items were modified based on the fit statistics suggestions. In sum, the fairness of the NUEEFL was not confirmed. The results obtained from such psychometric assessment could be beneficial for test designers, stake-holders, administrators, as well as teachers. Keywords: bias; differential item functioning analysis; dimensionality; fairness; Rasch model Citation: International Journal of Quantitative Research in Education, Vol. 5, No. 3 (2021) pp. 277 - 310 PubDate: 2021-12-21T23:20:50-05:00 DOI: 10.1504/IJQRE.2021.119817 Issue No:Vol. 5, No. 3 (2021)