|
|
- Online Parameter Estimation for Student Evaluation of Teaching
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Chia-Wen Chen, Chen-Wei Liu Abstract: Applied Psychological Measurement, Ahead of Print. Student evaluation of teaching (SET) assesses students’ experiences in a class to evaluate teachers’ performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers’ teaching proficiency and students’ harshness remains an unaddressed issue in the context of online SET. In the current study, we develop and compare three novel methods—marginal, iterative once, and hybrid approaches—to improve the precision of parameter estimations. A simulation study is conducted to demonstrate that the hybrid method is a promising technique that can substantially outperform traditional methods. Citation: Applied Psychological Measurement PubDate: 2023-03-20T06:00:10Z DOI: 10.1177/01466216231165314
- Confidence Screening Detector: A New Method for Detecting Test Collusion
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yongze Xu, Ying Cui, Xinyi Wang, Meiwei Huang, Fang Luo Abstract: Applied Psychological Measurement, Ahead of Print. Test collusion (TC) is a form of cheating in which, examinees operate in groups to alter normal item responses. TC is becoming increasingly common, especially within high-stakes, large-scale examinations. However, research on TC detection methods remains scarce. The present article proposes a new algorithm for TC detection, inspired by variable selection within high-dimensional statistical analysis. The algorithm relies only on item responses and supports different response similarity indices. Simulation and practical studies were conducted to (1) compare the performance of the new algorithm against the recently developed clique detector approach, and (2) verify the performance of the new algorithm in a large-scale test setting. Citation: Applied Psychological Measurement PubDate: 2023-03-20T05:34:50Z DOI: 10.1177/01466216231165299
- The Impact of Item Model Parameter Variations on Person Parameter
Estimation in Computerized Adaptive Testing With Automatically Generated Items-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Chen Tian, Jaehwa Choi Abstract: Applied Psychological Measurement, Ahead of Print. Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that “fake-easy” and “fake-difficult” item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency. Citation: Applied Psychological Measurement PubDate: 2023-03-18T02:44:35Z DOI: 10.1177/01466216231165313
- A Mixed Sequential IRT Model for Mixed-Format Items
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Junhuan Wei, Yan Cai, Dongbo Tu Abstract: Applied Psychological Measurement, Ahead of Print. To provide more insight into an individual’s response process and cognitive process, this study proposed three mixed sequential item response models (MS-IRMs) for mixed-format items consisting of a mixture of a multiple-choice item and an open-ended item that emphasize a sequential response process and are scored sequentially. Relative to existing polytomous models such as the graded response model (GRM), generalized partial credit model (GPCM), or traditional sequential Rasch model (SRM), the proposed models employ an appropriate processing function for each task to improve conventional polytomous models. Simulation studies were carried out to investigate the performance of the proposed models, and the results indicated that all proposed models outperformed the SRM, GRM, and GPCM in terms of parameter recovery and model fit. An application illustration of the MS-IRMs in comparison with traditional models was demonstrated by using real data from TIMSS 2007. Citation: Applied Psychological Measurement PubDate: 2023-03-17T07:27:14Z DOI: 10.1177/01466216231165302
- On the Folly of Introducing A (Time-Based UMV), While Designing for B
(Time-Based CMV)-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Alice Brawley Newlin Abstract: Applied Psychological Measurement, Ahead of Print.
Citation: Applied Psychological Measurement PubDate: 2023-03-15T08:53:19Z DOI: 10.1177/01466216231165304
- A Likelihood Approach to Item Response Theory Equating of Multiple Forms
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Michela Battauz, Waldir Leôncio Abstract: Applied Psychological Measurement, Ahead of Print. Test equating is a statistical procedure to make scores from different test forms comparable and interchangeable. Focusing on an IRT approach, this paper proposes a novel method that simultaneously links the item parameter estimates of a large number of test forms. Our proposal differentiates itself from the current state of the art by using likelihood-based methods and by taking into account the heteroskedasticity and the correlation of the item parameter estimates of each form. Simulation studies show that our proposal yields equating coefficient estimates which are more efficient than what is currently available in the literature. Citation: Applied Psychological Measurement PubDate: 2023-01-24T05:23:49Z DOI: 10.1177/01466216231151702
- A New Approach to Desirable Responding: Multidimensional Item Response
Model of Overclaiming Data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Kuan-Yu Jin, Delroy L. Paulhus, Ching-Lin Shih Abstract: Applied Psychological Measurement, Ahead of Print. A variety of approaches have been presented for assessing desirable responding in self-report measures. Among them, the overclaiming technique asks respondents to rate their familiarity with a large set of real and nonexistent items (foils). The application of signal detection formulas to the endorsement rates of real items and foils yields indices of (a) knowledge accuracy and (b) knowledge bias. This overclaiming technique reflects both cognitive ability and personality. Here, we develop an alternative measurement model based on multidimensional item response theory (MIRT). We report three studies demonstrating this new model’s capacity to analyze overclaiming data. First, a simulation study illustrates that MIRT and signal detection theory yield comparable indices of accuracy and bias—although MIRT provides important additional information. Two empirical examples—one based on mathematical terms and one based on Chinese idioms—are then elaborated. Together, they demonstrate the utility of this new approach for group comparisons and item selection. The implications of this research are illustrated and discussed. Citation: Applied Psychological Measurement PubDate: 2023-01-19T08:21:52Z DOI: 10.1177/01466216231151704
- The Effects of Rating Designs on Rater Classification Accuracy and Rater
Measurement Precision in Large-Scale Mixed-Format Assessments-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Wenjing Guo, Stefanie A. Wind First page: 91 Abstract: Applied Psychological Measurement, Ahead of Print. In standalone performance assessments, researchers have explored the influence of different rating designs on the sensitivity of latent trait model indicators to different rater effects as well as the impacts of different rating designs on student achievement estimates. However, the literature provides little guidance on the degree to which different rating designs might affect rater classification accuracy (severe/lenient) and rater measurement precision in both standalone performance assessments and mixed-format assessments. Using results from an analysis of National Assessment of Educational Progress (NAEP) data, we conducted simulation studies to systematically explore the impacts of different rating designs on rater measurement precision and rater classification accuracy (severe/lenient) in mixed-format assessments. The results suggest that the complete rating design produced the highest rater classification accuracy and greatest rater measurement precision, followed by the multiple-choice (MC) + spiral link design and the MC link design. Considering that complete rating designs are not practical in most testing situations, the MC + spiral link design may be a useful choice because it balances cost and performance. We consider the implications of our findings for research and practice. Citation: Applied Psychological Measurement PubDate: 2023-01-12T11:34:42Z DOI: 10.1177/01466216231151705
- A Comparison of Confirmatory Factor Analysis and Network Models for
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: W. Holmes Finch, Brian F. French, Alicia Hazelwood First page: 106 Abstract: Applied Psychological Measurement, Ahead of Print. Social science research is heavily dependent on the use of standardized assessments of a variety of phenomena, such as mood, executive functioning, and cognitive ability. An important assumption when using these instruments is that they perform similarly for all members of the population. When this assumption is violated, the validity evidence of the scores is called into question. The standard approach for assessing the factorial invariance of the measures across subgroups within the population involves multiple groups confirmatory factor analysis (MGCFA). CFA models typically, but not always, assume that once the latent structure of the model is accounted for, the residual terms for the observed indicators are uncorrelated (local independence). Commonly, correlated residuals are introduced after a baseline model shows inadequate fit and inspection of modification indices ensues to remedy fit. An alternative procedure for fitting latent variable models that may be useful when local independence does not hold is based on network models. In particular, the residual network model (RNM) offers promise with respect to fitting latent variable models in the absence of local independence via an alternative search procedure. This simulation study compared the performances of MGCFA and RNM for measurement invariance assessment when local independence is violated, and residual covariances are themselves not invariant. Results revealed that RNM had better Type I error control and higher power compared to MGCFA when local independence was absent. Implications of the results for statistical practice are discussed. Citation: Applied Psychological Measurement PubDate: 2023-01-14T11:55:09Z DOI: 10.1177/01466216231151700
- Heywood Cases in Unidimensional Factor Models and Item Response Models for
Binary Data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Selena Wang, Paul De Boeck, Marcel Yotebieng First page: 141 Abstract: Applied Psychological Measurement, Ahead of Print. Heywood cases are known from linear factor analysis literature as variables with communalities larger than 1.00, and in present day factor models, the problem also shows in negative residual variances. For binary data, factor models for ordinal data can be applied with either delta parameterization or theta parametrization. The former is more common than the latter and can yield Heywood cases when limited information estimation is used. The same problem shows up as non convergence cases in theta parameterized factor models and as extremely large discriminations in item response theory (IRT) models. In this study, we explain why the same problem appears in different forms depending on the method of analysis. We first discuss this issue using equations and then illustrate our conclusions using a small simulation study, where all three methods, delta and theta parameterized ordinal factor models (with estimation based on polychoric correlations and thresholds) and an IRT model (with full information estimation), are used to analyze the same datasets. The results generalize across WLS, WLSMV, and ULS estimators for the factor models for ordinal data. Finally, we analyze real data with the same three approaches. The results of the simulation study and the analysis of real data confirm the theoretical conclusions. Citation: Applied Psychological Measurement PubDate: 2023-01-30T02:31:51Z DOI: 10.1177/01466216231151701
- Evaluating Equating Transformations in IRT Observed-Score and Kernel
Equating Methods-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Waldir Leôncio, Marie Wiberg, Michela Battauz First page: 123 Abstract: Applied Psychological Measurement, Ahead of Print. Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework. Citation: Applied Psychological Measurement PubDate: 2022-10-04T03:01:31Z DOI: 10.1177/01466216221124087
- Targeted Double Scoring of Performance Tasks Using a Decision-Theoretic
Approach-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Sandip Sinharay, Matthew S. Johnson, Wei Wang, Jing Miao First page: 155 Abstract: Applied Psychological Measurement, Ahead of Print. Targeted double scoring, or, double scoring of only some (but not all) responses, is used to reduce the burden of scoring performance tasks for several mastery tests (Finkelman, Darby, & Nering, 2008). An approach based on statistical decision theory (e.g., Berger, 1989; Ferguson, 1967; Rudner, 2009) is suggested to evaluate and potentially improve upon the existing strategies in targeted double scoring for mastery tests. An application of the approach to data from an operational mastery test shows that a refinement of the currently used strategy would lead to substantial cost savings. Citation: Applied Psychological Measurement PubDate: 2022-09-23T09:05:06Z DOI: 10.1177/01466216221129271
|