Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Pages: 942 - 944 Abstract: Journal of Educational and Behavioral Statistics, Volume 48, Issue 6, Page 942-944, December 2023.
Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-16T04:53:00Z DOI: 10.3102/10769986231214154 Issue No:Vol. 48, No. 6 (2023)
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:George Leckie, Richard Parker, Harvey Goldstein, Kate Tilling Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-28T05:49:32Z DOI: 10.3102/10769986231210808
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-28T05:46:02Z DOI: 10.3102/10769986231209446
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Francesco Innocenti, Math J. J. M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS). Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-23T06:07:34Z DOI: 10.3102/10769986231210807
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-17T12:09:26Z DOI: 10.3102/10769986231207878
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-09T06:01:39Z DOI: 10.3102/10769986231207886
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Wim J. van der Linden, Luping Niu, Seung W. Choi Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint distribution of their abilities. The presentation of the model is followed by an optimized MCMC algorithm to update the posterior distribution of each of its ability parameters, select the items to Bayesian optimality, and adaptively move from one subtest to the next. Thanks to extremely rapid convergence of the Markov chain and simple posterior calculations, the algorithm can be used in real-world applications without any noticeable latency. Finally, an empirical study with a battery of short diagnostic subtests is shown to yield score accuracies close to traditional one-level adaptive testing with subtests of double lengths. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-07T06:41:54Z DOI: 10.3102/10769986231209447
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-11-07T06:39:37Z DOI: 10.3102/10769986231207879
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Steven Andrew Culpepper, Gongjun Xu Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-10-26T02:55:51Z DOI: 10.3102/10769986231210002
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-10-13T03:52:08Z DOI: 10.3102/10769986231204871
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Mark L. Davison, Hao Jia, Ernest C. Davenport Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Researchers examine contrasts between analysis of variance (ANOVA) effects but seldom contrasts between regression coefficients even though such coefficients are an ANOVA generalization. Regression weight contrasts can be analyzed by reparameterizing the linear model. Two pairwise contrast models are developed for the study of qualitative differences among predictors. One leads to tests of null hypotheses that the regression weight for a reference predictor equals each of the other weights. The second involves ordered predictors and null hypotheses that the weight for a predictor equals that for the variables just above or below in the ordering. As illustration, qualitative differences in high school math course content are related to math achievement. The models facilitate the study of qualitative differences among predictors and the allocation of resources. They also readily generalize to moderated, hierarchical, and generalized linear forms. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-10-13T01:21:16Z DOI: 10.3102/10769986231200155
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Junhuan Wei, Liufen Luo, Yan Cai, Dongbo Tu Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Response times (RTs) facilitate the quantification of underlying cognitive processes in problem-solving behavior. To provide more comprehensive diagnostic feedback on strategy selection and attribute profiles with multistrategy cognitive diagnosis model (CDM) and utilize additional information for item RTs, this study develops a multistrategy cognitive diagnosis modeling framework combined with RTs. The proposed model integrates individual response accuracy and RT into a unified framework to define strategy selection and make it closer to the individual’s strategy selection process. Simulation studies demonstrated that the proposed model had reasonable parameter recovery and attribute classification accuracy and outperformed the existing multistrategy CDMs and single-strategy CDMs in terms of performance. Empirical results further illustrated the practical application and the advantages of the proposed model. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-10-03T06:59:18Z DOI: 10.3102/10769986231200469
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Jyun-Hong Chen, Hsiu-Yi Chao Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. To solve the attenuation paradox in computerized adaptive testing (CAT), this study proposes an item selection method, the integer programming approach based on real-time test data (IPRD), to improve test efficiency. The IPRD method turns information regarding the ability distribution of the population from real-time test data into feasible test constraints to reversely assembled shadow tests for item selection to prevent the attenuation paradox by integer programming. A simulation study was conducted to thoroughly investigate IPRD performance. The results indicate that the IPRD method can efficiently improve CAT performance in terms of the precision of trait estimation and satisfaction of all required test constraints, especially for conditions with stringent exposure control. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-09-26T06:52:02Z DOI: 10.3102/10769986231197666
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Yannick Rothacher, Carolin Strobl Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Random forests are a nonparametric machine learning method, which is currently gaining popularity in the behavioral sciences. Despite random forests’ potential advantages over more conventional statistical methods, a remaining question is how reliably informative predictor variables can be identified by means of random forests. The present study aims at giving a comprehensible introduction to the topic of variable selection with random forests and providing an overview of the currently proposed selection methods. Using simulation studies, the variable selection methods are examined regarding their statistical properties, and comparisons between their performances and the performance of a conventional linear model are drawn. Advantages and disadvantages of the examined methods are discussed, and practical recommendations for the use of random forests for variable selection are given. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-09-06T12:04:17Z DOI: 10.3102/10769986231193327
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Nana Kim, Daniel M. Bolt Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Some previous studies suggest that response times (RTs) on rating scale items can be informative about the content trait, but a more recent study suggests they may also be reflective of response styles. The latter result raises questions about the possible consideration of RTs for content trait estimation, as response styles are generally viewed as nuisance dimensions in the measurement of noncognitive constructs. In this article, we extend previous work exploring the simultaneous relevance of content and response style traits on RTs in self-report rating scale measurement by examining psychometric differences related to fast versus slow item responses. Following a parallel methodology applied with cognitive measures, we provide empirical illustrations of how RTs appear to be simultaneously reflective of both content and response style traits. Our results demonstrate that respondents may exhibit different response behaviors for fast versus slow responses and that both the content trait and response styles are relevant to such heterogeneity. These findings suggest that using RTs as a basis for improving the estimation of noncognitive constructs likely requires simultaneously attending to the effects of response styles. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-08-31T05:59:52Z DOI: 10.3102/10769986231195260
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Sijia Huang, Li Cai Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The cross-classified data structure is ubiquitous in education, psychology, and health outcome sciences. In these areas, assessment instruments that are made up of multiple items are frequently used to measure latent constructs. The presence of both the cross-classified structure and multivariate categorical outcomes leads to the so-called item-level data with cross-classified structure. An example of such data structure is the routinely collected student evaluation of teaching (SET) data. Motivated by the lack of research on multilevel IRT modeling with crossed random effects and the need of an approach that can properly handle SET data, this study proposed a cross-classified IRT model, which takes into account both the cross-classified data structure and properties of multiple items in an assessment instrument. A new variant of the Metropolis–Hastings Robbins–Monro (MH-RM) algorithm was introduced to address the computational complexities in estimating the proposed model. A preliminary simulation study was conducted to evaluate the performance of the algorithm for fitting the proposed model to data. The results indicated that model parameters were well recovered. The proposed model was also applied to SET data collected at a large public university to answer empirical research questions. Limitations and future research directions were discussed. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-08-25T06:20:00Z DOI: 10.3102/10769986231193351
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Albert Yu, Jeffrey A. Douglas Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. We propose a new item response theory growth model with item-specific learning parameters, or ISLP, and two variations of this model. In the ISLP model, either items or blocks of items have their own learning parameters. This model may be used to improve the efficiency of learning in a formative assessment. We show ways that the ISLP model’s learning parameters can be estimated in simulation using Markov chain Monte Carlo (MCMC), demonstrate a way that the model could be used in the context of adaptive item selection to increase the rate of learning, and estimate the learning parameters in an empirical data analysis using the ISLP. In the simulation studies, the one-parameter logistic model was used as the measurement model to generate random response data with various test lengths and sample sizes. Ability growth was modeled with a few variations of the ISLP model, and it was verified that the parameters were accurately recovered. Secondly, we generated data using the linear logistic test model with known Q-matrix structure for the item difficulties. Using a two-step procedure gave very comparable results for the estimation of the learning parameters even when item difficulties were unknown. The potential benefit of using an adaptive selection method in conjunction with the ISLP model was shown by comparing total improvement in the examinees’ ability parameter to two other methods of item selection that do not utilize this growth model. If the ISLP holds, adaptive item selection consistently led to larger improvements over the other methods. A real data application of the ISLP was given to illustrate its use in a spatial reasoning study designed to promote learning. In this study, interventions were given after each block of ten items to increase ability. Learning parameters were estimated using MCMC. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-08-21T06:40:37Z DOI: 10.3102/10769986231193096
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:David Arthur, Hua-Hua Chang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Cognitive diagnosis models (CDMs) are the assessment tools that provide valuable formative feedback about skill mastery at both the individual and population level. Recent work has explored the performance of CDMs with small sample sizes but has focused solely on the estimates of individual profiles. The current research focuses on obtaining accurate estimates of skill mastery at the population level. We introduce a novel algorithm (bagging algorithm for deterministic inputs noisy “and” gate) that is inspired by ensemble learning methods in the machine learning literature and produces more stable and accurate estimates of the population skill mastery profile distribution for small sample sizes. Using both simulated data and real data from the Examination for the Certificate of Proficiency in English, we demonstrate that the proposed method outperforms other methods on several metrics in a wide variety of scenarios. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-08-07T07:37:51Z DOI: 10.3102/10769986231188442
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Clemens Draxler, Andreas Kurz, Can Gürer, Jan Philipp Nolte Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. A modified and improved inductive inferential approach to evaluate item discriminations in a conditional maximum likelihood and Rasch modeling framework is suggested. The new approach involves the derivation of four hypothesis tests. It implies a linear restriction of the assumed set of probability distributions in the classical approach that represents scenarios of different item discriminations in a straightforward and efficient manner. Its improvement is discussed, compared to classical procedures (tests and information criteria), and illustrated in Monte Carlo experiments as well as real data examples from educational research. The results show an improvement of power of the modified tests of up to 0.3. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-07-20T05:37:03Z DOI: 10.3102/10769986231183335
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Gerhard Tutz, Pascal Jordan Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. A general framework of latent trait item response models for continuous responses is given. In contrast to classical test theory (CTT) models, which traditionally distinguish between true scores and error scores, the responses are clearly linked to latent traits. It is shown that CTT models can be derived as special cases, but the model class is much wider. It provides, in particular, appropriate modeling of responses that are restricted in some way, for example, if responses are positive or are restricted to an interval. Restrictions of this sort are easily incorporated in the modeling framework. Restriction to an interval is typically ignored in common models yielding inappropriate models, for example, when modeling Likert-type data. The model also extends common response time models, which can be treated as special cases. The properties of the model class are derived and the role of the total score is investigated, which leads to a modified total score. Several applications illustrate the use of the model including an example, in which covariates that may modify the response are taken into account. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-07-20T05:37:03Z DOI: 10.3102/10769986231184147
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Jochen Ranger, Christoph König, Benjamin W. Domingue, Jörg-Tobias Kuhn, Andreas Frey Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. In the existing multidimensional extensions of the log-normal response time (LNRT) model, the log response times are decomposed into a linear combination of several latent traits. These models are fully compensatory as low levels on traits can be counterbalanced by high levels on other traits. We propose an alternative multidimensional extension of the LNRT model by assuming that the response times can be decomposed into two response time components. Each response time component is generated by a one-dimensional LNRT model with a different latent trait. As the response time components—but not the traits—are related additively, the model is partially compensatory. In a simulation study, we investigate the recovery of the model’s parameters. We also investigate whether the fully and the partially compensatory LNRT model can be distinguished empirically. Findings suggest that parameter recovery is good and that the two models can be distinctly identified under certain conditions. The utility of the model in practice is demonstrated with an empirical application. In the empirical application, the partially compensatory model fits better than the fully compensatory model. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-07-11T09:09:46Z DOI: 10.3102/10769986231184153
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Sean Joo, Montserrat Valdivia, Dubravka Svetina Valdivia, Leslie Rutkowski Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Evaluating scale comparability in international large-scale assessments depends on measurement invariance (MI). The root mean square deviation (RMSD) is a standard method for establishing MI in several programs, such as the Programme for International Student Assessment and the Programme for the International Assessment of Adult Competencies. Previous research showed that the RMSD was unable to detect departures from MI when the latent trait distribution was far from item difficulty. In this study, we developed three alternative approaches to the original RMSD: equal, item information, and b-norm weighted RMSDs. Specifically, we considered the item-centered normalized weight distributions to compute the item characteristic curve difference in the RMSD procedure more efficiently. We further compared all methods’ performance via a simulation study and the item information and b-norm weighted RMSDs showed the most promising results. An empirical example is demonstrated, and implications for researchers are discussed. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-07-05T05:45:29Z DOI: 10.3102/10769986231183326
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Justin L. Kern Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Given the frequent presence of slipping and guessing in item responses, models for the inclusion of their effects are highly important. Unfortunately, the most common model for their inclusion, the four-parameter item response theory model, potentially has severe deficiencies related to its possible unidentifiability. With this issue in mind, the dyad four-parameter normal ogive (Dyad-4PNO) model was developed. This model allows for slipping and guessing effects by including binary augmented variables—each indicated by two items whose probabilities are determined by slipping and guessing parameters—which are subsequently related to a continuous latent trait through a two-parameter model. Furthermore, the Dyad-4PNO assumes uncertainty as to which items are paired on each augmented variable. In this way, the model is inherently exploratory. In the current article, the new model, called the Set-4PNO model, is an extension of the Dyad-4PNO in two ways. First, the new model allows for more than two items per augmented variable. Second, these item sets are assumed to be fixed, that is, the model is confirmatory. This article discusses this extension and introduces a Gibbs sampling algorithm to estimate the model. A Monte Carlo simulation study shows the efficacy of the algorithm at estimating the model parameters. A real data example shows that this extension may be viable in practice, with the data fitting a more general Set-4PNO model (i.e., more than two items per augmented variable) better than the Dyad-4PNO, 2PNO, 3PNO, and 4PNO models. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-07-03T12:45:55Z DOI: 10.3102/10769986231181587
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Joemari Olea, Kevin Carl Santos Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Although the generalized deterministic inputs, noisy “and” gate model (G-DINA; de la Torre, 2011) is a general cognitive diagnosis model (CDM), it does not account for the heterogeneity that is rooted from the existing latent groups in the population of examinees. To address this, this study proposes the mixture G-DINA model, a CDM that incorporates the G-DINA model within the finite mixture modeling framework. An expectation–maximization algorithm is developed to estimate the mixture G-DINA model. To determine the viability of the proposed model, an extensive simulation study is conducted to examine the parameter recovery performance, model fit, and correct classification rates. Responses to a reading comprehension assessment were analyzed to further demonstrate the capability of the proposed model. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-06-15T06:27:50Z DOI: 10.3102/10769986231176012
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Adrian Quintero, Emmanuel Lesaffre, Geert Verbeke Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Bayesian methods to infer model dimensionality in factor analysis generally assume a lower triangular structure for the factor loadings matrix. Consequently, the ordering of the outcomes influences the results. Therefore, we propose a method to infer model dimensionality without imposing any prior restriction on the loadings matrix. Our approach considers a relatively large number of factors and includes auxiliary multiplicative parameters, which may render null the unnecessary columns in the loadings matrix. The underlying dimensionality is then inferred based on the number of nonnull columns in the factor loadings matrix, and the model parameters are estimated with a postprocessing scheme. The advantages of the method in selecting the correct dimensionality are illustrated via simulations and using real data sets. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-06-14T05:23:43Z DOI: 10.3102/10769986231176023
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Zachary K. Collier, Minji Kong, Olushola Soyoye, Kamal Chawla, Ann M. Aviles, Yasser Payne Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Asymmetric Likert-type items in research studies can present several challenges in data analysis, particularly concerning missing data. These items are often characterized by a skewed scaling, where either there is no neutral response option or an unequal number of possible positive and negative responses. The use of conventional techniques, such as discriminant analysis or logistic regression imputation, for handling missing data in asymmetric items may result in significant bias. It is also recommended to exercise caution when employing alternative strategies, such as listwise deletion or mean imputation, because these methods rely on assumptions that are often unrealistic in surveys and rating scales. This article explores the potential of implementing a deep learning-based imputation method. Additionally, we provide access to deep learning-based imputation to a broader group of researchers without requiring advanced machine learning training. We apply the methodology to the Wilmington Street Participatory Action Research Health Project. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-06-14T05:21:52Z DOI: 10.3102/10769986231176014
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Yinghan Chen, Shiyu Wang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Attribute hierarchy, the underlying prerequisite relationship among attributes, plays an important role in applying cognitive diagnosis models (CDM) for designing efficient cognitive diagnostic assessments. However, there are limited statistical tools to directly estimate attribute hierarchy from response data. In this study, we proposed a Bayesian formulation for attribute hierarchy within CDM framework and developed an efficient Metropolis within Gibbs algorithm to estimate the underlying hierarchy along with the specified CDM parameters. Our proposed estimation method is flexible and can be adapted to a general class of CDMs. We demonstrated our proposed method via a simulation study, and the results from which show that the proposed method can fully recover or estimate at least a subgraph of the underlying structure across various conditions under a specified CDM model. The real data application indicates the potential of learning attribute structure from data using our algorithm and validating the existing attribute hierarchy specified by content experts. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-06-14T05:19:33Z DOI: 10.3102/10769986231174918
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Maria Bolsinova, Jesper Tijmstra, Leslie Rutkowski, David Rutkowski Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Profile analysis is one of the main tools for studying whether differential item functioning can be related to specific features of test items. While relevant, profile analysis in its current form has two restrictions that limit its usefulness in practice: It assumes that all test items have equal discrimination parameters, and it does not test whether conclusions about the item-feature effects generalize outside of the considered set of items. This article addresses both of these limitations, by generalizing profile analysis to work under the two-parameter logistic model and by proposing a permutation test that allows for generalizable conclusions about item-feature effects. The developed methods are evaluated in a simulation study and illustrated using Programme for International Student Assessment 2015 Science data. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-06-13T05:10:56Z DOI: 10.3102/10769986231174927
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Esther Ulitzsch, Steffi Pohl, Lale Khorramdel, Ulf Kroehne, Matthias von Davier Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Questionnaires are by far the most common tool for measuring noncognitive constructs in psychology and educational sciences. Response bias may pose an additional source of variation between respondents that threatens validity of conclusions drawn from questionnaire data. We present a mixture modeling approach that leverages response time data from computer-administered questionnaires for the joint identification and modeling of two commonly encountered response bias that, so far, have only been modeled separately—careless and insufficient effort responding and response styles (RS) in attentive answering. Using empirical data from the Programme for International Student Assessment 2015 background questionnaire and the case of extreme RS as an example, we illustrate how the proposed approach supports gaining a more nuanced understanding of response behavior as well as how neglecting either type of response bias may impact conclusions on respondents’ content trait levels as well as on their displayed response behavior. We further contrast the proposed approach against a more heuristic two-step procedure that first eliminates presumed careless respondents from the data and subsequently applies model-based approaches accommodating RS. To investigate the trustworthiness of results obtained in the empirical application, we conduct a parameter recovery study. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-29T04:50:59Z DOI: 10.3102/10769986231173607
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Kazuhiro Yamaguchi Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Understanding whether or not different types of students master various attributes can aid future learning remediation. In this study, two-level diagnostic classification models (DCMs) were developed to represent the probabilistic relationship between external latent classes and attribute mastery patterns. Furthermore, variational Bayesian (VB) inference and Gibbs sampling Markov chain Monte Carlo methods were developed for parameter estimation of the two-level DCMs. The results of a parameter recovery simulation study show that both techniques appropriately recovered the true parameters; Gibbs sampling in particular was slightly more accurate than VB, whereas VB performed estimation much faster than Gibbs sampling. The two-level DCMs with the proposed Bayesian estimation methods were further applied to fourth-grade data obtained from the Trends in International Mathematics and Science Study 2007 and indicated that mathematical activities in the classroom could be organized into four latent classes, with each latent class connected to different attribute mastery patterns. This information can be employed in educational intervention to focus on specific latent classes and elucidate attribute patterns. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-26T04:41:06Z DOI: 10.3102/10769986231173594
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Youmi Suk, Kyung T. Han Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. As algorithmic decision making is increasingly deployed in every walk of life, many researchers have raised concerns about fairness-related bias from such algorithms. But there is little research on harnessing psychometric methods to uncover potential discriminatory bias inside decision-making algorithms. The main goal of this article is to propose a new framework for algorithmic fairness based on differential item functioning (DIF), which has been commonly used to measure item fairness in psychometrics. Our fairness notion, which we call differential algorithmic functioning (DAF), is defined based on three pieces of information: a decision variable, a “fair” variable, and a protected variable such as race or gender. Under the DAF framework, an algorithm can exhibit uniform DAF, nonuniform DAF, or neither (i.e., non-DAF). For detecting DAF, we provide modifications of well-established DIF methods: Mantel–Haenszel test, logistic regression, and residual-based DIF. We demonstrate our framework through a real dataset concerning decision-making algorithms for grade retention in K–12 education in the United States. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-11T05:35:15Z DOI: 10.3102/10769986231171711
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Joshua B. Gilbert, James S. Kim, Luke W. Miratrix Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing heterogeneous treatment effects (HTE) fail to address the HTE that may exist within outcome measures. In this study, we present a novel application of the explanatory item response model (EIRM) for assessing what we term “item-level” HTE (IL-HTE), in which a unique treatment effect is estimated for each item in an assessment. Results from data simulation reveal that when IL-HTE is present but ignored in the model, standard errors can be underestimated and false positive rates can increase. We then apply the EIRM to assess the impact of a literacy intervention focused on promoting transfer in reading comprehension on a digital assessment delivered online to approximately 8,000 third-grade students. We demonstrate that allowing for IL-HTE can reveal treatment effects at the item-level masked by a null average treatment effect, and the EIRM can thus provide fine-grained information for researchers and policymakers on the potentially heterogeneous causal effects of educational interventions. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-10T05:30:29Z DOI: 10.3102/10769986231171710
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Giada Spaccapanico Proietti, Mariagiulia Matteucci, Stefania Mignani, Bernard P. Veldkamp Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Classical automated test assembly (ATA) methods assume fixed and known coefficients for the constraints and the objective function. This hypothesis is not true for the estimates of item response theory parameters, which are crucial elements in test assembly classical models. To account for uncertainty in ATA, we propose a chance-constrained version of the maximin ATA model, which allows maximizing the α-quantile of the sampling distribution of the test information function obtained by applying the bootstrap on the item parameter estimation. A heuristic inspired by the simulated annealing optimization technique is implemented to solve the ATA model. The validity of the proposed approach is empirically demonstrated by a simulation study. The applicability is proven by using the real responses to the Trends in International Mathematics and Science Study (TIMSS) 2015 science test. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-10T05:24:09Z DOI: 10.3102/10769986231169039
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Lei Guo, Wenjie Zhou, Xiao Li Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The testlet design is very popular in educational and psychological assessments. This article proposes a new cognitive diagnosis model, the multiple-choice cognitive diagnostic testlet (MC-CDT) model for tests using testlets consisting of MC items. The MC-CDT model uses the original examinees’ responses to MC items instead of dichotomously scored data (i.e., correct or incorrect) to retain information of different distractors and thus enhance the MC items’ diagnostic power. The Markov chain Monte Carlo algorithm was adopted to calibrate the model using the WinBUGS software. Then, a thorough simulation study was conducted to evaluate the estimation accuracy for both item and examinee parameters in the MC-CDT model under various conditions. The results showed that the proposed MC-CDT model outperformed the traditional MC cognitive diagnostic model. Specifically, the MC-CDT model fits the testlet data better than the traditional model, while also fitting the data without testlets well. The findings of this empirical study show that the MC-CDT model fits real data better than the traditional model and that it can also provide testlet information. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-05-10T05:18:50Z DOI: 10.3102/10769986231165622
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Youmi Suk Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-04-25T12:58:24Z DOI: 10.3102/10769986231162096
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Qianru Liang, Jimmy de la Torre, Nancy Law Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. To expand the use of cognitive diagnosis models (CDMs) to longitudinal assessments, this study proposes a bias-corrected three-step estimation approach for latent transition CDMs with covariates by integrating a general CDM and a latent transition model. The proposed method can be used to assess changes in attribute mastery status and attribute profiles and to evaluate the covariate effects on both the initial state and transition probabilities over time using latent (multinomial) logistic regression. Because stepwise approaches generally yield biased estimates, correction for classification error probabilities is considered in this study. The results of the simulation study showed that the proposed method yielded more accurate parameter estimates than the uncorrected approach. The use of the proposed method is also illustrated using a set of real data. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-04-25T01:02:58Z DOI: 10.3102/10769986231163320
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Yan Li, Chao Huang, Jia Liu Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Cognitive diagnostic computerized adaptive testing (CD-CAT) is a cutting-edge technology in educational measurement that targets at providing feedback on examinees’ strengths and weaknesses while increasing test accuracy and efficiency. To date, most CD-CAT studies have made methodological progress under simulated conditions, but little has applied CD-CAT to real educational assessment. The present study developed a Chinese reading comprehension item bank tapping into six validated reading attributes, with 195 items calibrated using data of 28,485 second to sixth graders and the item-level cognitive diagnostic models (CDMs). The measurement precision and efficiency of the reading CD-CAT system were compared and optimized in terms of crucial CD-CAT settings, including the CDMs for calibration, item selection methods, and termination rules. The study identified seven dominant reading attribute mastery profiles that stably exist across grades. These major clusters of readers and their variety with grade indicated some sort of reading developmental mechanisms that advance and deepen step by step at the primary school level. Results also suggested that compared to traditional linear tests, CD-CAT significantly improved the classification accuracy without imposing much testing burden. These findings may elucidate the multifaceted nature and possible learning paths of reading and raise the question of whether CD-CAT is applicable to other educational domains where there is a need to provide formative and fine-grained feedback but where there is a limited amount of test time. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-04-20T07:04:58Z DOI: 10.3102/10769986231160668
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Mark L. Davison, David J. Weiss, Joseph N. DeWeese, Ozge Ersan, Gina Biancarosa, Patrick C. Kennedy Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. A tree model for diagnostic educational testing is described along with Monte Carlo simulations designed to evaluate measurement accuracy based on the model. The model is implemented in an assessment of inferential reading comprehension, the Multiple-Choice Online Causal Comprehension Assessment (MOCCA), through a sequential, multidimensional, computerized adaptive testing (CAT) strategy. Assessment of the first dimension, reading comprehension (RC), is based on the three-parameter logistic model. For diagnostic and intervention purposes, the second dimension, called process propensity (PP), is used to classify struggling students based on their pattern of incorrect responses. In the simulation studies, CAT item selection rules and stopping rules were varied to evaluate their effect on measurement accuracy along dimension RC and classification accuracy along dimension PP. For dimension RC, methods that improved accuracy tended to increase test length. For dimension PP, however, item selection and stopping rules increased classification accuracy without materially increasing test length. A small live-testing pilot study confirmed some of the findings of the simulation studies. Development of the assessment has been guided by psychometric theory, Monte Carlo simulation results, and a theory of instruction and diagnosis. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-04-05T11:10:08Z DOI: 10.3102/10769986231158301
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Kun Su, Robert A. Henson Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of defined criteria, which are purposely defined to improve the success rate of DCM implementation. Given the domain, the Q-matrix is determined by a simulation-based approach using correct classification rates as criteria. Finally, a physics test on the final Q-matrix was developed, administered, and analyzed by the author and the subject-matter experts (SMEs). Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-03-31T05:08:55Z DOI: 10.3102/10769986231159137
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Mark Wilson Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational measurement, but one of the contributions of this article is that it mainly focuses on the latter two levels. Co-ordination of the content across these two levels can be achieved using the concept of a construct map, which articulates the substantive target property at levels of detail that are appropriate for both teacher planning and within-classroom use. This article then describes a statistical model designed to span these two levels and discusses how best to relate this to the macrolevel. Results from a curriculum and instruction development project on the topic of measurement in the elementary school are demonstrated, showing how they are empirically related. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-03-31T05:07:05Z DOI: 10.3102/10769986231159006
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Pablo Nájera, Francisco J. Abad, Chia-Yi Chiu, Miguel A. Sorrel Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The nonparametric classification (NPC) method has been proven to be a suitable procedure for cognitive diagnostic assessments at a classroom level. However, its nonparametric nature impedes the obtention of a model likelihood, hindering the exploration of crucial psychometric aspects, such as model fit or reliability. Reporting the reliability and validity of scores is imperative in any applied context. The present study proposes the restricted deterministic input, noisy “and” gate (R-DINA) model, a parametric cognitive diagnosis model based on the NPC method that provides the same attribute profile classifications as the nonparametric method while allowing to derive a model likelihood and, subsequently, to compute fit and reliability indices. The suitability of the new proposal is examined by means of an exhaustive simulation study and a real data illustration. The results show that the R-DINA model properly recovers the posterior probabilities of attribute mastery, thus becoming a suitable alternative for comprehensive small-scale diagnostic assessments. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-03-31T05:05:16Z DOI: 10.3102/10769986231158829