Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Onur Demirkaya, Ummugul Bezirhan, Jinming Zhang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as an additional source of information. This study proposes a new statistic for detecting item preknowledge when compromised items are known by utilizing the hierarchical speed–accuracy revisits model. By simultaneously evaluating abnormal changes in the latent abilities, speeds, and revisit propensities of examinees, the procedure was found to provide greater statistical power and stronger substantive evidence that an examinee had indeed benefited from item preknowledge. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-02-28T01:06:16Z DOI: 10.3102/10769986231153403
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Simon Grund, Oliver Lüdtke, Alexander Robitzsch Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Multiple imputation (MI) is a popular method for handling missing data. In education research, it can be challenging to use MI because the data often have a clustered structure that need to be accommodated during MI. Although much research has considered applications of MI in hierarchical data, little is known about its use in cross-classified data, in which observations are clustered in multiple higher-level units simultaneously (e.g., schools and neighborhoods, transitions from primary to secondary schools). In this article, we consider several approaches to MI for cross-classified data (CC-MI), including a novel fully conditional specification approach, a joint modeling approach, and other approaches that are based on single- and two-level MI. In this context, we clarify the conditions that CC-MI methods need to fulfill to provide a suitable treatment of missing data, and we compare the approaches both from a theoretical perspective and in a simulation study. Finally, we illustrate the use of CC-MI in real data and discuss the implications of our findings for research practice. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-02-17T04:33:55Z DOI: 10.3102/10769986231151224
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Francesco Bartolucci, Fulvia Pennoni, Giorgio Vittadini Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. In order to evaluate the effect of a policy or treatment with pre- and post-treatment outcomes, we propose an approach based on a transition model, which may be applied with multivariate outcomes and accounts for unobserved heterogeneity. This model is based on potential versions of discrete latent variables representing the individual characteristic of interest and may be cast in the hidden (latent) Markov literature for panel data. Therefore, it can be estimated by maximum likelihood in a relatively simple way. The approach extends the difference-in-difference method as it is possible to deal with multivariate outcomes. Moreover, causal effects may be expressed with respect to transition probabilities. The proposal is validated through a simulation study, and it is applied to evaluate educational programs administered to pupils in the sixth and seventh grades during their middle school period. These programs are carried out in an Italian region to improve non-cognitive skills (CSs). We study if they impact also on students’ CSs in Italian and Mathematics in the eighth grade, exploiting the pretreatment test scores available in the fifth grade. The main conclusion is that the educational programs aimed to develop noncognitive abilities help the best students to maintain their higher cognitive abilities over time. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-02-10T07:21:54Z DOI: 10.3102/10769986221150033
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Patrícia Martinková, František Bartoš, Marek Brabec Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater’s or ratee’s gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error and to increase IRR by focusing on the most relevant subgroups. In this study, we propose a flexible approach for assessing IRR in cases of heterogeneity due to covariates by directly modeling differences in variance components. We use Bayes factors (BFs) to select the best performing model, and we suggest using Bayesian model averaging as an alternative approach for obtaining IRR and variance component estimates, allowing us to account for model uncertainty. We use inclusion BFs considering the whole model space to provide evidence for or against differences in variance components due to covariates. The proposed method is compared with other Bayesian and frequentist approaches in a simulation study, and we demonstrate its superiority in some situations. Finally, we provide real data examples from grant proposal peer review, demonstrating the usefulness of this method and its flexibility in the generalization of more complex designs. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-02-10T07:16:15Z DOI: 10.3102/10769986221150517
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Daniel Y. Lee, Jeffrey R. Harring Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. A Monte Carlo simulation was performed to compare methods for handling missing data in growth mixture models. The methods considered in the current study were (a) a fully Bayesian approach using a Gibbs sampler, (b) full information maximum likelihood using the expectation–maximization algorithm, (c) multiple imputation, (d) a two-stage multiple imputation method, and (e) listwise deletion. Of the five methods, it was found that the Bayesian approach and two-stage multiple imputation methods generally produce less biased parameter estimates compared to maximum likelihood or single imputation methods, although key differences were observed. Similarities and disparities among methods are highlighted and general recommendations articulated. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-02-09T07:40:28Z DOI: 10.3102/10769986221149140
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Ehsan Bokhari Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The prediction of dangerous and/or violent behavior is particularly important to the conduct of the U.S. criminal justice system when it makes decisions about restrictions of personal freedom, such as preventive detention, forensic commitment, parole, and in some states such as Texas, when to permit an execution to proceed of an individual found guilty of a capital crime. This article discusses the prediction of dangerous behavior both through clinical judgment and actuarial assessment. The general conclusion drawn is that for both clinical and actuarial prediction of dangerous behavior, we are far from a level of accuracy that could justify routine use. To support this later negative assessment, two topic areas are emphasized: (1) the MacArthur Study of Mental Disorder and Violence, including the actuarial instrument developed as part of this project (the Classification of Violence Risk), along with all the data collected that helped develop the instrument; and (2) the U.S. Supreme Court case of Barefoot v. Estelle (1983) and the American Psychiatric Association “friend of the court” brief on the (in)accuracy of clinical prediction for the commission of future violence. Although now three decades old, Barefoot v. Estelle is still the controlling Supreme Court opinion regarding the prediction of future dangerous behavior and the imposition of the death penalty in states, such as Texas; for example, see Coble v. Texas (2011) and the Supreme Court denial of certiorari in that case. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-01-12T06:00:25Z DOI: 10.3102/10769986221144727
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Joseph B. Lang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of examinees’ answer vectors and hence is broadly applicable. Especially important in this copy-detection setting, the RP test is shown to be exact in that its size is guaranteed to be no larger than a nominal α value. Additionally, simulation results suggest that the RP test is typically more powerful for copy detection than the existing approximate tests. The development of the RP test is based on the idea that the copy-detection problem can be recast as a causal inference and missing data problem. In particular, the observed data are viewed as a subset of a larger collection of potential values, or counterfactuals, and the null hypothesis of “no copying” is viewed as a “no causal effect” hypothesis and formally expressed in terms of constraints on potential variables. Citation: Journal of Educational and Behavioral Statistics PubDate: 2023-01-09T09:57:52Z DOI: 10.3102/10769986221143515
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Yajuan Si, Roderick J. A. Little, Ya Mo, Nell Sedransk Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-12-15T10:40:32Z DOI: 10.3102/10769986221141074
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Sally Paganin, Christopher J. Paciorek, Claudia Wehrhahn, Abel Rodríguez, Sophia Rabe-Hesketh, Perry de Valpine Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Item response theory (IRT) models typically rely on a normality assumption for subject-specific latent traits, which is often unrealistic in practice. Semiparametric extensions based on Dirichlet process mixtures (DPMs) offer a more flexible representation of the unknown distribution of the latent trait. However, the use of such models in the IRT literature has been extremely limited, in good part because of the lack of comprehensive studies and accessible software tools. This article provides guidance for practitioners on semiparametric IRT models and their implementation. In particular, we rely on NIMBLE, a flexible software system for hierarchical models that enables the use of DPMs. We highlight efficient sampling strategies for model estimation and compare inferential results under parametric and semiparametric models. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-12-09T06:04:42Z DOI: 10.3102/10769986221136105
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Yu Wang, Chia-Yi Chiu, Hans Friedrich Köhn Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The multiple-choice (MC) item format has been widely used in educational assessments across diverse content domains. MC items purportedly allow for collecting richer diagnostic information. The effectiveness and economy of administering MC items may have further contributed to their popularity not just in educational assessment. The MC item format has also been adapted to the cognitive diagnosis (CD) framework. Early approaches simply dichotomized the responses and analyzed them with a CD model for binary responses. Obviously, this strategy cannot exploit the additional diagnostic information provided by MC items. De la Torre’s MC Deterministic Inputs, Noisy “And” Gate (MC-DINA) model was the first for the explicit analysis of items having MC response format. However, as a drawback, the attribute vectors of the distractors are restricted to be nested within the key and each other. The method presented in this article for the CD of DINA items having MC response format does not require such constraints. Another contribution of the proposed method concerns its implementation using a nonparametric classification algorithm, which predestines it for use especially in small-sample settings like classrooms, where CD is most needed for monitoring instruction and student learning. In contrast, default parametric CD estimation routines that rely on EM- or MCMC-based algorithms cannot guarantee stable and reliable estimates—despite their effectiveness and efficiency when samples are large—due to computational feasibility issues caused by insufficient sample sizes. Results of simulation studies and a real-world application are also reported. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-11-28T06:04:48Z DOI: 10.3102/10769986221133088
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Niels G. Waller Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Although many textbooks on multivariate statistics discuss the common factor analysis model, few of these books mention the problem of factor score indeterminacy (FSI). Thus, many students and contemporary researchers are unaware of an important fact. Namely, for any common factor model with known (or estimated) model parameters, infinite sets of factor scores can be constructed to fit the model. Because all sets are mathematically exchangeable, factor scores are indeterminate. Our professional silence on this topic is difficult to explain given that FSI was first noted almost 100 years ago by E. B. Wilson, the 24th president (1929) of the American Statistical Association. To help disseminate Wilson’s insights, we demonstrate the underlying mathematics of FSI using the language of finite-dimensional vector spaces and well-known ideas of regression theory. We then illustrate the numerical implications of FSI by describing new and easily implemented methods for transforming factor scores into alternative sets of factor scores. An online supplement (and the fungible R library) includes R functions for illustrating FSI. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-11-07T11:06:45Z DOI: 10.3102/10769986221128810
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Xiao Li, Hanchen Xu, Jinming Zhang, Hua-hua Chang Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. The adaptive learning problem concerns how to create an individualized learning plan (also referred to as a learning policy) that chooses the most appropriate learning materials based on a learner’s latent traits. In this article, we study an important yet less-addressed adaptive learning problem—one that assumes continuous latent traits. Specifically, we formulate the adaptive learning problem as a Markov decision process. We assume latent traits to be continuous with an unknown transition model and apply a model-free deep reinforcement learning algorithm—the deep Q-learning algorithm—that can effectively find the optimal learning policy from data on learners’ learning process without knowing the actual transition model of the learners’ continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner’s learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy. Especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-11-04T06:43:51Z DOI: 10.3102/10769986221129847
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Ziwei Zhang, Corissa T. Rohloff, Nidhi Kohli Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. To model growth over time, statistical techniques are available in both structural equation modeling (SEM) and random effects modeling frameworks. Liu et al. proposed a transformation and an inverse transformation for the linear–linear piecewise growth model with an unknown random knot, an intrinsically nonlinear function, in the SEM framework. This method allowed for the incorporation of time-invariant covariates. While the proposed method made novel contributions in this area of research, the use of transformations introduces some challenges to model estimation and dissemination. This commentary aims to illustrate the significant contributions of the authors’ proposed method in the SEM framework, along with presenting the challenges involved in implementing this method and opportunities available in an alternative framework. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-10-06T01:14:16Z DOI: 10.3102/10769986221126747
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Weicong Lyu, Jee-Seon Kim, Youmi Suk First page: 3 Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. This article presents a latent class model for multilevel data to identify latent subgroups and estimate heterogeneous treatment effects. Unlike sequential approaches that partition data first and then estimate average treatment effects (ATEs) within classes, we employ a Bayesian procedure to jointly estimate mixing probability, selection, and outcome models so that misclassification does not obstruct estimation of treatment effects. Simulation demonstrates that the proposed method finds the correct number of latent classes, estimates class-specific treatment effects well, and provides proper posterior standard deviations and credible intervals of ATEs. We apply this method to Trends in International Mathematics and Science Study data to investigate the effects of private science lessons on achievement scores and then find two latent classes, one with zero ATE and the other with positive ATE. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-08-17T01:20:03Z DOI: 10.3102/10769986221115446
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Harold Doran First page: 37 Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. This article is concerned with a subset of numerically stable and scalable algorithms useful to support computationally complex psychometric models in the era of machine learning and massive data. The subset selected here is a core set of numerical methods that should be familiar to computational psychometricians and considers whitening transforms for dealing with correlated data, computational concepts for linear models, multivariable integration, and optimization techniques. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-08-17T01:18:44Z DOI: 10.3102/10769986221116905
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Mikkel Helding Vembye, James Eric Pustejovsky, Therese Deocampo Pigott First page: 70 Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. Meta-analytic models for dependent effect sizes have grown increasingly sophisticated over the last few decades, which has created challenges for a priori power calculations. We introduce power approximations for tests of average effect sizes based upon several common approaches for handling dependent effect sizes. In a Monte Carlo simulation, we show that the new power formulas can accurately approximate the true power of meta-analytic models for dependent effect sizes. Lastly, we investigate the Type I error rate and power for several common models, finding that tests using robust variance estimation provide better Type I error calibration than tests with model-based variance estimation. We consider implications for practice with respect to selecting a working model and an inferential approach. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-10-17T07:51:05Z DOI: 10.3102/10769986221127379
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Qingrong Tan, Yan Cai, Fen Luo, Dongbo Tu First page: 103 Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print. To improve the calibration accuracy and calibration efficiency of cognitive diagnostic computerized adaptive testing (CD-CAT) for new items and, ultimately, contribute to the widespread application of CD-CAT in practice, the current article proposed a Gini-based online calibration method that can simultaneously calibrate the Q-matrix and item parameters of new items. Three simulation studies with simulated and real item banks were conducted to investigate the performance of the proposed method and compare it with the joint estimation algorithm (JEA) and the single-item estimation (SIE) methods. The results indicated that the proposed Gini-based online calibration method yielded higher calibration efficiency than those of the SIE method and outperformed the JEA method on item calibration tasks in terms of both accuracy and efficiency under most experimental conditions. Citation: Journal of Educational and Behavioral Statistics PubDate: 2022-10-03T10:54:52Z DOI: 10.3102/10769986221126741