A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> STATISTICS (Total: 130 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Journal of Educational and Behavioral Statistics
Journal Prestige (SJR): 1.952
Citation Impact (citeScore): 2
Number of Followers: 7  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1076-9986 - ISSN (Online) 1935-1054
Published by Sage Publications Homepage  [1174 journals]
  • Estimating Heterogeneous Treatment Effects Within Latent Class Multilevel
           Models: A Bayesian Approach

    • Free pre-print version: Loading...

      Authors: Weicong Lyu, Jee-Seon Kim, Youmi Suk
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      This article presents a latent class model for multilevel data to identify latent subgroups and estimate heterogeneous treatment effects. Unlike sequential approaches that partition data first and then estimate average treatment effects (ATEs) within classes, we employ a Bayesian procedure to jointly estimate mixing probability, selection, and outcome models so that misclassification does not obstruct estimation of treatment effects. Simulation demonstrates that the proposed method finds the correct number of latent classes, estimates class-specific treatment effects well, and provides proper posterior standard deviations and credible intervals of ATEs. We apply this method to Trends in International Mathematics and Science Study data to investigate the effects of private science lessons on achievement scores and then find two latent classes, one with zero ATE and the other with positive ATE.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-08-17T01:20:03Z
      DOI: 10.3102/10769986221115446
       
  • A Collection of Numerical Recipes Useful for Building Scalable
           Psychometric Applications

    • Free pre-print version: Loading...

      Authors: Harold Doran
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      This article is concerned with a subset of numerically stable and scalable algorithms useful to support computationally complex psychometric models in the era of machine learning and massive data. The subset selected here is a core set of numerical methods that should be familiar to computational psychometricians and considers whitening transforms for dealing with correlated data, computational concepts for linear models, multivariable integration, and optimization techniques.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-08-17T01:18:44Z
      DOI: 10.3102/10769986221116905
       
  • Cognitive Diagnosis Modeling Incorporating Response Times and Fixation
           Counts: Providing Comprehensive Feedback and Accurate Diagnosis

    • Free pre-print version: Loading...

      Authors: Peida Zhan*, Kaiwen Man*, Stefanie A. Wind, Jonathan Malone
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Respondents’ problem-solving behaviors comprise behaviors that represent complicated cognitive processes that are frequently systematically tied to one another. Biometric data, such as visual fixation counts (FCs), which are an important eye-tracking indicator, can be combined with other types of variables that reflect different aspects of problem-solving behavior to quantify variability in problem-solving behavior. To provide comprehensive feedback and accurate diagnosis when using such multimodal data, the present study proposes a multimodal joint cognitive diagnosis model that accounts for latent attributes, latent ability, processing speed, and visual engagement by simultaneously modeling response accuracy (RA), response times, and FCs. We used two simulation studies to test the feasibility of the proposed model. Findings mainly suggest that the parameters of the proposed model can be well recovered and that modeling FCs, in addition to RA and response times, could increase the comprehensiveness of feedback on problem-solving-related cognitive characteristics as well as the accuracy of knowledge structure diagnosis. An empirical example is used to demonstrate the applicability and benefits of the proposed model. We discuss the implications of our findings as they relate to research and practice.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-07-29T06:59:48Z
      DOI: 10.3102/10769986221111085
       
  • Testing Differential Item Functioning Without Predefined Anchor Items
           Using Robust Regression

    • Free pre-print version: Loading...

      Authors: Weimeng Wang, Yang Liu, Hongyun Liu
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Differential item functioning (DIF) occurs when the probability of endorsing an item differs across groups for individuals with the same latent trait level. The presence of DIF items may jeopardize the validity of an instrument; therefore, it is crucial to identify DIF items in routine operations of educational assessment. While DIF detection procedures based on item response theory (IRT) have been widely used, a majority of IRT-based DIF tests assume predefined anchor (i.e., DIF-free) items. Not only is this assumption strong, but violations to it may also lead to erroneous inferences, for example, an inflated Type I error rate. We propose a general framework to define the effect sizes of DIF without a priori knowledge of anchor items. In particular, we quantify DIF by item-specific residuals from a regression model fitted to the true item parameters in respective groups. Moreover, the null distribution of the proposed test statistic using robust estimator can be derived analytically or approximated numerically even when there is a mix of DIF and non-DIF items, which yields asymptotically justified statistical inference. The Type I error rate and the power performance of the proposed procedure are evaluated and compared with the conventional likelihood-ratio DIF tests in a Monte Carlo experiment. Our simulation study has shown promising results in controlling Type I error rate and power of detecting DIF items. Even when there is a mix of DIF and non-DIF items, the true and false alarm rate can be well controlled when a robust regression estimator is used.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-07-19T05:08:49Z
      DOI: 10.3102/10769986221109208
       
  • Zero and One Inflated Item Response Theory Models for Bounded Continuous
           Data

    • Free pre-print version: Loading...

      Authors: Dylan Molenaar, Mariana Cúri; Jorge L. Bazán
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Bounded continuous data are encountered in many applications of item response theory, including the measurement of mood, personality, and response times and in the analyses of summed item scores. Although different item response theory models exist to analyze such bounded continuous data, most models assume the data to be in an open interval and cannot accommodate data in a closed interval. As a result, ad hoc transformations are needed to prevent scores on the bounds of the observed variables. To motivate the present study, we demonstrate in real and simulated data that this practice of fitting open interval models to closed interval data can majorly affect parameter estimates even in cases with only 5% of the responses on one of the bounds of the observed variables. To address this problem, we propose a zero and one inflated item response theory modeling framework for bounded continuous responses in the closed interval. We illustrate how four existing models for bounded responses from the literature can be accommodated in the framework. The resulting zero and one inflated item response theory models are studied in a simulation study and a real data application to investigate parameter recovery, model fit, and the consequences of fitting the incorrect distribution to the data. We find that neglecting the bounded nature of the data biases parameters and that misspecification of the exact distribution may affect the results depending on the data generating model.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-07-15T07:14:54Z
      DOI: 10.3102/10769986221108455
       
  • Pooling Interactions Into Error Terms in Multisite Experiments

    • Free pre-print version: Loading...

      Authors: Wendy Chan, Larry Vernon Hedges
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Multisite field experiments using the (generalized) randomized block design that assign treatments to individuals within sites are common in education and the social sciences. Under this design, there are two possible estimands of interest and they differ based on whether sites or blocks have fixed or random effects. When the average treatment effect is assumed to be identical across sites, it is common to omit site by treatment interactions and “pool” them into the error term in classical experimental design. However, prior work has not addressed the consequences of pooling when site by treatment interactions are not zero. This study assesses the impact of pooling on inference in the presence of nonzero site by treatment interactions. We derive the small sample distributions of the test statistics for treatment effects under pooling and illustrate the impacts on rejection rates when interactions are not zero. We use the results to offer recommendations to researchers conducting studies based on the multisite design.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-07-05T05:45:32Z
      DOI: 10.3102/10769986221104800
       
  • Statistical Power for Estimating Treatment Effects Using
           Difference-in-Differences and Comparative Interrupted Time Series
           Estimators With Variation in Treatment Timing

    • Free pre-print version: Loading...

      Authors: Peter Z. Schochet
      First page: 367
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      This article develops new closed-form variance expressions for power analyses for commonly used difference-in-differences (DID) and comparative interrupted time series (CITS) panel data estimators. The main contribution is to incorporate variation in treatment timing into the analysis. The power formulas also account for other key design features that arise in practice: autocorrelated errors, unequal measurement intervals, and clustering due to the unit of treatment assignment. We consider power formulas for both cross-sectional and longitudinal models and allow for covariates. An illustrative power analysis provides guidance on appropriate sample sizes. The key finding is that accounting for treatment timing increases required sample sizes. Further, DID estimators have considerably more power than standard CITS and ITS estimators. An available Shiny R dashboard performs the sample size calculations for the considered estimators.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-02-08T09:21:21Z
      DOI: 10.3102/10769986211070625
       
  • A Critical View on the NEAT Equating Design: Statistical Modeling and
           Identifiability Problems

    • Free pre-print version: Loading...

      Authors: Ernesto San Martín, Jorge González
      First page: 406
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      The nonequivalent groups with anchor test (NEAT) design is widely used in test equating. Under this design, two groups of examinees are administered different test forms with each test form containing a subset of common items. Because test takers from different groups are assigned only one test form, missing score data emerge by design rendering some of the score distributions unavailable. The partially observed score data formally lead to an identifiability problem, which has not been recognized as such in the equating literature and has been considered from different perspectives, all of them making different assumptions in order to estimate the unidentified score distributions. In this article, we formally specify the statistical model underlying the NEAT design and unveil the lack of identifiability of the parameters of interest that compose the equating transformation. We use the theory of partial identification to show alternatives to traditional practices that have been proposed to identify the score distributions when conducting equating under the NEAT design.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-04-29T08:58:22Z
      DOI: 10.3102/10769986221090609
       
  • Statistical Inference for G-indices of Agreement

    • Free pre-print version: Loading...

      Authors: Douglas G. Bonett
      First page: 438
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      The limitations of Cohen’s κ are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical inference methods for some important special cases of the multirater design also are derived. G-index meta-analysis methods are proposed and can be used to combine and compare agreement across two or more populations. Closed-form sample-size formulas to achieve desired confidence interval precision are proposed for two-rater and multirater designs. R functions are given for all results.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-04-29T08:56:19Z
      DOI: 10.3102/10769986221088561
       
  • Regression Discontinuity Designs With an Ordinal Running Variable:
           Evaluating the Effects of Extended Time Accommodations for
           English-Language Learners

    • Free pre-print version: Loading...

      Authors: Youmi Suk, Peter M. Steiner, Jee-Seon Kim, Hyunseung Kang
      First page: 459
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Regression discontinuity (RD) designs are commonly used for program evaluation with continuous treatment assignment variables. But in practice, treatment assignment is frequently based on ordinal variables. In this study, we propose an RD design with an ordinal running variable to assess the effects of extended time accommodations (ETA) for English-language learners (ELLs). ETA eligibility is determined by ordinal ELL English-proficiency categories of National Assessment of Educational Progress data. We discuss the identification and estimation of the average treatment effect (ATE), intent-to-treat effect, and the local ATE at the cutoff. We also propose a series of sensitivity analyses to probe the effect estimates’ robustness to the choices of scaling functions and cutoff scores and remaining confounding.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-04-27T08:57:05Z
      DOI: 10.3102/10769986221090275
       
  • Two Statistical Tests for the Detection of Item Compromise

    • Free pre-print version: Loading...

      Authors: Wim J. van der Linden
      First page: 485
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Two independent statistical tests of item compromise are presented, one based on the test takers’ responses and the other on their response times (RTs) on the same items. The tests can be used to monitor an item in real time during online continuous testing but are also applicable as part of post hoc forensic analysis. The two test statistics are simple intuitive quantities as the sum of the responses and RTs observed for the test takers on the item. Common features of the tests are ease of interpretation and computational simplicity. Both tests are uniformly most powerful under the assumption of known ability and speed parameters for the test takers. Examples of power functions for items with realistic parameter values suggest maximum power for 20–30 test takers with item preknowledge for the response-based test and 10–20 test takers for the RT-based test.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-05-12T06:36:34Z
      DOI: 10.3102/10769986221094789
       
  • Jenss–Bayley Latent Change Score Model With Individual Ratio of the
           Growth Acceleration in the Framework of Individual Measurement Occasions

    • Free pre-print version: Loading...

      Authors: Jin Liu
      First page: 507
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Longitudinal data analysis has been widely employed to examine between-individual differences in within-individual changes. One challenge of such analyses is that the rate-of-change is only available indirectly when change patterns are nonlinear with respect to time. Latent change score models (LCSMs), which can be employed to investigate the change in rate-of-change at the individual level, have been developed to address this challenge. We extend an existing LCSM with the Jenss–Bayley growth curve and propose a novel expression for change scores that allows for (1) unequally spaced study waves and (2) individual measurement occasions around each wave. We also extend the existing model to estimate the individual ratio of the growth acceleration (that largely determines the trajectory shape and is viewed as the most important parameter in the Jenss–Bayley model). We present the proposed model by a simulation study and a real-world data analysis. Our simulation study demonstrates that the proposed model can estimate the parameters unbiasedly and precisely and exhibit target confidence interval coverage. The simulation study also shows that the proposed model with the novel expression for the change scores outperforms the existing model. An empirical example using longitudinal reading scores shows that the model can estimate the individual ratio of the growth acceleration and generate individual rate-of-change in practice. We also provide the corresponding code for the proposed model.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-06-07T05:26:39Z
      DOI: 10.3102/10769986221099919
       
  • Improving Accuracy and Stability of Aggregate Student Growth Measures
           Using Empirical Best Linear Prediction

    • Free pre-print version: Loading...

      Authors: J. R. Lockwood, Katherine E. Castellano, Daniel F. McCaffrey
      First page: 544
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      Many states and school districts in the United States use standardized test scores to compute annual measures of student achievement progress and then use school-level averages of these growth measures for various reporting and diagnostic purposes. These aggregate growth measures can vary consequentially from year to year for the same school, complicating their use and interpretation. We develop a method, based on the theory of empirical best linear prediction, to improve the accuracy and stability of aggregate growth measures by pooling information across grades, years, and tested subjects for individual schools. We demonstrate the performance of the method using both simulation and application to 6 years of annual growth measures from a large, urban school district. We provide code for implementing the method in the package schoolgrowth for the R environment.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-06-28T05:23:55Z
      DOI: 10.3102/10769986221101624
       
  • Speed–Accuracy Trade-Off' Not So Fast: Marginal Changes in Speed
           Have Inconsistent Relationships With Accuracy in Real-World Settings

    • Free pre-print version: Loading...

      Authors: Benjamin W. Domingue, Klint Kanopka, Ben Stenhaug, Michael J. Sulik, Tanesia Beverly, Matthieu Brinkhuis, Ruhan Circi, Jessica Faul, Dandan Liao, Bruce McCandliss, Jelena Obradović, Chris Piech, Tenelle Porter, Project iLEAD Consortium, James Soland, Jon Weeks, Steven L. Wise, Jason Yeatman
      First page: 576
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      The speed–accuracy trade-off (SAT) suggests that time constraints reduce response accuracy. Its relevance in observational settings—where response time (RT) may not be constrained but respondent speed may still vary—is unclear. Using 29 data sets containing data from cognitive tasks, we use a flexible method for identification of the SAT (which we test in extensive simulation studies) to probe whether the SAT holds. We find inconsistent relationships between time and accuracy; marginal increases in time use for an individual do not necessarily predict increases in accuracy. Additionally, the speed–accuracy relationship may depend on the underlying difficulty of the interaction. We also consider the analysis of items and individuals; of particular interest is the observation that respondents who exhibit more within-person variation in response speed are typically of lower ability. We further find that RT is typically a weak predictor of response accuracy. Our findings document a range of empirical phenomena that should inform future modeling of RTs collected in observational settings.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-06-09T05:27:25Z
      DOI: 10.3102/10769986221099906
       
  • Forced-Choice Ranking Models for Raters’ Ranking Data

    • Free pre-print version: Loading...

      Authors: Su-Pin Hung, Hung-Yu Huang
      First page: 603
      Abstract: Journal of Educational and Behavioral Statistics, Ahead of Print.
      To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees’ performance also contribute to rater bias or errors; consequently, forced-choice items have recently been employed for raters to rate how a ratee performs in certain defined traits. This study develops forced-choice ranking models (FCRMs) for data analysis when performance is evaluated by external raters or experts in a forced-choice ranking format. The proposed FCRMs consider different degrees of raters’ leniency/severity when modeling the selection probability in the generalized unfolding item response theory framework. They include an additional topic facet when multiple tasks are evaluated and incorporate variations in leniency parameters to capture the interactions between ratees and raters. The simulation results indicate that the parameters of the new models can be satisfactorily recovered and that better parameter recovery is associated with more item blocks, larger sample sizes, and a complete ranking design. A technological creativity assessment is presented as an empirical example with which to demonstrate the applicability and implications of the new models.
      Citation: Journal of Educational and Behavioral Statistics
      PubDate: 2022-07-07T07:44:00Z
      DOI: 10.3102/10769986221104207
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.201.97.26
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-