Subjects -> EDUCATION (Total: 2346 journals)
    - ADULT EDUCATION (24 journals)
    - COLLEGE AND ALUMNI (10 journals)
    - E-LEARNING (38 journals)
    - EDUCATION (1996 journals)
    - HIGHER EDUCATION (140 journals)
    - INTERNATIONAL EDUCATION PROGRAMS (4 journals)
    - ONLINE EDUCATION (42 journals)
    - SCHOOL ORGANIZATION (14 journals)
    - SPECIAL EDUCATION AND REHABILITATION (40 journals)
    - TEACHING METHODS AND CURRICULUM (38 journals)

EDUCATION (1996 journals)            First | 1 2 3 4 5 6 7 8 | Last

Showing 401 - 600 of 857 Journals sorted alphabetically
Educació i Història : Revista d'Història de l'Educació     Open Access  
Educacion     Open Access   (Followers: 1)
Educación     Open Access   (Followers: 1)
Educación Física y Ciencia     Open Access  
Educación Química     Open Access   (Followers: 1)
Educación y Educadores     Open Access  
Educación y Humanismo     Open Access  
Educación, Lenguaje y Sociedad     Open Access  
Educar     Open Access  
Educare : International Journal for Educational Studies     Open Access   (Followers: 1)
Educate~     Open Access   (Followers: 4)
Educating Young Children: Learning and Teaching in the Early Childhood Years     Full-text available via subscription   (Followers: 16)
Education     Open Access   (Followers: 9)
Education     Full-text available via subscription   (Followers: 7)
Education + Training     Hybrid Journal   (Followers: 23)
Education 3-13     Hybrid Journal   (Followers: 16)
Education and Culture     Open Access   (Followers: 7)
Education and Information Technologies     Hybrid Journal   (Followers: 53)
Education and Linguistics Research     Open Access   (Followers: 4)
Education and Society     Full-text available via subscription   (Followers: 9)
Education and the Law     Hybrid Journal   (Followers: 16)
Education and Treatment of Children     Full-text available via subscription   (Followers: 4)
Education and Urban Society     Hybrid Journal   (Followers: 14)
Education as Change     Hybrid Journal   (Followers: 13)
Education Economics     Hybrid Journal   (Followers: 8)
Éducation et francophonie     Full-text available via subscription   (Followers: 1)
Éducation et socialisation     Open Access   (Followers: 1)
Education Finance and Policy     Hybrid Journal   (Followers: 22)
Education for Chemical Engineers     Hybrid Journal   (Followers: 5)
Education for Primary Care     Full-text available via subscription   (Followers: 16)
Éducation francophone en milieu minoritaire     Open Access  
Education in the Health Professions     Open Access  
Education in the Knowledge Society     Open Access   (Followers: 1)
Education in the Knowledge Society     Open Access   (Followers: 1)
Education Inquiry     Open Access   (Followers: 6)
Education Next     Partially Free   (Followers: 1)
Education Policy Analysis Archives     Open Access   (Followers: 12)
Education Reform Journal     Open Access   (Followers: 5)
Éducation relative à l'environnement     Open Access  
Education Research International     Open Access   (Followers: 19)
Education Review // Reseñas Educativas     Open Access   (Followers: 1)
Education, Business and Society : Contemporary Middle Eastern Issues     Hybrid Journal   (Followers: 1)
Education, Citizenship and Social Justice     Hybrid Journal   (Followers: 16)
Education, Knowledge and Economy: A journal for education and social enterprise     Hybrid Journal   (Followers: 21)
Education, Research and Perspectives     Full-text available via subscription   (Followers: 15)
Educational Action Research     Hybrid Journal   (Followers: 20)
Educational Administration Quarterly     Hybrid Journal   (Followers: 19)
Educational and Developmental Psychologist     Hybrid Journal   (Followers: 9)
Educational and Psychological Measurement     Hybrid Journal   (Followers: 17)
Educational Assessment     Hybrid Journal   (Followers: 19)
Educational Assessment, Evaluation and Accountability     Hybrid Journal   (Followers: 24)
Educational Considerations     Open Access  
Educational Evaluation and Policy Analysis     Hybrid Journal   (Followers: 26)
Educational Gerontology     Hybrid Journal   (Followers: 10)
Educational Guidance and Counseling Development Journal     Open Access   (Followers: 2)
Educational Leader (Pemimpin Pendidikan)     Open Access  
Educational Management Administration & Leadership     Hybrid Journal   (Followers: 29)
Educational Measurement: Issues and Practice     Hybrid Journal   (Followers: 8)
Educational Media International     Hybrid Journal   (Followers: 5)
Educational Neuroscience     Full-text available via subscription   (Followers: 2)
Educational Policy     Hybrid Journal   (Followers: 21)
Educational Practice and Theory     Full-text available via subscription   (Followers: 4)
Educational Psychologist     Hybrid Journal   (Followers: 27)
Educational Reflective Practices     Full-text available via subscription   (Followers: 2)
Educational Research     Hybrid Journal   (Followers: 130)
Educational Research for Policy and Practice     Hybrid Journal   (Followers: 10)
Educational Research Review     Hybrid Journal   (Followers: 127)
Educational Researcher     Hybrid Journal   (Followers: 139)
Educational Review     Hybrid Journal   (Followers: 27)
Educational Studies     Hybrid Journal   (Followers: 18)
Educational Studies : A Journal of the American Educational Studies Association     Hybrid Journal   (Followers: 7)
Educational Technology Research and Development     Partially Free   (Followers: 45)
Educationis     Open Access  
Educator     Open Access  
Educazione sentimentale     Full-text available via subscription  
Edufisika : Jurnal Pendidikan Fisika     Open Access  
Edukacyjna Analiza Transakcyjna     Open Access  
Edukasi     Open Access  
Edukasi : Jurnal Pendidikan Islam     Open Access  
Edukasi Journal     Open Access  
EduLite : Journal of English Education, Literature and Culture     Open Access  
Edumatica : Jurnal Pendidikan Matematika     Open Access  
EduMatSains     Open Access  
Edunomic Jurnal Pendidikan Ekonomi     Open Access  
edureligia : Pendidikan Agama Islam i     Open Access  
EduSol     Open Access  
Edutech     Open Access  
Eesti Haridusteaduste Ajakiri. Estonian Journal of Education     Open Access  
Effective Education     Hybrid Journal   (Followers: 5)
EĞİTİM VE BİLİM     Open Access   (Followers: 1)
Ejovoc (Electronic Journal of Vocational Colleges)     Open Access  
eJRIEPS : Ejournal de la recherche sur l'intervention en éducation physique et sport     Open Access  
Eklektika : Jurnal Pemikiran dan Penelitian Administrasi Pendidikan     Open Access  
El Guiniguada. Revista de investigaciones y experiencias en Ciencias de la Educación     Open Access  
El-Hikmah     Open Access   (Followers: 1)
Electronic Journal of Education Sciences     Open Access   (Followers: 1)
Electronic Journal of Research in Educational Psychology / Revista Electrónica de Investigación Psicoeducativa y Psicopedagógica     Open Access   (Followers: 8)
Elementary School Journal     Full-text available via subscription   (Followers: 5)
Elementary School Journal PGSD FIP UNIMED     Open Access  
ELT Forum : Journal of English Language Teaching     Open Access   (Followers: 11)
ELT Journal     Hybrid Journal   (Followers: 25)
ELT Worldwide     Open Access  
ELT-Lectura     Open Access  
Eltin Journal : Journal of English Language Teaching in Indonesia     Open Access  
Em Teia : Revista de Educação Matemática e Tecnológica Iberoamericana     Open Access  
Emotional and Behavioural Difficulties     Hybrid Journal   (Followers: 6)
En Blanco y Negro     Open Access  
En Líneas Generales     Open Access  
Encounters in Theory and History of Education     Open Access  
Encuentro Educacional     Open Access  
Encuentros     Open Access  
Encuentros : Revista de Ciencias Humanas, Teoría Social y Pensamiento Crítico     Open Access  
Encuentros Multidisciplinares     Open Access  
Engaged Scholar Journal : Community-Engaged Research, Teaching, and Learning     Open Access  
English Education Journal     Open Access   (Followers: 2)
English for Specific Purposes     Hybrid Journal   (Followers: 12)
English Franca : Academic Journal of English Language and Education     Open Access  
English in Aotearoa     Full-text available via subscription   (Followers: 2)
English in Australia     Full-text available via subscription   (Followers: 2)
English in Education     Hybrid Journal   (Followers: 12)
English Language Teaching     Open Access   (Followers: 29)
English Teaching & Learning     Hybrid Journal   (Followers: 5)
English Teaching: Practice & Critique     Hybrid Journal   (Followers: 1)
Englisia Journal     Open Access  
Enlace Universitario     Open Access  
Enletawa Journal     Open Access  
Enrollment Management Report     Hybrid Journal   (Followers: 1)
Ensaio Avaliação e Políticas Públicas em Educação     Open Access  
Ensaio Pesquisa em Educação em Ciências     Open Access  
Ensayos : Revista de la Facultad de Educación de Albacete     Open Access  
Ensayos Pedagógicos     Open Access  
Enseñanza de las Ciencias : Revista de Investigación y Experiencias Didácticas     Open Access  
Enseñanza de las Ciencias Sociales     Open Access  
Ensino em Perspectivas     Open Access  
Entramados : educación y sociedad     Open Access  
Entrelinhas     Open Access  
Entrepreneurship Education     Hybrid Journal   (Followers: 1)
Entrepreneurship Education and Pedagogy (EE&P)     Full-text available via subscription   (Followers: 1)
Environmental Education Research     Hybrid Journal   (Followers: 16)
Equine Veterinary Education     Hybrid Journal   (Followers: 10)
Equity & Excellence in Education     Hybrid Journal   (Followers: 11)
Erciyes Journal of Education     Open Access   (Followers: 1)
Erwachsenenbildung     Full-text available via subscription  
Escuela Abierta     Partially Free  
Espacio, Tiempo y Educación     Open Access  
Espacios en Blanco : Revista de educación     Open Access  
Estudios Pedagogicos (Valdivia)     Open Access  
Estudios sobre Educación     Open Access  
Estudos Históricos     Open Access   (Followers: 1)
ETD - Educação Temática Digital     Open Access  
Eternal (English, Teaching, Learning & Research Journal)     Open Access   (Followers: 2)
Ethics and Education     Hybrid Journal   (Followers: 14)
Ethiopian Journal of Education and Sciences     Open Access   (Followers: 5)
Éthique en éducation et en formation : Les Dossiers du GREE     Open Access  
Ethnography and Education: New for 2006     Hybrid Journal   (Followers: 10)
Euclid     Open Access  
European Early Childhood Education Research Journal     Hybrid Journal   (Followers: 13)
European Education     Full-text available via subscription   (Followers: 8)
European Educational Research Journal     Full-text available via subscription   (Followers: 18)
European Journal of Education     Hybrid Journal   (Followers: 32)
European Journal of Engineering Education     Hybrid Journal   (Followers: 9)
European Journal of Investigation in Health, Psychology and Education     Open Access   (Followers: 5)
European Journal of Open, Distance and E-Learning     Open Access   (Followers: 5)
European Journal of Open, Distance and E-Learning - EURODL     Open Access   (Followers: 11)
European Journal of Psychology of Education     Hybrid Journal   (Followers: 7)
European Journal of Special Needs Education     Hybrid Journal   (Followers: 11)
European Journal of Teacher Education     Hybrid Journal   (Followers: 25)
European Physical Education Review     Hybrid Journal   (Followers: 8)
Evaluation     Hybrid Journal   (Followers: 20)
Evaluation & Research in Education     Hybrid Journal   (Followers: 18)
Evolution : Education and Outreach     Open Access   (Followers: 3)
Exceptionality     Hybrid Journal   (Followers: 2)
Extensão em Ação     Open Access  
Extensio : Revista Eletrônica de Extensão     Open Access  
Facets     Open Access  
FAISCA. Revista de Altas Capacidades     Open Access  
Fawawi : English Education Journal     Open Access  
FEM : Revista de la Fundación Educación Médica     Open Access  
Feminist Teacher     Full-text available via subscription   (Followers: 1)
Filosofia e Educação     Open Access  
Filozofia Publiczna i Edukacja Demokratyczna     Open Access  
Fırat Üniversitesi Sosyal Bilimler Dergisi     Open Access  
FIRE : Forum of International Research in Education     Open Access   (Followers: 1)
First Opinions-Second Reactions (FOSR)     Open Access  
Florea : Jurnal Biologi dan Pembelajarannya     Open Access  
Florida Journal of Educational Research     Open Access   (Followers: 1)
Focus on Autism and Other Developmental Disabilities     Hybrid Journal   (Followers: 18)
Focus on Exceptional Children     Open Access  
Focus on Health Professional Education : A Multi-disciplinary Journal     Full-text available via subscription   (Followers: 6)
Fokus Konseling     Open Access  
Form@re - Open Journal per la formazione in rete     Open Access  
Formação Docente : Associação Nacional de Pós-Graduação e Pesquisa em Educação     Open Access  
Foro de Educación     Open Access  
Foro de Profesores de E/LE     Open Access  
FORUM     Open Access  
Forum Oświatowe     Open Access  
Frontiers in Education     Open Access   (Followers: 4)
Frontline     Full-text available via subscription   (Followers: 18)
Frontline Learning Research     Open Access   (Followers: 2)
Frühe Bildung     Hybrid Journal   (Followers: 3)

  First | 1 2 3 4 5 6 7 8 | Last

Similar Journals
Journal Cover
Educational and Psychological Measurement
Journal Prestige (SJR): 1.588
Citation Impact (citeScore): 2
Number of Followers: 17  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0013-1644 - ISSN (Online) 1552-3888
Published by Sage Publications Homepage  [1174 journals]
  • Is the Area Under Curve Appropriate for Evaluating the Fit of Psychometric
           Models'

    • Free pre-print version: Loading...

      Authors: Yuting Han, Jihong Zhang, Zhehan Jiang, Dexin Shi
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      In the literature of modern psychometric modeling, mostly related to item response theory (IRT), the fit of model is evaluated through known indices, such as χ2, M2, and root mean square error of approximation (RMSEA) for absolute assessments as well as Akaike information criterion (AIC), consistent AIC (CAIC), and Bayesian information criterion (BIC) for relative comparisons. Recent developments show a merging trend of psychometric and machine learnings, yet there remains a gap in the model fit evaluation, specifically the use of the area under curve (AUC). This study focuses on the behaviors of AUC in fitting IRT models. Rounds of simulations were conducted to investigate AUC’s appropriateness (e.g., power and Type I error rate) under various conditions. The results show that AUC possessed certain advantages under certain conditions such as high-dimensional structure with two-parameter logistic (2PL) and some three-parameter logistic (3PL) models, while disadvantages were also obvious when the true model is unidimensional. It cautions researchers about the dangers of using AUC solely in evaluating psychometric models.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-05-25T06:31:44Z
      DOI: 10.1177/00131644221098182
       
  • Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks

    • Free pre-print version: Loading...

      Authors: Matthias von Davier, Lillian Tyack, Lale Khorramdel
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-05-24T06:37:18Z
      DOI: 10.1177/00131644221098021
       
  • The Impact of Sample Size and Various Other Factors on Estimation of
           Dichotomous Mixture IRT Models

    • Free pre-print version: Loading...

      Authors: Sedat Sen, Allan S. Cohen
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-05-20T04:33:58Z
      DOI: 10.1177/00131644221094325
       
  • Investigating Confidence Intervals of Item Parameters When Some Item
           Parameters Take Priors in the 2PL and 3PL Models

    • Free pre-print version: Loading...

      Authors: Insu Paek, Zhongtian Lin, Robert Philip Chalmers
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori (MMAP) and posterior standard error (PSE) are estimated. Confidence intervals (CIs) for these parameters and other parameters which did not take any priors were investigated with popular prior distributions, different error covariance estimation methods, test lengths, and sample sizes. A seemingly paradoxical result was that, when priors were taken, the conditions of the error covariance estimation methods known to be better in the literature (Louis or Oakes method in this study) did not yield the best results for the CI performance, while the conditions of the cross-product method for the error covariance estimation which has the tendency of upward bias in estimating the standard errors exhibited better CI performance. Other important findings for the CI performance are also discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-05-16T11:37:52Z
      DOI: 10.1177/00131644221096431
       
  • Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis

    • Free pre-print version: Loading...

      Authors: E. Damiano D’Urso, Jesper Tijmstra, Jeroen K. Vermunt, Kim De Roover
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals’ latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these psychometric properties, where the number of measured constructs (i.e., factors) is assessed, and, afterward, rotational freedom is resolved to interpret these factors. This study assessed the effects of an acquiescence response style (ARS) on EFA for unidimensional and multidimensional (un)balanced scales. Specifically, we evaluated (a) whether ARS is captured as an additional factor, (b) the effect of different rotation approaches on the content and ARS factors recovery, and (c) the effect of extracting the additional ARS factor on the recovery of factor loadings. ARS was often captured as an additional factor in balanced scales when it was strong. For these scales, ignoring extracting this additional ARS factor, or rotating to simple structure when extracting it, harmed the recovery of the original MM by introducing bias in loadings and cross-loadings. These issues were avoided by using informed rotation approaches (i.e., target rotation), where (part of) the rotation target is specified according to a priori expectations on the MM. Not extracting the additional ARS factor did not affect the loading recovery in unbalanced scales. Researchers should consider the potential presence of ARS when assessing the psychometric properties of balanced scales and use informed rotation approaches when suspecting that an additional factor is an ARS factor.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-05-16T11:35:07Z
      DOI: 10.1177/00131644221089857
       
  • Evaluating the Quality of Classification in Mixture Model Simulations

    • Free pre-print version: Loading...

      Authors: Yoona Jang, Sehee Hong
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations, it was determined that models without a covariate better predicted the number of classes. These findings in general supported the use of the popular three-step approach; with its quality of classification determined to be more than 70% under various conditions of covariate effect, sample size, and quality of indicators. In light of these findings, the practical utility of evaluating classification quality is discussed relative to issues that applied researchers need to carefully consider when applying latent class models.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-04-29T12:27:42Z
      DOI: 10.1177/00131644221093619
       
  • Summary Intervals for Model-Based Classification Accuracy and Consistency
           Indices

    • Free pre-print version: Loading...

      Authors: Oscar Gonzalez
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-04-29T06:48:03Z
      DOI: 10.1177/00131644221092347
       
  • On Bank Assembly and Block Selection in Multidimensional Forced-Choice
           Adaptive Assessments

    • Free pre-print version: Loading...

      Authors: Rodrigo S. Kreitchmann, Miguel A. Sorrel, Francisco J. Abad
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-04-29T06:31:20Z
      DOI: 10.1177/00131644221087986
       
  • Performance of Coefficient Alpha and Its Alternatives: Effects of
           Different Types of Non-Normality

    • Free pre-print version: Loading...

      Authors: Leifeng Xiao, Kit-Tai Hau
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle’s omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient H) with continuous and discrete data having different types of non-normality. Results showed the estimation bias was acceptable for continuous data with varying degrees of non-normality when the scales were strong (high loadings). This bias, however, became quite large with moderate strength scales and increased with increasing non-normality. For Likert-type scales, other than omega h, most indices were acceptable with non-normal data having at least four points, and more points were better. For different exponential distributed data, omega RT and GLB were robust, whereas the bias of other indices for binomial-beta distribution was generally large. An examination of an authentic large-scale international survey suggested that its items were at worst moderately non-normal; hence, non-normality was not a big concern. We recommend (a) the demand for continuous and normally distributed data for alpha may not be necessary for less severely non-normal data; (b) for severely non-normal data, we should have at least four scale points, and more points are better; and (c) there is no single golden standard for all data types, other issues such as scale loading, model structure, or scale length are also important.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-04-11T09:41:08Z
      DOI: 10.1177/00131644221088240
       
  • Croon’s Bias-Corrected Estimation for Multilevel Structural Equation
           Models with Non-Normal Indicators and Model Misspecifications

    • Free pre-print version: Loading...

      Authors: Kyle Cox, Benjamin Kelcey
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Multilevel structural equation models (MSEMs) are well suited for educational research because they accommodate complex systems involving latent variables in multilevel settings. Estimation using Croon’s bias-corrected factor score (BCFS) path estimation has recently been extended to MSEMs and demonstrated promise with limited sample sizes. This makes it well suited for planned educational research which often involves sample sizes constrained by logistical and financial factors. However, the performance of BCFS estimation with MSEMs has yet to be thoroughly explored under common but difficult conditions including in the presence of non-normal indicators and model misspecifications. We conducted two simulation studies to evaluate the accuracy and efficiency of the estimator under these conditions. Results suggest that BCFS estimation of MSEMs is often more dependable, more efficient, and less biased than other estimation approaches when sample sizes are limited or model misspecifications are present but is more susceptible to indicator non-normality. These results support, supplement, and elucidate previous literature describing the effective performance of BCFS estimation encouraging its utilization as an alternative or supplemental estimator for MSEMs.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-12T06:24:56Z
      DOI: 10.1177/00131644221080451
       
  • Range Restriction Affects Factor Analysis: Normality, Estimation, Fit,
           Loadings, and Reliability

    • Free pre-print version: Loading...

      Authors: Alicia Franco-Martínez, Jesús M. Alvarado, Miguel A. Sorrel
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how this problem affects different outputs of the factor analysis: multivariate normality (MVN), estimation process, goodness-of-fit, recovery of factor loadings, and reliability. In doing so, a Monte Carlo study was conducted. Data were generated following the linear selective sampling model, simulating tests varying their sample size ([math] = 200 and 500 cases), test size ([math] = 6, 12, 18, and 24 items), loading size ([math] = .50, .70, and .90), and restriction size (from [math] = 1, .90, .80, and so on till .10 selection ratio). Our results systematically suggest that an interaction between decreasing the loading size and increasing the restriction size affects the MVN assessment, obstructs the estimation process, and leads to an underestimation of the factor loadings and reliability. However, most of the MVN tests and most of the fit indices employed were nonsensitive to the RR problem. We provide some recommendations to applied researchers.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-10T09:24:55Z
      DOI: 10.1177/00131644221081867
       
  • Resolving Dimensionality in a Child Assessment Tool: An Application of the
           Multilevel Bifactor Model

    • Free pre-print version: Loading...

      Authors: Hope O. Akaeze, Frank R. Lawrence, Jamie Heng-Chieh Wu
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Multidimensionality and hierarchical data structure are common in assessment data. These design features, if not accounted for, can threaten the validity of the results and inferences generated from factor analysis, a method frequently employed to assess test dimensionality. In this article, we describe and demonstrate the application of the multilevel bifactor model to address these features in examining test dimensionality. The tool for this exposition is the Child Observation Record Advantage 1.5 (COR-Adv1.5), a child assessment instrument widely used in Head Start programs. Previous studies on this assessment tool reported highly correlated factors and did not account for the nesting of children in classrooms. Results from this study show how the flexibility of the multilevel bifactor model, together with useful model-based statistics, can be harnessed to judge the dimensionality of a test instrument and inform the interpretability of the associated factor scores.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-08T07:28:14Z
      DOI: 10.1177/00131644221082688
       
  • Multidimensional Forced-Choice CAT With Dominance Items: An Empirical
           Comparison With Optimal Static Testing Under Different Desirability
           Matching

    • Free pre-print version: Loading...

      Authors: Yin Lin, Anna Brown, Paul Williams
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-08T07:26:36Z
      DOI: 10.1177/00131644221077637
       
  • Evaluation of Polytomous Item Locations in Multicomponent Measuring
           Instruments: A Note on a Latent Variable Modeling Procedure

    • Free pre-print version: Loading...

      Authors: Tenko Raykov, Martin Pusic
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      This note is concerned with evaluation of location parameters for polytomous items in multiple-component measuring instruments. A point and interval estimation procedure for these parameters is outlined that is developed within the framework of latent variable modeling. The method permits educational, behavioral, biomedical, and marketing researchers to quantify important aspects of the functioning of items with ordered multiple response options, which follow the popular graded response model. The procedure is routinely and readily applicable in empirical studies using widely circulated software and is illustrated with empirical data.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-03T05:48:38Z
      DOI: 10.1177/00131644211072829
       
  • Assessing Ability Recovery of the Sequential IRT Model With Unstructured
           Multiple-Attempt Data

    • Free pre-print version: Loading...

      Authors: Ziying Li, A. Corinne Huggins-Manley, Walter L. Leite, M. David Miller, Eric A. Wright
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The unstructured multiple-attempt (MA) item response data in virtual learning environments (VLEs) are often from student-selected assessment data sets, which include missing data, single-attempt responses, multiple-attempt responses, and unknown growth ability across attempts, leading to a complex and complicated scenario for using this kind of data set as a whole in the practice of educational measurement. It is critical that methods be available for measuring ability from VLE data to improve VLE systems, monitor student progress in instructional settings, and conduct educational research. The purpose of this study is to explore the ability recovery of the multidimensional sequential 2-PL IRT model in unstructured MA data from VLEs. We conduct a simulation study to evaluate the effects of the magnitude of ability growth and the proportion of students who make two attempts, as well as the moderated effects of sample size, test length, and missingness, on the bias and root mean square error of ability estimates. Results show that the model poses promise for evaluating ability in unstructured VLE data, but that some data conditions can result in biased ability estimates.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-03T05:47:13Z
      DOI: 10.1177/00131644211058386
       
  • Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

    • Free pre-print version: Loading...

      Authors: James D. Weese, Ronna C. Turner, Xinya Liang, Allison Ames, Brandon Crawford
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese’s standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia’s recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl’s suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-01T06:52:09Z
      DOI: 10.1177/00131644221081011
       
  • Power Analysis for Moderator Effects in Longitudinal Cluster Randomized
           Designs

    • Free pre-print version: Loading...

      Authors: Wei Li, Spyros Konstantopoulos
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Cluster randomized control trials often incorporate a longitudinal component where, for example, students are followed over time and student outcomes are measured repeatedly. Besides examining how intervention effects induce changes in outcomes, researchers are sometimes also interested in exploring whether intervention effects on outcomes are modified by moderator variables at the individual (e.g., gender, race/ethnicity) and/or the cluster level (e.g., school urbanicity) over time. This study provides methods for statistical power analysis of moderator effects in two- and three-level longitudinal cluster randomized designs. Power computations take into account clustering effects, the number of measurement occasions, the impact of sample sizes at different levels, covariates effects, and the variance of the moderator variable. Illustrative examples are offered to demonstrate the applicability of the methods.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-01T06:49:50Z
      DOI: 10.1177/00131644221077359
       
  • A New Stopping Criterion for Rasch Trees Based on the Mantel–Haenszel
           Effect Size Measure for Differential Item Functioning

    • Free pre-print version: Loading...

      Authors: Mirka Henninger, Rudolf Debelak, Carolin Strobl
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel–Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel–Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel–Haenszel procedure on tree stopping and DIF item classification.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-03-01T06:48:03Z
      DOI: 10.1177/00131644221077135
       
  • Coefficients of Factor Score Determinacy for Mean Plausible Values of
           Bayesian Factor Analysis

    • Free pre-print version: Loading...

      Authors: André Beauducel, Norbert Hilger
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      In the context of Bayesian factor analysis, it is possible to compute plausible values, which might be used as covariates or predictors or to provide individual scores for the Bayesian latent variables. Previous simulation studies ascertained the validity of mean plausible values by the mean squared difference of the mean plausible values and the generating factor scores. However, the mean correlation of sets of single plausible values of different factors were shown to be an adequate estimator of the correlation between factors. Using sets of single plausible values to compute a mean prediction in secondary analysis implies that their determinacy should be known. Therefore, a plausible value-based determinacy coefficient allowing for estimation of the determinacy of single plausible values was proposed and evaluated by means of two simulation studies. The first simulation study demonstrated that the plausible value-based determinacy coefficient is an adequate estimate of the correlation of single plausible values with the population factor. It is also shown that the plausible value-based determinacy coefficient of mean plausible values approaches the conventional, model parameter-based determinacy coefficient with increasing number of imputations. The second simulation study revealed that the plausible value-based determinacy coefficient and the model parameter-based determinacy coefficient yield similar results even for misspecified models in small samples. It also revealed that for small sample sizes and a small salient loading size, the coefficients of determinacy overestimate the validity, so that it is recommended to report the determinacy coefficients together with a bias-correction to estimate the validity of plausible values in empirical settings.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-02-22T07:18:21Z
      DOI: 10.1177/00131644221078960
       
  • Assessing Essential Unidimensionality of Scales and Structural Coefficient
           Bias

    • Free pre-print version: Loading...

      Authors: Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Percentage of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (ωH) have been used to assess the degree to which a scale is essentially unidimensional and to predict structural coefficient bias when a unidimensional measurement model is fit to multidimensional data. The usefulness of these indices has been investigated in the context of bifactor models with balanced structures. This study extends the examination by focusing on bifactor models with unbalanced structures. The maximum and minimum PUC values given the total number of items and factors were derived. The usefulness of PUC, ECV, and ωH in predicting structural coefficient bias was examined under a variety of structural regression models with bifactor measurement components. Results indicated that the performance of these indices in predicting structural coefficient bias depended on whether the bifactor measurement model had a balanced or unbalanced structure. PUC failed to predict structural coefficient bias when the bifactor model had an unbalanced structure. ECV performed reasonably well, but worse than ωH.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-02-08T10:16:27Z
      DOI: 10.1177/00131644221075580
       
  • Using Simulated Annealing to Investigate Sensitivity of SEM to External
           Model Misspecification

    • Free pre-print version: Loading...

      Authors: Charles L. Fisk, Jeffrey R. Harring, Zuchao Shen, Walter Leite, King Yiu Suen, Katerina M. Marcoulides
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Sensitivity analyses encompass a broad set of post-analytic techniques that are characterized as measuring the potential impact of any factor that has an effect on some output variables of a model. This research focuses on the utility of the simulated annealing algorithm to automatically identify path configurations and parameter values of omitted confounders in structural equation modeling (SEM). An empirical example based on a past published study is used to illustrate how strongly related an omitted variable must be to model variables for the conclusions of an analysis to change. The algorithm is outlined in detail and the results stemming from the sensitivity analysis are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-31T11:07:05Z
      DOI: 10.1177/00131644211073121
       
  • Developing Situated Measures of Science Instruction Through an Innovative
           Electronic Portfolio App for Mobile Devices: Reliability, Validity, and
           Feasibility

    • Free pre-print version: Loading...

      Authors: José Felipe Martínez, Matt Kloser, Jayashri Srinivasan, Brian Stecher, Amanda Edelman
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Adoption of new instructional standards in science demands high-quality information about classroom practice. Teacher portfolios can be used to assess instructional practice and support teacher self-reflection anchored in authentic evidence from classrooms. This study investigated a new type of electronic portfolio tool that allows efficient capture of classroom artifacts in multimedia formats using mobile devices. We assess the psychometric properties of measures of quality instruction in middle school science classrooms derived from the contents of portfolios collected using this novel tool—with instruction operationalized through dimensions aligned to the Next Generation Science Standards. Results reflect low rater error and adequate reliability for several dimensions, a dominant underlying factor, and significant relations to some relevant concurrent indicators. Although no relation was found to student standardized test scores or course grades, portfolio ratings did relate to student self-efficacy perceptions and enjoyment of science. We examine factors influencing measurement error, and consider the broader implications of the results for assessing the validity of portfolio score interpretations, and the feasibility and potential value of this type of tool for summative and formative uses, in the context of large-scale instructional improvement efforts.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-25T04:56:19Z
      DOI: 10.1177/00131644211064923
       
  • Identifying Ability and Nonability Groups: Incorporating Response Times
           Using Mixture Modeling

    • Free pre-print version: Loading...

      Authors: Georgios Sideridis, Ioannis Tsaousis, Khaleel Al-Harbi
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The goal of the present study was to address the analytical complexity of incorporating responses and response times through applying the Jeon and De Boeck mixture item response theory model in Mplus 8.7. Using both simulated and real data, we attempt to identify subgroups of responders that are rapid guessers or engage knowledge retrieval strategies. When applying the mixture model to a measure of contextual error in linguistics results pointed to the presence of a knowledge retrieval strategy. That is, a participant either knows the content (morphology, grammar rules) and can identify the error, or lacks the requisite knowledge and cannot benefit from spending more time on an item. In contrast, as item difficulty progressed, the high-ability group utilized the additional time to make informed guesses. The methodology is illustrated using annotated code in Mplus 8.7.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-20T11:34:51Z
      DOI: 10.1177/00131644211072833
       
  • On Effect Size Measures for Nested Measurement Models

    • Free pre-print version: Loading...

      Authors: Tenko Raykov, Christine DiStefano, Lisa Calvocoressi, Martin Volker
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      A class of effect size indices are discussed that evaluate the degree to which two nested confirmatory factor analysis models differ from each other in terms of fit to a set of observed variables. These descriptive effect measures can be used to quantify the impact of parameter restrictions imposed in an initially considered model and are free from an explicit relationship to sample size. The described indices represent the extent to which respective linear combinations of the proportions of explained variance in the manifest variables are changed as a result of introducing the constraints. The indices reflect corresponding aspects of the impact of the restrictions and are independent of their statistical significance or lack thereof. The discussed effect size measures are readily point and interval estimated, using popular software, and their application is illustrated with numerical examples.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-20T11:29:17Z
      DOI: 10.1177/00131644211066845
       
  • Effects of Response Option Order on Likert-Type Psychometric Properties
           and Reactions

    • Free pre-print version: Loading...

      Authors: Chet Robie, Adam W. Meade, Stephen D. Risavy, Sabah Rasheed
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The effects of different response option orders on survey responses have been studied extensively. The typical research design involves examining the differences in response characteristics between conditions with the same item stems and response option orders that differ in valence—either incrementally arranged (e.g., strongly disagree to strongly agree) or decrementally arranged (e.g., strongly agree to strongly disagree). The present study added two additional experimental conditions—randomly incremental or decremental and completely randomized. All items were presented in an item-by-item format. We also extended previous studies by including an examination of response option order effects on: careless responding, correlations between focal predictors and criteria, and participant reactions, all the while controlling for false discovery rate and focusing on the size of effects. In a sample of 1,198 university students, we found little to no response option order effects on a recognized personality assessment vis-à-vis measurement equivalence, scale mean differences, item-level distributions, or participant reactions. However, the completely randomized response option order condition differed on several careless responding indices suggesting avenues for future research.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-13T11:38:12Z
      DOI: 10.1177/00131644211069406
       
  • Diagnostic Classification Model for Forced-Choice Items and Noncognitive
           Tests

    • Free pre-print version: Loading...

      Authors: Hung-Yu Huang
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-08T06:42:54Z
      DOI: 10.1177/00131644211069906
       
  • Bias for Treatment Effect by Measurement Error in Pretest in ANCOVA
           Analysis

    • Free pre-print version: Loading...

      Authors: Yasuo Miyazaki, Akihito Kamata, Kazuaki Uekawa, Yizhi Sun
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      This paper investigated consequences of measurement error in the pretest on the estimate of the treatment effect in a pretest–posttest design with the analysis of covariance (ANCOVA) model, focusing on both the direction and magnitude of its bias. Some prior studies have examined the magnitude of the bias due to measurement error and suggested ways to correct it. However, none of them clarified how the direction of bias is affected by measurement error. This study analytically derived a formula for the asymptotic bias for the treatment effect. The derived formula is a function of the reliability of the pretest, the standardized population group mean difference for the pretest, and the correlation between pretest and posttest true scores. It revealed a concerning consequence of ignoring measurement errors in pretest scores: treatment effects could be overestimated or underestimated, and positive treatment effects can be estimated as negative effects in certain conditions. A simulation study was also conducted to verify the derived bias formula.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-08T06:42:23Z
      DOI: 10.1177/00131644211068801
       
  • A Regression Discontinuity Design Framework for Controlling Selection Bias
           in Evaluations of Differential Item Functioning

    • Free pre-print version: Loading...

      Authors: Natalie A. Koziol, J. Marc Goodrich, HyeonJin Yoon
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-08T06:37:50Z
      DOI: 10.1177/00131644211068440
       
  • Testing the Performance of Level-Specific Fit Evaluation in MCFA Models
           With Different Factor Structures Across Levels

    • Free pre-print version: Loading...

      Authors: Bitna Lee, Wonsook Sohn
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      A Monte Carlo study was conducted to compare the performance of a level-specific (LS) fit evaluation with that of a simultaneous (SI) fit evaluation in multilevel confirmatory factor analysis (MCFA) models. We extended previous studies by examining their performance under MCFA models with different factor structures across levels. In addition, various design factors and interaction effects between intraclass correlation (ICC) and misspecification type (MT) on their performance were considered. The simulation results demonstrate that the LS outperformed the SI in detecting model misspecification at the between-group level even in the MCFA model with different factor structures across levels. Especially, the performance of LS fit indices depended on the ICC, group size (GS), or MT. More specifically, the results are as follows. First, the performance of root mean square error of approximation (RMSEA) was more promising in detecting misspecified between-level models as GS or ICC increased. Second, the effect of ICC on the performance of comparative fit index (CFI) or Tucker–Lewis index (TLI) depended on the MT. Third, the performance of standardized root mean squared residual (SRMR) improved as ICC increased and this pattern was more clear in structure misspecification than in measurement misspecification. Finally, the summary and implications of the results are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-07T08:21:05Z
      DOI: 10.1177/00131644211066956
       
  • Examining the Robustness of the Graded Response and 2-Parameter Logistic
           Models to Violations of Construct Normality

    • Free pre-print version: Loading...

      Authors: Patrick D. Manapat, Michael C. Edwards
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait (θ) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal θ. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed distribution where the construct is low for most people, medium for some, and high for few. Failure to account for nonnormality may compromise the validity of inferences and conclusions. Although corrections have been developed to account for nonnormality, these methods can be computationally intensive and have not yet been widely adopted. Previous research has recommended implementing nonnormality corrections when θ is not “approximately normal.” This research focused on examining how far θ can deviate from normal before the normality assumption becomes untenable. Specifically, our goal was to identify the type(s) and degree(s) of nonnormality that result in unacceptable parameter recovery for the graded response model (GRM) and 2-parameter logistic model (2PLM).
      Citation: Educational and Psychological Measurement
      PubDate: 2022-01-07T08:18:27Z
      DOI: 10.1177/00131644211063453
       
  • Exploratory Graph Analysis for Factor Retention: Simulation Results for
           Continuous and Binary Data

    • Free pre-print version: Loading...

      Authors: Tim Cosemans, Yves Rosseel, Sarah Gelper
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Exploratory graph analysis (EGA) is a commonly applied technique intended to help social scientists discover latent variables. Yet, the results can be influenced by the methodological decisions the researcher makes along the way. In this article, we focus on the choice regarding the number of factors to retain: We compare the performance of the recently developed EGA with various traditional factor retention criteria. We use both continuous and binary data, as evidence regarding the accuracy of such criteria in the latter case is scarce. Simulation results, based on scenarios resulting from varying sample size, communalities from major factors, interfactor correlations, skewness, and correlation measure, show that EGA outperforms the traditional factor retention criteria considered in most cases in terms of bias and accuracy. In addition, we show that factor retention decisions for binary data are preferably made using Pearson, instead of tetrachoric, correlations, which is contradictory to popular belief.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-12-28T12:02:06Z
      DOI: 10.1177/00131644211059089
       
  • Symptom Presence and Symptom Severity as Unique Indicators of
           Psychopathology: An Application of Multidimensional Zero-Inflated and
           Hurdle Graded Response Models

    • Free pre-print version: Loading...

      Authors: Brooke E. Magnus, Yang Liu
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Questionnaires inquiring about psychopathology symptoms often produce data with excess zeros or the equivalent (e.g., none, never, and not at all). This type of zero inflation is especially common in nonclinical samples in which many people do not exhibit psychopathology, and if unaccounted for, can result in biased parameter estimates when fitting latent variable models. In the present research, we adopt a maximum likelihood approach in fitting multidimensional zero-inflated and hurdle graded response models to data from a psychological distress measure. These models include two latent variables: susceptibility, which relates to the probability of endorsing the symptom at all, and severity, which relates to the frequency of the symptom, given its presence. After estimating model parameters, we compute susceptibility and severity scale scores and include them as explanatory variables in modeling health-related criterion measures (e.g., suicide attempts, diagnosis of major depressive disorder). Results indicate that susceptibility and severity uniquely and differentially predict other health outcomes, which suggests that symptom presence and symptom severity are unique indicators of psychopathology and both may be clinically useful. Psychometric and clinical implications are discussed, including scale score reliability.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-12-27T06:46:20Z
      DOI: 10.1177/00131644211061820
       
  • Extended Multivariate Generalizability Theory With Complex Design
           Structures

    • Free pre-print version: Loading...

      Authors: Robert L. Brennan, Stella Y. Kim, Won-Chan Lee
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and free-response items, with the latter involving variability attributable to both items and raters. In this case, two distinct designs are needed to fully characterize the design and capture potential sources of error associated with each item format. Another example involves tests containing both testlets and one or more stand-alone sets of items. Testlet effects need to be taken into account for the testlet-based items, but not the stand-alone sets of items. This article presents an extension of MGT that faithfully models such complex test designs, along with two real-data examples. Among other things, these examples illustrate that estimates of error variance, error–tolerance ratios, and reliability-like coefficients can be biased if there is a mismatch between the user-specified universe of generalization and the complex nature of the test.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-11-15T05:47:20Z
      DOI: 10.1177/00131644211049746
       
  • Non-iterative Conditional Pairwise Estimation for the Rating Scale Model

    • Free pre-print version: Loading...

      Authors: Mark Elliott, Paula Buttery
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      We investigate two non-iterative estimation procedures for Rasch models, the pair-wise estimation procedure (PAIR) and the Eigenvector method (EVM), and identify theoretical issues with EVM for rating scale model (RSM) threshold estimation. We develop a new procedure to resolve these issues—the conditional pairwise adjacent thresholds procedure (CPAT)—and test the methods using a large number of simulated datasets to compare the estimates against known generating parameters. We find support for our hypotheses, in particular that EVM threshold estimates suffer from theoretical issues which lead to biased estimates and that CPAT represents a means of resolving these issues. These findings are both statistically significant (p 
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-24T08:38:55Z
      DOI: 10.1177/00131644211046253
       
  • Application of Change Point Analysis of Response Time Data to Detect Test
           Speededness

    • Free pre-print version: Loading...

      Authors: Ying Cheng, Can Shao
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Computer-based and web-based testing have become increasingly popular in recent years. Their popularity has dramatically expanded the availability of response time data. Compared to the conventional item response data that are often dichotomous or polytomous, response time has the advantage of being continuous and can be collected in an unobstrusive manner. It therefore has great potential to improve many measurement activities. In this paper, we propose a change point analysis (CPA) procedure to detect test speededness using response time data. Specifically, two test statistics based on CPA, the likelihood ratio test and Wald test, are proposed to detect test speededness. A simulation study has been conducted to evaluate the performance of the proposed CPA procedure, as well as the use of asymptotic and empirical critical values. Results indicate that the proposed procedure leads to high power in detecting test speededness, while keeping the false positive rate under control, even when simplistic and liberal critical values are used. Accuracy of the estimation of the actual change point, however, is highly dependent on the true change point. A real data example is also provided to illustrate the utility of the proposed procedure and its contrast to the response-only procedure. Implications of the findings are discussed at the end.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-21T06:18:54Z
      DOI: 10.1177/00131644211046392
       
  • The Effect of Latent and Error Non-Normality on Measures of Fit in
           Structural Equation Modeling

    • Free pre-print version: Loading...

      Authors: Lisa J. Jobst, Max Auerswald, Morten Moshagen
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Prior studies investigating the effects of non-normality in structural equation modeling typically induced non-normality in the indicator variables. This procedure neglects the factor analytic structure of the data, which is defined as the sum of latent variables and errors, so it is unclear whether previous results hold if the source of non-normality is considered. We conducted a Monte Carlo simulation manipulating the underlying multivariate distribution to assess the effect of the source of non-normality (latent, error, and marginal conditions with either multivariate normal or non-normal marginal distributions) on different measures of fit (empirical rejection rates for the likelihood-ratio model test statistic, the root mean square error of approximation, the standardized root mean square residual, and the comparative fit index). We considered different estimation methods (maximum likelihood, generalized least squares, and (un)modified asymptotically distribution-free), sample sizes, and the extent of non-normality in correctly specified and misspecified models to investigate their performance. The results show that all measures of fit were affected by the source of non-normality but with varying patterns for the analyzed estimation methods.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-21T06:16:59Z
      DOI: 10.1177/00131644211046201
       
  • Identifying Problematic Item Characteristics With Small Samples Using
           Mokken Scale Analysis

    • Free pre-print version: Loading...

      Authors: Stefanie A. Wind
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Researchers frequently use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, when they have relatively small samples of examinees. Researchers have provided some guidance regarding the minimum sample size for applications of MSA under various conditions. However, these studies have not focused on item-level measurement problems, such as violations of monotonicity or invariant item ordering (IIO). Moreover, these studies have focused on problems that occur for a complete sample of examinees. The current study uses a simulation study to consider the sensitivity of MSA item analysis procedures to problematic item characteristics that occur within limited ranges of the latent variable. Results generally support the use of MSA with small samples (N around 100 examinees) as long as multiple indicators of item quality are considered.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-13T07:39:27Z
      DOI: 10.1177/00131644211045347
       
  • A Multilevel Mixture IRT Framework for Modeling Response Times as
           Predictors or Indicators of Response Engagement in IRT Models

    • Free pre-print version: Loading...

      Authors: Gabriel Nagy, Esther Ulitzsch
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-13T07:39:09Z
      DOI: 10.1177/00131644211045351
       
  • Detecting Differential Rater Functioning in Severity and Centrality: The
           Dual DRF Facets Model

    • Free pre-print version: Loading...

      Authors: Kuan-Yu Jin, Thomas Eckes
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Performance assessments heavily rely on human ratings. These ratings are typically subject to various forms of error and bias, threatening the assessment outcomes’ validity and fairness. Differential rater functioning (DRF) is a special kind of threat to fairness manifesting itself in unwanted interactions between raters and performance- or construct-irrelevant factors (e.g., examinee gender, rater experience, or time of rating). Most DRF studies have focused on whether raters show differential severity toward known groups of examinees. This study expands the DRF framework and investigates the more complex case of dual DRF effects, where DRF is simultaneously present in rater severity and centrality. Adopting a facets modeling approach, we propose the dual DRF model (DDRFM) for detecting and measuring these effects. In two simulation studies, we found that dual DRF effects (a) negatively affected measurement quality and (b) can reliably be detected and compensated under the DDRFM. Using sample data from a large-scale writing assessment (N = 1,323), we demonstrate the practical measurement consequences of the dual DRF effects. Findings have implications for researchers and practitioners assessing the psychometric quality of ratings.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-09-02T11:33:38Z
      DOI: 10.1177/00131644211043207
       
  • Robustness of Adaptive Measurement of Change to Item Parameter Estimation
           Error

    • Free pre-print version: Loading...

      Authors: Allison W. Cooperman, David J. Weiss, Chun Wang
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Adaptive measurement of change (AMC) is a psychometric method for measuring intra-individual change on one or more latent traits across testing occasions. Three hypothesis tests—a Z test, likelihood ratio test, and score ratio index—have demonstrated desirable statistical properties in this context, including low false positive rates and high true positive rates. However, the extant AMC research has assumed that the item parameter values in the simulated item banks were devoid of estimation error. This assumption is unrealistic for applied testing settings, where item parameters are estimated from a calibration sample before test administration. Using Monte Carlo simulation, this study evaluated the robustness of the common AMC hypothesis tests to the presence of item parameter estimation error when measuring omnibus change across four testing occasions. Results indicated that item parameter estimation error had at most a small effect on false positive rates and latent trait change recovery, and these effects were largely explained by the computerized adaptive testing item bank information functions. Differences in AMC performance as a function of item parameter estimation error and choice of hypothesis test were generally limited to simulees with particularly low or high latent trait values, where the item bank provided relatively lower information. These simulations highlight how AMC can accurately measure intra-individual change in the presence of item parameter estimation error when paired with an informative item bank. Limitations and future directions for AMC research are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-08-16T09:53:39Z
      DOI: 10.1177/00131644211033902
       
  • A Monte Carlo Study of Confidence Interval Methods for Generalizability
           Coefficient

    • Free pre-print version: Loading...

      Authors: Zhehan Jiang, Mark Raymond, Christine DiStefano, Dexin Shi, Ren Liu, Junhua Sun
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Computing confidence intervals around generalizability coefficients has long been a challenging task in generalizability theory. This is a serious practical problem because generalizability coefficients are often computed from designs where some facets have small sample sizes, and researchers have little guide regarding the trustworthiness of the coefficients. As generalizability theory can be framed to a linear mixed-effect model (LMM), bootstrap and simulation techniques from LMM paradigm can be used to construct the confidence intervals. The purpose of this research is to examine four different LMM-based methods for computing the confidence intervals that have been proposed and to determine their accuracy under six simulated conditions based on the type of test scores (normal, dichotomous, and polytomous data) and data measurement design (p×i×r and p× [i:r]). A bootstrap technique called “parametric methods with spherical random effects” consistently produced more accurate confidence intervals than the three other LMM-based methods. Furthermore, the selected technique was compared with model-based approach to investigate the performance at the levels of variance components via the second simulation study, where the numbers of examines, raters, and items were varied. We conclude with the recommendation generalizability coefficients, the confidence interval should accompany the point estimate.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-08-07T07:14:26Z
      DOI: 10.1177/00131644211033899
       
  • Polytomous Testlet Response Models for Technology-Enhanced Innovative
           Items: Implications on Model Fit and Trait Inference

    • Free pre-print version: Loading...

      Authors: Hyeon-Ah Kang, Suhwa Han, Doyoung Kim, Shu-Chuan Kao
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-08-03T04:53:13Z
      DOI: 10.1177/00131644211032261
       
  • DIF Detection With Zero-Inflation Under the Factor Mixture Modeling
           Framework

    • Free pre-print version: Loading...

      Authors: Sooyong Lee, Suhwa Han, Seung W. Choi
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Response data containing an excessive number of zeros are referred to as zero-inflated data. When differential item functioning (DIF) detection is of interest, zero-inflation can attenuate DIF effects in the total sample and lead to underdetection of DIF items. The current study presents a DIF detection procedure for response data with excess zeros due to the existence of unobserved heterogeneous subgroups. The suggested procedure utilizes the factor mixture modeling (FMM) with MIMIC (multiple-indicator multiple-cause) to address the compromised DIF detection power via the estimation of latent classes. A Monte Carlo simulation was conducted to evaluate the suggested procedure in comparison to the well-known likelihood ratio (LR) DIF test. Our simulation study results indicated the superiority of FMM over the LR DIF test in terms of detection power and illustrated the importance of accounting for latent heterogeneity in zero-inflated data. The empirical data analysis results further supported the use of FMM by flagging additional DIF items over and above the LR test.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-07-27T05:10:34Z
      DOI: 10.1177/00131644211028995
       
  • The Response Vector for Mastery Method of Standard Setting

    • Free pre-print version: Loading...

      Authors: Dimiter M. Dimitrov
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Proposed is a new method of standard setting referred to as response vector for mastery (RVM) method. Under the RVM method, the task of panelists that participate in the standard setting process does not involve conceptualization of a borderline examinee and probability judgments as it is the case with the Angoff and bookmark methods. Also, the RVM-based computation of a cut-score is not based on a single item (e.g., marked in an ordered item booklet) but, instead, on a response vector (1/0 scores) on items and their parameters calibrated in item response theory or under the recently developed D-scoring method. Illustrations with hypothetical and real-data scenarios of standard setting are provided and methodological aspects of the RVM method are discussed.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-07-21T07:06:22Z
      DOI: 10.1177/00131644211032388
       
  • Hybrid Threshold-Based Sequential Procedures for Detecting Compromised
           Items in a Computerized Adaptive Testing Licensure Exam

    • Free pre-print version: Loading...

      Authors: Chansoon Lee, Hong Qian
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Using classical test theory and item response theory, this study applied sequential procedures to a real operational item pool in a variable-length computerized adaptive testing (CAT) to detect items whose security may be compromised. Moreover, this study proposed a hybrid threshold approach to improve the detection power of the sequential procedure while controlling the Type I error rate. The hybrid threshold approach uses a local threshold for each item in an early stage of the CAT administration, and then it uses the global threshold in the decision-making stage. Applying various simulation factors, a series of simulation studies examined which factors contribute significantly to the power rate and lag time of the procedure. In addition to the simulation study, a case study investigated whether the procedures are applicable to the real item pool administered in CAT and can identify potentially compromised items in the pool. This research found that the increment of probability of a correct answer (p-increment) was the simulation factor most important to the sequential procedures’ ability to detect compromised items. This study also found that the local threshold approach improved power rates and shortened lag times when the p-increment was small. The findings of this study could help practitioners implement the sequential procedures using the hybrid threshold approach in real-time CAT administration.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-19T10:40:46Z
      DOI: 10.1177/00131644211023868
       
  • Design Effect in Multilevel Settings: A Commentary on a Latent Variable
           Modeling Procedure for Its Evaluation

    • Free pre-print version: Loading...

      Authors: Tenko Raykov, Christine DiStefano
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      A latent variable modeling-based procedure is discussed that permits to readily point and interval estimate the design effect index in multilevel settings using widely circulated software. The method provides useful information about the relationship of important parameter standard errors when accounting for clustering effects relative to conducting single-level analyses. The approach can also be employed as an addendum to point and interval estimation of the intraclass correlation coefficient in empirical research. The discussed procedure makes it easily possible to evaluate the design effect in two-level studies by utilizing the popular latent variable modeling methodology and is illustrated with an example.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-04T10:59:54Z
      DOI: 10.1177/00131644211019447
       
  • The Sampling Ratio in Multilevel Structural Equation Models:
           Considerations to Inform Study Design

    • Free pre-print version: Loading...

      Authors: Joseph M. Kush, Timothy R. Konold, Catherine P. Bradshaw
      First page: 409
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Multilevel structural equation modeling (MSEM) allows researchers to model latent factor structures at multiple levels simultaneously by decomposing within- and between-group variation. Yet the extent to which the sampling ratio (i.e., proportion of cases sampled from each group) influences the results of MSEM models remains unknown. This article explores how variation in the sampling ratio in MSEM affects the measurement of Level 2 (L2) latent constructs. Specifically, we investigated whether the sampling ratio is related to bias and variability in aggregated L2 construct measurement and estimation in the context of doubly latent MSEM models utilizing a two-step Monte Carlo simulation study. Findings suggest that while lower sampling ratios were related to increased bias, standard errors, and root mean square error, the overall size of these errors was negligible, making the doubly latent model an appealing choice for researchers. An applied example using empirical survey data is further provided to illustrate the application and interpretation of the model. We conclude by considering the implications of various sampling ratios on the design of MSEM studies, with a particular focus on educational research.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-02T09:34:28Z
      DOI: 10.1177/00131644211020112
       
  • Factor Retention in Exploratory Factor Analysis With Missing Data

    • Free pre-print version: Loading...

      Authors: David Goretzko
      First page: 444
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Determining the number of factors in exploratory factor analysis is arguably the most crucial decision a researcher faces when conducting the analysis. While several simulation studies exist that compare various so-called factor retention criteria under different data conditions, little is known about the impact of missing data on this process. Hence, in this study, we evaluated the performance of different factor retention criteria—the Factor Forest, parallel analysis based on a principal component analysis as well as parallel analysis based on the common factor model and the comparison data approach—in combination with different missing data methods, namely an expectation-maximization algorithm called Amelia, predictive mean matching, and random forest imputation within the multiple imputations by chained equations (MICE) framework as well as pairwise deletion with regard to their accuracy in determining the number of factors when data are missing. Data were simulated for different sample sizes, numbers of factors, numbers of manifest variables (indicators), between-factor correlations, missing data mechanisms and proportions of missing values. In the majority of conditions and for all factor retention criteria except the comparison data approach, the missing data mechanism had little impact on the accuracy and pairwise deletion performed comparably well as the more sophisticated imputation methods. In some conditions, especially small-sample cases and when comparison data were used to determine the number of factors, random forest imputation was preferable to other missing data methods, though. Accordingly, depending on data characteristics and the selected factor retention criterion, choosing an appropriate missing data method is crucial to obtain a valid estimate of the number of factors to extract.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-11T07:17:21Z
      DOI: 10.1177/00131644211022031
       
  • Matched and Fully Private' A New Self-Generated Identification Code
           for School-Based Cohort Studies to Increase Perceived Anonymity

    • Free pre-print version: Loading...

      Authors: Maria Calatrava, Jokin de Irala, Alfonso Osorio, Edgar Benítez, Cristina Lopez-del Burgo
      First page: 465
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Anonymous questionnaires are frequently used in research with adolescents in order to obtain sincere answers about sensitive topics. Most longitudinal studies include self-generated identification codes (SGICs) to match information. Typical elements include a combination of letters and digits from personal data. However, these data may make the participant feel that their answers are not truly anonymous, and some studies using these types of SGICs have been perceived as not entirely anonymous by some participants. Furthermore, data protection laws could place limits on research carried out with these codes. The objective of our article is to test an SGIC with a higher degree of anonymity. We conducted two studies. In Study 1, we tested the perceived anonymity of this new SGIC code. Adolescents aged 12 to 18 years (N = 601) completed an anonymous questionnaire about lifestyles and risk behaviors, which also included the SGIC. Adolescents with and without risk behaviors were compared regarding whether or not they answered to the SGIC questions. We did not find any differences to suggest that participants felt identifiable. In Study 2, we assessed the efficiency of the new SGIC. At baseline, 123 students from two high schools (eighth grade) filled in questionnaires consisting of the new SGIC and their full names. Two years later, these same students (then in the 10th grade) were invited to fill in the same information again (116 students responded to this second call). A total of 97 students were present in both waves. The SGIC showed a moderate performance, with good enough indices of recall and precision. Evidence suggests that the new SGIC is a suitable tool for the anonymous matching of adolescents in follow-ups of school cohorts.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-08-13T04:58:17Z
      DOI: 10.1177/00131644211035436
       
  • Assessing Measurement Invariance Across Multiple Groups: When Is Fit Good
           Enough'

    • Free pre-print version: Loading...

      Authors: Wilhelmina van Dijk, Christopher Schatschneider, Stephanie Al Otaiba, Sara A. Hart
      First page: 482
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Complex research questions often need large samples to obtain accurate estimates of parameters and adequate power. Combining extant data sets into a large, pooled data set is one way this can be accomplished without expending resources. Measurement invariance (MI) modeling is an established approach to ensure participant scores are on the same scale. There are two major problems when combining independent data sets through MI. First, sample sizes will often be large leading to small differences becoming noninvariant. Second, not all data sets may include the same combination of measures. In this article, we present a method that can deal with both these problems and is user friendly. It is a combination of generating random normal deviates for variables missing completely in combination with assessing model fit using the root mean square error of approximation good enough principle, based on the hypothesis that the difference between groups is not zero but small. We demonstrate the method by examining MI across eight independent data sets and compare the MI decisions of the traditional and good enough approach. Our results show the approach has potential in combining educational data.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-16T07:21:24Z
      DOI: 10.1177/00131644211023567
       
  • Poisson Diagnostic Classification Models: A Framework and an Exploratory
           Example

    • Free pre-print version: Loading...

      Authors: Ren Liu, Haiyan Liu, Dexin Shi, Zhehan Jiang
      First page: 506
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Assessments with a large amount of small, similar, or often repetitive tasks are being used in educational, neurocognitive, and psychological contexts. For example, respondents are asked to recognize numbers or letters from a large pool of those and the number of correct answers is a count variable. In 1960, George Rasch developed the Rasch Poisson counts model (RPCM) to handle that type of assessment. This article extends the RPCM into the world of diagnostic classification models (DCMs) where a Poisson distribution is applied to traditional DCMs. A framework of Poisson DCMs is proposed and demonstrated through an operational dataset. This study aims to be exploratory with recommendations for future research given in the end.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-08T06:51:31Z
      DOI: 10.1177/00131644211017961
       
  • The Importance of Thinking Multivariately When Setting Subscale Cutoff
           Scores

    • Free pre-print version: Loading...

      Authors: Edward Kroc, Oscar L. Olvera Astivia
      First page: 517
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Setting cutoff scores is one of the most common practices when using scales to aid in classification purposes. This process is usually done univariately where each optimal cutoff value is decided sequentially, subscale by subscale. While it is widely known that this process necessarily reduces the probability of “passing” such a test, what is not properly recognized is that such a test loses power to meaningfully discriminate between target groups with each new subscale that is introduced. We quantify and describe this property via an analytical exposition highlighting the counterintuitive geometry implied by marginal threshold-setting in multiple dimensions. Recommendations are presented that encourage applied researchers to think jointly, rather than marginally, when setting cutoff scores to ensure an informative test.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-07-14T10:34:16Z
      DOI: 10.1177/00131644211023569
       
  • Semisupervised Learning Method to Adjust Biased Item Difficulty Estimates
           Caused by Nonignorable Missingness in a Virtual Learning Environment

    • Free pre-print version: Loading...

      Authors: Kang Xue, Anne Corinne Huggins-Manley, Walter Leite
      First page: 539
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of nonignorable missing data in the VLE log file data, and this is expected to negatively affect IRT item parameter estimation accuracy, which then negatively affects any future ability estimates utilized in the VLE. In the psychometric literature, methods for handling missing data have been studied mostly around conditions in which the data and the amount of missing data are not as large as those that come from VLEs. In this article, we introduce a semisupervised learning method to deal with a large proportion of missingness contained in VLE data from which one needs to obtain unbiased item parameter estimates. First, we explored the factors relating to the missing data. Then we implemented a semisupervised learning method under the two-parameter logistic IRT model to estimate the latent abilities of students. Last, we applied two adjustment methods designed to reduce bias in item parameter estimates. The proposed framework showed its potential for obtaining unbiased item parameter estimates that can then be fixed in the VLE in order to obtain ongoing ability estimates for operational purposes.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-04T11:12:10Z
      DOI: 10.1177/00131644211020494
       
  • Evaluation of Second- and Third-Level Variance Proportions in Multilevel
           Designs With Completely Observed Populations: A Note on a Latent Variable
           Modeling Procedure

    • Free pre-print version: Loading...

      Authors: Tenko Raykov, Natalja Menold, Jane Leer
      First page: 568
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Two- and three-level designs in educational and psychological research can involve entire populations of Level-3 and possibly Level-2 units, such as schools and educational districts nested within a given state, or neighborhoods and counties in a state. Such a design is of increasing relevance in empirical research owing to the growing popularity of large-scale studies in these and cognate disciplines. The present note discusses a readily applicable procedure for point-and-interval estimation of the proportions of second- and third-level variances in such multilevel settings, which may also be employed in model choice considerations regarding ensuing analyses for response variables of interest. The method is developed within the framework of the latent variable modeling methodology, is readily utilized with widely used software, and is illustrated with an example.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-04-21T10:39:36Z
      DOI: 10.1177/00131644211008643
       
  • Estimating Probabilities of Passing for Examinees With Incomplete Data in
           Mastery Tests

    • Free pre-print version: Loading...

      Authors: Sandip Sinharay
      First page: 580
      Abstract: Educational and Psychological Measurement, Ahead of Print.
      Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests. However, there is a lack of research on this estimation problem. The goal of this article is to suggest two new approaches—one each based on classical test theory and item response theory—for estimating the probabilities of passing of the examinees with incomplete data on mastery tests. The two approaches are demonstrated to have high accuracy and negligible misclassification rates.
      Citation: Educational and Psychological Measurement
      PubDate: 2021-06-22T06:43:16Z
      DOI: 10.1177/00131644211023797
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 18.232.177.219
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-