Authors:Maria Bolsinova; Jesper Tijmstra Abstract: By considering information about response time (RT) in addition to response accuracy (RA), joint models for RA and RT such as the hierarchical model (van der Linden, 2007) can improve the precision with which ability is estimated over models that only consider RA. The hierarchical model, however, assumes that only the person's speed is informative of ability. This assumption of conditional independence between RT and ability given speed may be violated in practice, and ignores collateral information about ability that may be present in the residual RTs. We propose a posterior predictive check for evaluating the assumption of conditional independence between RT and ability given speed. Furthermore, we propose an extension of the hierarchical model that contains cross-loadings between ability and RT, which enables one to take additional collateral information about ability into account beyond what is possible in the standard hierarchical model. A Bayesian estimation procedure is proposed for the model. Using simulation studies, the performance of the model is evaluated in terms of parameter recovery, and the possible gain in precision over the standard hierarchical model and an RA-only model is considered. The model is applied to data from a high-stakes educational test. PubDate: 2017-06-21T01:11:04.146383-05: DOI: 10.1111/bmsp.12104

Authors:Tamar Kennet-Cohen; Dvir Kleper, Elliot Turvall Abstract: A frequent topic of psychological research is the estimation of the correlation between two variables from a sample that underwent a selection process based on a third variable. Due to indirect range restriction, the sample correlation is a biased estimator of the population correlation, and a correction formula is used. In the past, bootstrap standard error and confidence intervals for the corrected correlations were examined with normal data. The present study proposes a large-sample estimate (an analytic method) for the standard error, and a corresponding confidence interval for the corrected correlation. Monte Carlo simulation studies involving both normal and non-normal data were conducted to examine the empirical performance of the bootstrap and analytic methods. Results indicated that with both normal and non-normal data, the bootstrap standard error and confidence interval were generally accurate across simulation conditions (restricted sample size, selection ratio, and population correlations) and outperformed estimates of the analytic method. However, with certain combinations of distribution type and model conditions, the analytic method has an advantage, offering reasonable estimates of the standard error and confidence interval without resorting to the bootstrap procedure's computer-intensive approach. We provide SAS code for the simulation studies. PubDate: 2017-06-20T02:27:38.723418-05: DOI: 10.1111/bmsp.12105

Authors:Yoosun Jamie Kim; Robert A. Cribbie Abstract: Valid use of the traditional independent samples ANOVA procedure requires that the population variances are equal. Previous research has investigated whether variance homogeneity tests, such as Levene's test, are satisfactory as gatekeepers for identifying when to use or not to use the ANOVA procedure. This research focuses on a novel homogeneity of variance test that incorporates an equivalence testing approach. Instead of testing the null hypothesis that the variances are equal against an alternative hypothesis that the variances are not equal, the equivalence-based test evaluates the null hypothesis that the difference in the variances falls outside or on the border of a predetermined interval against an alternative hypothesis that the difference in the variances falls within the predetermined interval. Thus, with the equivalence-based procedure, the alternative hypothesis is aligned with the research hypothesis (variance equality). A simulation study demonstrated that the equivalence-based test of population variance homogeneity is a better gatekeeper for the ANOVA than traditional homogeneity of variance tests. PubDate: 2017-06-01T01:50:35.69062-05:0 DOI: 10.1111/bmsp.12103

Authors:Ke-Hai Yuan; Ge Jiang, Ying Cheng Abstract: Data in psychology are often collected using Likert-type scales, and it has been shown that factor analysis of Likert-type data is better performed on the polychoric correlation matrix than on the product-moment covariance matrix, especially when the distributions of the observed variables are skewed. In theory, factor analysis of the polychoric correlation matrix is best conducted using generalized least squares with an asymptotically correct weight matrix (AGLS). However, simulation studies showed that both least squares (LS) and diagonally weighted least squares (DWLS) perform better than AGLS, and thus LS or DWLS is routinely used in practice. In either LS or DWLS, the associations among the polychoric correlation coefficients are completely ignored. To mend such a gap between statistical theory and empirical work, this paper proposes new methods, called ridge GLS, for factor analysis of ordinal data. Monte Carlo results show that, for a wide range of sample sizes, ridge GLS methods yield uniformly more accurate parameter estimates than existing methods (LS, DWLS, AGLS). A real-data example indicates that estimates by ridge GLS are 9–20% more efficient than those by existing methods. Rescaled and adjusted test statistics as well as sandwich-type standard errors following the ridge GLS methods also perform reasonably well. PubDate: 2017-05-26T01:55:33.202119-05: DOI: 10.1111/bmsp.12098

Authors:Chen-Wei Liu; Wen-Chung Wang Abstract: Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. PubDate: 2017-04-08T07:17:00.085834-05: DOI: 10.1111/bmsp.12097

Authors:Michael Smithson; Yiyun Shou Abstract: This paper introduces a two-parameter family of distributions for modelling random variables on the (0,1) interval by applying the cumulative distribution function of one ‘parent’ distribution to the quantile function of another. Family members have explicit probability density functions, cumulative distribution functions and quantiles in a location parameter and a dispersion parameter. They capture a wide variety of shapes that the beta and Kumaraswamy distributions cannot. They are amenable to likelihood inference, and enable a wide variety of quantile regression models, with predictors for both the location and dispersion parameters. We demonstrate their applicability to psychological research problems and their utility in modelling real data. PubDate: 2017-03-17T09:30:56.538616-05: DOI: 10.1111/bmsp.12091

Authors:Maria Umlauft; Frank Konietschke, Markus Pauly Abstract: Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal–Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. PubDate: 2017-03-15T02:05:36.852211-05: DOI: 10.1111/bmsp.12089

Authors:Joe W. Tidwell; Michael R. Dougherty, Jeffrey S. Chrabaszcz, Rick P. Thomas Abstract: Despite the fact that data and theories in the social, behavioural, and health sciences are often represented on an ordinal scale, there has been relatively little emphasis on modelling ordinal properties. The most common analytic framework used in psychological science is the general linear model, whose variants include ANOVA, MANOVA, and ordinary linear regression. While these methods are designed to provide the best fit to the metric properties of the data, they are not designed to maximally model ordinal properties. In this paper, we develop an order-constrained linear least-squares (OCLO) optimization algorithm that maximizes the linear least-squares fit to the data conditional on maximizing the ordinal fit based on Kendall's τ. The algorithm builds on the maximum rank correlation estimator (Han, 1987, Journal of Econometrics, 35, 303) and the general monotone model (Dougherty & Thomas, 2012, Psychological Review, 119, 321). Analyses of simulated data indicate that when modelling data that adhere to the assumptions of ordinary least squares, OCLO shows minimal bias, little increase in variance, and almost no loss in out-of-sample predictive accuracy. In contrast, under conditions in which data include a small number of extreme scores (fat-tailed distributions), OCLO shows less bias and variance, and substantially better out-of-sample predictive accuracy, even when the outliers are removed. We show that the advantages of OCLO over ordinary least squares in predicting new observations hold across a variety of scenarios in which researchers must decide to retain or eliminate extreme scores when fitting data. PubDate: 2017-02-27T02:20:29.642191-05: DOI: 10.1111/bmsp.12090

Authors:Siwei Liu Abstract: This paper compares the multilevel modelling (MLM) approach and the person-specific (PS) modelling approach in examining autoregressive (AR) relations with intensive longitudinal data. Two simulation studies are conducted to examine the influences of sample heterogeneity, time series length, sample size, and distribution of individual level AR coefficients on the accuracy of AR estimates, both at the population level and at the individual level. It is found that MLM generally outperforms the PS approach under two conditions: when the sample has a homogeneous AR pattern, namely, when all individuals in the sample are characterized by AR processes with the same order; and when the sample has heterogeneous AR patterns, but a multilevel model with a sufficiently high order (i.e., an order equal to or higher than the maximum order of individual AR patterns in the sample) is fitted and successfully converges. If a lower-order multilevel model is chosen for heterogeneous samples, the higher-order lagged effects are misrepresented, resulting in bias at the population level and larger prediction errors at the individual level. In these cases, the PS approach is preferable, given sufficient measurement occasions (T ≥ 50). In addition, sample size and distribution of individual level AR coefficients do not have a large impact on the results. Implications of these findings on model selection and research design are discussed. PubDate: 2017-02-22T08:45:50.43108-05:0 DOI: 10.1111/bmsp.12096

Authors:Pasquale Anselmi; Luca Stefanutti, Debora Chiusole, Egidio Robusto Abstract: The gain–loss model (GaLoM) is a formal model for assessing knowledge and learning. In its original formulation, the GaLoM assumes independence among the skills. Such an assumption is not reasonable in several domains, in which some preliminary knowledge is the foundation for other knowledge. This paper presents an extension of the GaLoM to the case in which the skills are not independent, and the dependence relation among them is described by a well-graded competence space. The probability of mastering skill s at the pretest is conditional on the presence of all skills on which s depends. The probabilities of gaining or losing skill s when moving from pretest to posttest are conditional on the mastery of s at the pretest, and on the presence at the posttest of all skills on which s depends. Two formulations of the model are presented, in which the learning path is allowed to change from pretest to posttest or not. A simulation study shows that models based on the true competence space obtain a better fit than models based on false competence spaces, and are also characterized by a higher assessment accuracy. An empirical application shows that models based on pedagogically sound assumptions about the dependencies among the skills obtain a better fit than models assuming independence among the skills. PubDate: 2017-02-17T04:05:40.202974-05: DOI: 10.1111/bmsp.12095

Authors:María Rubio-Aparicio; Julio Sánchez-Meca, José Antonio López-López, Juan Botella, Fulgencio Marín-Martínez Abstract: Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta-analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between-studies variance on the statistical performance of the QB(P) and QB(S) tests for subgroup analyses assuming a mixed-effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between-studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates. PubDate: 2017-02-06T03:35:27.313476-05: DOI: 10.1111/bmsp.12092

Authors:Paul De Boeck; Haiqin Chen, Mark Davison Abstract: Based on data from a cognitive test presented in a condition with time constraints per item and a condition without time constraints, the effect of speed on accuracy is investigated. First, if the effect of imposed speed on accuracy is negative it can be explained by the speed–accuracy trade-off, and if it can be captured through the corresponding latent variables, then measurement invariance applies between a condition with and a condition without time constraints. The results do show a negative effect and a lack of measurement invariance. Second, the conditional accuracy function (CAF) is investigated in both conditions, with and without time constraints. The CAF shows an (item-dependent) negative conditional dependence between response time and response accuracy and thus a positive relationship between speed and accuracy, which implies that faster responses are more accurate. In sum, there seem to be two kinds of speed effects: a speed–accuracy trade-off effect induced by imposed speed and an opposite CAF effect associated with speed within conditions. The second effect is interpreted as stemming from a within-person variation of the cognitive capacity during the test which simultaneously favours or disfavours speed and accuracy. PubDate: 2017-02-03T07:13:33.4059-05:00 DOI: 10.1111/bmsp.12094

Authors:Jochen Ranger; Jörg-Tobias Kuhn, Carsten Szardenings Abstract: Cognitive psychometric models embed cognitive process models into a latent trait framework in order to allow for individual differences. Due to their close relationship to the response process the models allow for profound conclusions about the test takers. However, before such a model can be used its fit has to be checked carefully. In this manuscript we give an overview over existing tests of model fit and show their relation to the generalized moment test of Newey (Econometrica, 53, 1985, 1047) and Tauchen (Journal of Econometrics, 30, 1985, 415). We also present a new test, the Hausman test of misspecification (Hausman, Econometrica, 46, 1978, 1251). The Hausman test consists of a comparison of two estimates of the same item parameters which should be similar if the model holds. The performance of the Hausman test is evaluated in a simulation study. In this study we illustrate its application to two popular models in cognitive psychometrics, the Q-diffusion model and the D-diffusion model (van der Maas, Molenaar, Maris, Kievit, & Boorsboom, Psychological Review, 118, 2011, 339; Molenaar, Tuerlinckx, & van der Maas, Journal of Statistical Software, 66, 2015, 1). We also compare the performance of the test to four alternative tests of model fit, namely the M2 test (Molenaar et al., Journal of Statistical Software, 66, 2015, 1), the moment test (Ranger et al., British Journal of Mathematical and Statistical Psychology, 2016) and the test for binned time (Ranger & Kuhn, Psychological Test and Assessment Modeling, 56, 2014b, 370). The simulation study indicates that the Hausman test is superior to the latter tests. The test closely adheres to the nominal Type I error rate and has higher power in most simulation conditions. PubDate: 2017-02-03T07:07:07.130454-05: DOI: 10.1111/bmsp.12082

Authors:Dylan Molenaar; Maria Bolsinova Abstract: In generalized linear modelling of responses and response times, the observed response time variables are commonly transformed to make their distribution approximately normal. A normal distribution for the transformed response times is desirable as it justifies the linearity and homoscedasticity assumptions in the underlying linear model. Past research has, however, shown that the transformed response times are not always normal. Models have been developed to accommodate this violation. In the present study, we propose a modelling approach for responses and response times to test and model non-normality in the transformed response times. Most importantly, we distinguish between non-normality due to heteroscedastic residual variances, and non-normality due to a skewed speed factor. In a simulation study, we establish parameter recovery and the power to separate both effects. In addition, we apply the model to a real data set. PubDate: 2017-02-03T06:55:41.161916-05: DOI: 10.1111/bmsp.12087

Authors:Oscar L. Olvera Astivia; Bruno D. Zumbo Abstract: The purpose of this paper is to highlight the importance of a population model in guiding the design and interpretation of simulation studies used to investigate the Spearman rank correlation. The Spearman rank correlation has been known for over a hundred years to applied researchers and methodologists alike and is one of the most widely used non-parametric statistics. Still, certain misconceptions can be found, either explicitly or implicitly, in the published literature because a population definition for this statistic is rarely discussed within the social and behavioural sciences. By relying on copula distribution theory, a population model is presented for the Spearman rank correlation, and its properties are explored both theoretically and in a simulation study. Through the use of the Iman–Conover algorithm (which allows the user to specify the rank correlation as a population parameter), simulation studies from previously published articles are explored, and it is found that many of the conclusions purported in them regarding the nature of the Spearman correlation would change if the data-generation mechanism better matched the simulation design. More specifically, issues such as small sample bias and lack of power of the t-test and r-to-z Fisher transformation disappear when the rank correlation is calculated from data sampled where the rank correlation is the population parameter. A proof for the consistency of the sample estimate of the rank correlation is shown as well as the flexibility of the copula model to encompass results previously published in the mathematical literature. PubDate: 2017-01-31T08:38:08.923003-05: DOI: 10.1111/bmsp.12085

Authors:Frank Goldhammer; Merle A. Steinwascher, Ulf Kroehne, Johannes Naumann Pages: 238 - 256 Abstract: Completing test items under multiple speed conditions avoids the performance measure being confounded with individual differences in the speed–accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptually different functions: the speed-accuracy trade-off function (SATF) across conditions relating the condition average response time to the condition average of accuracy, and the conditional accuracy function (CAF) within a condition describing accuracy conditional on response time. Using a generalized linear mixed modelling approach, we propose an item response modelling framework that is suitable for item response and response time data from experimental speed conditions. The proposed SATF and CAF model accommodates response time effects between conditions (i.e., person and item SATF slope) and within conditions (i.e., residual CAF slopes), captures person and item differences in these effects, and is suitable for measures with a strong speed component. Moreover, for a single condition a CAF model is proposed distinguishing person, item and residual CAF. The properties of the models are illustrated with an empirical example. PubDate: 2017-05-05T06:25:12.835182-05: DOI: 10.1111/bmsp.12099

Authors:Ingmar Visser; Rens Poessé Pages: 280 - 296 Abstract: The linear ballistic accumulator (LBA) model (Brown & Heathcote, , Cogn. Psychol., 57, 153) is increasingly popular in modelling response times from experimental data. An R package, glba, has been developed to fit the LBA model using maximum likelihood estimation which is validated by means of a parameter recovery study. At sufficient sample sizes parameter recovery is good, whereas at smaller sample sizes there can be large bias in parameters. In a second simulation study, two methods for computing parameter standard errors are compared. The Hessian-based method is found to be adequate and is (much) faster than the alternative bootstrap method. The use of parameter standard errors in model selection and inference is illustrated in an example using data from an implicit learning experiment (Visser et al., , Mem. Cogn., 35, 1502). It is shown that typical implicit learning effects are captured by different parameters of the LBA model. PubDate: 2017-05-05T06:25:12.436464-05: DOI: 10.1111/bmsp.12100

Authors:Peter W. Rijn; Usama S. Ali Pages: 317 - 345 Abstract: We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. PubDate: 2017-05-05T06:25:11.209892-05: DOI: 10.1111/bmsp.12101