Abstract: Schad, Daniel J. et al. -- Keywords: Null hypothesis significance testing, Bayesian inference, statistical power. -- Abstract : When researchers carry out a null hypothesis significance test, it is tempting to assume that a statistically significant result lowers Prob(H0), the probability of the null hypothesis being true. Technically, such a statement is meaningless for various reasons: e.g., the null hypothesis does not have a probability associated with it. However, it is possible to relax certain assumptions to compute the posterior probability Prob(H0) under repeated sampling. We show in a step-by-step guide that the intuitively appealing belief, that Prob(H0) is low when significant results have been obtained under repeated sampling, is in general incorrect and depends greatly on: (a) the prior probability of the null being true; (b) type-I error rate, (c) type-II error rate, and (d) replication of a result. Through step-by-step simulations using open-source code in the R System of Statistical Computing, we show that uncertainty about the null hypothesis being true often remains high despite a significant result. To help the reader develop intuitions about this common misconception, we provide a Shiny app (https://danielschad.shinyapps.io/probnull/). We expect that this tutorial will help researchers better understand and judge results from null hypothesis significance tests.
Abstract: Aberson, Christopher L. et al. -- Keywords: Put keywords here, in a comma separated list. -- Abstract : Many tools exist for power analyses focused on $R^2$ Model (the variance explained by all the predictors together) but tools for estimating power for coefficients often require complicated inputs that are neither intuitive nor simple to estimate. Further compounding this issue is the recognition that power to detect effects for all predictors in a model tends to be substantially lower than power to detect individual effects. In short, most available power analysis approaches ignore the probability of detecting all effects and focus on probability of detecting individual effects. The consequences of this are designs that are underpowered to detect effects. The present work presents tools for addressing these issues via simulation approaches provided by the pwr2ppl package (Aberson, 2019) and an associated Shiny app.
Abstract: Béland , Sébastien et al. -- Keywords: Structural equation model; Interaction; Practice; Psychology; Education. -- Abstract : Structural equation modeling involving latent interaction has garnered much attention from researchers in many disciplines. Interestingly, Becher & Trowler (2001) described academics as living in a tribe sharing a common set of practices and led by a stable elite. To provide an overview of psychological and educational studies using the latent moderated structural equations approach (LMS), we produced a scoping review from three databases (ERIC, PsychInfo, and Érudit) and selected 78 articles. The goal of this study is to examine the nature and extent of practices regarding the use of the LMS method in order to recommend good practices. Our results show that there are some discrepancies in the way researchers analyze data using LMS.
Abstract: Duplessis-Marcotte, Félix et al. -- Keywords: Régression, Régression multiniveau, R, modélisation. -- Abstract : La crise de reproductibilité en psychologie est en partie causée par l'utilisation d'analyses statistiques inadaptées aux données récoltées. Les données ont souvent des caractéristiques importantes à considérer, comme lorsque celles-ci sont nichées dans différents groupes (p. ex. recruter plusieurs élèves dans différentes classes). Dans ce cas, cela fait en sorte que le postulat de normalité des modèles linéaires généraux n'est pas respecté. Ignorer ce postulat d'indépendance en utilisant un modèle linéaire général peut mener à des résultats erronés, comme des faux positifs, des biais ou une perte de puissance. Les analyses de régressions multiniveaux répondent à ce problème et assurent la validité des résultats obtenus. Cet article se veut un tutoriel couvrant les principes généraux sous-jacents aux régressions multiniveaux pour analyser des données nichées. Des données pseudoaléatoires sont générées avec R et analysées avec des régressions multiniveaux afin de démontrer la valeur ajoutée de considérer la hiérarchisation des données quant à la validité des résultats. De plus, cet article fournit, étape par étape, la syntaxe R pour faciliter l'utilisation des analyses multiniveaux et l'adaptation de celles-ci aux données des lecteurs.
Abstract: Daryanto, Ahmad -- Keywords: difference in two proportions, z-test, Wald interval, Yates's continuity correction, Agresti-Caffo interval. -- Abstract : I introduce D2prop macro in SPSS that can be used to test difference in two proportions of two independent samples. The focus of this paper is to show how to use the macro and interpret its outputs.
Abstract: Caron, Pier-Olivier et al. -- Keywords: data visualization, R package, data extraction. -- Abstract : Data visualization is an essential and powerful tool to generate hypotheses, uncover patterns, and disseminate findings. It is crucial that introductory statistics courses train students to become critical authors and consumers of data visualization. Ludic data sets might help teaching statistics to students by making graphics more enjoyable, using images as instantaneous feedback, encouraging to discover hidden patterns, and reducing their focus on traditional hypothesis testing. Those data sets that have hidden images can be difficult to come by for teachers. These considerations have led to the development of a package that could easily create data sets from images for educational purposes. The purpose of this study is to present image2data, an R package that generates data sets from images. In this study, we show how to install the package, explain the basic arguments, and show three examples on how it can be used for teaching. Future studies could evaluate the effectiveness and motivation generated by using hidden images in data sets. Our hope is that by using hidden image in data sets, students will be inspired to decrypt data sets and discover the unexpected.
Abstract: Laurencelle, Louis -- Keywords: Kendall's tau, variants of tau, honest critical values, monotonicity, normal approximation tau de Kendall. -- Abstract : A measure of the consistency of variation in a bivariate series of ordinal data, Kendall's tau (\tau) has properties and exhibits distributional characteristics that are distinct from Pearson's r, each coefficient deserving its own interpretation. We examine in detail these properties of the tau and two variants, the \tau _b and \tau _c, together with the nuances that distinguish them. We then bring out the peculiarities of their probability distribution. The statistical test of the \tau gives rise to a forum where different approaches are discussed, including the use of "strict" vs. "honest" critical values and the familiar normal approximation. A fairly extensive table of critical values of both kinds is also provided. -- Mesure de la cohérence de variation entre deux séries de données ordinales, le tau (\tau) de Kendall a des propriétés et présente des caractéristiques distributionnelles distinctes de celles du r de Pearson, chaque coefficient ayant son interprétation propre. Nous examinons dans le détail les propriétés du \tau et de deux de ses variantes, le \tau _b et le \tau _c, de même que les nuances qui les distinguent. Nous mettons ensuite au jour les particularités de leur distribution de probabilités. Le test de significativité du \tau donne lieu à une tribune où sont discutées différentes approches, incluant le recours à des valeurs critiques "honnêtes" versus "strictes" et à l'approximation normale classique. Un tableau assez fourni de valeurs critiques des deux sortes est aussi inclus.
Abstract: Fitts, Douglas A. -- Keywords: confidence interval, Cohen's d, Hedges' g, simulation, noncentral t distribution. -- Abstract : A standardized mean difference using a pooled standard deviation with paired samples (d_p; paired-pooled design) can be compared directly to a d_p from an independent samples design, but the unbiased point estimate g_p and confidence interval (CI) for d_p cannot unless the population correlation \rho between the scores is known in the paired-pooled design, which it rarely is. The \rho is required to calculate the degrees of freedom \nu for the design, and \nu is necessary to calculate the g_p and CI. If a variable sample correlation is substituted for \rho the \nu is only approximate and the sampling distribution for d_p is unknown. This article uses simulations to compare the characteristics of the unknown distribution to the noncentral t distribution as an approximation and provides empirically-derived regression equations to compensate for the bias in the approximated CI computed using the noncentral t distribution. The result is an approximate but much more accurate coverage of the CI than previously available. Tables are supplied to assist sample size planning and computer programs are provided for computations. These results are experimental and tentative until the actual distribution can be discovered. The regularity of the deviation in coverage that allows the compensation to work encourages that search.
Abstract: André, Nathalie et al. -- Keywords: Validité, Validation psychométrique, Modèles factoriels, Modèles à facettes de Guttman Validity, Psychometric test validation. -- Abstract : Validité, validation d'un test, propriétés ou qualités psychométriques, voilà trois expressions qui désignent les opérations ou procédures permettant de juger de la pertinence d'un instrument de mesure psychométrique, de son élaboration et son application. L'expression ``validité psychométrique'' a dès les débuts reçu diverses interprétations et elle continue aujourd'hui d'interpeller les théoriciens. La validité d'un test réside-t-elle dans la mesure qu'il produit ou dans l'action qui en découle ou bien réfère-t-elle à l'interprétation qu'on lui attache' La validité repose-t-elle strictement sur le résultat d'une procédure de validation' Un test ``démontré valide'' est-il valide en lui-même ou sa ``validité'' est-elle relative à la démonstration qu'on en a faite ou à son usage prévu' Existe-t-il différentes espèces de validité ou représente-t-elle un concept unifié, molaire' Une définition claire et consensuelle du concept de ``validité psychométrique'' n'a pas encore vu le jour. C'est dans une perspective à la fois pragmatique et conceptuellement rigoureuse que nous préconisons d'abolir l'expression ``validité psychométrique'', que cette expression ne correspond pas \emph {per se} à un concept, mais qu'elle est tout simplement un jugement, tout comme l'est au tribunal le prononcé du juge en considération des preuves qu'on lui soumet, de l'intimé concerné et des circonstances attenantes au procès. La validité est un jugement à l'effet que la mesure, le test, l'échelle psychométrique répond adéquatement et fidèlement à l'usage qu'on lui destine. L'essai explore ensuite le riche arsenal disponible au psychométricien et à l'utilisateur pour leur permettre de ``juger'' à bon escient de la validité situationnelle du test produit. \\ Validity, test validation, psychometric properties or qualities: here are three expressions that designate the operations and procedures that make it possible to judge the relevance of a psychometric measurement instrument, its development and its use. The expression ``psychometric validity'' has historically inherited various interpretations from the outset and it continues to challenge theorists today. Does the validity of a test lie in the measure it produces or in the action that is derived from it, or does it refer to the interpretation intended for it' Is validity based strictly on the result of a validation procedure' Is a "demonstrated valid" test valid in itself or is its "validity" relative to the demonstration or intended use' Are there different species of validity or does it represent a unified, molar concept' A clear and consensual definition of the concept of "psychometric validity" has not yet been found. It is from a pragmatic yet conceptually rigorous perspective that this essay argues for the abolition of the term "psychometric validity", that the term is not a concept \emph {per se}, but is simply a judgment, just as a judge's pronouncement in court is a judgment in light of the evidence before him or her, the respondent, and the circumstances surrounding the trial. Validity is a judgment that the measure, test, or psychometric scale responds adequately and reliably to its intended use. The essay then explores the well-stocked arsenal available to the psychometrician and his client to enable them to ``judge'' the situational validity of the test produced.
Abstract: Effatpanah, Farshad et al. -- Keywords: Reduced redundancy, speeded cloze-elide test, scoring methods, item response theory, Rash partial credit model. -- Abstract : Cloze-elide tests are overall measures of both first (L1) and second language (L2) reading comprehension and communicative skills. Research has shown that a time constraint is an effective method to understand individual differences and increase the reliability and validity of tests. The purpose of this study is to investigate the psychometric quality of a speeded cloze-elide test using a ploytomous Rasch model, called partial credit model (PCM), by inspecting the fit of four different scoring techniques. To this end, responses of 150 English as a foreign language (EFL) students to a speeded cloze-elide test was analyzed. The comparison of different scoring techniques revealed that scoring based on wrong scores can better explain variability in the data. The results of PCM indicated that the assumptions of unidimensionality holds for the speeded cloze-elide test. However, the results of partial credit analysis of data structure revealed that a number of categories do not increase with category values. Finally, suggestions for further research, to better take advantage of the flexibilities of item response theory and Rasch models for explaining count data, will be presented.
Abstract: Sambaraju, Prasanth -- Keywords: Power of chi-square test, cumulative distribution function, noncentral. -- Abstract : Noncentral distributions are obtained by transformation of their respective central distributions, and are identified by a noncentrality parameter. The noncentrality parameter measures the degree to which mean of test statistics departs, when the null hypothesis is false. Central distributions are used to describe test statistics, when the null hypothesis is true. Noncentral distributions are used to calculate statistical power of a test in situations when the null hypothesis is false. The paper presents Visual Basic for Application code in Microsoft Excel to compute the cumulative distribution function for noncentral chi-square distributions. The results obtained were found to be comparable with the reported values.
Abstract: Clement, Leah Mary et al. -- Keywords: moderation, mediation, moderated mediation, tutorial, data analysis. -- Abstract : Interest in moderation and mediation models have gained momentum since the 1980s and have become widespread in numerous fields of research including clinical, social, and health psychology in addition to behavioral, educational, and organizational research. There are resources available to help the user understand an analysis of a moderated mediation using the PROCESS macro and its resultant output, however, many are in video format (e.g., YouTube) or lack detailed instructions based on real world examples. To our knowledge, there are no resources that provide a thorough yet accessible step-by-step explanation of the procedure involved in using PROCESS v4.1 to analyze and interpret a moderated mediation model using real data in SPSS v28. The aim of this guide is to address this knowledge gap. An overview of mediation, moderation, and moderated mediation models is presented followed by instructions for verifying that assumptions are respected. Finally, a procedure to analyze data using PROCESS v4.1 is presented along with an interpretation of the resultant output.