Psychometric Properties of the Preschool Language Scales, Fifth Edition (PLS-5) in Russian-Speaking Children: A Classical and Item Response Theory Study

O.I. Talantseva; I.O. An; M.A. Zhukova; A.N. Trubitsyna; A.V. Teedemaa; E.L. Grigorenko

doi:10.17759/cpse.2022110211

Clinical Psychology and Special Education
2022. Vol. 11, no. 2, 174–195
doi:10.17759/cpse.2022110211
ISSN: 2304-0394 (online)

Psychometric Properties of the Preschool Language Scales, Fifth Edition (PLS-5) in Russian-Speaking Children: A Classical and Item Response Theory Study

344

O.I. Talantseva, I.O. An, M.A. Zhukova, A.N. Trubitsyna, A.V. Teedemaa, E.L. Grigorenko

Abstract

Difficulties in language development are common hallmarks of different neuro-developmental disorders. Early diagnosis is a crucial factor for proper early interventions and better prognosis. Currently, there is a severe shortage of standardized instruments for assessing potential language disorders in Russia. To address this gap, we analyzed the psychometric properties of the Russian version of the Preschool Language Scale, 5th edition (RPLS-5). The sample consisted of 473 children aged 3 to 96 months (Mean=32.64, SD=19.79), including 224 typically developing (TD) children and 240 at-risk (AR) children. To assess the reliability of the Russian version of the PLS-5, we used both Classical Test Theory (CTT) and Item Response Theory (IRT). The results indicated the high reliability of the RPLS-5 based on both types of analyses. According to the results of IRT analysis, the difficulty of items ranged from very easy to very difficult, and with few exceptions, the difficulty parameters consistently increased for each subsequent item, reflecting the hierarchical organization of the test. The discrimination parameters ranged from high to perfect. In general, IRT demonstrated that the RPLS-5 is reliable for low-to-high levels of language abilities.

General Information

Keywords: item-response theory, psychometrics, language, assessment, PLS-5, reliability, validity

Journal rubric: Methods and Techniques

DOI: https://doi.org/10.17759/cpse.2022110211

Funding. The manuscript conceptualization and literature search were supported by the Grant from the President of Russian Federation МК-4217.2021.2 (P.I.: Marina A. Zhukova); data analysis, interpretation and manuscript preparation were funded in part by Sirius University.

Acknowledgements. We are grateful to Mei Tan and Lauren Elderton for their editorial support and to all members and students of the Laboratory of Translational Sciences of Human Development of Saint Petersburg State University, who participated in data collection (specifically Ekaterina Shurdova, Darya Momotenko, Anastasiya Sukmanova, Anastasiya Kudryavtseva, Raisa Romanova, and Diana Garifullina among others).

Received: 03.10.2021

Accepted: 30.04.2022

For citation: Talantseva O.I., An I.O., Zhukova M.A., Trubitsyna A.N., Teedemaa A.V., Grigorenko E.L. Psychometric Properties of the Preschool Language Scales, Fifth Edition (PLS-5) in Russian-Speaking Children: A Classical and Item Response Theory Study [Elektronnyi resurs]. Klinicheskaia i spetsial'naia psikhologiia = Clinical Psychology and Special Education, 2022. Vol. 11, no. 2, pp. 174–195. DOI: 10.17759/cpse.2022110211.

Full text

Introduction

Difficulties and delays in language development can serve as markers for different groups of neurodevelopmental disorders, such as communication disorders, autism spectrum disorder (ASD), specific learning disorders, and others [Alam, 2012; Luyster, 2011]. Importantly, children with language impairment are more vulnerable to social, behavioral, and emotional difficulties and less successful in academic settings compared to their typically developing peers [Christensen, 2017; St Clair, 2011]. Early identification of language difficulties is essential for diagnosis and the provision of intervention services that could lead to better life-long outcomes [Dubois, 2020]. Thus, a comprehensive assessment of preschool children’s speech and language skills has crucial importance for identifying specific difficulties and providing appropriate routes of interventional and educational plans.

To date, the assessment of language abilities by an interdisciplinary and multi-source approach, documenting medical history, clinical examination, screening checklists for parents, and in-depth assessment of children’s receptive and expressive language abilities, along with dynamic procedures such as diagnostic teaching, is the gold standard [Christensen, 2017; Dubois, 2020]. However, there is no single reference instrument for language assessment, and professionals should flexibly choose the most appropriate one depending on the child's biological and mental age, specific diagnostic aims, comorbid disorders, available time and methodological resources, and clinical expertise in administering specific instruments. To cover these different conditions, a variety of methods have been designed for preschoolers in higher income countries, mostly for English-speaking children [Christensen, 2017; Dubois, 2020]. At the same time, there is an urgent need for standardized language assessment tools for use with Russian-speaking preschoolers. Without such tools, the accurate identification of language difficulties is challenging, especially for the receptive domain, as language comprehension difficulties are less evident without specific instruments [Christensen, 2017]. In the absence of such standardized language assessment tools, there may be an overall under-detection of language disorders in Russia.

One approach to address this issue is to develop new methods for use within specific cultural settings. This may be the preferred method in some circumstances, as it takes into account the peculiarities of a specific language and the context in which native children are raised. It could also facilitate the training of specialists and reduce the costs of training and purchasing an instrument. However, developing a new language specific test may require massive and high-cost effort and may be challenging due to a lack of expertise in contemporary psychometrics and the administration of contemporary instruments. Also, it may limit the possibility of comparing the obtained data with samples from other countries [Hambleton, 2013]. To the best of our knowledge, the Assessment of the Development of Russian (ORRIA) [Babyonyshev, 2007] is the only standardized tool designed specifically for Russian-speaking children, developed using state-of-the-art diagnostics of speech and language strengths and problems. The ORRIA is a direct assessment of children aged 3 to 9. It assesses phonological, morphological, and syntactic skills of children, as well as vocabulary in both receptive and expressive modalities. The ORRIA has been validated and could be used with typically developing children and children with developmental language disorders [Kornilov, 2016; Prikhoda, 2016].

An alternative approach is to adapt existing assessment tools. A growing body of literature supports the use of tests developed in high-income countries to assess children in other contexts when carefully translated, adapted, normed, and applied [Hambleton, 2013]. This could be essential for research purposes and is crucial to lay the foundations for the expertise needed for the development of new and more culturally appropriate methods. The only method normed for a Russian-speaking sample is the MacArthur Communicative Development Inventories (CDI) [Eliseeva, 2017] — two checklists for caregivers of children from 8 to 30 months. The Vineland Adaptive Behavior Scales, 2nd edition (VABS-II) [Sparrow, 2005], have also been recently adapted and psychometrically evaluated with a Russian-speaking sample [Ovchinnikova, 2018], although no norms are available for the Russian version. However, there are no adapted and standardized norm-referenced instruments for Russian-speaking children for the direct evaluation of children’s language skills.

The Preschool Language Scales, Fifth Edition (PLS-5) [Zimmerman, 2011] is among the most widely used standardized instruments for young children. The PLS-5 is an individually administered play-based instrument designed to evaluate language skills in children from birth to 7 years 11 months. It is used for both clinical and research purposes. The PLS-5 comprises two standard subscales (Auditory Comprehension and Expressive Communication), along with additional indicators (The Language Sample Checklist, Articulation Screener Scale, and Home Communication Questionnaire). The Auditory Comprehension (AC) subscale evaluates the child’s ability to understand spoken language. The Expressive Communication (EC) subscale determines how the child expresses himself/herself verbally and communicates with others. In both scales, items are arranged by complexity according to developmental expectations, which helps to detect changes in language skills over time. The test is used to assess preverbal as well as language skills in the areas of semantics, morphology, syntax, integrative and early literacy skills.

The PLS-5 has been shown to be effective with typically developing children, as well as special populations, such as children with ASD [Riley, 2019] and attention deficit hyperactivity disorder [Ramos, 2019]. Standardization of the original version of the PLS-5 was conducted in the United States with a sample of 1,400 children, including children with various clinical diagnoses. The retest reliability of the PLS-5 showed a good level of stability of results over time (stability coefficients ranged from 0.86 to 0.95). The internal consistency of the tool showed very high values (0.91 for the Auditory Comprehension scale, 0.93 for the Expressive Communication scale, and 0.95 for the overall score). Correlation coefficients for the PLS-5 and the CELF Preschoolers-2 [Wiig, 2004] scores ranged from medium (0.70) to high (0.82) [Zimmerman, 2011]. In addition to the English-language version (also normed for Australia and New Zealand), the authors have adapted and standardized the Spanish-language version of the PLS-5. The PLS-5 also has adapted versions in Turkish [Sahli, 2017], Bengali [American Psychiatric Association, 2013], and Indonesian [Sidarta, 2008].

The Russian version of the PLS-5 (RPLS-5) was translated and adapted for a large-scale research project investigating the developmental effects of rearing in orphanages [Zhukova, 2018]. The preliminary evaluation of the psychometric properties of the RPLS-5 was conducted with a sample of 44 children; 28 of these children (M_age=32.50 months, SD_age=7.50) had a history of institutional care placement, and 16 children (M_age=35.13 months, SD_age=8.08) were raised in biological families. The results demonstrated high reliability (Cronbach’s α coefficients were 0.96 for both subscales) and a high correlation between scores and chronological age on both subscales (r=0.89, p<.001 for AC and r=0.85, p<.001 for EC) [Zhukova, 2018].

In the current study, we aimed to evaluate the psychometric properties of the RPLS-5 with a larger sample of children, using methods from both classical test theory (CTT) as well as item response theory (IRT).

Methods

Sample

The total sample consisted of 473 children aged 3–96 months (M=32.64, SD=19.79; 201 girls). According to caregivers’ reports, all children in the study were native Russian language speakers. Among them, the group of typically developing children (TD) included 224 children (M=27.36, SD=16.22; 109 girls). To assess the test performance in special populations, we included additional clinical and at-risk (AR) groups. The group of children residing in institutional care (IC) consisted of 100 children (M=23.58, SD=10.93; 41 girls). Another group of 42 included children with a history of IC but at the point of data collection, were being raised in foster families (FF; M=29.27, SD=11.67, 28 girls). A group of children with autism spectrum disorders (ASD) consisted of 72 children (M=58.97, SD=17.93; 19 girls). The sample also included 11 children with unspecified neurodevelopmental disorders (UNDD) (M=23.58, SD=24.38; 3 girls). Due to the small number of participants with UNDD, the data obtained from this group was not included in the group analyses.

The inclusion/exclusion criteria for TD, IC, and FF groups were as the following: absence of uncorrected hearing or sight problems; no diagnosed neurological disorder or neurological symptoms; absence of major genetic syndromes; no history of institutional care placement for the TD group, or at least 6 months of institutionalization for the IC and FF groups. The inclusion criterion for the ASD group was the presence of any pervasive developmental disorders according to the ICD-10.

Participants for the study were mainly recruited through several medical organizations, socio-psychological services, childcare institutions, and social networks.

Assessments

RPLS-5. The PLS-5 is an interactive, play-based assessment of language skills for children from birth to 7 years 11 months. The test includes the Auditory Comprehension (AC) and Expressive Vocabulary (EC) subscales, as well as the Language Sample Checklist, Articulation Screener Scale, and Home Communication Questionnaire. For the current study, we used only the AC and EC subscales.

The PLS-5 comprises 65 auditory comprehension and 67 expressive comprehension tasks to identify the child’s language skills. Administration time usually ranges from 25 to 50 minutes, depending on the child’s age. The PLS-5 uses a basal and ceiling rule. All items are dichotomous. Passed items receive a score of “1”; items that are not passed are scored “0.” For items administered to children from birth through age 2 years 11 months, the item scoring can be based on elicitation, observation, or caregiver report. The scores are summed for each subscale, and raw scores can be converted to norm referenced scores, including standard scores, percentile ranks, and age equivalents for AC and EC. The scores can be summed to calculate a norm-referenced total language score.

The PLS-5 has been translated and adapted to the Russian language by a group of experts in language development; the procedure is described in detail in Zhukova et al., 2016 [Zhukova, 2016]. Most of the tasks designed for children from birth to 3 years were not modified, as they assess universal, culture-free preverbal behavior and emerging language skills, such as eye contact, joint attention, gestural communication, and familiar object recognition. Changes were primarily made to tasks assessing phonological, grammar, and early literacy skills according to the phonology, grammar, and orthography of the Russian language. The pictures to the modified tasks were replaced with more corresponding ones in the Picture Manual.

CDI. To assess concurrent validity, we used the Russian adaptation of the MacArthur-Bates Communicative Development Inventories (CDI) [Fenson, 2007], translated and normed in Russian by Tseitlin and colleagues [Eliseeva, 2017]. The CDI are caregiver-informant checklists that evaluate a child’s early communication and lexical skills. There are two age-specific versions: CDI Words and Gestures (CDI-WG) and CDI Words and Sentences (CDI-WS). CDI-WG is designed for children from 8 to 18 months and is focused on gestural communication and emerging receptive and expressive vocabulary. CDI-WS is designed for children from 18 to 36 months and assesses expressive vocabulary as well as early grammar.

Data analysis

To evaluate the psychometric properties of the RPLS-5, we analyzed the data using both CTT and IRT methods.

The plan of analysis was as follows: 1) describe the subscales and item characteristics across and within the study groups using CTT methods; 2) evaluate the concurrent validity of the RPLS-5 and CDIs; 3) investigate the dimensionality of the RPLS-5 subscales; 4) compare the 1- and 2-parameter logistic models regarding their fit to the data; and 5) examine the subscales and item functioning using the best fitting model.

The statistical analyses were performed in the R programming environment [Piedmont, 2014]. For descriptive and CTT analyses we used the psy [Falissard, 2012]and CTT [Willse, 2018] packages. The IRT analysis was performed using the ltm [Rizopoulos, 2006] and mirt [Chalmers, 2012] packages.

CTT. CTT theory is based on the assumption that observed test scores are composed of a true score and a standard error of measurement, where the true and the error scores are independent [Lord, 2008]. Thus, it is often called the “true score model.” CTT focuses on the entire test rather than separate items. CTT analyses are easy to perform and are widely used in psychometrics to measure and manage test performance data. However, the psychometric properties measured using CTT methods depend on the sample characteristics and sample size.

To assess the reliability of the RPLS-5 subscales, as well as the whole test, we used the split half coefficient (after Spearman-Brown correction) and Cronbach’s α. The obtained results were interpreted as unacceptable at α<0.70, fair at values from 0.70 to 0.79, good at values from 0.80 to 0.89, and excellent at α>0.90 [Crais]. Additionally, we measured inter-item and item-to-total correlations, item difficulty (within this framework, understood as the proportion of correct responses to the item), Cronbach’s α if an item was deleted, and correlations between scores and chronological age. According to Piedmont [Piedmont, 2014], inter-item correlations within the range of 0.20 to 0.40 are considered optimal because, first, they indicate sufficient homogeneity in the measurement of the latent variable and, second, they have sufficiently unique variance and do not duplicate each other. The concurrent validity was measured as the correlation between the RPLS-5 and the CDI scores.

IRT. IRT, also known as the latent response theory, establishes a link between the properties of an instrument, the individual responding to the tasks, and the underlying trait being measured [St Clair, 2011]. In general, it is based on generalized linear mixed models that allow precise measurement across the range of a latent trait at both the item and test levels. In contrast to CTT, IRT does not rely on the assumption of a linear association between the latent trait and the results of a test and is less dependent on sample characteristics and sample size [Reise, 2003]. The most popular IRT models can be categorized into 1-parameter logistic (1PL), 2-parameter logistic (2PL), and 3-parameter logistic (3PL) models, named according to the number of parameters used to model the characteristics of an item. The 1PL model describes test items in terms of only one parameter, e.g., item difficulty (the value of measuring ability, at which an individual has a 50% chance of passing or affirming that item). The 2PL model includes two item characteristic parameters, e.g., item difficulty and item discrimination (a measure of the item’s ability to accurately differentiate between individuals with higher or lower levels of a latent trait). The 3PL model includes both characteristics of the item from the other models and, additionally, a pseudo-chance “guessing” parameter.

Given the independence of IRT from sample characteristics, all participants were included in the analysis. The latent dimensionality of the RPLS-5 subscales was examined with a modified parallel analysis (MPA). Subsequently, to choose the most appropriate model to fit the data, we estimated 1- and 2-parameter logistic models. The model fit was investigated based on the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Log-likelihood absolute parameters with a preference for lower values, along with the ANOVA and Bootstrap Likelihood Ratio Test (BLRT, n=1,000). The item fit statistics were analyzed based on the Orlando and Thissen's S-χ² item fit index. An item was flagged for misfit if the significance level for the S-χ² index was less than .05 [Orlando].

In addition to the IRT assumption of unidimensionality, we tested the assumption of local independence using the Q₃ statistic with a critical value of |0.2| [Conti-Ramsden, 2012]. Then, the item difficulty and item discrimination were analyzed. The level of an item’s complexity was assessed as relative, depending on the location of the function of the item on the scale of measured ability. When interpreting the values of the discriminatory parameter, the following gradation was used: 0 — no; 0.01–0.34 — very low; 0.65–1.34 — moderate; 1.35–1.69 — high; above 1.70 — perfect [Baker, 2001].

Results

Descriptive statistics and CTT results

Table 1 displays the average raw scores and standard deviations generated by children in the RPLS-5 according to 8 age ranges, along with correlation coefficients between raw scores and the ages of participants.

Table 1

Average raw scores and standard deviations gained by children in PLS-5 accordingto 8 age ranges along with correlation coefficients
between raw scores and an age of participants by groups

Age (months)	TD		IC		FF		ASD
Age (months)	AC	EC	AC	EC	AC	EC	AC	EC
0–11	15.3±2.4	17.2±3.6	13.7±1.7	15.1±3.6	11.3±2.6	12.2±2.9	NA	NA
12–23	23.6±5.5	23.5±3.1	20.3±4.5	19.6±4.2	24.2±8.0	22.5±1.6	8.0	8.0
24–35	35.0±7.1	30.3±7.1	29. 8 ±4.0	26.7±5.0	32.6±5.8	28.7±5.3	21.2±9.2	17.2±8.0
36–47	42.7±5.7	41.5±7.5	34.8±3.4	32.6±4.9	40.1±5.3	38.8±7.9	21.5±7.1	20.2±6.7
48–59	45.4±7.4	45.2±7.3	42.5±2.1	43.5±7.8	39.0	38.0	23.4±9.3	18.7±9.7
60–71	49.4±7.9	52.3±9.0	NA	NA	NA	NA	23.2±10.0	22.8±8.6
72–83	63.0	61.0	NA	NA	NA	NA	32.1±10.0	25. 0±8.6
84–95	50.8±8.0	56.3±7.0	NA	NA	NA	NA	28.4±15.8	22.9±12.3
Correlation between age and raw scores	0.90***	0.87***	0.94***	0.87***	0,83***	0.31**	0.36**	0.30**

Notes. AC — the Auditory Comprehension scale; EC — the Expressive Communication scale; TD — the Typical Developing group, IC — the Institutional Care group, FF — the Foster Family group, ASD — Autism Spectrum Disorder group, NA — Not Available (as a value could not be calculated). * — p<.05; ** — p<.01,
*** — p<.001.

While 18 age ranges were used in the original validation study, we decided to reduce them in this study because of the lack of participants in some ranges. From this table, it can be seen that for TD, IC, and FF groups, AC and EC raw average scores systematically increase with age. At the same time, for the ASD group, the trend is less unequivocal, likely due to the heterogeneity of language abilities in this group. However, the correlation analysis reveals a statistically significant relationship between age and raw scores for both scales in all groups, although it is weaker for the FF in EC and for ASD in both domains.
A comparison of the results between the TD and at-risk subgroups (see Supplemental Tables 1-2 at https://osf.io/tq3pa/?view_only=e513d5557a304f0baa04adba76c31a1b) has revealed that in most age ranges children from the IC and ASD subgroups obtained lower scores on both scales. For IC, the effect sizes were statistically significant and ranged from moderate to large (Cohen’s d ranged from 0.55 to 1.54); for ASD, effect sizes were large for all age ranges (Cohen’s d ranged from 1.83 to 3.43), but for children aged 24 to 35 months the difference in AC was not significant (t=2.93 (3.26), p>.06). The large, but not significant effect was also found for children from the FF up to 11 months in AC (t=3.17 (3.34), p<.05, d=1.83 [0.74; 2.92]), but no differences were found for older children from this group.

Cronbach's alpha values were excellent across all subscales and groups, ranging from 0.94 to 0.96 (Table 2). Inter-item correlations ranged from 0.28 to 0.33 for AC and from 0.27 to 0.30 across subgroups, indicating that both scales have sufficient homogeneity in the measurement of the investigated constructs, a sufficient unique dispersion, and do not duplicate each other.

Table 2

Internal consistency values and average inter-item correlations for the Auditory Comprehension and Expressive Communication scales

Scale-group	Split-half reliability	Cronbach's ɑ	Average Inter item correlation
AC-TD	0.98	0.96	0.28
AC-IC	0.98	0.94	0.30
AC-FF	0.98	0.95	0.33
AC-ASD	0.98	0.95	0.30
AC-Total	0.98	0.95	0.26
EC-TD	0.98	0.96	0.29
EC-IC	0.97	0.94	0.27
EC-FF	0.98	0.95	0.30
EC-ASD	0.98	0.94	0.30
EC-Total	0.98	0.96	0.28

Notes. AC — the Auditory Comprehension scale; EC — the Expressive Communication scale; TD — the Typical Developing group, IC — the Institutional Care group, FF — the Foster Family group, ASD — Autism Spectrum Disorder group.

Item descriptives displayed in Supplemental Table 3 (available at: https://osf.io/ tq3pa/?view_only=e513d5557a304f0baa04adba76c31a1b) demonstrate that the exclusion of any of the items of AC and EC does not result in any increase in the internal consistency of either of the subscales (Cronbach's alpha values for all items on both scales remain at 0.96). The frequency of correct answers on items varied for both subscales.
As seen in Supplemental Tables 4-6 (available at: https://osf.io/tq3pa/?view_only= e513d5557a304f0baa04adba76c31a1b) for both scales, items at the beginning of the scales are the easiest, and all/most of the participants completed them; also, as the item number increases the frequency of correct answers decreases, with the most difficult items at the end of the scales.

The matching results of the RPLS-5 and CDIs via the correlation analysis show a positive and statistically significant relationship between composite scores of these instruments (r=0.77 (98), p<.001; r=0.83 (220), p<.001 for CDI-18 and CDI-36 respectively), this relationship was also found for each PLS subscale and CDI composite scores (r=0.78 (98), p<.001 for AC and CDI-18; r=0.64 (98), p<.001 for AC and CDI-36; r=0.75 (220), p<.001 for EC and CDI-18; r=0.83 (220), p<.001 for EC and CDI-36).

IRT analysis

Auditory comprehension. In order to avoid errors associated with zero or near-zero variance in the proceeding of IRT analysis, items with the frequency of correct and incorrect answers lower than 4 (see Supplemental Table 4 at https://osf.io/tq3pa/ ?view_only=e513d5557a304f0baa04adba76c31a1b) were excluded.

The MPA indicated that the second eigenvalue of the observed data is substantially larger than the second eigenvalue of the data under the assumed 1PL model (Table 3), indicating that the unidimensionality assumption has been met and unidimensional IRT models can be applied to the data.

IRT model-fit assessment revealed that the data showed a significantly better fit to the 2PL model compared to the 1-PL model (LRT=529.57, df=54, p<.001, for BLRT p<.001). The AIC, BIC, and LogLikelihood also favored the 2PL model (Table 3). The individual item-fit values are provided in the online supplemental materials (see Supplemental Table 6
at https://osf.io/tq3pa/?view_only=e513d5557a304f0baa04adba76c31a1b). In the 1PL model, 11 items were potentially misfitted to the data; in the 2PL model, the number of items with poor fit was lower (items № 10, 11, 16, 25, 26, 51). Thus, we proceeded with further analyses using the 2PL model.

Examination of the residual correlation matrix based on Q₃ statistics revealed potential local dependence between the following pairs of items: 6-7 (0.23), 7-9 (0.24),
9-10 (0.30), 6-12 (0.23), 12-16 (-0.31), 18-21 (0.26), and 24-26 (-0.29). However, as the number of potentially violated pairs was small (0.47%) and these “red flag” items assess different receptive skills typically developing in close age periods, we determined that there was sufficient evidence to support the assumption of local independence for the whole subscale to proceed further analysis of model parameters.

Table 3

Model fit criteria and comparison observed versus resampled eigenvalues for 1PL and 2PL models

Model fit criteria	Auditory comprehension		Expressive communication
	Model		Model
	1PL	2PL	1PL	2PL
AIC	9878.22	9456.65	8950.75	8055.39
BIC	10111.13	9914.15	9187.70	8520.97
Log-likelihood	-4883.11	-4618.32	-4418.38	-3915.70
Unidimensional testing	-	-	-	-
Second eigenvalues of the observed data	2.51	2.51	NA	1.74
Average of second eigenvalues in Monte Carlo samples (n=100)	2.11	2.89	NA	1.68
Comparison observed versus resampled eigenvalue	0.27	0.66	NA	0.45

Notes. AIC — Akaike Information Criterion, BIC — Bayesian Information Criterion, 1PL — 1-parameter logistic model, 2PL — 2-parameter logistic model.

In the 2PL model, the difficulty of the items (see Supplemental Table 6 at https://osf.io/tq3pa/?view_only=e513d5557a304f0baa04adba76c31a1b) consistently ranged from the easiest at the beginning to the most difficult at the end of the AC subscale (b parameters were from -2.745 to 2.065). The discrimination parameters across the items ranged from high to perfect values, indicating a high capacity to differentiate individuals in the wide range from low to high latent ability (i.e., language comprehension functions).

The Item Characteristic Curves (ICCs) of all the items (Figure 1) indicate that the probability of completing items successfully increased with the increase of the latent trait. At the same time, the larger the item number, the higher the required latent trait ability for the probability of giving a correct answer.

The Test Information Function (TIF), along with the standard error measure and reliability curve (Figure 2) for the overall AC, demonstrates that the scale measured language comprehension with at least 90% reliability for theta values of about -2.5 to 2.5 SD, while the most information (i.e., ability of the subscale to differentiate individuals with the highest precision) is provided for individuals with ability levels of about 1.8. At this point, the standard error is the lowest, and the information is the highest, indicating that AC better discriminates individuals at higher ability levels. For individuals with severe problems in language comprehension, as well as with highly skilled individuals, the subtest has the lowest precision. The marginal estimate of empirical reliability for the 2PL model of AC is high (0.98) and close to the CTT-based estimates reported above.

Figure 1. Item Characteristic Curves for Auditory Comprehension Scale

Expressive communication. Based on the frequency of correct and incorrect answers (see Supplemental Table 5 at https://osf.io/tq3pa/?view_only=e513d5557a304f0baa 04adba76c31a1b), items № 1-5, 53-54, 61-63, and 65 were excluded from the further IRT analysis because of near zero variance.

The results of the MPA were not statistically significant (Table 2), meaning that the unidimensionality assumption was met.

Comparison of the 1PL and 2PL models based on the analysis of variance revealed that the latter exhibited a better fit (LRT=1005.362, df=55, p<.001, for bootstrap LRT p<.001) and absolute model fit indices (Table 2). The p-values associated with the S-χ² fit statistics ranged from 0.0000001 to 0.98, with an average p-value of 0.25 for the 1Pl model and from 0.00036 to 0.94 (M=0.60) for the 2PL model. A comparison of the number of misfit items was not possible (as 2PL model fit statistics could not be calculated for 9 items); however, taken together, the results favor the 2PL model.

Figure 2. Test Information Function (left panel) and Test Reliability Curve (right panel) for the Auditory Comprehension Scale

As in the case of AC, only 8 of 1,485 item pairs (0.54%) had Q₃ values above |0.2|: 6-10 (-0.31), 7-10 (-0.21), 11-14 (-0.29), 13-17 (-0.25), 16-21 (-0.24), 19-22 (0.22), 19-25 (0.31), 20-25 (0.22). All assess different milestones of expressive communications, typically emerging in close periods of early development. So that the assumption of local independence for the whole EC subscale could be accepted.

In the 2PL model, the discrimination parameters ranged from high to perfect values (2.16<α<24.77), indicating the potentially good diagnostic validity of EC. The difficulty of the items ranged from -2.38 to 1.67, with the easiest items located at the beginning of the scale and the most difficult ones at the end (all item parameters are presented in Supplemental Table 6 at https://osf.io/tq3pa/?view_only=e513d5557a304f0baa04ad ba76c31a1b).

A visual analysis of the ICCs (Figure 3) shows that most of the items exhibit similar patterns: the probability of a correct answer increases with increased levels of the latent ability, and upon reaching a certain level that differs for each item, it sharply increases, after which the probability no longer depends on the growth of the ability. Notably,
as for items № 58-60, 64, and 66, the α-parameters were more than 20, and its ICCs are the sharpest: after distinguishing some points along with the ability scales, they are seemed to vertical line. That means that to the left of these points, the probability of a correct answer is near zero, to the right — the probability increases sharply to 1, so that these items make no distinction between those whose ability level is below or above a certain level of the ability).

TIF with the standard error and reliability curve (Figure 4) of overall EC reveals that the subtest measured expressive language skills with at least 90% reliability for theta values from -2.2 to 2.6 SD; outside this range, the standard error sharply increases. At the same time, the TIF has several peaks. Surprisingly, the most information is distinguished for individuals with an ability level around of 2.2 SD. Thus, EC has the lowest reliability for individuals with severe problems in expressive communication and for those with extremely high language expertise. In general, the marginal estimate of empirical reliability for the 2PL model EC is excellent (0.98).

Figure 3. Item Characteristic Curves for Expressive Communication Scale

Figure 4. Test Information Function (left panel) and Test Reliability Curve (right panel) for the Expressive Communication Scale

Discussion

Currently, in Russia, there is an acute shortage of valid and evidence-based standardized instruments for assessing potential language disorders in preschool children [American Psychiatric Association, 2013]. This could lead to mis- or under-diagnoses. Consequently, a significant number of children with language impairments may not receive timely and appropriate interventions, which decreases their potential for better prognoses. Moreover, the lack of such tools significantly limits our ability to compare data from Russian-language samples with results from other language groups. This study of the psychometric parameters of the Russian-language version of the PLS-5, using both CTT and IRT approaches, is a major step towards filling this gap.

The analysis of internal consistency, inter-item, and item-total correlations, as well as the analysis of changes in internal consistency in the case of a deleted item, demonstrates that from the standpoint of CTT, both RPLS-5 scales are highly reliable and comparable with the original PLS-5 [Zimmerman, 2011]. Also, the results of the correlation analysis revealed a statistically significant relationship between both scales and the age of participants, which reflects the tests’ ability to differentiate children by age. Between-group comparisons show that children brought up in orphanages exhibit lower levels of ability according to both RPLS-5 scales in comparison with neurotypical peers under 4 years old raised in biological families. Children with ASD also scored lower on both subtests. Since deviating patterns in language development are common, both among children with a history of institutionalization [Zhukova, 2010] and those with ASD [Levy, 2010; Norrelgen, 2015], the results support the ability of both RPLS-5 scales to identify potential disability in the language domain. The fact that no statistically significant differences were found between children raised in foster families and their neurotypical peers from biological families is consistent with findings that when children are adopted in early childhood, the negative effects of institutionalization can be smoothed out [Windsor, 2011]. At the same time, it should be noted that since conducting a comparative study was not the goal of this work, the described conclusions should be interpreted with caution.

The correlation analysis of the RPLS-5 results and CDI questionnaires shows a statistically significant positive relationship between the instruments, which also attests to the validity of the RPLS-5.

According to the results of the IRT analysis, the difficulty of items in both scales ranged from very easy to very difficult, reflecting the fact that the PLS-5 is designed for a wide age range. In addition, the difficulty of the items in general consistently increased with the increase in the number of the item, which reflects the appropriate hierarchical organization of the RPLS-5, as task complexity increases in accordance with age expectations. At the same time, a number of fluctuations are found that indicate that the permutation of some items (through sorting items from the smallest to largest b parameters) could possibly improve the reliability of the RPLS-5. The discrimination parameters across the items of both scales ranged from high to perfect values, indicating a high capacity to differentiate individuals from low to high receptive and expressive language abilities. At the same time, some items located at the end of the EC scale (№ 58-60, 64, and 66) demonstrated unusually high values of the α-parameters. This may be influenced by the underrepresentation of children between the ages of 6.5 and 8 (the age range that these tasks are aimed at) in the sample that was included in the analysis. On the other hand, it may also be associated with the fact that, since the PLS-5 is limited to the age of 8 years when appropriate development is achieved, this scale loses its differentiating ability for older children. The lowest reliability of the RPLS-5 is found for individuals with severe problems in the language domain and for those with extremely high language expertise. In general, marginal estimates of empirical reliability for the 2PL model for both scales are excellent, that is comparable with results obtained with CTT and with reliability of the original version of PLS-5.

It is also worth noting that the attempt made in this study to assess the psychometric indicators of the RPLS-5 using an IRT approach is the first for PLS-5 as far as we know. However, we also indicate several limitations of this study. First, although we ascertained a large sample, the distribution of age in the sample was not even, and some of the age groups were smaller than intended or used at the validation of the original PLS and its subsequent versions. This limitation especially influences the results obtained from the descriptive and CTT analyses and the number of items included in the IRT analysis. Next, since the sample included children not only with normative language development but also children from at-risk groups, it can be assumed that the RPLS-5 may exhibit differential functioning in different subgroups. Differential analysis for the likelihood of such distortions remained outside the scope of this study but represents a focus for the future. Future directions should also include an analysis of the test-retest reliability.

References

Alam M. Adaptation of Preschool Language Scale-4 (PLS-4) for Screening language development of Bangla speaking children. Master Dissertation. Dhaka, Bangladesh: BRAC University, 2012. 51 p. URL: https://core.ac.uk/download/pdf/61802358.pdf (Aссessed: 09.06.2022)
American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 5th ed., Arlington, VA: American Psychiatric Publishing, 2013.
Association of Psychiatrists and Psychologists for Evidence-Bases Practice. Klinicheskie rekomendacii "Rasstrojstva autisticheskogo spektra" (deti): ID 594 [Clinical guidelines for children with Autism Spectrum Disorders: ID 594]. Ministry of Health of the Russian Federation, 2020. URL: https://apicr.minzdrav.gov.ru/api.ashx?op= GetClinrecPdf&id=594_1 (Aссessed: 09.06.2022)
Babyonyshev M., Hart L., Reich J. et al. Otsenka razvitiya russkogo yazy‘ka [Assessment of the Development of Russian]: Unpublished manual, 2007. (In Russ.).
Baker F.B. The basics of item response theory. 2nd ed. ERIC, 2001. 186 p. URL: https://eric.ed.gov/?id=ED458219 (Aссessed: 09.06.2022).
Chalmers R.P. Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 2012, vol. 48, no. 6, pp. 1–29. DOI: 10.18637/jss.v048.i06
Christensen K.B., Makransky G., Horton M. Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 2017, vol. 41, no. 3, pp. 178–194. DOI: 10.1177/ 0146621616677520
Cicchetti D.V. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 1994, vol. 6, no. 4, pp. 284–290. DOI: 0.1037/1040-3590.6.4.284
Conti-Ramsden G., Durkin K. Language development and assessment in the preschool period. Neuropsychology Review, 2012, vol. 22, no. 4, pp. 384–401. DOI: 10.1007/s11065-012-9208-z
Crais E. Testing and beyond: Strategies and tools for evaluation and assessment of infants and toddlers. Language, Speech, and Hearing Services in Schools, 2011, vol. 42, no. 3, pp. 341–364. DOI: 0.1044/0161-1461(2010/09-0061)
Dubois P., St-Pierre M.C., Desmarais C. Et al. Young adults with developmental language disorder: a systematic review of education, employment, and independent living outcomes. Journal of Speech, Language, and Hearing Research, 2020, vol. 63, no. 11,
pp. 3786–3800. DOI: 0.1044/2020_JSLHR-20-00127
Eliseeva M.B., Vershinina E.A., Ryskina V.L. Makarturovskii oprosnik: russkaya versiya. Otsenka rechevogo i kommunikativnogo razvitiya detei rannego vozrasta. Normy razvitiya. Obraztsy analiza. Kommentarii [MacArthur Questionnaire: Russian edition. Assessment of speech and communicative development of children of early age. Developmental norms. Examples of analysis. Commentaries]. Ivanovo: LISTOS, 2017. 76 p. (In Russ.).
Falissard B. Psy: Various procedures used in psychometry, 2012. 20 p. URL: https://CRAN.R-project.org/package=psy (Aссessed: 08.06.2022).
Fenson L., Marchman V.A., Thal D.J. et al. MacArthur-Bates Communicative Development Inventories: User's guide and technical manual. 2nd ed. Baltimore, MD: Brookes Publishing, 2007. 208 p.
Hambleton R.K., Lee M.K. Methods for Translating and Adapting Tests to Increase Cross-Language Validity. In D.H. Saklofske, C.R. Reynolds, V. Schwean (eds.), The Oxford Handbook of Child Psychological Assessment. New York, NY: Oxford University Press, 2013. DOI: 10.1093/oxfordhb/9780199796304.013.0008
Kornilov S.A., Lebedeva T.V., Zhukova M.A. et al. Language development in rural and urban Russian-speaking children with and without developmental language disorder. Learning and Individual Differences, 2016, vol. 46, pp. 45–53. DOI: 10.1016/j.lindif. 2015.07.001
Levy S.E., Giarelli E., Lee L. et al. Autism spectrum disorder and co-occurring developmental, psychiatric, and medical conditions among children in multiple populations of the United States. Journal of Developmental & Behavioral Pediatrics, 2010, vol. 31, no. 4, pp. 267–275. DOI: 10.1097/DBP.0b013e3181d5d03b
Lord F.M., Novick M.R. Statistical theories of mental test scores. New York, NY: Information Age Publishing, 2008. 592 p.
Luyster R.J., Seery A., Talbott M.R., Tager‐Flusberg H. Identifying early‐risk markers and developmental trajectories for language impairment in neurodevelopmental disorders. Developmental Disabilities Research Reviews, 2011, vol. 17, no. 2, pp. 151–159. DOI: 10.1002/ddrr.1109
Norrelgen F., Fernell E., Eriksson M. Children with autism spectrum disorders who do not develop phrase speech in the preschool years. Autism, 2015, vol. 19, no. 8, pp. 934–943. DOI: 10.1177/1362361314556782
Orlando M., Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 2000, vol. 24, no. 1, pp. 50–64. DOI: 10.1177/01466216000241003
Ovchinnikova I.V., Zhukova M.A., Grigorenko E.L. Aprobaciya metodiki Vineland Adaptive Behavior Scales (VABS) na russkoyazychnoj vyborke [Experimental testing of the technique Vineland Adaptive Behavior Scales (VABS) on a Russian language sample]. Voprosi psychologii=Questions of Psychology, 2018, no. 6, pp. 134–145. (In Russ.).
Piedmont R.L. Inter-item correlations. In A.C. Michalos (ed.),Encyclopedia of Quality of Life and Well-Being Research. Dordrecht: Springer Netherlands, 2014, pp. 3303–3304. DOI: 10.1007/978-94-007-0753-5_1493
Prikhoda N.A. Ocenka razvitiya russkogo yazyka (ORRYA) kak standartizovannaya metodika diagnostiki kommunikativnoj funkcii u detej ot 3 do 9 let [Russian Language Development Assessment as a Standardized Technique for Assessing Communicative Function in Children Aged 3–9 Years]. Psikhologicheskaya nauka i obrazovanie=Psychological Science and Education, 2016, vol. 21, no. 3, pp. 25–33. DOI: 0.17759/pse.2016210304 (In Russ.)
R Core Team. A language and environment for statistical computing, 2019. URL: https://www.r-project.org. (Accessed: 09.06.2022)
Ramos E., Suarez M., Hart K. et al. Syntactic and Semantic Abilities of Bilingual versus Monolingual Preschoolers with Language Impairment and ADHD. International Journal of Language and Linguistics, 2019, vol. 6, no. 2, pp. 1–8. DOI: 10.30845/ijll.v6n2p1
Reise S.P., Henson J.M. A discussion of modern versus traditional psychometrics as applied to personality assessment scales. Journal of Personality Assessment, 2003, vol. 81, no. 2, pp. 93–103. DOI: 10.1207/S15327752JPA8102_01
Riley E., Paynter J., Gilmore L. Comparing the Mullen Scales of Early Learning and the Preschool Language Scale — Fifth Edition for Young Children with Autism Spectrum Disorder. Advances in Neurodevelopmental Disorders, 2019, vol. 3, no. 1, pp. 29–37. DOI: 10.1007/s41252-018-0084-2
Rizopoulos D. Itm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 2006, vol. 5, no 17, pp. 1–25. DOI: 10.18637/jss.v017.i05
Sahli A.S., Belgin E. Adaptation, validity, and reliability of the Preschool Language Scale–Fifth Edition (PLS–5) in the Turkish context: The Turkish Preschool Language Scale–5 (TPLS–5). International Journal of Pediatric Otorhinolaryngology, 2017, no. 98, pp. 143–149. DOI: 10.1016/j.ijporl.2017.05.003
Sidarta N., Tulaar A.B., Nasution A. et al. Validity and reliability of Preschool Language Scale 4 for measuring language development in children 48–59 months of age. Universa Medicina, 2008, vol. 27, no. 4, pp. 174–182. DOI: 10.18051/UnivMed. 2008.v27.174-182
Sparrow S.S., Cicchetti D.V., Balla D.A. et al. Vineland adaptive behavior scales. Survey forms manual. Circle Pines, MN: American Guidance Service, 2005. 330 p.
St Clair M.C., Pickles A., Durkin K. et al. A longitudinal study of behavioral, emotional and social difficulties in individuals with a history of specific language impairment (SLI). Journal of Communication Disorders, 2011, vol. 44, no. 2, pp. 186–199. DOI: 10.1016/j.jcomdis.2010.09.004
Steinberg L., Thissen D. Item response theory. In J.S. Comer, P.C. Kendall (eds.), The Oxford Handbook of Research Strategies for Clinical Psychology, NY: Oxford University Press, 2013. DOI: 10.1093/oxfordhb/9780199793549.013.0018
Wiig E.H., Secord W.A., Semel E. CELF-Preschool-2: Clinical Evaluation of Language Fundamentals — Preschool. (CELF Preschool-2). Toronto: Harcourt Assessment, 2004. 204 p.
Willse M.J.T. Package ‘CTT’, 2018. 20 p. URL: https://cran.r-project.org/web/packages/CTT/CTT.pdf (Aссessed: 09.06.2022).
Windsor J., Benigno J.P., Wing C.A. et al. Effect of foster care on young children’s language learning. Child Development, 2011, vol. 82, no. 4, pp. 1040–1046. DOI: 10.1111/j.1467-8624.2011.01604.x
Zhukova M.A. Psychologicheskie i psychofiziologicheskie osobennosti yazykovogo razvitiya detei I vzroslyh s opytom institucionalizacii: diss. ... kand. psikhol. nauk. [Psychological and psychophysiological characteristics of language development of children and adults with a history of institutionalization. PhD Dissertation]. Saint-Petersburg: Saint-Petersburg State University, 2018. 166 p.
Zhukova M.A., Kornilov S.A., Simmons E. et al. Diagnostika razvitiya yazyka i rechi s pomoshch'yu "yazykovykh shkal dlya doshkol'nikov" (preschool language scales): analiz individual'nogo sluchaya. [Diagnosing speech development with the help of "preschool language scales": a case analysis]. Voprosi psychologii=Questions of Psychology, 2016, no. 5, pp. 154–164. (In Russ.).
Zhukova M.A., Kornilov S.A., Tseitlin S.N. et al. Early lexical development of children raised in institutional care in Russia. British Journal of Developmental Psychology, 2010, vol. 38, no. 2, pp. 239–254. DOI: 10.1111/bjdp.12314
Zimmerman I.L., Steiner V.G., Pond R.E. Preschool language scales — fifth edition (PLS-5). Bloomington, MN: Pearson, 2011. DOI: 10.1037/t15141-000

Information About the Authors

Oksana I. Talantseva, Research Associate, Scientific Center for Cognitive Research, Sirius University of Science and Technology, Federal territory "Sirius", Russian Federation, ORCID: https://orcid.org/0000-0002-7555-1216, e-mail: talantseva.oi@talantiuspeh.ru

Iuliia O. An, Research Engineer of the Laboratory of Translational Developmental Sciences, Saint-Petersburg State University, St.Petersburg, Russian Federation, ORCID: https://orcid.org/0000-0002-4695-7065, e-mail: iuliia.o.an@gmail.com

Marina A. Zhukova, Candidate of Science (Psychology), Postdoctoral Fellow, Boston Children’s Hospital, Harvard Medical School, Boston, United States of America, ORCID: https://orcid.org/0000-0002-3069-570X, e-mail: marina.zhukova@childrens.harvard.edu

Anna N. Trubitsyna, Researcher of the Center for Behavior Analysis, Institute of Psychology and Medicine, Novosibirsk State University, Novosibirsk, Russian Federation, ORCID: https://orcid.org/0000-0002-3702-4693, e-mail: atrubicyna@ngs.ru

Anastasiya V. Teedemaa, Psychologist of the Center for Behavior Analysis, Institute of Psychology and Medicine, Novosibirsk State University, Novosibirsk, Russian Federation, ORCID: https://orcid.org/0000-0002-1982-3555, e-mail: belkavkolese@mail.ru

Elena L. Grigorenko, PhD, Hugh Roy and Lillie Cranz Cullen Distinguished Professor of Psychology, University of Houston, Houston, TX, USA; Adjunct Senior Research Scientist, Moscow State University of Psychology and Education, Moscow, Russia; Professor and Acting Director, Center for Cognitive Sciences, Sirius University of Science and Technology, Federal territory "Sirius", Russia; Adjunct Professor, Child Study Center and Adjunct Senior Research Scientist, Haskins Laboratories, Yale University, New Haven, CT, USA; Research Certified Professor, Baylor College of Medicine, Member of the editorial boards of the journals “Clinical and Special Educatiom”, “Experimental Psychology” and “Psychological Science and Education”, Houston, United States of America, ORCID: https://orcid.org/0000-0001-9646-4181, e-mail: elena.grigorenko@times.uh.edu