Background. Standard assessments of autism spectrum disorders (ASD) rely primarily on negative markers for detection
of the disorders. Research has indicated that certain vocal characteristics of children diagnosed with ASD may differ
consistently from those of typically developing children, suggesting the presence of a positive marker for ASD in child
vocalization activity. However, investigations into the potential clinical utility of such a marker have been limited by two
key challenges: 1) the difficulty of obtaining sample data of sufficient quantity and quality; and 2) the identification of
consistent discriminative vocalization patterns. The LENA System’s advanced technology overcomes these limitations
and provides a unique approach to the detection of ASD with good accuracy.
Development. The LENA System comprises two distinct components: recording hardware and processing software.
The LENA Digital Language Processor (DLP) is a small, lightweight digital recorder that fits into the front pocket of
specially designed children’s clothing and records up to 16 hours of continuous, high-quality audio. Recordings include
all vocalizations produced by the key child (i.e., the child wearing the DLP) and all externally sourced sounds and speech
activity within an approximate 4-6 foot radius. This unobtrusive approach to data sampling permits the collection of
naturalistic full-day recordings from a child’s home language environment with relative ease, rendering negligible the
limitations arising from the first challenge, obtaining adequate child vocalization data. The second challenge, identifying
consistent patterns in child vocalizations that can be utilized to discriminate a child with ASD from a typically developing
child, is addressed by the processing software as described below.
The LENA System software processes the audio recording into segments from several seconds to several minutes in
duration, assigning a sound category (e.g., key child vocalizations, adult male speech, TV/electronic sound, silence) to
each segment based on previously developed acoustic models. Key child vocalization segments are further processed
to determine the probability that the child’s vocal output is consistent with a pre-defined classification model for ASD.
We have developed two complementary methods for detecting unique and discriminating patterns in the vocalizations
produced by children with ASD and deriving these classification probabilities.
The first method, here called phone-based (PB), defines a unique acoustic feature set using a quantitative approach
that incorporates modified components of the open-source Sphinx automatic speech recognition (ASR) software. Child
vocalization segment data are processed by this software into 46 unique categories that include 39 “phone” and 7 “nonphone”
categories. Note that these “phones” are more broadly defined acoustic approximations of commonly accepted
phoneme categories. Sequential pairs of these “phones” are grouped into “biphones” that are then linearly recombined
and reduced to 50 dimensions following a previously derived principal components analysis. For a more detailed
description of the phone-based approach described here, please see LENA Technical Report LTR-08-1, “The LENATM
Automatic Vocalization Assessment” (http://www.lenafoundation.org/TechReport.aspx/AVA/LTR-08-1).
The second method, here called cluster-based (CB), utilizes an unsupervised k-means clustering routine applied directly
to child vocalization segment data. This self-organized approach utilizes 64 phone-like clusters generated on the
acoustic feature of mel-frequency cepstrum (MFC). For this method, as for the phone-based method, because the goal
is not to recognize or translate speech it is not necessary that the resulting clusters or dimensions be identifiable as
specific phones but only that the processing provides reliable or consistent results.
Ultimately, a previously derived linear discriminant analysis (LDA) function is applied to the combined PB and CB feature
sets to determine the probability of classification to the ASD pattern. For convenience and to enhance interpretability,
LDA classification probabilities are reduced to seven ordinal categories using a variable threshold based on sensitivity
and specificity for our development data.
Performance. Classification performance was assessed for a sample of 190 children ages 24–48 months based on each child’s first
recording after age 24 months and employing the method of Leave-One-Out Cross Validation (LOOCV) to maximize data usage and
enhance generalizability. The sample included 75 children diagnosed with ASD, 34 children diagnosed with a language delay (LD), and
81 typically developing children (TD). The ASD sample was recruited nationwide, and families were required to provide documented
confirmation of the ASD diagnosis from a professional or team of professionals. In addition, parents completed the self-report symptom
questionnaires the Modified Checklist for Autism in Toddlers (M-CHAT) and the Social Communication Questionnaire (SCQ); average
parent score for the M-CHAT was 9.5 (SD=4.8; Range 0-19) and for the SCQ was 18.7 (SD=5.7; Range 7-32). The performance metric
presented here is based on the Equal Error Rate (EER).1 The following table summarizes EER performance across three comparisons:
ASD vs. non-ASD (TD & LD); ASD vs. LD; and ASD vs. TD.
The LENA Automatic Autism Screen compares favorably to other non-automatic measures, which are widely used. Shown below are
some of the reliability statistics reported by other well known measures.
Summary. The combined phone-based and cluster-based detection method detailed above demonstrates relatively low classification
error rates, reinforcing the viability of an automated detector for ASD based on child vocalization activity. The DLP provides researchers
the means to collect comprehensive naturalistic language environment data in a simple and unobtrusive manner, and the automated
processing software enables the assessment of ASD-specific vocal characteristics using completely objective measures.
The LENA Foundation is exploring other approaches that incorporate additional information that may be derived from recording data,
which has the potential to improve accuracy. In addition, the Foundation is seeking to increase the sample size, include a more diverse
sample set, as well as younger children with ASD, with the hope that the screen could be extended down to 18 months and perhaps
even younger.
1
In any classification problem, it is necessary to set a threshold probability value for detection of the target group of interest. This threshold
value determines not only the number of correct detections but also the number of false acceptances (false positives) and false rejections
(false negatives). There is a trade-off between these two types of error; for example, as the false positive rate decreases the false negative
rate increases. A generally accepted measure of classification performance is the EER, or the classification error using a threshold at which the
false positive rate equals the false negative rate. The lower the EER, the fewer classification errors overall.
Learn More - LLAS FAQs 