Facilitating Early Identification by Detecting ASD with Audio Recordings of a Child
We created the
LENA Language and Autism Screen (LAS) to facilitate the early identification and intervention that are so vital to optimizing
treatment of autism spectrum disorders (ASD).
An average of
1 in 150 children in the United States has ASD, according to the CDC, and the American Academy of Pediatrics (AAP) recommends
that pediatricians screen all children for ASD at 18- and 24-month checkups. Studies have shown that early intervention is one
of the most effective means of treating ASD. Despite these well-publicized facts and recommendations, only 8 percent of primary
care pediatricians routinely screen their patients for ASD, according to a survey conducted in 2004. There is a great need for
efficient ASD screening tools.
One is the LAS,
a revolutionary and easy-to-use new ASD screen tool. With a data collection kit sent from the LENA Foundation, a parent can use
a digital recorder worn by the child to collect daylong audio recording of the child in the natural environment and then send
back the audio data for processing.
There are many
advantages to collecting data in the natural environment instead of in a clinical setting or during a visit to a doctor's
office. First, it is easy to do and can save parents time and effort; second, the natural recording can record the true behavior
of a child while an unfamiliar clinical environment may make the child feel uncomfortable and behave differently; third, the
observation time during a clinical visit is usually quite short while the audio recording can encompass a whole day; fourth,
either the direct observation by a doctor or use of an indirect parent questionnaire involves human subjectivity while the audio
data processing is machine based and objective; fifth, the supercomputer can work tirelessly and is available anytime.
It should be noted
that the LAS is not a do-it-yourself (DIY) ASD screen. The parents' role is merely in assisting with the data collection. The
audio recordings that are sent back to us are processed by a supercomputer using computational models built on thousands of
daylong recordings from hundreds of children and their families. The LAS uses pattern recognition models to identify the child
sounds in a recording and then uses the open-source Sphinx adult phone speech recognition system to decompose child sounds into
phone-like units and obtain associated composition information (i.e., the percentage of each different sound category). We have
found that such composition contains rich discriminant information for ASD detection. More recently, we used the clustering of a
child's sounds as another way of decomposition that is directly derived from child sounds and can provide different information. The
combination of these two methods has given the best performance achieved so far—91 percent accuracy with the same false positive and false
negative rates.
Our new findings are based on a data set of 76 typically developing children (712 recordings), 30 children with language delay but without
autism (290 recordings), and 79 children formally diagnosed with autism of various symptom severities. The ASD sample was recruited nationwide,
and families were required to provide documented confirmation of the ASD diagnosis from a professional or team of professionals.
Figure 1 and Figure 2 below help explain the discriminant information contained in child sound composition. Figure 1 shows the age-normalized
percentage distribution (the typically developing group has a 0-mean and unit-variance) for the phone-like sounds of AW (the vowel sound in
the word "cow") and ER (the vowel sound in the word "hurt"). There is some distinction among the three groups of children, but the overlap is
also large. Linear Discriminant Analysis (LDA) is used to exploit all discrimination information among all composition features. There are 46
features from the adult-phone model and 63 features from the child clusters. The purpose of LDA is to find a linear combination of more than
100 features so that the resulting single scale can best separate the ASD group from the other two groups. Meanwhile, Figure 2 shows how all
children are distributed in the resulting scale, which is displayed as a stacked histogram with the vertical axis representing the number of
children within a small bin in the scale. It shows a good separation with small overlap between ASD and others.
Figure 1. Example of Percentage Distribution for Phone-Like Sounds

Figure 2. Stacked Histogram in the Scale After LDA Transform

Generally speaking, there is
no fundamental difference between the LAS, which analyzes audio data and produces visual results, and other complex medical devices such as
MRIs or CT scans, where physical principles, signal processing, and computational technologies are used to analyze data and produce an image
for detection and treatment.
ASD is a spectrum disorder
with diverse symptoms, characterized by functional impairments in social interaction and communication as well as restricted and repetitive
behavior. Although purely audio based, we believe that the LAS is capable of capturing, to one degree or another, information related to these
different types of symptoms. We postulate that children with autism may produce certain types of sounds more often and other sounds less
frequently than other children, resulting in a composition difference, due to restricted and repetitive behaviors, articulatory motor
abnormalities, differences in attention, language development issues, social interaction issues, and other conditions. As long as a child can
produce sounds, not necessarily words, the LAS can be used.
We are excited about our new
findings on this fully automatic way of ASD detection. We are continually collecting more data. We believe that there are other audio-based
behavior modeling and data analysis techniques to be discovered, and that better performance and reliability will be achieved in the future.