Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder (Extract)
Discussion
These results demonstrate that a machine learning algorithm can discriminate between children that do and do not meet ASD surveillance criteria, among children with developmental concerns. Currently, the ADDM Network employs highly-trained clinicians to manually review each child's developmental evaluations (often multiple evaluations per child), requiring an average of 45 to 60 minutes per child. Therefore, if the system must review an increasing number of records, it will require proportional increases in the resources needed to complete this task. In contrast, an automated approach requires relatively fixed resources for nearly any amount of information, and offers the potential to improve the efficiency and timeliness of the surveillance system.
Using only the words and phrases contained in a child's records, the algorithm correctly predicted the clinician-assigned ASD case definition for 86.5% (kappa = 0.73) of the children captured by the surveillance system. This is slightly lower than the clinician inter-rater agreement observed for the overall 2010 ADDM Network (90.7%, kappa = 0.80).[14] Because the algorithm is trained on the clinician-assigned ratings, it is unlikely that agreement between the algorithm and a clinician would ever exceed inter-rater clinician agreement. On the other hand, the algorithm will have perfect inter-rater reliability, as it will always make the same classification for a given set of evaluations. An essential question is: what level of performance — if any — would be considered “acceptable” in order to trust the algorithm's predictions? Of note, the algorithm-clinician agreement was similar to the inter-rater agreement reported by two other groups doing similar ASD classification on the basis of health records (one reported a kappa of 0.73[21], and the other 88% agreement[22]).
The algorithm was more likely to misclassify children with certain characteristics. In particular, it was less sensitive to classifying ASD among children with fewer evaluations and those that were older when first evaluated. We also observed that the algorithm was more likely to misclassify children that underwent a secondary review by the ADDM clinicians (compared to those that did not undergo secondary review), suggesting these might be more difficult for the clinicians, as well. It might be possible to address some shortcomings by allowing the algorithm to consider the source or number of evaluations, or the age of the child at each evaluation. Alternately, the current model might serve as a useful “filter” to select the records that need manual review. As shown in Fig 1, the predictive values at the extreme ends of the range are quite high, with more misclassification in the middle. These scores could be used to identify records that need clinician reviews (e.g., a score of 0.50) versus those that are “safe bets” (e.g., scores over 0.80 or below 0.20). If, in the future, the surveillance system is able to electronically receive the contents of medical and educational evaluations, this type of “filter” could be immensely useful.
A previous study used an analogous approach using early intervention (birth to three years) records to predict which children would later be diagnosed with ASD.[23] The best-performing model from that study reported 91.4% precision (PPV) and 58.2% recall (sensitivity); the lower sensitivity is possibly due to a highly imbalanced ratio of ASD to non-ASD children. While this study had somewhat different goals from ours, the two studies suggest that text-based machine learning techniques may one day be useful in a variety of public health applications concerning ASD.
Other recent studies utilizing electronic health information have focused on using medical billing (ICD) codes to detect individuals with ASD.[24, 25] These approaches are likely well-suited for case-control studies, where PPV might be more important than sensitivity, but will not detect individuals with ASD that do not have ICD codes. Because the algorithm we developed does not consider ICD-9 codes (as special education records do not assign them), the two approaches could be used to jointly classify ASD when both ICD-9 codes and evaluation text are available. In the future it may be possible to train classification algorithms on ADDM data and distribute them to help identify individuals with ASD from electronic records.
Although these results are promising, additional work is needed to evaluate the utility of this approach for ongoing ASD surveillance. For instance, performance characteristics — such as NPV or specificity — could be different in other populations. We trained the algorithm on a single year of data from one ADDM site and tested it on the following year's data from the same site; we would need to evaluate whether similar performance could be achieved across ADDM sites or in other populations. We would also need to monitor performance so that it does not drift or degrade over time. In particular, the relatively recent changes to the ASD diagnosis in the DSM-5 could affect the terms used to describe ASD symptoms. Likewise, the surveillance case definition for the ADDM Network may change to reflect the DSM-5 criteria. For these reasons — and others — it is likely that any long-running system would require some level of continued manual review to assess the performance and quality of the system. Nevertheless, even a partially automated approach — in which a clinician might confirm or augment the algorithm's predictions — could result in a substantial reduction in required resources.
The ADDM clinicians currently code a variety of behavioral symptoms and produce much more information than a dichotomous case classification; it remains to be seen whether these methods could reliably classify specific symptoms in addition to the overall ASD classification. We plan to pursue much more granular and classification algorithms for specific symptoms or for different populations were our current performance was weakest (such as girls, children only seen after age 6, or children without an intellectual disability). It would also be useful to estimate how well the algorithm (and the ADDM methods in general) compare to other ASD classifications, such as in-person assessments. Quantifying this textual information in a reproducible way will provide novel opportunities to better understand how children are evaluated for ASD in typical community settings.
This study is based on a large, population-based surveillance system that has routinely performed ASD surveillance in metropolitan Atlanta for more than a decade. The ASD surveillance case definition uses a well-established protocol for ascertaining ASD from record review, including extensive documentation, training materials, and inter-rater reliability for this procedure. As a by-product of conducting surveillance, the ADDM Network generates information that is useful for training text-based ASD classification algorithms. With relatively small modifications, it could efficiently produce a large volume of very specific examples that could be used to identify particular symptoms or behaviors. Ultimately, the approach piloted in this study could be trained on a much larger sample representing a diversity of community providers and behavioral evaluations.
Conclusion
Public health surveillance systems are constantly challenged to become faster, better, or to provide the same information for lower cost.[26] We observed that an automated approach could predict — with high agreement — whether a child would meet ASD surveillance criteria. While there are many logistical issues to consider, these results hint at the potential for using machine learning approaches to identify ASD from unstructured text data.
Citation
Maenner MJ, Yeargin-Allsopp M, Van Naarden Braun K, Christensen DL, Schieve LA (2016) Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder. PLoS ONE 11(12): e0168224. doi:10.1371/journal.pone.0168224 Retrieved from http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0168224 on 10 Jan 2017. (link). Adapted and reproduced here under a CC BY 3.0 license.