Simple, Transparent, End-to-end Automated Machine Learning Pipeline (STREAMLINE)

STREAMLINE is an end-to-end automated machine learning (AutoML) pipeline that empowers anyone to easily run, interpret, and apply a rigorous and customizable analysis for data mining or predictive modeling. Notably, this tool is currently limited to supervised learning on tabular, binary classification data but will be expanded as our development continues. The development of this pipeline focused on (1) overall automation, (2) avoiding and detecting sources of bias, (3) optimizing modeling performance, (4) ensuring complete reproducibility (under certain STREAMLINE parameter settings), (5) capturing complex associations in data (e.g. feature interactions), and (6) enhancing interpretability of output. Overall, the goal of this pipeline is to provide a transparent framework to learn from data as well as identify the strengths and weaknesses of ML modeling algorithms or other AutoML algorithms.

 

View Resource

Single-Cell RNA-Seq database for Alzheimer’s Disease (scREAD)

The Single-Cell RNA-Seq database for Alzheimer’s Disease (scREAD) dedicates to management of all the existing scRNA-Seq and snRNA-Seq data sets from the human postmortem brain tissue with AD and mouse models with AD pathology. It provides comprehensive analysis results for 73 data sets from 10 brain regions. These data sets include various types of data, such as control atlas construction, cell-type prediction,identification of differentially expressed genes, and identification of cell-type-specific regulons.

View Resource

SNOMED

SNOMED CT is one of a suite of designated standards for use in U.S. Federal Government systems for the electronic exchange of clinical health information and is also a required standard in interoperability specifications of the U.S. Healthcare Information Technology Standards Panel. The clinical terminology is owned and maintained by SNOMED International, a not-for-profit association.

View Resource

Study on Global Ageing and Adult Health (SAGE)

The Study on Global Ageing and Adult Health (SAGE) is part of an ongoing program of work to compile comprehensive longitudinal information on the health and well-being of adult populations and the ageing process. The core SAGE collects data on adults aged 18+ years, with an emphasis on populations aged 50+ years, from nationally representative samples in six countries: China, Ghana, India, Mexico, Russian Federation and South Africa. The study is composed of three stages. 

 

Wave 1 total sample size is over 40,000 individuals. 

 

Wave 2 data collection was completed in 2014/15 in five countries. Wave 2 data collection was released in the public domain at the end 2020. 

 

Wave 3 data collection was completed in March 2020.

 

View Resource

Survey of Healthy Ageing and Retirement in Europe (SHARE)

The Survey of Healthy Ageing and Retirement in Europe (SHARE) is a research infrastructure for studying the effects of health, social, economic and environmental policies over the life-course of European citizens and beyond. The SHARE contains various types of data, including data of participants’ health status, economic status, social status, psychological status, lifestyle, and biomarker. The study is led by the Munich Center for the Economics of Aging (MEA), which is part of the Max Planck Institute for Social Law and Social Policy in Germany, and funded by the European Commission and NIA. 

 

View Resource

The Alzheimer’s Knowledge Base (AlzKB)

The Alzheimer’s Knowledge Base (AlzKB) is an online database that integrates more than 20 different sources of knowledge about genes, pathways, drugs, and diseases to inform AI analyses. It provides comprehensive information on genetic variations related to Alzheimer’s Disease (AD). The database contains single-nucleotide polymorphisms (SNPs) data, insertion data, deletion data, and other genetic variation data. The database was developed by a research team at the University of California, Los Angeles, and is funded by grant R01 AG066833 from the National Institute on Aging (NIA), National Institutes of Health (NIH).

View Resource

The Alzheimer’s Disease Genetics Consortium (ADGC)

The Alzheimer’s Disease Genetics Consortium (ADGC) is a collaborative effort that brings together researchers from multiple institutions to study the genetics of Alzheimer’s disease. The primary goal of ADGC is to identify and understand genetic factors that contribute to the risk of developing Alzheimer’s disease and related dementias. By analyzing large-scale genomic data, ADGC aims to uncover genetic variants and mutations associated with the disease, which can lead to a better understanding of the underlying biological mechanisms and potential targets for therapeutic interventions.

The ADGC will provide the most recent and most comprehensive data available. The data sets available are listed on the ADGC Web site (link). These data are QC’ed by members of the ADGC-AC using a uniform process. As noted in the list of cohorts posted on the ADGC web site, some datasets require permission of the PI who contributed the dataset, and it is the responsibility of the SAG PI to seek permission from those PIs to use their data. If requested, imputed data will be provided using the most recent and largest imputation panel (e.g. TOPMed). A minimum phenotype dataset will be provided. If additional phenotypes are needed, the ADGC will work with the investigator to identify and acquire the needed data.

View Resource

The Alzheimer’s Disease Neuroimaging Initiative (ADNI)

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a longitudinal multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer’s disease (AD). Since its launch more than a decade ago, the landmark public-private partnership has made major contributions to AD research, enabling the sharing of data between researchers around the world.Data from several dementia studies complementary to ADNI are also available through the IDA. These include the DoD-ADNI study, which measures the effects of traumatic brain injury and post-traumatic stress disorder on Alzheimer’s disease in veterans, and the AIBL study (Australian Imaging Biomarkers and Lifestyle Study of Aging).  

View Resource

The Australian Imaging Biomarkers & Lifestyle Flagship Study of Ageing (AIBL)

Australian Imaging Biomarkers & Lifestyle Flagship Study of Ageing (AIBL) is a large-scale research initiative to discover which biomarkers, cognitive characteristics, and health and lifestyle factors determine subsequent development of symptomatic Alzheimer’s Disease (AD). It contains prospective longitudinal study of cognition from more than 4.5 years and biomarker, cognitive, clinical, and imaging data of more than 1000 participants including patients with Alzheimer’s Disease (AD), mild cognitive impairment (MCI) and healthy volunteers. Since its launch in 2006, AIBL has been widely used in studies related to  research on early detection of AD, identification of important biomarkers for AD, and new therapies and treatments for AD. The AIBL data are available only to authorized users, but a subset of data (subjects with MR and PET) is available through the ADNI (Alzheimer’s Disease Neuroimaging Initiative).

 

View Resource