Simple, Transparent, End-to-end Automated Machine Learning Pipeline (STREAMLINE)

STREAMLINE is an end-to-end automated machine learning (AutoML) pipeline that empowers anyone to easily run, interpret, and apply a rigorous and customizable analysis for data mining or predictive modeling. Notably, this tool is currently limited to supervised learning on tabular, binary classification data but will be expanded as our development continues. The development of this pipeline focused on (1) overall automation, (2) avoiding and detecting sources of bias, (3) optimizing modeling performance, (4) ensuring complete reproducibility (under certain STREAMLINE parameter settings), (5) capturing complex associations in data (e.g. feature interactions), and (6) enhancing interpretability of output. Overall, the goal of this pipeline is to provide a transparent framework to learn from data as well as identify the strengths and weaknesses of ML modeling algorithms or other AutoML algorithms.

 

View Resource

TransmogrifAI

TransmogrifAI is an end-to-end Auto-ML library for structured data written in Scala that runs on top of Apache Spark, an open-source unified analytics engine for large-scale data processing. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse.

For automation, TransmogrifAI has numerous Transformers and Estimators that make use of Feature abstractions to automate feature engineering, feature validation, and model selection.

For modularity and reuse, TransmogrifAI enforces a strict separation between ML workflow definitions and data manipulation, ensuring that code written using TransmogrifAI is inherently modular and reusable.

For compile-time type-safety, machine learning workflows built using TransmogrifAI are strongly typed. This means developers get to enjoy the many benefits of compile-time type safety, including code completion during development and fewer runtime errors.

For transparency, model insights leverage stored feature metadata and lineage to help debug models while providing insights to the end user, making machine learning models less of a black box.

Link: https://transmogrif.ai/

View Resource

Tree-based Pipeline Optimization Tool (TPOT)

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data. Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there. TPOT is built on top of scikit-learn, so all of the code it generates should look familiar… if you’re familiar with scikit-learn, anyway.

 

View Resource

Xcessiv

Xcessiv is an open-source, web-based application developed using Python and Javascript for automating and visualizing the model selection process, hyperparameter tuning, and feature extraction in machine learning. It provides a user-friendly interface for managing and executing experiments across multiple algorithms and datasets. Xcessiv employs models from the Scikit-learn package, supports parallel hyperparameter searches using Bayesian optimization, and enables easy management and comparison of hundreds of different model-hyperparameter combinations, easy stack ensemble creation, and automated ensemble construction. It can also export created stacked ensembles as a standalone Python file to support multiple levels of stacking.

Link: https://xcessiv.readthedocs.io/en/stable/

View Resource