MLJAR- supervised: Automated Machine Learning Python package that works with tabular data

MLJAR- supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model. It is no black-box as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model). MLJAR- supervised will help you with:
(1) explaining and understanding your data,
(2) trying many different machine learning models,
(3) creating Markdown reports from analysis with details about all models,
(4) saving, re-running and loading the analysis and ML models.

Link: https://supervised.mljar.com/

View Resource

MLme: Machine Learning Made Easy

MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts by integrating four essential functionalities, namely data exploration, AutoML, CustomML, and visualization. MLme serves as a valuable resource that empowers researchers of all technical levels to leverage ML for insightful data analysis and enhance research outcomes. By simplifying and automating various stages of the ML workflow, it enables researchers to allocate more time to their core research tasks, thereby enhancing efficiency and productivity.

doi: 10.1101/2023.07.04.546825

View Resource

Multifactor Dimensionality Reduction (scikit-MDR)

A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. This project is still under active development and we encourage you to check back on this repository regularly for updates. MDR is an effective feature construction algorithm that is capable of modeling higher-order interactions and capturing complex patterns in data sets. MDR currently only works with categorical features and supports both binary classification and regression problems. We are working on expanding the algorithm to cover more problem types and provide more convenience features.

 

View Resource

PYCARET: An open-source, low-code machine learning library in Python

PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the hypothesis to insight cycle time in an ML experiment. It enables data scientists to perform end-to-end experiments quickly and efficiently. With PyCaret, you spend less time coding and more time on analysis. In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to perform complex machine learning tasks with only a few lines of code. PyCaret is simple and easy to use.

Link: https://pycaret.org/
Youtube Link: https://www.youtube.com/channel/UCxA1YTYJ9BEeo50lxyI_B3g

View Resource

RECIPE

RECIPE (REsilient ClassifIcation Pipeline Evolution) is an AutoML framework based on a grammar-based genetic programming algorithm that builds customized classification pipelines. The framework is flexible enough to receive different grammars and can be easily extended to other machine learning tasks. It overcomes the drawbacks of previous evolutionary-based frameworks, such as generating invalid individuals, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar.

Link: https://laic-ufmg.github.io/Recipe/docs/

View Resource

Relief-based Algorithm Training Environment (REBATE)

This package includes a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. These Relief-Based algorithms (RBAs) are designed for feature weighting/selection as part of a machine learning pipeline (supervised learning). Presently this includes the following core RBAs: ReliefF, SURF, SURF*, MultiSURF*, and MultiSURF. Additionally, an implementation of the iterative TuRF mechanism and VLSRelief is included. It is still under active development and we encourage you to check back on this repository regularly for updates. These algorithms offer a computationally efficient way to perform feature selection that is sensitive to feature interactions as well as simple univariate associations, unlike most currently available filter-based feature selection methods. The main benefit of Relief algorithms is that they identify feature interactions without having to exhaustively check every pairwise interaction, thus taking significantly less time than exhaustive pairwise search.

 

View Resource

Semi-automated Term Harmonization Pipeline

This repository includes a set of Python-based Jupyter notebooks that comprise a semi-automated term harmonization pipeline applied to harmonize medical history terms across 28 clinical trials of pulmonary arterial hypertension. These notebooks pair with the paper ‘A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials’. Below, we offer an overview of these pipelines and provide guidance for users on how to adapt these notebooks to their own target harmonization tasks.

 

View Resource

Simple, Transparent, End-to-end Automated Machine Learning Pipeline (STREAMLINE)

STREAMLINE is an end-to-end automated machine learning (AutoML) pipeline that empowers anyone to easily run, interpret, and apply a rigorous and customizable analysis for data mining or predictive modeling. Notably, this tool is currently limited to supervised learning on tabular, binary classification data but will be expanded as our development continues. The development of this pipeline focused on (1) overall automation, (2) avoiding and detecting sources of bias, (3) optimizing modeling performance, (4) ensuring complete reproducibility (under certain STREAMLINE parameter settings), (5) capturing complex associations in data (e.g. feature interactions), and (6) enhancing interpretability of output. Overall, the goal of this pipeline is to provide a transparent framework to learn from data as well as identify the strengths and weaknesses of ML modeling algorithms or other AutoML algorithms.

 

View Resource

TransmogrifAI

TransmogrifAI is an end-to-end Auto-ML library for structured data written in Scala that runs on top of Apache Spark, an open-source unified analytics engine for large-scale data processing. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse.

For automation, TransmogrifAI has numerous Transformers and Estimators that make use of Feature abstractions to automate feature engineering, feature validation, and model selection.

For modularity and reuse, TransmogrifAI enforces a strict separation between ML workflow definitions and data manipulation, ensuring that code written using TransmogrifAI is inherently modular and reusable.

For compile-time type-safety, machine learning workflows built using TransmogrifAI are strongly typed. This means developers get to enjoy the many benefits of compile-time type safety, including code completion during development and fewer runtime errors.

For transparency, model insights leverage stored feature metadata and lineage to help debug models while providing insights to the end user, making machine learning models less of a black box.

Link: https://transmogrif.ai/

View Resource

Tree-based Pipeline Optimization Tool (TPOT)

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data. Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there. TPOT is built on top of scikit-learn, so all of the code it generates should look familiar… if you’re familiar with scikit-learn, anyway.

 

View Resource