GAMA: (General Automated Machine learning Assistant) An automated machine learning tool based on genetic programming.

GAMA is an AutoML package for end-users and AutoML researchers. It generates optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing (e.g. PCA, normalization) as well as a machine learning algorithm (e.g. Logistic Regression, Random Forests), with fine-tuned hyperparameter settings (e.g. number of trees in a Random Forest). To find these pipelines, multiple search procedures have been implemented. GAMA can also combine multiple tuned machine learning pipelines together into an ensemble, which on average should help model performance. At the moment, GAMA is restricted to classification and regression problems on tabular data. In addition to its general use AutoML functionality, GAMA aims to serve AutoML researchers as well. During the optimization process, GAMA keeps an extensive log of progress made. Using this log, insight can be obtained on the behavior of the search procedure.

Link: https://openml-labs.github.io/gama/master/

View Resource

H2O AutoML

H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Flow notebook/web interface, and works seamlessly with big data technologies like Hadoop and Spark. H2O provides implementations of many popular algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks, Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Link: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

View Resource

Hyperopt-Sklearn

Hyperopt-Sklearn (Hyperparameter optimization for Sklearn) is a Python library for hyperparameter-optimization-based model selection among machine learning algorithms in the Scikit-learn package. The main goal of Hyperopt-Sklearn is to automate and ease the process of hyperparameter tuning for machine learning models. It utilizes Bayesian optimization techniques to decrease the complexity of hyperparameter tuning and speed up the optimization process. It is a valuable tool for tuning hyperparameters and improving performance of Scikit-learn models without manual intervention.

Link: https://hyperopt.github.io/hyperopt-sklearn/

View Resource

LAMA: LightAutoML

LightAutoML is an open-source Python library aimed at automated machine learning. It is designed to be lightweight and efficient for various tasks with tabular, text data. LightAutoML provides easy-to-use pipeline creation that enables: automatic hyperparameter tuning, data processing; automatic typing, feature selection; automatic time utilization; automatic report creation; and easy-to-use modular scheme to create your own pipelines.

Link: https://lightautoml.readthedocs.io/en/latest/

View Resource

Ludwig: A low-code framework for building custom AI models like LLMs and other deep neural networks

Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks. The Ludwig allows you to build custom models with ease. A declarative YAML configuration file is all you need to train a state-of-the-art LLM on your data and its support for multi-task and multi-modality learning. You can also optimize for scale and efficiency, since it also provides automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. By supporting hyperparameter optimization, explainability, and rich metric visualizations, you retain full control of your models down to the activation functions. It is modular and extensible and is engineered for production (Docker, HuggingFace).

Link: https://ludwig.ai/latest/

View Resource

Machine Learning Essentials for Biomedical Data Science

An educational playlist (including 11 videos) covering the key essentials for using machine learning as part of a data science analysis pipeline. While topics are primarily framed around applications in biomedicine, this content is broadly applicable to other domains. This series was prepared at the Cedars Sinai Medical Center in Los Angeles by Dr. Ryan Urbanowicz of the Department of Computational Biomedicine.

View Resource

MLBox

MLBox is a powerful AutoML Python library that provides fast reading and distributed data preprocessing/cleaning/formatting, highly robust feature selection and leak detection, accurate hyperparameter optimization in high-dimensional space, state-of-the-art predictive models for classification and regression (Deep Learning, Stacking, LightGBM, etc.), and prediction with model interpretation.

Link: https://mlbox.readthedocs.io/en/latest/

View Resource

MLJAR- supervised: Automated Machine Learning Python package that works with tabular data

MLJAR- supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model. It is no black-box as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model). MLJAR- supervised will help you with:
(1) explaining and understanding your data,
(2) trying many different machine learning models,
(3) creating Markdown reports from analysis with details about all models,
(4) saving, re-running and loading the analysis and ML models.

Link: https://supervised.mljar.com/

View Resource

MLme: Machine Learning Made Easy

MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts by integrating four essential functionalities, namely data exploration, AutoML, CustomML, and visualization. MLme serves as a valuable resource that empowers researchers of all technical levels to leverage ML for insightful data analysis and enhance research outcomes. By simplifying and automating various stages of the ML workflow, it enables researchers to allocate more time to their core research tasks, thereby enhancing efficiency and productivity.

doi: 10.1101/2023.07.04.546825

View Resource