modeling – Page 2 – Penn AI Tech

GAMA: (General Automated Machine learning Assistant) An automated machine learning tool based on genetic programming.

July 22, 2024 by Elizabeth

GAMA is an AutoML package for end-users and AutoML researchers. It generates optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing (e.g. PCA, normalization) as well as a machine learning algorithm (e.g. Logistic Regression, Random Forests), with fine-tuned hyperparameter settings (e.g. number of trees in a Random Forest). To find these pipelines, multiple search procedures have been implemented. GAMA can also combine multiple tuned machine learning pipelines together into an ensemble, which on average should help model performance. At the moment, GAMA is restricted to classification and regression problems on tabular data. In addition to its general use AutoML functionality, GAMA aims to serve AutoML researchers as well. During the optimization process, GAMA keeps an extensive log of progress made. Using this log, insight can be obtained on the behavior of the search procedure.

Link: https://openml-labs.github.io/gama/master/

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

H2O AutoML

July 22, 2024 by Elizabeth

H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Flow notebook/web interface, and works seamlessly with big data technologies like Hadoop and Spark. H2O provides implementations of many popular algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks, Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), Cox Proportional Hazards, K-Means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Link: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

Hyperopt-Sklearn

July 22, 2024 by Elizabeth

Hyperopt-Sklearn (Hyperparameter optimization for Sklearn) is a Python library for hyperparameter-optimization-based model selection among machine learning algorithms in the Scikit-learn package. The main goal of Hyperopt-Sklearn is to automate and ease the process of hyperparameter tuning for machine learning models. It utilizes Bayesian optimization techniques to decrease the complexity of hyperparameter tuning and speed up the optimization process. It is a valuable tool for tuning hyperparameters and improving performance of Scikit-learn models without manual intervention.

Link: https://hyperopt.github.io/hyperopt-sklearn/

View Resource

AutoML | Software | Technology: Tools, Hardware, and Software

LAMA: LightAutoML

July 22, 2024 by Elizabeth

LightAutoML is an open-source Python library aimed at automated machine learning. It is designed to be lightweight and efficient for various tasks with tabular, text data. LightAutoML provides easy-to-use pipeline creation that enables: automatic hyperparameter tuning, data processing; automatic typing, feature selection; automatic time utilization; automatic report creation; and easy-to-use modular scheme to create your own pipelines.

Link: https://lightautoml.readthedocs.io/en/latest/

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

Ludwig: A low-code framework for building custom AI models like LLMs and other deep neural networks

July 22, 2024 by Elizabeth

Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks. The Ludwig allows you to build custom models with ease. A declarative YAML configuration file is all you need to train a state-of-the-art LLM on your data and its support for multi-task and multi-modality learning. You can also optimize for scale and efficiency, since it also provides automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. By supporting hyperparameter optimization, explainability, and rich metric visualizations, you retain full control of your models down to the activation functions. It is modular and extensible and is engineered for production (Docker, HuggingFace).

Link: https://ludwig.ai/latest/

View Resource

Machine Learning | Internal | Training Resources | All Resources

Machine Learning Essentials for Biomedical Data Science

March 5, 2023 by Ray

An educational playlist (including 11 videos) covering the key essentials for using machine learning as part of a data science analysis pipeline. While topics are primarily framed around applications in biomedicine, this content is broadly applicable to other domains. This series was prepared at the Cedars Sinai Medical Center in Los Angeles by Dr. Ryan Urbanowicz of the Department of Computational Biomedicine.

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

ML-Plan

July 22, 2024 by Elizabeth

ML-Plan is a Java-based free software library for AutoML and provides a tool to optimize machine learning pipelines in WEKA or Sklearn. It is one of the functionalities of AILibs, a modular collection of Java libraries related to automated decision making.

Link: https://starlibs.github.io/AILibs/projects/mlplan/

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

MLBox

July 22, 2024 by Elizabeth

MLBox is a powerful AutoML Python library that provides fast reading and distributed data preprocessing/cleaning/formatting, highly robust feature selection and leak detection, accurate hyperparameter optimization in high-dimensional space, state-of-the-art predictive models for classification and regression (Deep Learning, Stacking, LightGBM, etc.), and prediction with model interpretation.

Link: https://mlbox.readthedocs.io/en/latest/

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

MLJAR- supervised: Automated Machine Learning Python package that works with tabular data

July 8, 2024 by Elizabeth

MLJAR- supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model. It is no black-box as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model). MLJAR- supervised will help you with:
(1) explaining and understanding your data,
(2) trying many different machine learning models,
(3) creating Markdown reports from analysis with details about all models,
(4) saving, re-running and loading the analysis and ML models.

Link: https://supervised.mljar.com/

View Resource

Software | AutoML | Technology: Tools, Hardware, and Software

MLme: Machine Learning Made Easy

July 22, 2024 by Elizabeth

MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts by integrating four essential functionalities, namely data exploration, AutoML, CustomML, and visualization. MLme serves as a valuable resource that empowers researchers of all technical levels to leverage ML for insightful data analysis and enhance research outcomes. By simplifying and automating various stages of the ML workflow, it enables researchers to allocate more time to their core research tasks, thereby enhancing efficiency and productivity.

doi: 10.1101/2023.07.04.546825

View Resource