Introduction

Snap Machine Learning (Snap ML) is a library for training generalized linear models. It is being developed at the IBM Research - Zürich laboratory with the vision to remove training time as a bottleneck for machine learning applications. Snap ML supports a large number of classical machine learning models and scales gracefully to data sets with billions of examples and/or features. It offers distributed training, GPU acceleration and supports sparse data structures.

"With Snap ML you can train your machine learning model faster than you can snap your fingers!"

The Snap ML library offers two different packages:

pai4sk-snapml

This package offers local, single-node as well as MPI-based distributed training. The library is exposed to the user via a sklearn-like python interface.

The local version of Snap ML is designed to run on a single machine. It targets small to medium scale data that can fit inside the memory of a single machine. snap-ml-local offers GPU acceleration and supports sparse data structures.

There are two ways to make use of this library. The first approach is to make use of Scikit-learn compatible pai4sk APIs which can seamlessly integrates into existing Python application. In IBM® Watson Machine Learning Community Edition (WML CE) 1.7.0, this module is built upon scikit-learn 0.22.1 library and can be used as a replacement for scikit-learn. This module will automatically fall back to sklearn/CPU based algorithms when snap-ml doesn’t support an API. The second approach is to invoke SnapML APIs directly where we support more accelerated and distributed ML algorithms compared to the former. Refer to the API Documentation for the current set of GPU accelerated APIs exposed by pai4sk.

It also offers distributed training of models across a cluster of machines. This enables training on large scale datasets that exceed the memory capacity of a single machine. It offers GPU acceleration and supports sparse data structures. The library is exposed to the user via a sklearn-like python interface.

snap-ml-spark

Similar to distributed pai4sk-snapml, the snap-ml-spark package offers distributed training of models across a cluster of machines. The library is exposed to the user via a spark.ml like interface and can seamlessly be integrated into existing pySpark application.

References

Overview