Apache SINGA


Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

History

The was initiated by the at National University of Singapore in 2014, in collaboration with the database group of Zhejiang University, in order to support complex analytics at scale, and make database systems more intelligent and autonomic. It focused on distributed deep learning by partitioning the model and data onto nodes in a cluster and parallelize the training. The prototype was accepted by Apache Incubator in March 2015, and graduated as a top-level project in October 2019. Seven versions have been released as shown in the following table. Since V1.0, SINGA is general to support traditional machine learning models such as logistic regression. Companies like NetEase, , and are using SINGA for their applications, including and .

Software Stack

SINGA's software stack includes three major components, namely, core, IO and model. The following figure illustrates these components together with the hardware. The core component provides memory management and tensor operations; IO has classes for reading data from disk and network; The model component provides data structures and algorithms for machine learning models, e.g., layers for neural network models, optimizers/initializer/metric/loss for general machine learning models.

Benchmark for Distributed training

Workload: we use a deep convolutional neural network, as the application. ResNet-50 has 50 convolution layers for image classification. It requires 3.8 GFLOPs to pass a single image through the network. The input image size is 224x224.
Hardware: we use p2.8xlarge instances from AWS, each of which has 8 Nvidia Tesla K80 GPUs, 96 GB GPU memory in total, 32 vCPU, 488 GB main memory, 10 Gbit/s network bandwidth.
Metric: we measure the time per iteration for different number of workers to evaluate the scalability of SINGA. The batch size is fixed to be 32 per GPU. Synchronous training scheme is applied. As a result, the effective batch size is $32N$, where N is the number of GPUs. We compare with a popular open source system which uses the parameter server topology. The first GPU is selected as the server. In the following figure, bars are for the throughput and lines are for the communication cost.

Rafiki

Rafiki is a sub module of SINGA for providing machine learning analytics service.

Using SINGA

To get started with SINGA, there are some available as Jupyter notebooks. The tutorials cover the following:
There is also an online about SINGA.