Ray

(Moritz et al., 2018) PDF

Ray unifies task-parallel and actor programming models in a single dynamic task graph and employs a scalable architecture enabled by the global control store and a bottom-up distributed scheduler. The programming flexibility, high throughput, and low latencies simultaneously achieved by this architecture is particularly important for emerging artificial intelligence work- loads, which produce tasks diverse in their resource requirements, duration, and functionality.

Learning Ray by Max Pumperla, Edward Oakes, and Richard Liaw. ( PDF ).

In short, Ray sets up and manages clusters of computers so that you can run distributed tasks on them. A Ray Cluster consists of nodes that are connected to each other via a network. You program against the so-called driver, the program root, which lives on the head node. The driver can run jobs, a collection of tasks, that are run on the nodes in the cluster. Specifically, the individual tasks of a job are run on worker processes on worker nodes.

This cluster can also be local, on your laptop. The default number of worker processes is the number of CPUs available on your machine.

Github - https://github.com/ray-project/ray

$ pip install "ray[rllib, serve, tune]==2.10.0"

import ray
ray.init()

Libraries:

data processing — apache-arrow
model training — TensorFlow, pytorch
hyperparameter tuning
model serving

News:

Amazon’s Exabyte-Scale Migration from Apache Spark to Ray on Amazon EC2 | AWS Open Source Blog

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M. I., & Stoica, I. (2018). Ray: A Distributed Framework for Emerging AI Applications.

btbytes.com

Ray

Graph View

Backlinks