Ray.io

Website | Source | Tutorial

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI libraries for simplifying ML compute:

Learn more about Ray AI Libraries:

Data: Scalable Datasets for ML
Train: Distributed Training
Tune: Scalable Hyperparameter Tuning
RLlib: Scalable Reinforcement Learning
Serve: Scalable and Programmable Serving

Or more about Ray Core and its key abstractions:

Tasks: Stateless functions executed in the cluster.
Actors: Stateful worker processes created in the cluster.
Objects: Immutable values accessible across the cluster.

Learn more about Monitoring and Debugging:

Monitor Ray apps and clusters with the Ray Dashboard.
Debug Ray apps with the Ray Distributed Debugger.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations.

Install Ray with: pip install ray. For nightly wheels, see the Installation page.

Why Ray?

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

More Information

Older documents:

Libraries

New Libraries

This section contains libraries that are well-made and useful, but have not necessarily been battle-tested by a large userbase yet.

Models and Projects

Ray + LLM

veRL veRL: Volcano Engine Reinforcement Learning for LLM
FastChat Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
LangChain-Ray Examples on how to use LangChain and Ray
Aviary Ray Aviary - evaluate multiple LLMs easily
LLM-distributed-finetune Finetuning Large Language Models Efficiently on a Distributed Cluster, Uses Ray AIR to orchestrate the training on multiple AWS GPU instances.
LLMPerf - A library for validating and benchmarking LLMs (updated through 2024)

Reinforcement Learning

slime - A LLM post-training framework aiming at scaling RL.
muzero-general - A commented and documented implementation of MuZero based on the Google DeepMind paper (Schrittwieser et al., Nov 2019) and the associated pseudocode.
rllib-torch-maddpg - PyTorch implementation of MADDPG (Lowe et al.) in RLLib
MARLlib - a comprehensive Multi-Agent Reinforcement Learning algorithm library
VMAS - A vectorized differentiable simulator for Multi-Agent Reinforcement Learning benchmarking

Ray Data (Data Processing)

RayDP - Distributed data processing library on Ray by running Apache Spark on Ray. Seamlessly integrates with other Ray libraries for E2E data analytics and AI pipeline.
Google Cloud Platform Ray Preprocessing - Examples of Ray data preprocessing pipelines for model fine-tuning on GCP.

Ray Train (Distributed Training)

Ray Train Examples - Official Ray Train documentation with PyTorch, TensorFlow, and Hugging Face Accelerate examples for distributed training.
MinIO with Ray Train - Distributed training examples using Ray Train with MinIO object storage.

Ray Tune (Hyperparameter Optimization)

Ultralytics YOLO11 with Ray Tune - Efficient hyperparameter tuning for YOLO11 object detection models using Ray Tune.
Softlearning - Reinforcement learning framework for training maximum entropy policies, official implementation of Soft Actor-Critic algorithm using Ray Tune.
Flambe - ML framework to accelerate research and its path to production, integrates with Ray Tune.

Ray Serve (Model Serving)

LangChain Ray Serve - Deploy LangChain applications and OpenAI chains in production using Ray Serve.

Ray + JAX / TPU

Swarm-jax - Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
Alpa - Auto parallelization for large-scale neural networks using Jax, XLA, and Ray

Ray + Database

Balsa Balsa is a learned SQL query optimizer. It tailor optimizes your SQL queries to find the best execution plans for your hardware and engine.
RaySQL Distributed SQL Query Engine in Python using Ray
Quokka Open source SQL engine in Python

Ray + X (integration)

Ray MCP Server – Bridge AI assistants to Ray clusters; manage clusters, submit jobs, and monitor resources through a Model Context Protocol interface.
prefect-ray Prefect integrations with Ray
xgboost_ray Distributed XGBoost on Ray
Ray-DeepSpeed-Inference Run deepspeed on ray serve

Ray-Project

SkyPilot a framework for easily running machine learning workloads on any cloud through a unified interface
Exoshuffle-CloudSort the winning entry of the 2022 CloudSort Benchmark in the Indy category.

distributed computing

Fugue a unified interface for distributed computing that lets users execute Python, pandas, and SQL code on Ray without rewrites.
Daft is a fast, Pythonic and scalable open-source dataframe library built for Python and Machine Learning workloads.
Flower(flwr) is a framework for building federated learning systems. Uses Ray for scaling out experiments from desktop, single GPU rack, or multi-node GPU cluster.
Modin: Scale your pandas workflows by changing one line of code. Uses Ray for transparently scaling out to multiple nodes.
Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloads.

Ray AIR

Ray on Azure ML Turning AML compute into Ray cluster

Cloud Deployment

Ray on AWS - Official guide for launching Ray clusters on AWS with CloudWatch monitoring
Ray on GCP - Official guide for launching Ray clusters on Google Cloud Platform
Ray on Azure - Official guide for launching Ray clusters on Microsoft Azure

Misc

AutoGluon AutoML for Image, Text, and Tabular Data
Aws-samples Ray on Amazon SageMaker/EC2/EKS/EMR
KubeRay A toolkit to run Ray applications on Kubernetes
ray-educational-materials This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Metaflow-Ray An extension for Metaflow that enables seamless integration with Ray

Videos

Anyscale Academy & Official Tutorials

Anyscale YouTube Channel - Official YouTube channel with Ray tutorials, conference talks, and educational content
Anyscale Academy - Ray tutorials from Anyscale with accompanying videos on YouTube
Ray Crash Course - Introductory online class with video on Anyscale YouTube
Reinforcement Learning with Ray RLlib - Complete tutorial with video

Conference Talks

Ray Summit 2024 - Annual Ray conference with recorded sessions on YouTube (Sep 30 - Oct 2, 2024)
Ray Summit 2025 - Upcoming conference (Nov 3-5, 2025, San Francisco)

RLlib

Deep reinforcement learning at Riot Games by Ben Kasper - reinforcement learning for game development in production

Papers

This section contains papers focused on Ray (e.g. RAY-based library whitepapers, research on RAY, etc). Papers implemented in RAY are listed in the Models/Projects section.

Foundational Papers

Ray: A Distributed Framework for Emerging AI Applications (OSDI 2018) - The foundational paper presenting Ray's unified interface for task-parallel and actor-based computations. Demonstrates scaling beyond 1.8 million tasks per second.
Ray on arXiv - arXiv version of the foundational Ray paper

Tutorials and Blog Posts

2024-2025

Ray: Your Gateway to Scalable AI and Machine Learning Applications - Analytics Vidhya (March 2025) - Comprehensive guide to Ray's architecture and capabilities with practical project implementation
RAY: A Powerful Distributed Computing Framework for ML/AI - Spheron Network (June 2024) - Covers Ray's capabilities for scaling models and distributed computing
The Modern AI Stack: Ray - Medium (September 2024) - How Ray fits into the modern AI infrastructure
Understanding Iterations in Ray RLlib - tecRacer (February 2024) - Deep dive into RLlib's learning iterations
Ray Summit 2024: Breaking Through the AI Complexity Wall - Anyscale (2024) - Highlights from Ray Summit 2024, orchestrating 1M+ clusters per month
How Ray Helps Power ChatGPT - The New Stack - How OpenAI uses Ray for ChatGPT training coordination

Earlier Resources

Programming in Ray: Tips for first-time users - Berkeley RISE Lab
Hacker News Discussion - Community discussion about Ray
Load PyTorch Models 340 Times Faster with Ray - IBM
Writing Your First Distributed Python Application with Ray - Anyscale official tutorial

books

Learning Ray Learning Ray - Flexible Distributed Python for Machine Learning

course

RL course Applied Reinforcement Learning with RLlib
MLops course MLops course

cheatsheet

Ray design doc

Tags: ai distribution

Last modified 10 March 2026