Anyscale connector

Run Ray on top of your warehouse data.

Data Panda lands your operational and warehouse data in one place and ships it to an Anyscale Ray cluster. Train models, fine-tune LLMs, run batch inference and serve online endpoints, all on the data your business already generates.

About Anyscale

Managed Ray, built by the people who created it.

Anyscale was founded in 2019 by Robert Nishihara, Philipp Moritz and Ion Stoica, the team that built Ray at UC Berkeley's RISELab between 2016 and 2017. Ion Stoica also co-founded Databricks and chairs Anyscale today; Philipp Moritz is CTO and Robert Nishihara is the third co-founder. Keerti Melkote took over as CEO in 2024.

The product is the Anyscale Platform: a managed, multi-cloud runtime for Ray that runs on AWS, GCP, Azure, CoreWeave and Nebius. It wraps the Ray libraries (Ray Data for distributed data processing, Ray Train for model training, Ray Serve for online inference, Ray Tune for hyperparameter search and RLlib for reinforcement learning) with workspaces, jobs, services, observability, governance and priority-aware GPU scheduling. OpenAI runs Ray to coordinate training of its largest models, including ChatGPT, and Anyscale's own customer list includes Coinbase, Character.ai, Canva, Notion, Runway, Grab, Recursion, TripAdvisor, TwelveLabs, Riot Games and Physical Intelligence. The open-source Ray project has over 41,000 GitHub stars and more than 500 million downloads.

What your Anyscale data is for

What you get once Anyscale is connected.

Cluster spend joined to model output

GPU-hours and Anyscale spend per job, next to the model the job produced and the business workflow that uses it.

Spend per Anyscale workspace, job and service, joined to the team and the model artifact each run produced
GPU usage per cluster broken down by Ray Data preprocessing, Ray Train epochs and Ray Serve replicas
Ray Train job success rate and time-to-completion, plotted against dataset version and model size

Predictions back into the operational stack

Ray Serve scores or batch predictions written straight into the systems your team works in.

Lead-scoring model trained on Ray Train, served on Ray Serve, score written to HubSpot or Salesforce on every contact update
Churn predictions batched nightly with Ray Data and Ray Train, dropped into the CRM as a tag and into the warehouse as a column
Demand-forecast outputs from a Ray Tune sweep written to the ERP planning table the buyer already opens

Train and fine-tune on your own data

Pull warehouse content into a Ray cluster, fine-tune or train, write the artifact to model storage.

LLM fine-tuning on internal documents, support tickets and product copy, with Ray Train coordinating the GPUs
RLHF and post-training loops on top of an open-weight model with the customer-feedback rows from the warehouse
Recommendation and ranking models trained on click and order history, with Ray Tune searching the hyperparameter space

Internal apps that call your own model

Custom tools that read warehouse data and call a Ray Serve endpoint instead of a public LLM API.

Triage assistant that classifies support cases against a model trained on your own ticket history
Multimodal product-tagging app that runs a vision model on Ray Serve over the catalogue images in S3
Internal embedding service that re-indexes warehouse documents on a schedule using Ray Data and a custom model

Use cases

Use cases we deliver with Anyscale data.

A list of concrete reports, automations and AI features we have built on Anyscale data. Pick the one that matches your situation.

Distributed model trainingRay Train spreads PyTorch or XGBoost training across GPU and CPU nodes on AWS, GCP, Azure, CoreWeave or Nebius.

LLM fine-tuning on internal dataOpen-weight LLM tuned on warehouse documents, tickets and product copy via Ray Train.

Batch inference at scaleRay Data and Ray Train score millions of warehouse rows nightly, output written back as a column or tag.

Online serving with Ray ServeTrained model exposed as an autoscaling HTTP endpoint, called from internal apps and CRM workflows.

Hyperparameter search with Ray TuneRay Tune sweeps over learning rate, batch size and architecture choices on the same managed cluster.

RLHF and post-training loopsReinforcement-learning fine-tuning on customer-feedback rows using RLlib or community frameworks like SkyRL.

Multimodal data preparationRay Data pipelines for video, image, text and audio at terabyte scale before training kicks off.

Embedding generation in batchRe-embed warehouse documents on schedule with a custom model, write vectors to your search index.

Multi-cloud and burst capacitySame Ray code runs on AWS, GCP, Azure, CoreWeave or Nebius without rewrites; bursting to whichever has GPUs.

Cost and usage reportingGPU-hours and spend per job joined to the model artifact and the workflow that consumes the predictions.

Real business questions

Answers you will finally get.

Which Ray jobs are spending the most, and is the model output being used?

GPU-hours and Anyscale spend per job and per workspace, joined to the model artifact each job produced and to the downstream workflow that calls the model. Surfaces the weekly fine-tune burning the bulk of the cluster budget while its model is still pinned to a stale version, next to the lighter Ray Tune sweep that produced the recommender currently running in production.

Are our Ray Serve endpoints still earning their GPUs?

Ray Serve request volume, latency and error rate per endpoint, joined to GPU usage and cost. Catches the always-on endpoint serving five calls a day on two reserved GPUs, and the busy endpoint that is autoscaling well past its budget because every internal app added a retry loop.

Is the data we are feeding the cluster fresh enough?

Lineage from the source system through the warehouse table to the dataset version a Ray Train job consumed, with timestamps at every hop. Catches the fine-tuning run that has been training on a feature table that stopped refreshing after a pipeline failure two weeks ago, which is why the model keeps regressing on last month's behaviour.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

GPU-hours and Anyscale spend per team, per workspace and per business unit, joined to the workflow each model serves. The AI line on the budget moves from a single Anyscale invoice to a number sitting next to the predictions consumed in CRM, ERP and the warehouse.

For operations

Ray Serve endpoint health, latency and cost per call, joined to the internal app calling the endpoint. Lets the team retire the always-on endpoint that nobody calls and rightsize the one that every retry loop is hammering.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your Anyscale data lives.

Power BI Microsoft

Fabric Microsoft

Snowflake Data warehouse

BigQuery Google

Tableau Visualisation

Excel Sheets & pivots

Three steps

From Anyscale to answers in three steps.

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

Anyscale connector configured and running
Warehouse set up in your cloud account
Clean access for your Power BI, Fabric or Tableau team
Documentation on what's in the data model
Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

Everything in Self-serve
Dashboards built to the questions your team actually asks
Automations between your systems
AI workflows scoped to real tasks your team runs
Custom apps where a dashboard does not cut it
Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

What is the difference between Anyscale and the open-source Ray project?

Ray is the open-source distributed compute framework: Python APIs plus the libraries Ray Data, Ray Train, Ray Serve, Ray Tune and RLlib. You can run Ray yourself on your own Kubernetes or VMs. Anyscale is the company behind Ray and runs the managed Anyscale Platform: a multi-cloud runtime for Ray with workspaces, jobs, services, observability, governance and priority-aware GPU scheduling on AWS, GCP, Azure, CoreWeave and Nebius. Same Ray code, less cluster operations work.

Which clouds does Anyscale run on?

Anyscale runs on AWS, GCP, Azure, CoreWeave and Nebius. The same Ray code is portable across them, and the platform supports multi-cloud GPU pooling so a job can burst to whichever provider has capacity. The cluster lives in your own cloud account in customer-hosted mode, which is the typical setup for teams that need data and compute to stay in their own VPC.

What workloads do teams typically run on Anyscale?

Distributed model training (Ray Train), batch inference and multimodal data preparation (Ray Data), online inference (Ray Serve), hyperparameter search (Ray Tune) and reinforcement learning including RLHF post-training (RLlib and community frameworks like SkyRL and veRL). The same cluster usually carries multiple of these in production, which is why workspace-level cost and usage reporting matters more than per-job timing.

GDPR-compliant

Data stays in the EU

You own the warehouse

A first deliverable live in four to six weeks.

We review your Anyscale setup and the systems around it. Together we pick the first thing worth building.

Book a call See our other connectors