GPU Cloud

The AI developer
cloud, on demand.

Pods for building. Clusters for scaling. Reserved capacity for shipping. B300, B200, H200, H100, and A100 — billed by the second, NVLink-ready, in 12 regions.

See the lineup
<90s
Provisioning
12
Regions
99.99%
Uptime SLA
1s
Billing increment
GPU cloud infrastructure illustration

The lineup

Every NVIDIA class, ready to deploy

From Ada Lovelace inference at $1.49 / GPU·hr to Blackwell Ultra B300 for frontier training — every node is NVLink-capable, NVMe-backed, and billed by the second.

01

B300

Blackwell Ultra · 288 GB HBM3e

  • Performance14 PFLOPS
  • Bandwidth8 TB/s

From

$7.00/GPU·hr

Limited

Deploy
02

B200

Blackwell · 192 GB HBM3e

  • Performance10 PFLOPS
  • Bandwidth8 TB/s

From

$5.49/GPU·hr

Limited

Deploy
03

H200 SXM

Hopper · 141 GB HBM3e

  • Performance3,958 TFLOPS
  • Bandwidth4.8 TB/s

From

$3.99/GPU·hr

Limited

Deploy
04

H100 SXM

Hopper · 80 GB HBM3

  • Performance3,958 TFLOPS
  • Bandwidth3.35 TB/s

From

$2.99/GPU·hr

In stock

Deploy
05

H100 NVL

Hopper · 94 GB HBM3

  • Performance3,958 TFLOPS
  • Bandwidth3.9 TB/s

From

$2.59/GPU·hr

In stock

Deploy
06

A100 SXM

Ampere · 80 GB HBM2e

  • Performance312 TFLOPS bf16
  • Bandwidth2 TB/s

From

$1.89/GPU·hr

In stock

Deploy
07

L40S

Ada Lovelace · 48 GB GDDR6

  • Performance733 TFLOPS
  • Bandwidth864 GB/s

From

$1.49/GPU·hr

In stock

Deploy

Reserved capacity available · 1mo+ commitments

Talk to sales

How it works

From zero to inference in four steps

No replatforming. No lock-in. No hyperscaler tax. Bring your container, your framework, your code — we handle the rest.

01

Spin up

Pick a GPU, pick a region, pick a base image. Pod is ready with SSH and a public IP in under 90 seconds.

02

Build

Train, fine-tune, or batch-process. Your containers, your framework, your weights — persistent NVMe volumes follow your jobs.

03

Deploy

Push to production with templated inference endpoints, blue-green rollouts, and built-in TLS + auto-scaling.

04

Scale

Single pod to thousand-GPU clusters across 12 regions. Reserve capacity for predictable load, burst on demand for spikes.

Workloads

Picked for the work you actually do

Not sure which GPU to start with? Match the workload to the silicon — and start with a single node, scale to a cluster when the job demands it.

01

LLM training & pretraining

Frontier model training, full fine-tunes, and large-scale pretraining on multi-node clusters with NVLink fabric.

Recommended
NVIDIAB300NVIDIAB200
02

Fine-tuning & adaptation

LoRA, QLoRA, full-parameter tunes, and RLHF on 7B → 70B+ models with shared NVMe checkpoint volumes.

Recommended
NVIDIAH100 SXMNVIDIAH100 NVL
03

Production inference

High-throughput LLM serving with vLLM, TGI, or TensorRT-LLM. Sub-100ms first-token latency at scale.

Recommended
NVIDIAL40SNVIDIAH100 NVLNVIDIAH200 SXM
04

Diffusion, vision & multimodal

SDXL, FLUX, SVD, vision encoders, and embedding pipelines on memory-flexible Ada and Hopper class GPUs.

Recommended
NVIDIAL40SNVIDIAA100 SXM

Per-second billing

Pay only for time the GPU is up. Stop a pod, billing stops within the second.

NVLink + InfiniBand fabric

900 GB/s GPU-to-GPU bandwidth on SXM nodes, 3.2 Tbps east-west on reserved clusters.

Persistent NVMe volumes

Checkpoint, dataset, and weight volumes that survive pod restarts and follow your jobs.

Single pod to 1,000+ GPUs

Start with a single accelerator, scale into reserved multi-node clusters when you need to.

Need bigger?
Reserve a cluster.

Multi-node H200, B200, and B300 clusters with NVIDIA NVLink 5.0 fabric, dedicated capacity, and committed pricing. From a single 8-GPU node to thousand-GPU training runs — we handle the rest.

  • Multi-node NVLink fabric

    8× B300 SXM per node · 1.8 TB/s GPU-to-GPU bandwidth

  • Reserved pricing

    Up to 60% off on-demand · 1-mo to 3-yr commitment

  • Dedicated support

    Priority access, 24/7 coverage, and SLA-backed reliability

  • Rapid provisioning

    Clusters ready in hours, not days

  • Custom networking

    Tailored VPC, routing, and isolation to fit your architecture

  • 99.99% uptime SLA

    Enterprise-grade reliability you can build on

Ready to build?

Talk to our infrastructure team and get a custom quote.

Cluster configuration

Review & request

NVIDIA B300 SXM · 8-node cluster

NVLink fabric · Redundant power · PCIe 5.0

  • GPUs

    64× NVIDIA B300
  • GPU memory

    18.4 TB (288 GB / GPU)
  • Interconnect

    NVLink Switch System 5.0
  • vCPUs / node

    96 vCPUs
  • Networking

    800 Gbps · RDMA
  • Term

    12-month reserved
Secure·Private·Enterprise ready

FAQ

Common questions

01How fast can I get a GPU?

On-demand pods provision in under 90 seconds for available capacity (H100, A100, L40S). Limited-stock SKUs like B200 and B200 typically provision in a few minutes. Reserved multi-node clusters are provisioned in under 24 hours after sales handoff.

02What's included in the per-hour price?

The GPU(s), bundled vCPU and system RAM, the container disk, public IP, ingress and egress bandwidth (with a generous monthly allowance), and persistent NVMe storage up to a quota. Egress beyond the allowance and reserved capacity are billed separately.

03Can I run multi-node training jobs?

Yes. SXM nodes are NVLink-fabric-connected within a chassis (900 GB/s GPU-to-GPU). Reserved clusters add InfiniBand or RoCE east-west networking (up to 3.2 Tbps) across nodes — purpose-built for distributed training with FSDP, DeepSpeed, or NVIDIA's Megatron-LM stack.

04How does reserved pricing work?

Commit to capacity for 1 month to 3 years and save up to 60% off the on-demand rate. Reserved capacity is dedicated, region-pinned, and SLA-backed. Talk to sales for a quote tailored to your training schedule.

05Do you support spot pricing for interruptible jobs?

Yes. Spot pods are roughly 50–70% cheaper than on-demand and are interruptible with 60 seconds of notice. Best fit for fault-tolerant training with checkpointing, hyperparameter sweeps, and batch inference.

06Which frameworks are pre-installed?

CUDA 12.4, PyTorch 2.x, JAX, TensorFlow, vLLM, TGI, TensorRT-LLM, and common scientific stacks ship in base images. You can also bring your own Docker image from any public or private registry.

NVIDIAReady when you are

The AI developer cloud

No replatforming. No lock-in. No hyperscaler tax. Pick a GPU, pick a region, and you're running.

Request a demo