Best GPU Rental Services 2026

We rent A100s and H100s every single day to train models and generate AI video. Here's the honest comparison of every major GPU cloud — real prices, real cold start times, real failure modes.

By Null Agency · Updated June 19, 2026 · Based on 800+ hours of real GPU time across 8 providers

TL;DR — Just Tell Me Where to Rent

Skip: AWS p4/p5 instances (3-4x more expensive than RunPod for identical GPUs), GCP A2 (same problem), Azure NCv4 (same problem). Hyperscalers charge enterprise prices for the exact same NVIDIA silicon. The only reason to use them is if you're already locked into their ecosystem.

Hourly Pricing Comparison (June 2026)

All prices in USD per hour, on-demand. Spot / community / interruptible pricing is typically 30-60% lower. Prices verified June 19, 2026.

ProviderA100 80GBH100 80GBRTX 4090Type
RunPod$1.49/hr$2.69/hr$0.69/hrContainer
Vast.ai$0.80-1.40/hr$1.90-2.50/hr$0.30-0.45/hrMarketplace
Lambda Labs$1.79/hr$2.99/hrN/AVM
Paperspace$3.18/hr$5.95/hr$0.51/hrVM / Notebook
CoreWeave$2.21/hr$4.76/hrN/AK8s / VM
Modal$3.40/hr$5.92/hrN/AServerless
AWS p5.48xlarge$8.05/hr (A100)$12.29/hrN/AVM
GCP A2 Ultra$5.06/hr$11.06/hrN/AVM

The 3-4x markup hyperscalers charge for identical NVIDIA silicon is one of the easiest cost optimizations in AI. If you're running on AWS p4/p5 for anything other than compliance reasons, you're burning money.

The Deep Reviews

🏆 RunPod — Best Overall GPU Rental in 2026

A100: $1.49/hrH100: $2.69/hrNetwork VolumesContainer-native

RunPod is what we use every day at Null Agency. We run an A100 SXM 80GB at $1.49/hr in the US-KS-2 (Kansas) region with a persistent 150GB Network Volume mounted at /workspace. The volume holds our model weights (Wan 2.2 14B is ~28GB, plus a dozen LoRAs and reference embeddings) and survives pod termination — so we can shut down the pod overnight, spin it back up in 90 seconds, and pick up exactly where we left off without re-downloading anything.

What makes RunPod the winner:

Real cold start times we measured: 30-90 seconds when the container image is already cached on the host node, 3-6 minutes for a first-time image pull on a fresh host. Network Volume mount happens in <5 seconds.

Best for: AI video generation with self-hosted Wan 2.2 or Hunyuan, Stable Diffusion / Flux training, LLM fine-tuning, self-hosted XTTS-v2 voice cloning and Stable Audio Open music generation, plus running local coder models behind Continue or Cline — anyone who needs persistent model weights across pod lifecycles, indie devs who want enterprise hardware without enterprise contracts.

Tradeoffs: Community Cloud GPUs occasionally get reclaimed (rare, but happens) — for critical jobs, use Secure Cloud. Limited region availability outside US/EU (no South America, limited Asia). Network Volumes are region-locked — you can't move a volume from US-KS-2 to EU-RO-1 without manually copying.

Start on RunPod See pricing

💰 Vast.ai — Cheapest GPUs on the Internet

RTX 4090: $0.30/hrA100: $0.80/hrMarketplaceReliability varies

Vast.ai is a peer-to-peer marketplace — hosts are individuals or small datacenters renting out their idle GPUs. The result is the lowest prices in the industry by a wide margin. We've rented RTX 4090s for $0.30/hr that match the performance of a $0.69/hr 4090 on RunPod. For batch jobs you can babysit, the savings are real. If you're trying to decide which of the two to standardize on, see our dedicated RunPod vs Vast.ai comparison — it covers persistent storage, reliability, cold-start, and bandwidth tradeoffs in depth.

What works:

What breaks:

Best for: Hobby experimentation, model evaluation runs, batch inference jobs that can restart, anyone with a tight budget who knows their way around Linux.

Browse Vast.ai

🏢 Lambda Labs — Best for Multi-Node Training

A100: $1.79/hrH100: $2.99/hrInfiniBandCapacity-limited

Lambda is the GPU cloud built by ML engineers for ML engineers. Their On-Demand Cloud gives you full Ubuntu VMs (not containers) with NVIDIA drivers, CUDA, PyTorch and the typical ML stack pre-installed. Where they win decisively is multi-GPU and multi-node training: their 8x H100 and 8x A100 instances come with NVLink and NVSwitch fully wired, and their 1-Click Clusters give you InfiniBand-connected nodes for distributed training without you touching a Slurm script.

What works:

What breaks:

Best for: Serious training runs that need 8x or 16x GPU with full NVLink, teams that prefer SSH-into-Ubuntu over container workflows, anyone training foundation models.

Try Lambda Labs

📓 Paperspace by DigitalOcean — Best Notebooks for Beginners

A100: $3.18/hrH100: $5.95/hrJupyter-firstExpensive vs RunPod

Paperspace Gradient is the friendliest entry point into GPU computing if you've never SSH'd into a Linux box and don't want to learn today. Their Notebooks product is a Jupyter environment with a GPU one click away — no Docker, no terminal, no Network Volume configuration. It's the closest thing to "Google Colab but with proper hardware and persistence."

What works:

What breaks:

Best for: Beginners, education, students, teams that want a Jupyter-centric workflow without the ops overhead.

Try Paperspace

🏗️ CoreWeave — Best for Enterprise Scale

H100 SXM at scaleKubernetes-nativeEnterprise SLANot for indies

CoreWeave is what OpenAI, Mistral, and a dozen other foundation model labs use to train their models. They run massive H100 SXM5 and B200 clusters with InfiniBand fabric, Kubernetes orchestration, and the kind of SLAs that involve actual humans answering the phone. If you're training a 70B+ parameter model and need 256x H100 nodes for two months, CoreWeave is the only realistic answer outside of hyperscalers.

What works:

What breaks:

Best for: Foundation model labs, well-funded startups, enterprise ML platform teams, anyone training models with 50+ GPU sustained.

Visit CoreWeave

⚡ Modal — Best Serverless GPU Platform

Pay per secondA100: $3.40/hrSub-second warm startsPremium pricing

Modal is the cleanest serverless GPU experience in 2026. You write Python with decorators, Modal builds a container, deploys it across a fleet of GPUs, and bills you per second of actual compute. There is no pod to manage, no idle bill, no Kubernetes. For inference APIs that don't have constant traffic, this is the right architecture.

What works:

What breaks:

Best for: Inference APIs with spiky or low traffic, batch processing pipelines, scheduled training jobs, anyone allergic to ops.

Try Modal

🤖 Together AI — Best Serverless Inference API

Token-priced200+ modelsOpenAI-compatible

Together AI runs a managed inference platform with 200+ open-source models pre-deployed. You don't rent a GPU — you call an OpenAI-compatible API and pay per token. For LLM inference at production scale, the per-token price is often cheaper than running your own pod, and you get auto-scaling, multi-region, and zero ops.

What works:

What breaks:

Best for: Production LLM apps that want OpenAI-style simplicity with OSS models, RAG pipelines, anyone who'd rather pay per token than manage GPUs.

Try Together AI

🎯 Replicate — Best Model Marketplace

Per-second billing10k+ modelsCog containers

Replicate is the App Store of AI models. Anyone can publish a model packaged with their Cog format, and anyone can run it via API or web UI. The catalog is unmatched — Flux, SDXL, Wan, every image/video/audio model the community has packaged, plus weird niche models you won't find anywhere else. You pay per second of GPU time, no idle billing.

What works:

What breaks:

Best for: Prototyping with exotic models, generative apps that need a long tail of model options, hobby projects.

Browse Replicate

Decision Framework: On-Demand vs Spot vs Serverless

The hardest decision in renting GPUs in 2026 isn't which provider — it's which billing model. We've made this choice for hundreds of jobs. Here's the framework we use:

Use On-Demand Pods (RunPod, Lambda, Paperspace) when:

Use Spot / Community / Interruptible (Vast.ai, RunPod Community) when:

Use Serverless (Modal, RunPod Serverless, Replicate) when:

Use Managed API (Together AI, Replicate public models) when:

Real example from our stack:

For our daily Wan 2.2 video generation: RunPod on-demand A100 SXM 80GB at $1.49/hr with a 150GB Network Volume in US-KS-2. We boot the pod when we start work, generate clips interactively in ComfyUI, then shut down. Average run: 3 hours, $4.47. The Network Volume sits at $0.10/GB/month — $15/month for 150GB — so even when the pod is off, weights persist. If we needed inference scaling, we'd flip to RunPod Serverless with the same model. If we were experimenting with a 1.3B model, we'd drop to a 4090 at $0.69/hr.

Cold Start Times We Actually Measured

Cold start is the single most-lied-about metric in GPU cloud marketing. Here's what we measured running the same job (load Wan 2.2 14B, generate one 5-second 720p clip) cold across providers in June 2026:

ProviderPod bootImage pull (fresh)Model loadTotal to first token
RunPod (cached)30s0s45s (from volume)~90s
RunPod (fresh)30s3-5 min2-4 min (download)5-10 min
Vast.ai45s2-6 min2-5 min5-12 min
Lambda Labs60-120sN/A (VM image)2-4 min3-7 min
Paperspace Notebook2-4 minN/A2-4 min4-8 min
Modal (warm)0s0scached<3s
Modal (cold)5-15scached10-30s15-45s
Replicate publicvariesvaries30-120s30s-2 min

The big takeaway: persistent storage destroys cold start times. A 150GB Network Volume on RunPod with our model weights pre-loaded cuts our cold start from 5-10 minutes to under 90 seconds. If you're going to use any container provider regularly, configure persistent volumes from day one.

Persistent Storage: The Feature Nobody Talks About

Every "best GPU rental" article on the internet focuses on hourly rates. Real users care about storage. When you're working with 50GB model weights, gigabyte LoRAs, and reference datasets that took an hour to assemble, blowing them away every time you terminate a pod is unacceptable.

Container Volume vs Network Volume (RunPod terminology, but universal)

How providers handle persistent storage in 2026:

Our setup: 150GB RunPod Network Volume in US-KS-2. Costs $15/month. Holds Wan 2.2 14B (~28GB), Wan 2.1 (~17GB), Hunyuan Video (~25GB), 12 LoRAs (~6GB), Flux dev (~24GB), and our reference image library. When we spin up a new pod, the volume mounts in <5 seconds and everything is already there. We've saved an estimated 40+ hours of download time over the past 6 months versus re-pulling models per session.

Region Availability: Where the GPUs Actually Are

One thing the marketing pages don't tell you: GPU availability varies enormously by region. Here's what we've observed in mid-2026:

For latency-sensitive inference workloads, picking a region with both GPU capacity and proximity to your users is harder than it should be in 2026. Most teams compromise by using a managed serverless inference provider (Together AI, Modal) and letting them handle global distribution.

Networking: Egress Fees, NAT, and the Hidden Bill

This is where AWS / GCP / Azure absolutely destroy your wallet. If you're running training jobs that download large datasets, or inference that serves large outputs (video, audio, images), egress is a real line item.

The egress fee gap is a primary reason hyperscalers can't compete on AI workloads. RunPod letting you pull 200GB of model weights without a meter is genuinely unmatched.

GPU Selection: Which Card Do You Actually Need?

The GPU rental conversation usually jumps straight to providers without asking the obvious question: which GPU does your workload even need? Renting an H100 to run Stable Diffusion 1.5 is throwing money in a fire. Here's the practical breakdown of what each GPU is good for in 2026.

RTX 3090 / 4090 / 5090 (24-32GB VRAM, $0.30-1.20/hr)

Consumer cards are dramatically underrated for AI workloads. A 4090 has 24GB VRAM and roughly 80% of an A100's FP16 throughput at a fraction of the price. We use 4090s for:

The 5090 (32GB VRAM, available late 2025) extends this envelope upward — you can fit Wan 2.2 14B with aggressive quantization, run 24B-class LLMs comfortably, and train somewhat larger LoRAs. If you're running consumer-grade workloads, skipping straight to a data-center GPU is overpaying.

A100 80GB (PCIe and SXM, $1.49-1.79/hr)

The workhorse. Released in 2020, still the most cost-effective data-center card in 2026 for most workloads. 80GB HBM2e lets you run almost any open-source model without quantization, and the FP16 performance is plenty for everything short of foundation-model training. We use A100 SXM 80GB for:

The SXM variant is meaningfully faster than the PCIe variant for memory-bandwidth-bound workloads (most diffusion models) — when RunPod has SXM stock, take it.

H100 80GB (PCIe and SXM5, $2.69-4.76/hr)

The current performance king for most rental workloads. ~2-3x faster than A100 on FP16 transformer workloads, FP8 support that A100 lacks, and 80GB HBM3 with significantly higher bandwidth. Use H100 when:

At $2.69/hr on RunPod, an H100 that finishes a job 2x faster than an A100 ($1.49/hr) wins on total cost. We routinely swap our Network Volume from an A100 pod to an H100 pod when batching nighttime video generation runs.

H200 / B100 / B200 (premium, $4-10/hr)

Hopper-refresh and Blackwell cards are arriving on rental clouds in 2026. They're worth renting only if your job is bottlenecked on memory bandwidth (some long-context LLM inference) or you're training a model where time-to-result has six-figure business value. For most users, H100 is the better price/performance choice.

L4 / L40 / L40S (24-48GB VRAM, $0.80-2.00/hr)

The Ada-architecture data-center cards are excellent for inference workloads. L40S in particular has 48GB VRAM and FP8 support — great for running 70B LLMs at scale or batched image generation pipelines. Underrated, often available when A100s aren't.

RTX A6000 / A6000 Ada (48GB VRAM, $0.70-1.50/hr)

Workstation cards that sneak onto rental marketplaces, especially Vast.ai. 48GB VRAM at consumer-card pricing is genuinely useful for fitting models that don't quite fit on 24GB without quantization compromises.

Real-World Cost Examples

Theoretical hourly rates are useless without context. Here are five real workloads we run, with actual costs:

Workload 1: Daily Wan 2.2 video generation (our actual stack)

RunPod A100 SXM 80GB, US-KS-2, ~3 hours/day, 5 days/week. Plus 150GB Network Volume.

Workload 2: Stable Diffusion training run, single LoRA

Vast.ai RTX 4090, 6 hours of training, $0.35/hr interruptible.

Workload 3: Production LLM inference API, low traffic

Modal A100, ~30 minutes of actual GPU time per day across calls, $3.40/hr.

Workload 4: Foundation model evaluation across 100 prompts

Together AI Llama 3 70B at $0.88/M output tokens. 100 prompts × ~2k output tokens each = 200k tokens.

Workload 5: Train a custom 1B parameter model from scratch

Lambda Labs 8x H100 reserved cluster, 72 hours, $19.99/hr × 8 = $159.92/hr.

Common Mistakes That Will Burn Your Budget

After 800+ hours of GPU rental, here are the mistakes we see (and have made) most often:

1. Forgetting to terminate idle pods

The single most expensive habit. You spin up an A100 to debug something, get pulled into a meeting, and 8 hours later realize you've burned $12. Set a calendar reminder. Better: configure auto-shutdown if your provider supports it. RunPod has pod timers; use them.

2. Re-downloading model weights every session

Wan 2.2 14B is 28GB. Pulling it fresh every session at 200 MB/s takes ~2.5 minutes — at $1.49/hr that's $0.06 per restart. Doesn't sound like much. Multiply by 5 model swaps per day, 20 days a month: $6 wasted. Worse, you're often paying for the GPU during the download. A persistent volume costs $15/month and pays for itself in three days.

3. Running production on Community Cloud / Vast.ai

Cheap until the host yanks the machine in the middle of your training run. For anything mission-critical, pay the Secure Cloud premium. The 30% price difference is cheaper than one lost training run.

4. Picking GPU by price alone, ignoring memory bandwidth

Diffusion models and LLM inference are memory-bandwidth-bound, not compute-bound. An H100 SXM5 has nearly 3x the memory bandwidth of an A100 PCIe — for the right workload, the H100 finishes in less than half the time, making the higher per-hour rate cheaper overall.

5. Using a hyperscaler "because that's what we already use"

If your company already has an AWS commitment, the temptation to run AI workloads on p4d/p5 is real. The 3-4x cost premium over RunPod is also real. For most ML teams, opening a RunPod account and treating it as a vendor pays for itself in the first month.

6. Not using spot/interruptible for restartable jobs

If your training loop checkpoints every 30 minutes (it should), spot pricing cuts your bill by 40-60% with effectively zero downside. The mental model "spot is unreliable" is outdated for AI workloads — checkpoint, then take the discount.

7. Buying a GPU when you should be renting

Owning a 4090 makes sense if you use it 60+ hours a month for years. For anything less, renting is cheaper, more flexible, and lets you trade up the architecture every 12 months. A 4090 costs $1,800 + ~$30/month in electricity. At $0.69/hr on RunPod, you'd need to rent for 2,600+ hours to break even, ignoring the opportunity cost of capital and the fact that you'll want an upgrade by year two.

Why You Can Trust This Review

We're Null Agency — an AI software company. We don't just review GPU clouds, we use them every day to ship products.

What we actually run on rented GPUs:

Our methodology for this article:

FAQ

What is the cheapest GPU rental service in 2026?
Vast.ai consistently has the lowest hourly rates because it's a peer-to-peer marketplace — an RTX 4090 can be rented for $0.30-0.45/hr from individual hosts. For datacenter-grade reliability at a low price, RunPod Community Cloud offers A100 80GB at $1.49/hr and H100 at $2.69/hr — about a third of what AWS charges for the same silicon.
How long do GPU instances take to cold start?
On RunPod, container pods boot in 30-90 seconds when the image is cached on the host node, 3-6 minutes for a fresh image pull. Vast.ai is similar. Modal serverless functions cold-start in 5-15 seconds for warm-ish functions and sub-second when a worker is already running. Lambda Labs On-Demand VMs take 60-120 seconds to boot. CoreWeave Kubernetes pods can start in under 30 seconds with pre-pulled images. Loading a 28GB model from a persistent volume adds 30-60 seconds; downloading it fresh adds 2-5 minutes.
Does RunPod have persistent storage?
Yes — Network Volumes are persistent across pod restarts and attached to a specific datacenter region. Container Volumes (the pod's local disk) are ephemeral and deleted when the pod is terminated. For any serious AI workload, Network Volumes are essential. We run a 150GB Network Volume in US-KS-2 for $15/month that holds Wan 2.2, Hunyuan, Flux, and a dozen LoRAs — letting us swap GPU pods in under 90 seconds without re-downloading anything.
What's the difference between on-demand, spot, and serverless GPUs?
On-demand: you rent a full pod by the hour, you pay even when idle. Spot/Community/Interruptible: 30-60% cheaper hourly rates but the host can reclaim the GPU with little warning. Serverless: pay per second of actual GPU compute used, with cold-start latency in the 1-30 second range. Use on-demand for long training runs and interactive work, spot for restartable batch jobs, serverless for inference APIs with spiky traffic.
Can I run uncensored AI models on rented GPUs?
Yes. Container-based providers (RunPod, Vast.ai, Lambda Labs, Paperspace, CoreWeave) give you root access inside a Linux container — you can run any open-source model including uncensored Wan 2.2, Hunyuan, fine-tunes, or your own custom training code. Managed inference platforms (Together AI, Replicate) restrict you to their curated model catalog, which generally excludes uncensored models.
What region has the best GPU availability in 2026?
US-Central (Kansas, Iowa, Texas) and US-East (Virginia, Carolinas) have the broadest availability across providers. RunPod's US-KS-2 region consistently has A100 80GB and H100 stock when other regions are sold out. EU-Romania (RO-1) is the cheapest reliable EU region. Asia-Pacific has thin availability across all providers — use Vast.ai if you must run in APAC.
Is Vast.ai safe for production workloads?
Vast.ai is great for experimentation and restartable batch jobs but risky for production. Hosts are individuals running consumer or small-datacenter hardware, uptime varies wildly between hosts, and instances can disappear with short notice. Don't put production secrets in a Vast.ai container. For production inference or training that can't fail, use RunPod Secure Cloud, Lambda Labs, or CoreWeave.

Other Tools We've Reviewed

RunPod vs Vast.ai

Head-to-head GPU rental comparison

Best AI Video Generators

Wan 2.2, Runway, Sora, Pika, Luma

Best AI Image Generators

Midjourney, Flux, SDXL, DALL-E

Best AI Voice Cloning

ElevenLabs, Play.ht, XTTS-v2, F5

Best AI Music Generators

Suno, Udio, Stable Audio, Mubert

Best AI Coding Assistants

Claude Code, Cursor, Copilot, Cline

Best PDF Editors

PhantomEtch vs Adobe vs Smallpdf

Best Privacy Analytics

GhostMetrics vs Plausible vs Fathom

US Economic Dashboard

Live Federal Reserve data

Affiliate disclosure: Some links above are partner referrals. We earn a small commission (often recurring) when you sign up through them, at no extra cost to you. We only recommend GPU providers we actually use and pay for. RunPod has been our daily driver for over a year — the affiliate relationship came after the workflow, not the other way around. Nothing in this comparison is paid placement.