We rent A100s and H100s every single day to train models and generate AI video. Here's the honest comparison of every major GPU cloud — real prices, real cold start times, real failure modes.
By Null Agency · Updated June 19, 2026 · Based on 800+ hours of real GPU time across 8 providers
TL;DR — Just Tell Me Where to Rent
Best overall (container + persistent volume):RunPod — A100 80GB at $1.49/hr, H100 at $2.69/hr, Network Volumes that survive pod swaps
Cheapest absolute (if you tolerate variance):Vast.ai — RTX 4090 from $0.30/hr on the consumer marketplace
Best for serious training runs:Lambda Labs — clean InfiniBand interconnects, real datacenter SLAs
Best serverless inference:Modal — sub-second warm starts, Python-native, pay per second
Best managed model API:Together AI or Replicate — no infra at all, you call an API
Best for enterprise / massive clusters:CoreWeave — H100 SXM at scale, Kubernetes-native
Skip: AWS p4/p5 instances (3-4x more expensive than RunPod for identical GPUs), GCP A2 (same problem), Azure NCv4 (same problem). Hyperscalers charge enterprise prices for the exact same NVIDIA silicon. The only reason to use them is if you're already locked into their ecosystem.
Hourly Pricing Comparison (June 2026)
All prices in USD per hour, on-demand. Spot / community / interruptible pricing is typically 30-60% lower. Prices verified June 19, 2026.
Provider
A100 80GB
H100 80GB
RTX 4090
Type
RunPod
$1.49/hr
$2.69/hr
$0.69/hr
Container
Vast.ai
$0.80-1.40/hr
$1.90-2.50/hr
$0.30-0.45/hr
Marketplace
Lambda Labs
$1.79/hr
$2.99/hr
N/A
VM
Paperspace
$3.18/hr
$5.95/hr
$0.51/hr
VM / Notebook
CoreWeave
$2.21/hr
$4.76/hr
N/A
K8s / VM
Modal
$3.40/hr
$5.92/hr
N/A
Serverless
AWS p5.48xlarge
$8.05/hr (A100)
$12.29/hr
N/A
VM
GCP A2 Ultra
$5.06/hr
$11.06/hr
N/A
VM
The 3-4x markup hyperscalers charge for identical NVIDIA silicon is one of the easiest cost optimizations in AI. If you're running on AWS p4/p5 for anything other than compliance reasons, you're burning money.
RunPod is what we use every day at Null Agency. We run an A100 SXM 80GB at $1.49/hr in the US-KS-2 (Kansas) region with a persistent 150GB Network Volume mounted at /workspace. The volume holds our model weights (Wan 2.2 14B is ~28GB, plus a dozen LoRAs and reference embeddings) and survives pod termination — so we can shut down the pod overnight, spin it back up in 90 seconds, and pick up exactly where we left off without re-downloading anything.
What makes RunPod the winner:
Network Volumes that survive pod swaps. This is the killer feature. Most providers either lock storage to a specific VM (Lambda, Paperspace) or make persistent storage a painful separate product (AWS EBS). RunPod's Network Volumes are first-class — they live in a region, you attach them to whatever GPU pod you spin up, and you can swap from A100 to H100 to 4090 in under a minute without touching your data.
Pricing transparency. Listed price is the price. No egress fees on most workloads, no surprise NAT gateway charges, no per-API-call billing.
Two clouds in one. Secure Cloud (datacenter-grade, T3-rated facilities, used for production) and Community Cloud (lower prices, individual hosts vetted by RunPod, used for experimentation). You pick per pod.
Templates and one-click setups. ComfyUI, A1111, Ollama, Jupyter, vLLM, sglang — all have official templates. Spin up an A100 with ComfyUI pre-installed in under 2 minutes.
Serverless option. RunPod Serverless competes with Modal — pay per second of inference time, sub-3-second cold starts on warm workers.
Real cold start times we measured: 30-90 seconds when the container image is already cached on the host node, 3-6 minutes for a first-time image pull on a fresh host. Network Volume mount happens in <5 seconds.
Tradeoffs: Community Cloud GPUs occasionally get reclaimed (rare, but happens) — for critical jobs, use Secure Cloud. Limited region availability outside US/EU (no South America, limited Asia). Network Volumes are region-locked — you can't move a volume from US-KS-2 to EU-RO-1 without manually copying.
Vast.ai is a peer-to-peer marketplace — hosts are individuals or small datacenters renting out their idle GPUs. The result is the lowest prices in the industry by a wide margin. We've rented RTX 4090s for $0.30/hr that match the performance of a $0.69/hr 4090 on RunPod. For batch jobs you can babysit, the savings are real. If you're trying to decide which of the two to standardize on, see our dedicated RunPod vs Vast.ai comparison — it covers persistent storage, reliability, cold-start, and bandwidth tradeoffs in depth.
What works:
Search filters let you pick by DLPerf score, host reliability rating, internet bandwidth, disk speed.
Interruptible (spot) pricing is often 40-60% below on-demand.
Massive consumer GPU selection — 3090, 4090, 5090, A6000 — that providers like Lambda don't carry.
"Reserved" mode locks the GPU for a longer term at a discount.
What breaks:
Host quality varies wildly. We've had instances where the "advertised" 10 Gbit network was actually 200 Mbit residential cable. Always check the DLPerf score and bandwidth test before committing.
No SLA. Hosts can take their machine offline for any reason. For production, this is a dealbreaker.
No persistent volumes that survive instance destruction — everything lives on the host's local disk.
Container security is best-effort. Don't put production secrets in a Vast.ai pod.
Best for: Hobby experimentation, model evaluation runs, batch inference jobs that can restart, anyone with a tight budget who knows their way around Linux.
Lambda is the GPU cloud built by ML engineers for ML engineers. Their On-Demand Cloud gives you full Ubuntu VMs (not containers) with NVIDIA drivers, CUDA, PyTorch and the typical ML stack pre-installed. Where they win decisively is multi-GPU and multi-node training: their 8x H100 and 8x A100 instances come with NVLink and NVSwitch fully wired, and their 1-Click Clusters give you InfiniBand-connected nodes for distributed training without you touching a Slurm script.
What works:
Real datacenter hardware with real interconnects — if you're training a model that needs all-reduce across 8+ GPUs, this is the cheapest place with proper NVLink/NVSwitch.
Lambda Stack pre-installed. PyTorch, TensorFlow, JAX, CUDA, cuDNN all just work.
Filesystem (Lambda Cloud Storage) attached to instances, persisting across launches.
Reserved Cloud for committed capacity at significant discount.
What breaks:
On-Demand H100 availability is unreliable in 2026. Their reservation queue is real.
Slower to spin up than container providers — figure 60-120 seconds for VM boot vs 30s for a container.
No consumer GPUs (no 4090). If you don't need datacenter cards, you're overpaying.
Best for: Serious training runs that need 8x or 16x GPU with full NVLink, teams that prefer SSH-into-Ubuntu over container workflows, anyone training foundation models.
📓 Paperspace by DigitalOcean — Best Notebooks for Beginners
A100: $3.18/hrH100: $5.95/hrJupyter-firstExpensive vs RunPod
Paperspace Gradient is the friendliest entry point into GPU computing if you've never SSH'd into a Linux box and don't want to learn today. Their Notebooks product is a Jupyter environment with a GPU one click away — no Docker, no terminal, no Network Volume configuration. It's the closest thing to "Google Colab but with proper hardware and persistence."
What works:
One-click Jupyter notebooks with persistent storage built in.
DigitalOcean ownership means billing, support, and enterprise procurement are clean.
Gradient Deployments turn a notebook into a hosted inference endpoint.
Free tier with M4000 GPU for getting started.
What breaks:
Hourly rates are 2x+ what RunPod charges for identical hardware.
Cold starts on Notebooks are slow — 2-4 minutes typical.
Core (VM-based) machines have legitimately confusing pricing tiers.
Best for: Beginners, education, students, teams that want a Jupyter-centric workflow without the ops overhead.
H100 SXM at scaleKubernetes-nativeEnterprise SLANot for indies
CoreWeave is what OpenAI, Mistral, and a dozen other foundation model labs use to train their models. They run massive H100 SXM5 and B200 clusters with InfiniBand fabric, Kubernetes orchestration, and the kind of SLAs that involve actual humans answering the phone. If you're training a 70B+ parameter model and need 256x H100 nodes for two months, CoreWeave is the only realistic answer outside of hyperscalers.
What works:
Industrial-scale capacity. They have GPUs when nobody else does.
Kubernetes-native (KubeVirt for VMs, plain K8s for containers) — clean for ML platform teams.
Storage tiers (Tier 1 NVMe scratch, Tier 2 distributed, S3-compatible object) designed for ML workloads.
Real enterprise contracts, real procurement, real SLAs.
What breaks:
Not designed for indies or small teams. Onboarding involves a sales call.
No published pricing for many SKUs — you negotiate.
Minimum commitments on the better deals.
Best for: Foundation model labs, well-funded startups, enterprise ML platform teams, anyone training models with 50+ GPU sustained.
Pay per secondA100: $3.40/hrSub-second warm startsPremium pricing
Modal is the cleanest serverless GPU experience in 2026. You write Python with decorators, Modal builds a container, deploys it across a fleet of GPUs, and bills you per second of actual compute. There is no pod to manage, no idle bill, no Kubernetes. For inference APIs that don't have constant traffic, this is the right architecture.
What works:
Per-second billing. If your function runs 800ms, you pay for 800ms.
Cold starts are genuinely fast — 5-15 seconds for warm-ish functions, sub-second when a worker is already running.
Pythonic API. @app.function(gpu="A100") and you're done.
Built-in volumes, secrets, scheduled functions, web endpoints.
Concurrency primitives that actually work — fan-out 1000 jobs in 3 lines of code.
What breaks:
Effective hourly rate is 2x+ RunPod for the same GPU. You pay for the serverless convenience.
Constant-load workloads are wildly more expensive than on-demand pods.
Cold start under heavy concurrency is still 5-15s — too slow for some latency-sensitive APIs.
Best for: Inference APIs with spiky or low traffic, batch processing pipelines, scheduled training jobs, anyone allergic to ops.
Together AI runs a managed inference platform with 200+ open-source models pre-deployed. You don't rent a GPU — you call an OpenAI-compatible API and pay per token. For LLM inference at production scale, the per-token price is often cheaper than running your own pod, and you get auto-scaling, multi-region, and zero ops.
What works:
Llama, Mixtral, Qwen, DeepSeek, Wan, Flux, every major OSS model deployed and ready.
OpenAI-compatible API — swap base URL and key, your existing code works.
Per-token pricing that's competitive with OpenAI / Anthropic for equivalent-quality models.
Fine-tuning service for adapting models to your data.
Dedicated endpoints for high-traffic production.
What breaks:
You're locked to their model catalog. Want to run a niche fine-tune? Bring your own GPU.
Content moderation on some endpoints — not a problem for most use cases but worth knowing.
No control over batching or quantization choices.
Best for: Production LLM apps that want OpenAI-style simplicity with OSS models, RAG pipelines, anyone who'd rather pay per token than manage GPUs.
Replicate is the App Store of AI models. Anyone can publish a model packaged with their Cog format, and anyone can run it via API or web UI. The catalog is unmatched — Flux, SDXL, Wan, every image/video/audio model the community has packaged, plus weird niche models you won't find anywhere else. You pay per second of GPU time, no idle billing.
What works:
Massive catalog of community-published models, many with no equivalent elsewhere.
Cog is a clean way to package your own model and host it.
Public model pages are great for prototyping — try a model in the browser before you write code.
Per-second billing on most public models.
What breaks:
Cold starts on infrequently-used public models can be 30-120 seconds — model weights load fresh.
Effective hourly rate is high if you have steady traffic. Move to Modal or RunPod Serverless past a threshold.
Limited control over hardware — you take what the model author chose.
Best for: Prototyping with exotic models, generative apps that need a long tail of model options, hobby projects.
Decision Framework: On-Demand vs Spot vs Serverless
The hardest decision in renting GPUs in 2026 isn't which provider — it's which billing model. We've made this choice for hundreds of jobs. Here's the framework we use:
Use On-Demand Pods (RunPod, Lambda, Paperspace) when:
You're running a long job (4+ hours) that can't tolerate interruption.
Your model weights are 20GB+ and re-downloading them would dominate cost.
You need persistent storage attached.
You're training, not inferring.
Use Spot / Community / Interruptible (Vast.ai, RunPod Community) when:
Your job is restartable from a checkpoint.
You're running batch inference where individual failures are fine.
You're doing hyperparameter sweeps where a few lost runs don't matter.
Budget is the dominant constraint and time is flexible.
Use Serverless (Modal, RunPod Serverless, Replicate) when:
You're building an inference API with spiky or unpredictable traffic.
You'd rather pay 2x more per GPU-second to never pay for idle time.
Your job is short (seconds to a few minutes) and called via API.
You want zero ops overhead.
Use Managed API (Together AI, Replicate public models) when:
The model you need is in their catalog.
You'd rather call an HTTP endpoint than think about containers.
Your usage is small enough that per-token pricing wins.
You don't need fine-grained control over batching, quantization, or sampling.
Real example from our stack:
For our daily Wan 2.2 video generation: RunPod on-demand A100 SXM 80GB at $1.49/hr with a 150GB Network Volume in US-KS-2. We boot the pod when we start work, generate clips interactively in ComfyUI, then shut down. Average run: 3 hours, $4.47. The Network Volume sits at $0.10/GB/month — $15/month for 150GB — so even when the pod is off, weights persist. If we needed inference scaling, we'd flip to RunPod Serverless with the same model. If we were experimenting with a 1.3B model, we'd drop to a 4090 at $0.69/hr.
Cold Start Times We Actually Measured
Cold start is the single most-lied-about metric in GPU cloud marketing. Here's what we measured running the same job (load Wan 2.2 14B, generate one 5-second 720p clip) cold across providers in June 2026:
Provider
Pod boot
Image pull (fresh)
Model load
Total to first token
RunPod (cached)
30s
0s
45s (from volume)
~90s
RunPod (fresh)
30s
3-5 min
2-4 min (download)
5-10 min
Vast.ai
45s
2-6 min
2-5 min
5-12 min
Lambda Labs
60-120s
N/A (VM image)
2-4 min
3-7 min
Paperspace Notebook
2-4 min
N/A
2-4 min
4-8 min
Modal (warm)
0s
0s
cached
<3s
Modal (cold)
5-15s
cached
10-30s
15-45s
Replicate public
varies
varies
30-120s
30s-2 min
The big takeaway: persistent storage destroys cold start times. A 150GB Network Volume on RunPod with our model weights pre-loaded cuts our cold start from 5-10 minutes to under 90 seconds. If you're going to use any container provider regularly, configure persistent volumes from day one.
Persistent Storage: The Feature Nobody Talks About
Every "best GPU rental" article on the internet focuses on hourly rates. Real users care about storage. When you're working with 50GB model weights, gigabyte LoRAs, and reference datasets that took an hour to assemble, blowing them away every time you terminate a pod is unacceptable.
Container Volume vs Network Volume (RunPod terminology, but universal)
Container Volume (or "instance disk"): Lives with the pod. Fast (local NVMe). Deleted when pod is terminated. Use for: working scratch space, checkpoints you'll export.
Network Volume (or "persistent disk"): Lives separately, attaches to any pod in the region. Slightly slower (still very fast on modern NVMe-over-fabric). Survives forever until you delete it. Use for: model weights, datasets, anything you want to keep.
How providers handle persistent storage in 2026:
RunPod: First-class Network Volumes. Region-locked. $0.10/GB/mo. Best in class.
Vast.ai: No real persistent option. Storage is on the host's local disk and dies with the instance. Workaround: rsync to cloud storage between runs.
Paperspace: Persistent storage built into Gradient Notebooks by default. Confusing pricing.
CoreWeave: Multi-tier storage (Tier 1 NVMe scratch, Tier 2 distributed). Designed for ML.
Modal:modal.Volume and modal.NetworkFileSystem — both work well, mount into serverless functions.
Our setup: 150GB RunPod Network Volume in US-KS-2. Costs $15/month. Holds Wan 2.2 14B (~28GB), Wan 2.1 (~17GB), Hunyuan Video (~25GB), 12 LoRAs (~6GB), Flux dev (~24GB), and our reference image library. When we spin up a new pod, the volume mounts in <5 seconds and everything is already there. We've saved an estimated 40+ hours of download time over the past 6 months versus re-pulling models per session.
Region Availability: Where the GPUs Actually Are
One thing the marketing pages don't tell you: GPU availability varies enormously by region. Here's what we've observed in mid-2026:
US-Central (Kansas, Iowa, Texas): RunPod's deepest A100 / H100 stock is here. We've never been unable to spin up an A100 SXM in US-KS-2.
US-East (Virginia, Carolinas): Lambda Labs' main region. Capacity-constrained on H100s — reservations recommended.
US-West (California, Oregon): CoreWeave's main footprint. Higher prices, more capacity.
EU-Romania (RO-1) & EU-Czech: Cheapest reliable EU regions for RunPod. Often a few cents/hour less than US.
EU-Iceland (Lambda): Hydropower means lower prices, but ~150ms latency from US East Coast.
Asia-Pacific: Generally thin availability across all providers. Higher prices, longer wait queues. Vast.ai is your best bet if you must be APAC.
For latency-sensitive inference workloads, picking a region with both GPU capacity and proximity to your users is harder than it should be in 2026. Most teams compromise by using a managed serverless inference provider (Together AI, Modal) and letting them handle global distribution.
Networking: Egress Fees, NAT, and the Hidden Bill
This is where AWS / GCP / Azure absolutely destroy your wallet. If you're running training jobs that download large datasets, or inference that serves large outputs (video, audio, images), egress is a real line item.
RunPod: No egress fees on most plans. You can pull a 1TB dataset down and ship out generated video without surprise charges.
Vast.ai: Host-dependent. Most hosts don't charge egress, but bandwidth is residential-grade on some.
Lambda Labs: No standard egress fees.
Paperspace: Generous free egress.
Modal / Together / Replicate: Egress baked into per-second / per-token pricing.
CoreWeave: Negotiated per contract.
AWS: $0.09/GB out. A 100GB dataset download costs $9. Serving 10TB of generated video costs $900.
GCP: Similar to AWS.
The egress fee gap is a primary reason hyperscalers can't compete on AI workloads. RunPod letting you pull 200GB of model weights without a meter is genuinely unmatched.
GPU Selection: Which Card Do You Actually Need?
The GPU rental conversation usually jumps straight to providers without asking the obvious question: which GPU does your workload even need? Renting an H100 to run Stable Diffusion 1.5 is throwing money in a fire. Here's the practical breakdown of what each GPU is good for in 2026.
Consumer cards are dramatically underrated for AI workloads. A 4090 has 24GB VRAM and roughly 80% of an A100's FP16 throughput at a fraction of the price. We use 4090s for:
Stable Diffusion / Flux image generation (any model under 24GB VRAM fits cleanly).
LoRA training for image and small video models — 4-8 hour runs that don't need data-center HBM.
Wan 2.2 1.3B "Lite" video generation.
LLM inference for models up to 13B parameters (Llama 3 8B, Mistral 7B).
The 5090 (32GB VRAM, available late 2025) extends this envelope upward — you can fit Wan 2.2 14B with aggressive quantization, run 24B-class LLMs comfortably, and train somewhat larger LoRAs. If you're running consumer-grade workloads, skipping straight to a data-center GPU is overpaying.
A100 80GB (PCIe and SXM, $1.49-1.79/hr)
The workhorse. Released in 2020, still the most cost-effective data-center card in 2026 for most workloads. 80GB HBM2e lets you run almost any open-source model without quantization, and the FP16 performance is plenty for everything short of foundation-model training. We use A100 SXM 80GB for:
Wan 2.2 14B video generation at full precision.
Hunyuan Video at full precision.
LLM inference for 70B-class models (Llama 3 70B, Qwen 2.5 72B) at FP8.
Fine-tuning 7B-13B models without sharding.
The SXM variant is meaningfully faster than the PCIe variant for memory-bandwidth-bound workloads (most diffusion models) — when RunPod has SXM stock, take it.
H100 80GB (PCIe and SXM5, $2.69-4.76/hr)
The current performance king for most rental workloads. ~2-3x faster than A100 on FP16 transformer workloads, FP8 support that A100 lacks, and 80GB HBM3 with significantly higher bandwidth. Use H100 when:
Training large models where wall-clock time matters more than per-hour cost.
Running FP8 inference (Llama 3 70B at FP8 fits cleanly with high throughput).
Running video gen at scale (generating 100+ clips per day where the time savings beat the per-hour premium).
At $2.69/hr on RunPod, an H100 that finishes a job 2x faster than an A100 ($1.49/hr) wins on total cost. We routinely swap our Network Volume from an A100 pod to an H100 pod when batching nighttime video generation runs.
H200 / B100 / B200 (premium, $4-10/hr)
Hopper-refresh and Blackwell cards are arriving on rental clouds in 2026. They're worth renting only if your job is bottlenecked on memory bandwidth (some long-context LLM inference) or you're training a model where time-to-result has six-figure business value. For most users, H100 is the better price/performance choice.
L4 / L40 / L40S (24-48GB VRAM, $0.80-2.00/hr)
The Ada-architecture data-center cards are excellent for inference workloads. L40S in particular has 48GB VRAM and FP8 support — great for running 70B LLMs at scale or batched image generation pipelines. Underrated, often available when A100s aren't.
RTX A6000 / A6000 Ada (48GB VRAM, $0.70-1.50/hr)
Workstation cards that sneak onto rental marketplaces, especially Vast.ai. 48GB VRAM at consumer-card pricing is genuinely useful for fitting models that don't quite fit on 24GB without quantization compromises.
Real-World Cost Examples
Theoretical hourly rates are useless without context. Here are five real workloads we run, with actual costs:
Workload 1: Daily Wan 2.2 video generation (our actual stack)
Same on CoreWeave with negotiated discount: ~$8,000-9,000
Same on AWS p5.48xlarge: ~$25,000+
Common Mistakes That Will Burn Your Budget
After 800+ hours of GPU rental, here are the mistakes we see (and have made) most often:
1. Forgetting to terminate idle pods
The single most expensive habit. You spin up an A100 to debug something, get pulled into a meeting, and 8 hours later realize you've burned $12. Set a calendar reminder. Better: configure auto-shutdown if your provider supports it. RunPod has pod timers; use them.
2. Re-downloading model weights every session
Wan 2.2 14B is 28GB. Pulling it fresh every session at 200 MB/s takes ~2.5 minutes — at $1.49/hr that's $0.06 per restart. Doesn't sound like much. Multiply by 5 model swaps per day, 20 days a month: $6 wasted. Worse, you're often paying for the GPU during the download. A persistent volume costs $15/month and pays for itself in three days.
3. Running production on Community Cloud / Vast.ai
Cheap until the host yanks the machine in the middle of your training run. For anything mission-critical, pay the Secure Cloud premium. The 30% price difference is cheaper than one lost training run.
4. Picking GPU by price alone, ignoring memory bandwidth
Diffusion models and LLM inference are memory-bandwidth-bound, not compute-bound. An H100 SXM5 has nearly 3x the memory bandwidth of an A100 PCIe — for the right workload, the H100 finishes in less than half the time, making the higher per-hour rate cheaper overall.
5. Using a hyperscaler "because that's what we already use"
If your company already has an AWS commitment, the temptation to run AI workloads on p4d/p5 is real. The 3-4x cost premium over RunPod is also real. For most ML teams, opening a RunPod account and treating it as a vendor pays for itself in the first month.
6. Not using spot/interruptible for restartable jobs
If your training loop checkpoints every 30 minutes (it should), spot pricing cuts your bill by 40-60% with effectively zero downside. The mental model "spot is unreliable" is outdated for AI workloads — checkpoint, then take the discount.
7. Buying a GPU when you should be renting
Owning a 4090 makes sense if you use it 60+ hours a month for years. For anything less, renting is cheaper, more flexible, and lets you trade up the architecture every 12 months. A 4090 costs $1,800 + ~$30/month in electricity. At $0.69/hr on RunPod, you'd need to rent for 2,600+ hours to break even, ignoring the opportunity cost of capital and the fact that you'll want an upgrade by year two.
Why You Can Trust This Review
We're Null Agency — an AI software company. We don't just review GPU clouds, we use them every day to ship products.
What we actually run on rented GPUs:
Daily AI video generation for product demos and marketing. We run Wan 2.2 14B on an A100 SXM 80GB on RunPod in US-KS-2, with a 150GB Network Volume holding our model weights. The pod runs at $1.49/hr. We've put 800+ hours on this exact configuration in the last six months.
Stable Diffusion / Flux generation for thumbnail and image assets across our product line — PhantomEtch, Faceoff, and others.
LoRA training for character-consistent video — typically 8-12 hours per training run on an A100, sometimes 4090 for the smaller models.
LLM fine-tuning for internal tools — Qwen-based fine-tunes, run on H100 nodes when we need them.
Our methodology for this article:
Spent $2,200+ across these providers in 2026 specifically for comparison testing.
Measured cold start times on each provider with identical workloads (Wan 2.2 14B → one 5-sec clip).
Verified pricing on each provider's official page, last checked June 19, 2026.
Ran our actual production workload on each container provider for at least 20 hours to assess real-world reliability.
Affiliate links are marked. We only recommend providers we actually use and pay for. RunPod has been our daily driver for over a year and was our pick long before we joined their affiliate program.
FAQ
What is the cheapest GPU rental service in 2026?
Vast.ai consistently has the lowest hourly rates because it's a peer-to-peer marketplace — an RTX 4090 can be rented for $0.30-0.45/hr from individual hosts. For datacenter-grade reliability at a low price, RunPod Community Cloud offers A100 80GB at $1.49/hr and H100 at $2.69/hr — about a third of what AWS charges for the same silicon.
How long do GPU instances take to cold start?
On RunPod, container pods boot in 30-90 seconds when the image is cached on the host node, 3-6 minutes for a fresh image pull. Vast.ai is similar. Modal serverless functions cold-start in 5-15 seconds for warm-ish functions and sub-second when a worker is already running. Lambda Labs On-Demand VMs take 60-120 seconds to boot. CoreWeave Kubernetes pods can start in under 30 seconds with pre-pulled images. Loading a 28GB model from a persistent volume adds 30-60 seconds; downloading it fresh adds 2-5 minutes.
Does RunPod have persistent storage?
Yes — Network Volumes are persistent across pod restarts and attached to a specific datacenter region. Container Volumes (the pod's local disk) are ephemeral and deleted when the pod is terminated. For any serious AI workload, Network Volumes are essential. We run a 150GB Network Volume in US-KS-2 for $15/month that holds Wan 2.2, Hunyuan, Flux, and a dozen LoRAs — letting us swap GPU pods in under 90 seconds without re-downloading anything.
What's the difference between on-demand, spot, and serverless GPUs?
On-demand: you rent a full pod by the hour, you pay even when idle. Spot/Community/Interruptible: 30-60% cheaper hourly rates but the host can reclaim the GPU with little warning. Serverless: pay per second of actual GPU compute used, with cold-start latency in the 1-30 second range. Use on-demand for long training runs and interactive work, spot for restartable batch jobs, serverless for inference APIs with spiky traffic.
Can I run uncensored AI models on rented GPUs?
Yes. Container-based providers (RunPod, Vast.ai, Lambda Labs, Paperspace, CoreWeave) give you root access inside a Linux container — you can run any open-source model including uncensored Wan 2.2, Hunyuan, fine-tunes, or your own custom training code. Managed inference platforms (Together AI, Replicate) restrict you to their curated model catalog, which generally excludes uncensored models.
What region has the best GPU availability in 2026?
US-Central (Kansas, Iowa, Texas) and US-East (Virginia, Carolinas) have the broadest availability across providers. RunPod's US-KS-2 region consistently has A100 80GB and H100 stock when other regions are sold out. EU-Romania (RO-1) is the cheapest reliable EU region. Asia-Pacific has thin availability across all providers — use Vast.ai if you must run in APAC.
Is Vast.ai safe for production workloads?
Vast.ai is great for experimentation and restartable batch jobs but risky for production. Hosts are individuals running consumer or small-datacenter hardware, uptime varies wildly between hosts, and instances can disappear with short notice. Don't put production secrets in a Vast.ai container. For production inference or training that can't fail, use RunPod Secure Cloud, Lambda Labs, or CoreWeave.
Affiliate disclosure: Some links above are partner referrals. We earn a small commission (often recurring) when you sign up through them, at no extra cost to you. We only recommend GPU providers we actually use and pay for. RunPod has been our daily driver for over a year — the affiliate relationship came after the workflow, not the other way around. Nothing in this comparison is paid placement.