Understanding vGPU and MIG: How They Work for Multi-Tenant Environments

If you’re carving one big GPU across many teams, you’ve got two main tools: vGPU and MIG. One slices time. The other slices hardware. Use them rig

author avatar

0 Followers
Understanding vGPU and MIG: How They Work for Multi-Tenant Environments

If you’re carving one big GPU across many teams, you’ve got two main tools: vGPU and MIG. One slices time. The other slices hardware. Use them right and you get high utilization without cross-tenant noise. Use them wrong and you’ll chase flaky performance and weird failures. Here’s a plain-English map.

Quick definitions

Let’s agree on terms before we compare trade-offs.

  • vGPU lets multiple VMs share a single physical GPU through the hypervisor. Profiles define how big each vGPU is, and a scheduler time-slices GPU engines between VMs.
  • MIG (Multi-Instance GPU) splits a supported GPU into several hardware partitions. Each partition gets dedicated SMs and its own path through L2 and memory controllers, giving stronger isolation and steadier QoS.


How vGPU actually works

Understanding the scheduler helps you predict “noisy neighbor” effects.

By default, a time-sliced vGPU runs workloads in series. Each vGPU takes a turn, then yields to the next. Admins can switch policies: best-effort, equal-share, or fixed-share, and even change time-slice length in milliseconds. That’s handy when you’re balancing latency vs throughput.

Where vGPU shines: classic VDI and GPU-accelerated desktops, bursty dev workloads, or compute jobs that tolerate small jitter in exchange for higher overall density. Hypervisors like VMware vSphere, KVM variants, and Citrix Hypervisor are supported via the vGPU manager and product support matrix.

How MIG actually works

MIG is different because it’s spatial partitioning.

On A100, H100/H200, and newer Blackwell-class parts, you can create multiple GPU Instances (and Compute Instances inside them). Each instance has dedicated SMs plus isolated L2 and memory controller slices. That’s why a chatty neighbor can’t steal your cache or flood your DRAM bus. Result: predictable throughput and latency per tenant.

Where MIG shines: multi-tenant inference, small to mid training runs, and regulated or noisy environments where you want hard isolation with consistent QoS.

Can you use them together? Yes

On newer stacks you can back a vGPU with a MIG slice.

vGPU “with MIG” lets you first carve the GPU into MIG instances, then hand a single MIG slice to each VM as a vGPU. You get VM-level management and live-migration features from the hypervisor, plus MIG’s hardware isolation inside the box. Recent vGPU releases document this for vSphere.

Kubernetes and containers

Two common sharing modes in clusters: time-slicing and MIG.

  • Time-sliced sharing in K8s splits GPU access evenly across pods on a node. It’s simple and good for dev or bursty jobs, with trade-offs similar to vGPU time slicing.
  • MIG in K8s exposes resources like nvidia.com/mig-1g.5gb through the NVIDIA device plugin and GPU Operator. Scheduling sees each slice as a separate, isolated “mini-GPU.”

Read this detailed article to know more about the Benefits And Use Cases Of Kubernetes in the Cloud

Isolation and QoS differences (the heart of it)

Pick this based on how strict you need to be.

  • vGPU (time-sliced): fairness is a scheduling policy. Tenants share engines over time, so latency can jitter under load, especially with many vGPUs. Policies like equal-share or fixed-share help, but this is still temporal sharing.
  • MIG (spatial): each slice owns dedicated SMs, L2 banks, crossbar ports, and memory controllers. That eliminates most interference and gives steadier per-tenant performance.


Performance shape and capacity planning

A few rules keep surprises down.

  • Small, steady workloads: MIG gives stable latency and avoids context-switch tax. Good for inference services and multi-tenant APIs.
  • Interactive or spiky dev use: vGPU time slicing pushes density up and can be tuned for responsiveness with shorter time slices or equal-share scheduling.
  • Strict SLOs with noisy neighbors nearby: prefer MIG or vGPU-with-MIG. That’s the predictable path when teams share hardware.


Profiles and naming you’ll see in the wild

You’ll bump into profile strings during setup and scheduling.

  • MIG profiles define how many GPU “g” slices and how much memory a slice gets, like 1g.5gb or larger shapes on newer parts. These show up as schedulable resources in K8s.
  • vGPU profiles map to memory and feature sets for each VM. You pick them in your hypervisor when attaching a vGPU to a guest.


Supported GPUs and where MIG exists

Don’t assume your card can do it.

MIG starts on Ampere and continues through Hopper and Blackwell. NVIDIA keeps a live table of supported GPUs and profiles. Check it before you promise a split-tenancy plan on older hardware.

Licensing and editions in brief

Budget and procurement will ask.

NVIDIA vGPU features require licensing on the host and guests. Editions include vPC, vApps, and vWS for graphics, plus licensing guidance for compute virtualization in current docs. Without licenses, features run in a reduced state. Plan this up front to avoid “it works in eval, not in prod.”

Setup snapshots

  • vGPU on hypervisors: install the vGPU Manager on the host, pick the vGPU profile, install guest drivers, and configure licensing. Supported stacks are listed in NVIDIA’s product support matrix.
  • MIG on bare metal or K8s: enable MIG mode, create instances, then let the device plugin or GPU Operator surface them as resources. Start small with one node before you roll cluster-wide.
  • vGPU with MIG on vSphere: enable MIG, create slices, then assign a MIG-backed vGPU to each VM. Treat each slice like a mini-GPU you can vMotion and manage.


Common pitfalls and fixes

  • “We enabled vGPU but latency is spiky.” You’re seeing time-slice effects. Try equal-share or fixed-share, adjust the slice length, or move strict tenants to MIG.
  • “Our K8s pods still fight each other.” If you used time-slicing, switch to MIG resources like nvidia.com/mig-1g.5gb for true isolation.
  • “Can we run Blackwell slices the same way as A100?” Yes in principle, but check the current MIG profile table and driver requirements for that generation before scripting everything.

Choosing between vGPU, MIG, and vGPU-with-MIG

A quick decision map you can actually use.

  • Desktop, IDEs, light GPU tools for many users → vGPU time-sliced profiles. Tune scheduler for responsiveness.
  • Multi-tenant inference or small training with SLOs → MIG slices per tenant or per service for stable QoS.
  • You need VM management and hard isolation → vGPU-with-MIG on a supported hypervisor. Good balance of ops control and determinism.
  • Kubernetes only → prefer MIG for production isolation, use time-slicing for dev pools. Device plugin and GPU Operator handle both.

A short checklist to avoid surprises

  • Confirm your GPU model supports MIG and note the exact profiles you plan to expose.
  • Pick your sharing mode per tenant: time-sliced vGPU, MIG, or vGPU-with-MIG. Document why.
  • If using vGPU, set an explicit scheduler policy and time-slice length instead of leaving defaults. Track p95 latency.
  • If using K8s, install the NVIDIA device plugin and GPU Operator, then request nvidia.com/mig-* resources in deployments.
  • Bake license steps into your buildbook so hosts don’t drop into reduced capability after reboot.

Bottom line

vGPU boosts density by sharing time. MIG delivers steadier QoS by sharing hardware. If you match the tool to the tenant and wire the scheduler or profiles deliberately, multi-tenant GPUs behave like you expect instead of like a lottery.


Top
Comments (0)
Login to post.