Aximmetry Dual-GPU Compute Decoupling

June 3, 2026 Aximmetry Tutorials AximmetryCN

Under the extreme workloads of 4K/8K broadcast-grade virtual production (VP), technical teams often encounter on a single workstation“GPU Compute Redline”。

When you enable Lumen, full dynamic ray tracing, and Nanite in Unreal Engine 5 (UE5) and attempt to render a cinematic-grade complex scene, GPU usage is already approaching 90%. At this point, if you also need to run the following tasks on this same GPU:

Receiving two channels of 4K 60fps 10-bit SDI physical camera input;
Running high-precision, sub-pixel real-time advanced green screen keying (Chroma Keyer);
Performing 3D depth map compositing, redistortion, and 3D garbage mask calculations;
Finally outputting one channel of 4K 60fps video to the broadcast truck.

This will cause the GPU'sVRAM Bandwidth Saturation与 PCIe Physical Bus Congestion。

Under a single GPU architecture, UE5's heavy 3D rendering threads and high-throughput video I/O threads will fiercely compete for CUDA cores and DMA (Direct Memory Access) channels within the graphics card. This resource contention directly leads to physical-level micro-stuttering and can even cause Direct3D12 driver response timeouts, resulting in a catastrophic“D3D12 Device Lost (Graphics Driver Crash)”, causing an instant black screen during the live broadcast.

Native UE5 has extremely poor support for parallel rendering across multiple GPUs (SLI is dead, DX12 mGPU has a very high barrier to entry and is highly unstable).

Aximmetry Leveraging its industrial-grade “Dual-GPU Heterogeneous Compute Decoupling (Dual-GPU Offloading)” topology architecture, it completely severs the resource entanglement between 3D rendering and video I/O.

I. Physical Isolation: Low-Level Partitioning of GPU Affinity

Aximmetry's first step in solving compute contention is to perform “physical zoning” of the hardware at the operating system level.

In Aximmetry's system settings, engineers can enable the hardcore GPU Affinity scheduling strategy:

GPU 0: Dedicated 3D Rendering Sandbox

Aximmetry assigns the first graphics card (e.g., RTX 6000 Ada) completely and cleanly to Unreal Engine 5. During runtime, UE5 can only “see” and use GPU 0. All of its VRAM and CUDA compute power is 100% dedicated to calculating complex 3D geometry, Lumen light fields, and material shading. No video capture, keying, or output tasks ever consume any bandwidth from GPU 0.

GPU 1: Dedicated I/O and Compositing Engine

Aximmetry assigns the second graphics card to itself. Aximmetry's extremely powerful 2D/3D hybrid compositing engine, SDI/NDI video input/output stream control, real-time Keyer (chroma keyer), and post-processing Color Lookup Table (LUT) calculations all run on the physical chip of GPU 1.

This “one country, two systems” physical isolation ensures that even if UE5 experiences a momentary stutter while rendering an extremely complex scene, the broadcast-grade video output stream managed by Aximmetry will still output with an absolutely stable 59.94Hz lock, never sending a single corrupted frame to the broadcast output.

II. Direct GPU Connection: Zero-CPU-Copy Cross-Card Transfer Based on P2P

Since rendering and compositing are split across two different graphics cards, an unavoidable problem arises:How to transfer the 3D virtual background rendered by GPU 0 to GPU 1 for final keying and compositing?

Under the traditional Windows driver architecture, cross-card transfer requires an extremely inefficient “round-trip path”: GPU 0 VRAM -> PCIe Bus -> System Memory (CPU) -> PCIe Bus -> GPU 1 VRAMThis double translation not only introduces extremely high latency but also instantly saturates the PCIe bus bandwidth, leading to severe frame latency.

Aximmetry adopts a low-level PCIe Peer-to-Peer (P2P) 与 DirectGMA technology:

Bypassing the CPU for Physical Direct Connection

With hardware support (e.g., supporting NVLink, or enabling Resizable BAR on PCIe 4.0/5.0 slots), Aximmetry establishes a directPhysical Peer-to-Peer Channel between the Graphics Cards。

Direct Memory Mapping (Memory Mapping)

Aximmetry directly maps the final 3D render target output by GPU 0 into the physical address space of GPU 1. When GPU 1 performs keying and compositing, it directly and transparently reads the pixel data from GPU 0's VRAM via the PCIe bus. The entire transfer process is completed entirely at the hardware level by the DMA controller on the graphics cards, with 0% CPU usage, and data transfer latency is compressed to the near-physical limit of microseconds.

III. Dynamic Load Balancing: Optimal Configuration for Maximizing Dual-GPU Compute

Under the dual-GPU decoupling architecture, Aximmetry also allows technical directors to perform fine-grainedLoad Balancingconfiguration in the Flow Graph to handle different production requirements:

Heavy Keying Offloading

If the on-site signal is ultra-high-spec 4K 120fps live footage, the computational load for keying increases geometrically. Aximmetry automatically schedules heavy post-processing shaders like Gaussian blur, edge feathering, and vignette compensation to run on GPU 1, ensuring GPU 0 (UE5) can output smoothly.

Light Load Adaptive Merging

In lightweight scenarios that don't require extreme performance, Aximmetry's dynamic scheduler can also merge the two graphics cards for use, achieving maximum energy efficiency by intelligently distributing render pipelines.

Conclusion: An Industrial-Grade Voltage Regulator That Squeezes Hardware Dry

On the battlefield of live production, where not a single error is tolerated, the brute-force rendering mode of “one card does it all” is destined to become the most dangerous system time bomb when facing the dual assault of ultimate image quality and broadcast-grade high-bandwidth I/O.

Unreal Engine 5 is a high-energy visual beast that needs to be carefully fed and protected, not drained of its last ounce of strength by messy external signals.

Aximmetry With the hardcore architecture of dual-GPU compute decoupling, it acts as the perfect“Industrial-Grade Voltage Regulator”。

It rigidly isolates rendering and I/O through GPU affinity, uses P2P direct VRAM access technology to eliminate cross-card transfer latency, and squeezes the physical hardware performance of the workstation to its absolute limit. Under Aximmetry's command, the two streams of virtual and real compute power run at high speed on their respective tracks, leveraging each other without interference, jointly supporting a cinematic-grade real-time vision that combines ultimate image quality with steel-like stability.

If there is any infringement or copyright concern, please inform us. We will handle the related content within 24 hours. Thank you! Contact email: southcul@163.com