• Reducing the Carbon Footprint: Energy-Saving Strategies for Data Centers
      Reducing the Carbon Footprint: Energy-Saving Strategies for Data Centers
      FEATURED INSIGHT OF THE WEEK

      Reducing the Carbon Footprint: Energy-Saving Strategies for Data Centers

      Data centers, the backbone of our digital world, are massive energy consumers. As their demand surges, utilizing renewable energy sources becomes imperative. This article explores energy consumption in data centers, projected future usage, energy-saving strategies, and the critical role of renewables in ensuring a sustainable future.

      4 minute read

      Search Insights & Thought Leadership

          Cybersecurity Trends 2026: What Changed, What Broke, and What Leaders Must Do Next
          DGX B300 Core Computing Architecture

          DGX B300 Core Computing Architecture

          The NVIDIA DGX B300 serves as a high-performance foundation for AI factories, exemplified by deployments capable of 9 quintillion calculations per second. It integrates eight Blackwell Ultra GPUs featuring a dual-die design that appears to software as a single logical unit. To accelerate reasoning, the architecture uses NVFP4 precision, reducing memory usage by 1.8×, and doubles SFU throughput for 2× faster attention performance. The system features 2.3 TB of HBM3e memory with 8 TB/s bandwidth per GPU to keep massive models resident. Scaling is enabled by NVLink 5 (1.8 TB/s) and 800 Gb/s networking within a 10 RU chassis. Effective integration requires meticulous planning of power and cooling, supported by deployment guidance from the Semifly Marketplace.

          8 minute read

          NVIDIA B300 and Generative AI

          NVIDIA B300 and Generative AI

          The NVIDIA B300, based on the Blackwell Ultra architecture, is designed to support the AI Factory model by treating high-volume inference and generative AI reasoning as the primary workloads. This infrastructure shift responds to the difficulty enterprises face in running large generative models reliably and at scale. The B300 overcomes the defining bottleneck of memory by integrating 288 GB of HBM3e capacity and 8 TB/s bandwidth, enabling support for multi-trillion-parameter models and extended context windows. Crucially, native NVFP4 inference significantly changes the economics of deployment, delivering up to 4x higher performance and 25–50x greater energy efficiency compared to FP8, while maintaining accuracy via dual-level scaling. Furthermore, specialized attention-layer acceleration and the second-generation Transformer Engine provide 11–15x higher LLM throughput per GPU, establishing a new baseline for large-scale production inference.

          9 minute read

          B300 and Networking: A Technical Architecture Overview

          B300 and Networking: A Technical Architecture Overview

          The NVIDIA B300, or Blackwell Ultra, is engineered for massive AI workloads, featuring 288 GB of HBM3e memory and a 50% increase in compute performance over its predecessor. Its architecture addresses data bottlenecks through NVLink 5, which provides 1.8 TB/s of internal bandwidth per GPU.  For multi-node scaling, B300 systems utilise 800 Gb/s InfiniBand or Ethernet connectivity via ConnectX-8 adapters. These capabilities are delivered through the DGX B300 turnkey appliance and the modular HGX B300 platform. Together, they facilitate large-scale model training and high-speed inference by ensuring compute power is not idled by slow data movement.  Think of the B300 as a high-performance racing engine; without a wide, high-speed highway (the network), it cannot reach its top speeds when working as part of a fleet. 

          17 minute read

          B300 and Networking: A Technical Introduction 

          B300 and Networking: A Technical Introduction 

          The NVIDIA B300, or Blackwell Ultra, is engineered for massive AI workloads, featuring 288 GB of HBM3e memory and a 50% increase in compute performance over its predecessor. Its architecture addresses data bottlenecks through NVLink 5, which provides 1.8 TB/s of internal bandwidth per GPU. For multi-node scaling, B300 systems utilise 800 Gb/s InfiniBand or Ethernet connectivity via ConnectX-8 adapters. These capabilities are delivered through the DGX B300 turnkey appliance and the modular HGX B300 platform. Together, they facilitate large-scale model training and high-speed inference by ensuring compute power is not idled by slow data movement. Think of the B300 as a high-performance racing engine; without a wide, high-speed highway (the network), it cannot reach its top speeds when working as part of a fleet.

          17 minute read

          NVIDIA Blackwell Ultra GPUs - Pillar of moder datacenters

          NVIDIA Blackwell Ultra GPUs - Pillar of moder datacenters

          The NVIDIA Blackwell Ultra (B300) defines a new standard for AI infrastructure, shifting the industry focus from merely adding more GPUs to maximizing efficiency, measured by tokens-per-watt and cost-per-million-tokens. B300 achieves dramatic performance gains over Hopper (7.5× dense throughput) by transitioning to a dual-die unified GPU architecture (208B transistors) and introducing the inference-optimized NVFP4 precision format. The platform is designed to scale as an "AI fabric" via the NVL72 system, where 72 GPUs operate as a single logical computer, achieving 1.1 exaFLOPS of FP4 compute. Although B300 requires Direct Liquid Cooling (DLC) due to its 1,400W power density, this shift ultimately lowers OpEx through increased cooling efficiency. Economically, this efficiency enables systems like the GB200 NVL72 to deliver returns as high as 15× the initial investment.

          9 minute read

          NVIDIA B300 Features and Capabilities

          NVIDIA B300 Features and Capabilities

          The NVIDIA DGX B300, launched in March 2025 and built on the Blackwell Ultra architecture, is an advanced AI infrastructure designed to handle complex reasoning, real-time inference, and generative AI workloads simultaneously. It supports the entire AI lifecycle—training, fine-tuning, and inference—on a single platform, reducing delays and fragmentation. The B300 features eight Ultra GPUs with 288 GB of HBM3e each, totaling 2.3 TB across the system, enabling high throughput for models processing extremely long context windows. Data flow is managed by a fifth-generation NVLink internal fabric (14.4 TB/s aggregate bandwidth) and external ConnectX-8 SuperNICs (up to 800 Gb/s) for multi-node clustering. To maintain performance, the system separates AI compute from infrastructure control. A BlueField-3 DPU handles networking, storage, and security tasks, ensuring the Ultra GPUs focus purely on model execution. The operational backbone is managed by software layers like Mission Control, NVIDIA AI Enterprise, and the Dynamo inference layer. Access to the B300 is streamlined through the Semifly Marketplace, which offers configurations and deployment guidance

          8 minute read

          NVIDIA B300 Software Stack: What You Need to Know

          NVIDIA B300 Software Stack: What You Need to Know

          The B300 GPU is optimized explicitly for Generative AI and complex reasoning workloads, depending on the mandatory B300 Software Stack to maximize low-precision performance like NVFP4 and manage its dual-die hardware. The Foundational Infrastructure layer runs on NVIDIA DGX OS and requires CUDA Toolkit 13.1 or later. A key innovation is NVIDIA CUDA Tile, which updates the programming model to abstract hardware complexity, letting developers use logical data "tiles" for improved performance and code portability. Specialized APIs, including MLOPart and Static SM Partitioning, enable predictable multi-tenancy and efficient resource isolation. The stack also includes accelerated frameworks, such as TensorRT-LLM, and orchestration tools like NVIDIA Mission Control and AI Enterprise, providing a production-grade foundation for large-scale GenAI deployment.

          9 minute read

          Dell XE9680 AI Benchmark

          Dell XE9680 AI Benchmark

          The Dell PowerEdge XE9680 is a flagship 8-GPU, 6U server engineered to overcome infrastructure bottlenecks and move enterprises from experimental AI to full-scale production. It is built around dual 4th or 5th Gen Intel Xeon Scalable processors and supports up to 4TB of DDR5 memory with PCIe Gen 5.0 I/O. A key advantage is its flexible accelerator ecosystem, allowing choice between NVIDIA (H100/H200), AMD (MI300X), or Intel (Gaudi 3) GPUs without requiring a platform redesign. Performance benchmarks show up to 1.8× faster BERT pre-training and 2× higher inference throughput, demonstrating minimal communication bottlenecks and sustained utilization. The XE9680 provides operational efficiency; for example, AMD configurations offer 10–20% acquisition savings, enabling organizations to balance cost and performance for diverse AI workloads. Security is maintained through a Cyber Resilient Architecture and TPM 2.0.

          9 minute read

          H200 NVL AI Inference Benchmarks: Setting a New Standard for Enterprise AI Performance 

          H200 NVL AI Inference Benchmarks: Setting a New Standard for Enterprise AI Performance 

          The NVIDIA H200 NVL GPU redefines enterprise AI inference performance by focusing on higher throughput and efficiency for complex workloads like large language models and computer vision systems. The architecture features a substantial upgrade to 141 GB of HBM3e memory with 4.8 TB/s bandwidth, enabling larger AI models to fit entirely within GPU memory, minimizing latency and the need for partitioning. The H200 NVL utilizes fourth-generation NVLink for direct GPU communication up to 900 GB/s, crucial for efficient multi-GPU scaling in generative AI deployments. MLPerf Inference benchmarks confirmed the H200 NVL's advancements, demonstrating up to 1.8x higher performance in LLM inference compared to the H100 PCIe configuration. Furthermore, it offers superior performance-per-watt, resulting in lower energy and infrastructure costs for enterprises scaling their AI services. The NVL form factor is specifically designed for inference and provides flexible deployment options, excelling in recommendation systems and mixed workload environments

          12 minute read

          1–10 of 367 items
          of 37 pages
          semifly
          About Us