

# Xilinx 5G Telco Accelerator Cards

Xilinx Wired and Wireless Group

© Copyright 2020 Xilinx

### **Overview**

 Xilinx is expanding our 5G product offering with the addition of Telco Accelerator Cards



The first card in our Telco Accelerator series is called "T1"

 T1 is a two-in-one 75W "plug and play" card that accelerates Fronthaul and L1 Functions and is already in limited sampling

The "Virtualization of 5G" provides a great opportunity to expand our existing footprint in 2<sup>nd</sup> and 3<sup>rd</sup> phases of 5G deployments



### **Radio Access Network**



## What is the Virtualization of 5G?

### **Traditional Model (LTE)**

- OEM's create Proprietary equipment for Wireless Deployments
- 5G Operators are moving away from this model
  - "Vendor Lock" creates a lack of competition
  - Difficult for Operators to deploy Software Services
    - VR, Gaming, Automotive etc.

### **Open-RAN Model (5G)**

- Virtual BBU implemented in a standard server form factor
  - Similar to what we saw with "Open Compute" 10 years ago
  - Utilize Open Interfaces for Multi-Vendor Compatibility
  - New players can drive competition and innovation
  - Software Services can now be deployed all the way to the Edge!







5G vBBU (Standard Rack-Mount Server)





## **Open RAN Concept is Growing in Popularity... Fast**

 In 2019 and prior, most Xilinx 5G customer demand came from traditional OEMs

- In 2020, demand dramatically shifted towards Open and Virtualized RAN architectures
  - O-RAN Radio Unit (O-RU)
  - O-RAN Distributed Units (O-DU)

 Xilinx Telco Accelerator Cards address the O-DU portion of the Open vRAN Market



Source: Data extracted from ABI Research (www.abiresearch.com)





# **5G Virtualization**



### **Traditional Base Band Unit**



### Traditional Baseband Unit (BBU) from an OEM

Let's look at what's inside...



### What's Inside a Traditional Base Band Unit?

**Traditional BBU** 

#### Chips inside a traditional BBU



#### General Purpose Processor

L2/L3 Protocol Layer Processing Chips Used  $\rightarrow$  x86 or ARM

#### **Fronthaul FPGA**

Terminates CPRI traffic to/from Radio Unit Chips Used  $\rightarrow$  Mid-sized FPGA's like Kintex or Zynq

#### Layer 1 Baseband ASIC or FPGA

Low-PHY and High-PHY Functions Chips Used  $\rightarrow$  First Deployments are FPGA, then ASIC's in 2<sup>nd</sup> Gen

## Introducing T1 – Fronthaul and L1 Offload FPGA Card



#### Same Chips – But in a O-RAN Compliant PCIe Card!

Frees up GPP's for Software at the Edge

### **O-RAN Virtual BBU in a Commodity Server**



#### Standard Server

Can be ruggedized or not, based on environment Available from Dell, SuperMicro, HPE etc.

#### **General Purpose Processor**

Upper-Layer Protocol Layer Processing (Open RAN etc.) Can be x86 or ARM

#### Fronthaul and L1 Offload in PCIe Cards • - -

Processors cannot handle high-volume 5G traffic alone L1 and Fronthaul functions managed in PCIe Cards





## New Open RAN Virtual Base Band Unit (vBBU)



#### **New vBBU Distributed Unit** Open, Disaggregated, <u>FPGA Accelerated</u>!





# **T1 Deeper Dive**



## **T1 Offloads Difficult Functions from the CPU**

### **O-RAN Stack**



### **Demonstrated T1 Performance using FlexRAN on Dell R740**

| Stand-Alone S                                                                        | Server (No T1)                                                                                           |
|--------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
|                                                                                      |                                                                                                          |
| Serv                                                                                 | /er                                                                                                      |
| MAC<br>Split 6<br>Channel Encoding HARQ<br>Rate Matching<br>Scrambling<br>Modulation | MAC<br>Channel Decoding HARQ<br>Rate De-Matching<br>Descrambling<br>Demodulation<br>IDFT<br>Equalization |
| Layer Mapping<br>Fronthaul eCPRI<br>PTP Timing                                       | Channel Estimation Fronthaul eCPRI PTP Timing                                                            |

#### L1: Single Thread XEON Gold in Dell R740

| L1 Performance | Throughput | Latency |
|----------------|------------|---------|
| Encoder        | 0.718 Gbps | 45 us   |
| Decoder        | 0.183 Gbps | 62.7us  |

#### FH: 4T4R @100 MHz OBW

| Fronthaul                      | FH Bandwidth  | # XEON Cores |
|--------------------------------|---------------|--------------|
| 2 Sectors<br>w/ Redundant Port | NIC-Dependent | 24           |
| 4 Sectors                      | NIC-Dependent | 64           |

| Serve                                                                                                                  | er With T1                                                                           | Card                                                                                                           |
|------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| Serve                                                                                                                  | <b>• +</b><br>•r                                                                     | T1 Card                                                                                                        |
| MAC<br>Channel Encoding<br>Rate Matching<br>Scrambling<br>Modulation<br>Layer Mapping<br>Fronthaul eCPRI<br>PTP Timing | - Split 6<br>HARQ Channel<br>Rate De<br>Descri<br>Demo<br>Equa<br>Channel<br>Frontha | AC<br>Decoding HARQ<br>-Matching<br>ambling<br>dulation<br>PFT<br>lization<br>Estimation<br>ul eCPRI<br>Timing |
| L1: Single Thread                                                                                                      | d XEON + T1 Ca                                                                       | r <mark>d</mark> in Dell R740                                                                                  |
| L1 Performance                                                                                                         | Throughput                                                                           | Latency                                                                                                        |
| Encoder                                                                                                                | 17.7 Gbps                                                                            | 14.15 us                                                                                                       |
| Decoder                                                                                                                | 7.8 Gbps                                                                             | 16.21 us                                                                                                       |

#### FH: 4T4R @100 MHz OBW

| Fronthaul                      | FH Bandwidth  | # XEON Cores |
|--------------------------------|---------------|--------------|
| 2 Sectors<br>w/ Redundant Port | 2x 23.48 Gbps | 1            |
| 4 Sectors                      | 46.96 Gbps    | 2            |

#### T1 Provides L1 Performance **42x** Higher Encoder Throughput **24x** Higher Decoder Throughput 3.2x Lower Encoder Latency 3.8x Lower Decoder Latency

#### T1 Provides Real 5G Fronthaul Sub ns PTP Timestamping **ORAN Layer Mapping in NIC** HW Redundancy and Fallback \*Traditional NIC's provided none of this

**EXILINX**.

14

Number of iteration = 8

N=3456. K= 2816

© Copyright 2020 Xilinx

Number of iteration = 8 N=3456, K= 2816

### **Available Reference Designs**

L1 Software **BBDev API** L1 Stack QDMA Driver T1 Card PCIe Gen3 QDMA CB CRC Attach / Rate Matching / De-matching Detach LDPC Encode LDPC Decode HARQ Engine

L1 Reference Design

**Fronthaul Software** C/U/S/M-Plane Software DPDK API's and IQ Streaming I/F Linux TCP Stack and **QDMA** Driver T1 Card PCIe Gen3 **QDMA** S/M Plane Queues IQ Streaming IP **O-RAN Framer** Synchronization eCPRI Framer Packet Parser and Interconnect 25G + PTP 25G + PTP

Fronthaul Reference Design

#### **Fully Operational Reference Designs**

Removes adoption barriers for companies that are not FPGA savvy

#### FlexRAN Software Stack

Layer-1 → BBDev standard API's Fronthaul → DPDK Drivers

#### **Standard QDMA Interface**

Same interface used by Alveo

### **FPGA IP and Integration is already done!**

No need for RTL team or additional 3<sup>rd</sup> parties



## **T1 Deployment Scenarios**

#### **2 Sector with Redundancy**

High Reliability Deployment One SFP28 for Traffic Second SFP28 for Fallback Full ORAN Classification on Card



#### **Oversubscribed with FHGW**

Highest RU / DU Ratio

Requires external FHGW for oversubscription

Moves DU closer to Core

Can merge DU and CU



#### **4 Sector Direct to RU's**

High RU / DU ratio w/o FHGW Scalable (more cards = more radios) Uses Radio Daisy Chaining



### **Highly Scalable**

More T1 Cards = More Radios Fronthaul and L1 scale together Frees up XEON's for Operator Services







## **Review of T1**

#### **Replaces two incumbent cards with a single T1**

Fronthaul and L1 on a single 75W card

|              | T1 (Sampling Now)            |
|--------------|------------------------------|
| Form Factor  | FHHL PCIe Card               |
| Optimization | Hybrid (Fronthaul + L1)      |
| FH Ports     | 2x SFP28 + 1588              |
| FH BW        | 4 sectors of 4TRX @ 100HMz * |
| IEEE 1588    | Yes – Stamp at PHY           |
| L1 Encode    | 17.7 Gbps *                  |
| L1 Decode    | 7.8 Gbps *                   |



| T1 Performance Advantage   |                           |
|----------------------------|---------------------------|
| 42x                        | Higher Encoder Throughput |
| 24x                        | Higher Decoder Throughput |
| 3.2x                       | Lower Encoder Latency     |
| 3.8x                       | Lower Decoder Latency     |
| Sub ns PTP Timestamping    |                           |
| ORAN Layer Mapping in NIC  |                           |
| HW Redundancy and Fallback |                           |

# **XILINX**<sub>®</sub>

# **Thank You**



© Copyright 2020 Xilinx

## Xilinx Mission

# Building the Adaptable, Intelligent World