

# Vivado Adapt 2021 Design Closure

**Timing Closure Assistance Tools** 

February 12, 2021



# **Design Closure Sessions**



#### Session 1

Methodology, tips, and tricks for achieving better Quality-of-Results

#### Session 2

Using Timing Closure Assistance tools to address tough timing issues

#### Session 3

Power Constraints, best practices for an accurate Report Power estimation





- Shorter Design Cycles with QoR Assessment
- Fewer Iterations and Quicker Analysis with QoR Suggestions
- Versal and Vitis
- ▶ Full Automation using I<sup>2</sup> Flow

# **Shorter Wasted Design Cycles**

#### Report QoR Assessment (RQA)



- Scoring 1 Bad/5 Good
- Score is based on best metrics available at the time
  - Opt Netlist / Utilization / Estimated Netlist Timing
  - Place Accurate clock skew, congestion picture available

```
synth_design
report_qor_assessment -file postsynth_rqa.rpt
...
report_methodology -file methodology.rpt
opt_design
report_qor_assessment -file postopt_rqa.rpt
...
place_design
report_qor_assessment -file postplace_rqa.rpt
...
phys_opt_design
report_qor_assessment -file postplace_rqa.rpt
...
route_design
report_qor_assessment -file postplace_rqa.rpt
```

## **QoR Assessment and Next Step Guidance** report\_qor\_assessment (RQA)

#### Assessment Scores

- 1 Implementation will fail
- 2 Timing will fail
- 3 Timing difficult
- 4 Timing fair
- 5 Timing easy to meet

Rule of thumb: Est. +/- 1 from final score

#### ML Strategy Availability

 Tracks implementation directives for strategy generation

| QoR Assessment Score                            | 2 - Im                 | plementatio | on may con | mplete. Ti    | -                  | t |
|-------------------------------------------------|------------------------|-------------|------------|---------------|--------------------|---|
|                                                 | Run re                 | port_method | dology and | d fix or v    | waive critical war |   |
| . QoR Assessment Deta:                          | ils                    |             |            |               |                    |   |
|                                                 | Thresh                 | +           | Used       | Availa        | Status             |   |
| Utilization                                     | +<br>                  | +<br>       | +<br>I     | ++<br>        | +                  |   |
| SLRs - 1                                        | l                      | l.          | l          |               | i i                |   |
| Registers                                       | 50.00                  | 51.79       | 447444     | 864000        | REVIEW             |   |
| Control Sets                                    | 7.50                   | 10.13       | 10936      | 108000        | REVIEW             |   |
| Clocking                                        | l i                    | L           | I          |               | 1                  |   |
| Setup Skew                                      | -0.350                 | -0.720      | - 1        | I – I         | REVIEW             |   |
| Hold Skew                                       | 0.350                  | 0.410       | - 1        | I – I         | REVIEW             |   |
| Congestion                                      |                        | l           |            |               |                    | > |
| Number of Level 5                               |                        | i l         | I          | i i           |                    |   |
| Global                                          | I 0                    | I 4         | - 1        | - i           | REVIEW             |   |
| Short                                           | 0                      |             |            |               | REVIEW             |   |
| Timing                                          | -                      |             |            |               |                    |   |
| -                                               |                        | -8.628      | -          | - 1           | REVIEW             |   |
|                                                 |                        | -165133     |            |               | REVIEW             |   |
|                                                 |                        | -0.571      |            |               | REVIEW             |   |
|                                                 |                        | -2291.48    |            |               | REVIEW             |   |
| . ML Strategy Availab:<br>Conditions for ML Str | ility<br><br>rategy Av | -           | -+         | +<br>  Status | <b>ر</b><br>+      |   |
| opt_design directive<br>place_design directiv   |                        |             | Explore    | e I OF        | []                 |   |
| phys_opt_design dired                           | ctive                  |             | 1          |               | •                  |   |
|                                                 |                        |             |            |               |                    |   |

#### Flow Guidance What is the best thing to do next? **Review Methodology** ۲ Improve design using RQS **Run ML Strategies** ٠ **Run Incremental Flow** Assessment Details Items marked as REVIEW score < 5Compare Score + Threshold + ٠ Actual Simplified 2020.2 ٠ Clock skew checks

Multi checks

**E** XILINX.

© Copyright 2021 Xilinx

# **Using Report QoR Assessment**



#### 1. Overall Assessment Summary

#### -----







1. Overall Assessment Summary

| QoR Assessment Score | 3  | - Design  | runs h | nave a | small  | chano | e of | suc | cess  |        |        |
|----------------------|----|-----------|--------|--------|--------|-------|------|-----|-------|--------|--------|
| Flow Guidance        | Tr | y running | report | _qor_  | sugges | tions | and  | fix | items | marked | REVIEW |

#### 2. QoR Assessment Details

\_\_\_\_\_



## Automated Design Improvements report\_qor\_suggestions (RQS)

- Automates the analysis and resolution of issues that lower QoR
  - Simplify timing closure + higher QoR Effort Level Required



- Implement => Analyze => Rerun w/ Fix
- QoR focused
  - Focus on Internal FPGA timing
  - V. Limited XDC Constraints
  - No IO timing/HLS/IP/Power Optimization/Runtime
- 75% impl / 25% synth
  - Applies mostly properties and occasionally switches

#### QoR Gain



- Best case 40% gain
  - Clocking Sync CDCs + Safe clock startup + Congestion
- Typical case 4 12%
  - Smaller congestion + placement + timing path issues
- Last mile case < 2%</li>
  - Most difficult Timing path / placement issues

#### Automated QoR Suggestions report\_gor\_suggestions (RQS)





# **Best Use of QoR Suggestions**

#### GENERATED\_AT

- Post opt
  - Accurate utilization numbers
  - Netlist based timing checks
  - Repeatable
- Post place
  - Accurate clock skew numbers
    - Repeatable
  - Initial congestion picture
  - Timing issues due to placement
    - Run dependent
- Post route
  - Accurate congestion picture
  - Fully routed timing paths
  - Run dependent



- Optimal Strategy
  - Early when design is new or modules are added
    - After opt / place resolve basic issues
  - Later phases best run during design closure
    - After route\_design is enough

#### Simple Strategy

- Run after route\_design

## RQS – Synthesis Retiming Suggestions

- 3 Retiming suggestions
  - Can overlap

#### RETIMING\_FORWARD / BACKWARD

- 2020.1
- Failing path targeted
- Moves logic by 1 logic level

#### BLOCK\_SYNTH.RETIMING

- 2020.2
- Limits logic levels for entire module
- Harder to predict final outcome

| Q         X         ⊕         C         Q         X         ⊕         ID         Ø           General         ID         ID         Ø         ID         ID         Ø         ID         ID         Ø         ID         ID         Ø         ID         I | APPLICABLE_FOR<br>synth_design | Yes                | DESCRIPTION Improve timing on critical path using RETIMING_FORWARD property. |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------|------------------------------------------------------------------------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | _<br>synth_design              | Yes                |                                                                              |
| GENERATED © RQS_TIMING-33-1 ©<br>~ Timing © RQS_TIMING-44-1 ©                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                |                    | Improve timing on critical path using RETIMING_FORWARD property.             |
| ✓ Timing      RQS_TIMING-44-1     ✓                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                |                    | Improve timing on critical path using RETIMING_FORWARD property.             |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | synth design                   |                    |                                                                              |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                | Yes                | Improve timing on critical path using RETIMING_BACKWARD property.            |
| RQS_TIMING-33-1 RQS_TIMING-27-1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | synth_design                   | Yes                | Improve module level timing by using BLOCK_SYNTH.RETIMING property.          |
| RQS_TIMING-44-1 VE XDC                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                |                    |                                                                              |
| V XDC V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                |                    |                                                                              |
| K ML Strategies are available only in                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | default/explore at succ        | cessfully routed o | design.                                                                      |



#### BLOCK\_SYNTH.RETIMING



#### **QoR Suggestions – Implementation** Congestion and Clocking

#### > 2020.2 - Congestion Strategy

- 1. First ensure clocking + utilization is optimal
- 2. GT clocking issue
- 3. Focus on edge of die / PS corner
- 4. Largest congested area (2020.1 top item)

#### RQS\_CLOCK-8

- Optimize CLOCK\_ROOT for Sync CDCs
- Shortens clock tree length and reduces uncertainties
- Lower priority than CLOCK\_DELAY\_GROUP suggestion



# **ML Strategies and Last Mile Timing Closure**

#### ML Strategies



- Strategy that is optimized for your design
  - 7% gain over Default directive
  - 3-4% over Explore directive
- Training run requirement
  - Opt design directive Default or Explore
  - Place\_design, phys\_opt\_design and route\_design
    - All Default
    - All Explore

#### Last Mile Timing Closure



- Success is not guaranteed
  - 20% success
    - Suggestions impact worst case paths
    - Design WNS < -0.100
  - Possible
    - Design WNS < -0.450

# Versal, Vitis and DFX

#### Versal

- Support for RQA / RQS
  - Suggestions and training is limited but does scale well from Usc+
  - Clocking suggestions specific to Versal
- ML Strategies + Last mile
  - Work in progress

#### Vitis

- Vitis designs performing well with RQS
  - Not natively supported, work in Vivado project

## DFX

- 2020.2 Added –cell support to RQA and RQS
  - Needs to be called manually Auto 2021.1
  - 1 cell at a time







# I<sup>2</sup> Flow Details



**E** XILINX.

# <sup>2</sup>Flow: Customer Designs

#### Customer Design

- Baseline WNS -1.354

# Good improvement through the stages

|        | -                                                                                                                                                                                              | I2 (Inte                                  | lligent x Iterat                                                        | ive) Flow Summary                                         | y                                                                         |                                                                                                                 |                                 |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|-------------------------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------|
|        | I2 Flow Stage                                                                                                                                                                                  | I2 Flow Step                              | PostRoute-WNS                                                           | PostRoute-RQA                                             | Route-Status                                                              | Suggestions & Strategy files                                                                                    | Status                          |
| →<br>→ | RQS Design Improvements<br>RQS Design Improvements<br>RQS Design Improvements<br>RQS Design Improvements<br><u>RQS Design Improvements</u><br>Expand Placer Solution<br>Expand Placer Solution | Utilization<br>  Clocking<br>  Congestion | - 1.354<br>-<br><br>- 0.409<br>- 0.388<br>- 0.215<br>- 0.024<br>- 0.464 | 2<br>  -<br>  -<br>  3<br>  3<br>  3<br>  3<br>  3<br>  3 | Routed<br>  -<br>  Routed<br>  Routed<br>  Routed<br>  Routed<br>  Routed | -<br>  -<br>  -<br>  timing_step.rqs<br>  strategy_stage1.rqs<br>  strategy_stage2.rqs<br>  strategy_stage3.rqs | -<br>-<br>-<br>-<br>-<br>-<br>- |
| -      | Last Mile                                                                                                                                                                                      | RQS-Incremental                           | 0.008                                                                   | 5                                                         | Routed                                                                    | incr_stage.rqs                                                                                                  | -                               |

3 Strategies predicted Improves hit rate Run in parallel

#### BEFORE







# **I2 Flow – QoR numbers**

- Significant QoR gains
  - > RQS / ML alone

#### QoR Measurement

- Default WNS baseline
- 100% = Timing closure





# **NI – EA Partner**

#### ► NI

- Improve RQS
- I2 Flow EA Partner
- Robert Atkinson





# Engineer Ambitiously.

# Our Mission:

NI equips engineers and enterprises with systems that accelerate productivity, innovation, and discovery.

Dr. Franklin Chang Díaz CEO and Founder, Ad Astra Rocket Company

## **NI SOFTWARE-CONNECTED SOLUTIONS**

Π

ENABLING INNOVATION AND PRODUCTIVITY ACROSS THE PRODUCT DEVELOPMENT CYCLE

| Research | Design and Validation                           | Production                  | Deployment and<br>Maintenance |
|----------|-------------------------------------------------|-----------------------------|-------------------------------|
| LabVIEW  | APPLICATION AND DE<br>TestStand VeriStand C#/.N |                             | <br>•••                       |
|          |                                                 |                             |                               |
|          | <b>MODULAR</b><br>PXI CompactDAQ VST            | HARDWARE<br>CompactRIO USRP |                               |
|          |                                                 |                             |                               |

ni.com





MODULAR HARDWARE

Industry-Leading Hardware to Fit Your Needs

The industry's broadest portfolio of products for automated test and automated measurement is:

- Best-in-class, precise, and accurate, from a trusted provider
- Scalable and flexible to adapt quickly to your evolving test and measurement needs
- Highly configurable and software-connected so you can get your job done in a timely manner

# Optimization Project & Complexity

#### **Optimization:**

- Cost and power optimization to existing system element
- Stuff Kintex UltraScale KU115 design into KU085
- Increased compilation difficulty
  - Fewer resources in smaller device
- KU115 already challenging to meet timing (Success rate ~25%)
- KU085 success rate falls to 0%

#### **Complexity:**

- 76 routed clocks
- Wide interfaces caused by PCIe and DDR
- Heavy DSP & RAM usage



| Site Type | Utilizat | tion % |
|-----------|----------|--------|
|           | KU085    | KU115  |
| LUTs      | 60.7     | 45.7   |
| Registers | 45.6     | 34.2   |
| Block RAM | 66.5     | 49.9   |
| DSP48E2   | 81.3     | 60.4   |

# NI Build Goals

- Empower software team members to change and compile the FPGA themselves
- Need minimal cycle/compile time to keep team efficient
  - Ideally one optimal set of directives can be found
  - Increase build success rate
- Minimal Vivado knowledge should be needed by team to be successful
  - Avoid floor-planning in software-configurable portion which requires detailed FPGA knowledge, too

Π

# Fit Challenges

- Lower SLR contains all user IO and MGTs due to package selected
  - PXI form factor often requires smallest package with ~100% I/O utilization
- Upper SLR used only for logic
- Upper SLR consistently appears underutilized compared to lower SLR
- Congestion in DDR4 and PCIe, and the interaction with DSP causes most timing problems



# **RQS** Results

- RQS helps with congestion:
- High MUXF\* and CARRY usage leads to congestion
  - MUXF\_REMAP and CARRY\_REMAP properties use LUTs instead
- Synthesis replication creates overlapping equivalent nets
  - Set EQUIVALENT\_DRIVER\_OPT property to merge drivers
- LUT-combining too aggressive additional LUTs can space out design
  - Reset SOFT\_HLUTNM property to separate LUTs of interest
- RQS also helps with Clocking and Timing suggestions
- Generate RQS file and import into Vivado project
  - Applies properties only where needed during opt\_design



# **RQS** Result

- Met timing on first attempt after including generated RQS file!
- Options moving forward:
  - Integrate RQS flow into our automated build flow
    - Requires internal development cycle to enable which will take too long
  - Port QoR suggestions manually to XDC
    - Extensible to earlier Vivado versions
    - Less flexible as design and floorplan changes
  - Automatic build flow optimization and RQS utilization highly desirable
    - Enter the I2 flow!

n

# I2 Flow

- Iterates through optimization techniques
- Machine Learning algorithms choose implementation strategies
- Last Mile timing closure uses incremental improvements
- ~6x increase in compile time when all I2 stages run
- Closer to meeting timing than we had ever been before!
- Processing time much cheaper than human effort optimizing flow

| Second Desig | n             | WNS    | TNS     | Time     |
|--------------|---------------|--------|---------|----------|
| Stage 1      | Optimization  | -0.530 | -564.3  | 54 hours |
|              | ML Strategy 1 | -0.852 | -1560.7 | 13 hours |
| Stage 2      | ML Strategy 2 | -0.460 | -35.279 | 9 hours  |
|              | ML Strategy 3 | -0.100 | -1.109  | 10 hours |
| Stage 3      | Last Mile     | -0.006 | -0.006  | 12 hours |

N

# I2 Flow ML Strategies

Optimal strategies could take a long time to discover by trial and error

| Strategy | opt_design                           | place_design        | phys_opt_design   | route_design       |
|----------|--------------------------------------|---------------------|-------------------|--------------------|
| #1       | -merge_equivalent_drivers<br>Explore | ExtraNetDelay_low   |                   |                    |
| #2       | ExploreWithRemap                     | EarlyBlockPlacement | AggressiveExplore | NoTimingRelaxation |
| #3       | ExploreSequentialArea                | Explore             | -                 |                    |

- Winning strategies coupled with the best optimizations found can be used again outside the I2 flow to increase compile cycle time while improving success rate
- Strategies are only used when timing not sufficient to jump straight to last mile step

# Summary

- After months struggling to close timing, Vivado's QoR suggestions, ML strategies, and I2 flow have done so which opens the door for forward progress for us
- For day-to-day compiles, RQS integration has boosted our productivity by reducing the overall compile time by increasing the success rate
- For very challenging timing issues, the I2 flow is a push-button way to be really aggressive and achieve timing closure
- The I2 flow is invaluable for determining a successful build strategy without a trial and error based full solution space search

Π

# **More Resources and Feedback**

#### Documentation

- UG906: Design Analysis and Closure Techniques
  - https://www.xilinx.com/content/dam/xilinx/support/documentation/sw\_manuals/xilinx2020\_1/ug906-vivado-design-analysis.pdf
- UG938: Design Analysis and Closure Techniques Tutorial
  - Lab 2: Increasing Design Performance using Report QoR Suggestions
  - https://www.xilinx.com/support/documentation/sw\_manuals/xilinx2020\_1/ug938-vivado-design-analysis-closure-tutorial.pdf
- Blogs
  - QoR Suggestions
    - https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Improving-QoR-with-report-qor-suggestions-in-Vivado/ba-p/1033308
  - QoR Assessment
    - https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Using-the-Report-QoR-Assessment-Command/ba-p/1110761
  - Feedback
    - Try the commands + reply to these blogs with your experience...we will monitor





Eliminate wasted full design cycles with Report QoR Assessment



Get > 10% QoR gain using Report QoR Suggestions



Post feedback to us using the blog link



# **ML-based Implementation Strategies**

#### Automatically identify top 3 strategies to improve QoR





# Last Mile Timing Closure

#### For designs that have nearly closed timing



**E** XILINX.