







Rui (Ray) Xu
Dept. of Electrical Engineering
Columbia University

On behalf of the COLUTAV4 Team

Sept 2, 2021







#### **Outline**



- History of The COLUTA ASIC Development
- COLUTAV4
- Select Modifications From CV3
  - Analog Settling
  - Digital Correction Redundancy
  - I2C Robustness
- Responses To PDR Recommendations
  - Simulation Strategy
  - VREF And PSRR Behavior With VDD
  - FEB2 Integration Study
  - SEE Tolerance Strategy
- Looking Ahead
- Closing Remarks



#### **Historical Background**



- Summer 2017: COLUTAV1 (1-ch DRE, 1-ch SAR)
- Summer 2018: COLUTAV2 (2-ch DRE+SAR)
- Summer 2019: COLUTAV3 (4-ch DRE+SAR, 4-ch MDAC+SAR)
- → Summer 2021: COLUTAV4 (8-ch MDAC+SAR)
  - V4 builds on the excellent performance of V3
- Averaging ~1 tapeout every 12 months (barring COVID-19's impact on testing V3)
- Multi-disciplinary team across two main institutions and departments
  - Columbia, Nevis Labs: FPGA/digital design, software, testing, FEB2 integration
  - Columbia, Electrical Engineering: analog design, chip integration and verification, testing
  - UT Austin, Physics: software, chip packaging, mass/automated testing
  - UT Austin, Electrical Engineering: analog design



#### **COLUTAV4**





#### Submitted Sept 1st

**TSMC 65nm LP 1.2V** 

5.584 x 5.456 mm<sup>2</sup>

100 I/O & 366 bondpads

4.3 million transistors

4.4 million total discrete instances



#### **Modification: Analog Settling**



- As discussed in PDR, two minor issues affecting analog settling of the MDAC were observed in COLUTAV3 measurements
- A complex-impedance source induces nonlinearity when the MDAC makes a transition to a new subrange
  - An extra, optional clock phase was added into the MDAC to reset the sampling capacitor after each cycle
  - Solution verified with simulation studies {with and without} the reset phase against various input network topologies
- Parasitic resistance caused an insufficient reset which manifests as nonlinearity
  - Some analog nets in the MDAC were re-routed from thin-metal to higher thick-metal layers
  - Solution verified with extensive R+C+CC simulations



#### **Modification: Digital Correction**



- If the analog range was less than the exact radix-2 value, then there would exist a discontinuity that cannot be calibrated out by the on-chip digital arithmetic unit
  - A sign bit was added to the MDAC correction constant's word to allow for additional correction redundancy
  - Otherwise, this would have resulted in a yield loss
  - Solution verified with system-level simulations mimicing the calibration and sine wave data-taking procedures as they are in measurements
- Material in backup slides graphically illustrate the problem and solution



#### **Modification: I2C Robustness**



- NO-ACK was systematically observed in radiation testboard and in RTL simulation at arbitrary timing relationships between SCL and the state machine's on-chip 40MHz clock
  - Schmitt trigger I/O pads are now used for I2C signals
  - Clock domain synchronizer was added into the I2C state machine RTL
  - CV3 and prior: IP was taken as-is
  - I2C worked flawlessly on precision testboards, analog test board, and
     FEB2 slice board → this issue is very "hit or miss" to reproduce
  - CERN Microelectronics Group recognizes the issue during CV3
    measurements and sent out a memo that strongly recommended the
    use of schmitt trigger I/O pads



## **Simulation Strategy**



<u>PDR Rec:</u> It is recommended that full-channel simulations should be performed with the extracted netlist with parasitics ... considerations of available time and resources.

Additionally, questions arose during the PDR of how the digital synthesized blocks are co-simulated with the analog blocks.





- Simulation "breadth": simulation should mimic measurement procedure. To name a few:
  - Calibration simulation with on-chip digital arithmetic
  - Sine wave measurement with on-chip digital arithmetic
  - I2C functionality; Signal-in → signal-out for whole chip
  - Depending on the nature of the simulation goal, digital blocks may be simulated at the transistor level alongside with the analog blocks
- Simulation "depth": design should not be critically sensitive to R+C+CC parasitics
  - Digital blocks: behavioral simulation → extracted P&R simulation
  - Analog blocks: schematic simulation → extracted R+C+CC simulation
- Putting it all together
  - (New) Full-chip extraction simulation now possible with careful control of the design hierarchy, e.g. critical nets at the top-level are extracted and simulated while preserving the design hierarchy (Thanks with help from Mentor Graphics Support)





 Example: A "pinpointed" simulation only to verify the modification addressing the issue of analog settling with a complex-impedance source (i.e. bandpass filter)

SNDR vs Input Amplitude, Quantized No noise, No mismatch, With Filter 1.0169491MHz @ 118 FFT point



#### Procedure:

- 1) Understand the issue and implement the fix
- 2) Run calibration testbench to obtain weights from dedicated calibration circuit
- 3) Run sine wave testbench without fix to replicate issue (blue)
- 4) Run sine wave testbench with fix to see issue is gone (red)

Certain simulation features or tangential circuit blocks can be left out, such as the addition of transient noise, in order to expedite simulation. This study took 1 day to complete and generate all the points on the left.

- Simulation mimics measurement procedure
- Simulation "closes the loop" between various measurements, for example, calibration and data acquisition
- Simulation is used to prove or disprove a non-confounded hypothesis. Simulation is never used to design.





 Example: An "exploratory" simulation to verify basic <u>analog</u> signal-in-signal-out functionality, with no particular emphasis on known issues



Whole-chip analog simulation at schematic level: ~1 week

- Simulation done on the chip top-level model which encompasses all connections inside the chip
- Simulation mimics measurement procedure
- Simulation "closes the loop" between various measurements, for example, calibration and data acquisition
- Simulation is used to prove or disprove the basic functionality and signal paths of the chip.





 Example: An "exploratory" simulation to verify basic <u>digital I2C-SC</u> functionality, with no particular emphasis on known issues



Whole-chip digital simulation checking all digital paths and I2C transactions at schematic level: ~2 week Analog simulation checking R+C+CC delays on the top-level digital buses across 225 corners: ~1 week

- Simulation done on the chip top-level model which encompasses all connections inside the chip
- Simulation mimics measurement procedure
- Simulation is used to prove or disprove the basic functionality and signal paths of the chip.



#### **VREF/PSRR Versus VDD**



**PDR Rec:** The team should verify the performance of the ADC over the proposed power supply range of  $1.2V \pm 10\%$  (between 1.08V and 1.32V) to investigate whether there are any issues related to the voltage reference generation, noise performance, etc.

- VREFP and VREFN maintain a fixed headroom (100mV) w.r.t. VDD and GND
  - Implemented by a constant-current-source from bandgap into a resistive DAC
- Design choice to compromise maximizing full-swing range while keeping transistors in saturation at a nominal VDD of 1.2V
- Measurements were done on CV3 at 1.08V thru 1.32V with this topology
  - Conclusion: transfer function remains linear at the different VREFP and VREFN
- Implications:
  - The PA/S and ADC should share a common 1.2V LDO to maintain the same fullswing range in light of, for example, radiation-induced drift in VDD
  - The LDO output should be heavily filtered for the analog rails
  - PSRR at non-DC is determined by off-chip bypass capacitors that form a lowpass response with an on-chip resistive DAC



#### **FEB2 Integration Study**



<u>PDR Rec:</u> The reviewers suggest that given the very challenging specifications of the ADC, the design team should take the time to fully characterize the Preamp/Shaper/ADC system and analyze the results prior to submitting the COLUTAv4, as this could lead to a production-ready chip, potentially saving one design iteration.

- The Slice Testboard is the current 32-channel pre-prototype of the FEB2:
  - 8 PA/S ASICs
  - 8 COLUTAV3 ASICs
  - 8 lpGBT ASICS
- The analog performance of the Slice Testboard is currently being studied in detail, a few results of which are in the following slides





#### **Measured Pulse Shapes**



Measured output pulseshapes on HI (left) and LO (right) gain, as a function of the amplitude of the triangular input current injected into the PA/S input







#### Measured Energy and Timing Resolution



preliminary measured energy (left) and timing (right) resolution, as a function of the amplitude of the triangular input current injected into the PA/S input





#### For large pulses:

- Energy resolution < 0.1 %
- Timing resolution ~ 50 pS (dominated by system jitter) Both exceed spec.



# Measured Integral Nonlinearity



preliminary measured linearity (left) and INL (right) for LO gain, as a function of the amplitude of the triangular input current injected into the PA/S input





Note that INL << 1%





- Additional studies done in simulation for CV4
- Thanks to Mietek for initiating this study:
   https://indico.cern.ch/event/1008978/contributions/4249037/attachment s/2197011/3715777/210225\_ALFE2\_update\_LAr.pdf
- His work demonstrates that the PA/S output stage interfaced with the MDAC switched-cap input network behaves linearly.
- However, the study neglects transmission line effects.
- Follow-up simulations were done with a FEB2 transmission line EM model
  - A Thevenin model represents the PA/S avoids the need to setup a full multi-technology simulation testbench with both PA/S (130nm) and ADC (65nm)
  - Sine waves make linear analysis easy compared to a LAr pulse shape





- FEB2 Slice Board has both buried and non-buried differential transmission lines as an experimental feature
- Both were modeled in Cadence using their pseudo-EM library with appropriate lengths
- https://community.cadence.com/cadence\_blogs\_8/b/rf/posts/have-you-tried-the-new -transmission-line-library-rftlinelib







- Transient simulation with sine wave
- Goal: ensure the interaction between MDAC smapling and TLine is linear
  - Reminder: already shown that the PA/S output is linear w/ MDAC
  - TLines are inherently linear-time-invariant components







- Significant error is seen in simulation without MDAC reset phase when the input spans more than one subrange
  - No termination on TLine = high-order ringing
  - Mechanism of error is identical to the issue of insufficient analog settling with complex-impedance sources that was investigated in CV3
  - Insufficient settling/high-order ringing + data-dependency from the MDAC → erroneous output





Green crosses "excluded data" are points after an MDAC transition (ignore legend title)

Errors are on the order of 100 LSBs

Error magnitude is inversely related to the lossy-ness of the transmission line





- No error (< 1 LSB) is seen with MDAC reset phase</li>
  - MDAC reset phase enforces linear settling even in the presence of high-order ringing, i.e. the reset phase removes the data-dependency that manifests as nonlinearity
- MDAC reset phase does its job as designed
- Same conclusion for simulation with surface transmission line



Signal RMS = 6.571675E-01; Residuals (all) RMS = 6.998510E-05Residuals (excluded only) RMS = 1.430579E-04





- Measurements and simulation results presented related to the integration of COLUTAV3 & V4 into FEB2
- Measurements, with no special treatment of the transmission line, exceed spec
  - Energy resolution < 0.1 % @ large amplitudes</li>
  - Jitter ~ 50 pS limited by system jitter @ large amplitudes
  - INL < 1% @ small amplitudes</li>
- Simulation predicts errors on the order of ~100s of ADC counts when the input exceeds one subrange <u>without</u> the MDAC reset phase and <u>with</u> a FEB2 TLine model
  - Mechanism is understood; implemented fix in CV4 will mitigate this
  - Slice Testboard measurements: not immediately obvious because (1) this phenomenon does not impact all pulses measured and (2) simulation was with sinusoid
  - The energy resolution is expected to improve with the implemented CV4 fix
  - Precision Testboard: not observed because of differences in the input network configuration



## **SEE Tolerance Strategy (cont.)**



**PDR Rec:** The team should discuss with the CERN CHIPS team about the techniques for SEU protection and SEU/SET verification methodology to improve the SEU robustness since physically separating the TMR DFF might not be sufficient in case SET becomes dominant, even though the SEU may not be a major concern for LAr operation compared to the ITk. It is suggested to run a digital simulation (Verilog netlist+sdf) while simulating I2C transactions, and randomly introducing SEU and SET in all circuit nodes. In addition, operational plans for checking the ASIC configuration integrity in-system should be proposed and reviewed.

- Following the discussition, TMRG and new synthesis/P&R scripts were used in CV4
  - TMRG ensures sufficient triplication without impacting behavioral functionality to existing non-triplicated Verilog code
  - New synthesis/P&R scripts enforce spacing of redundant logic according to CERN recommendation (> 15 um linear distance)
- All asynchronous resets were modified to synchronous resets



## **Looking Ahead**



- Packaging: QFN first, followed by BGA
- Testing:
  - Precision testing (QFN & BGA)
  - Radiation testing
  - Integration into FEB2
  - Automated mass testing @ UT Austin
- Mass testing robot to arrive ~October (Left)
  - Some custom CV4-specific parts for the robot already fabricated









## **Closing Remarks**



- COLUTAV4 builds on top of the successes of V3
  - Eight MDAC+SAR channels; final prototype ADC following series of 3 pre-prototypes
  - 5.584 x 5.456 mm<sup>2</sup>; 4.3 million transistors
  - Largest and most complex chip to date in the COLUTA group
  - It naturally follows that this chip has been subjected to the most amount of pre-tapeout simulation & verification than past COLUTA designs
- We look forward to testing V4! Submitted Sept 1<sup>st</sup>
  - Chip dies expected back early December of this year.
- Many efforts happening in parallel (packaging, testing, firmware, software, etc)
- Questions/comments?



# **Backup Slides**







Ideal case: analog range is exactly equal to the radix-2 designed values

#### Ideal DDPU Correction, No Redundancy







Scenario: analog range is slightly less than the radix-2 designed values

#### **Actual DDPU Correction, No Redundancy**







Fix: Allow programmability in the MSB of correction constant S2. In two's complement, this is the sign bit. Allows S2 to move up or down.
 Actual DDPU Correction,



By allowing S2 to move up or down, the DDPU digital code output will be continuous. S2 will have a programmable range equal to twice the designed range.





- Was this digital non-redundancy an issue encountered in CV3?
  - This was not studied in V3 pre-tapeout simulations
  - Measurements of V3 revealed this may manifest as a yield issue
  - Out of 32 independent MDAC channels, most were ~50 SAR LSB counts or more away from being discontinuous.
  - 1 SAR LSB ≈ 200 uVpp diff.
  - 1 out of the 32 channels measured was 5 LSB counts away
  - Not a lot of margin!
- This <u>was</u> studied in V4 pre-tapeout simulations
  - V4 MDAC calibration was extensively simulated
  - The calibrated MDAC transfer function would become discontinuous if the MDAC output range was less than the designed range



## **SEE Tolerance Strategy (cont.)**



- Example: slow-control blocks between CV3 and CV4
  - Manages 544 individual bits in a single channel
  - SEU vulnerability in state-machine was discovered in CV3 radiation testing
- CV3 (pre-TMRG): 454 x 160 um; 15520 instances
  - Only the 544 output registers are pre-triplicated
  - Distancing not enforced
- CV4 (TMRG+distancing): 1100 x 200 um; 28064 instances
  - 544 output registers + state-machine handled by TMRG
  - Distancing is enforced
  - A lot of area overhead for routing distance-enforced cells!
- With TMRG + enforced distancing, this vulnerability is expected to be mitigated in CV4