Thanks Walter for the summary. I just want to apologize quickly for the sudden
cut-off of the audio at the end, my finger touched the "hang up" button too
soon, before I could say thanks to everyone for joining...
Arpad
===============================================================
From: ibis-macro-bounce@xxxxxxxxxxxxx [mailto:ibis-macro-bounce@xxxxxxxxxxxxx] ;
On Behalf Of Walter Katz
Sent: Tuesday, September 11, 2018 3:06 PM
To: IBIS-ATM <ibis-macro@xxxxxxxxxxxxx>
Subject: [ibis-macro] Re: Summary of DDR5 issues - and one more items to add to
this list
All,
This is the e-mail I shared in today's IBIS-ATM meeting.
Walter
1. Asymmetric rise and fall times of a single ended channel.
* Both Cadence and SiSoft believe that this can be done by the EDA tool
without any changes to the standard
* Keysight believes that the standard is incomplete because
i. It does
not define how to generate the Impulse Response input to AMI_Init
ii. It does
not define how to generate the waveform input to the Rx AMI_GetWave (mostly
this)
iii. Or the
AMI methodology is invalid for single ended DDR5 DQ channels
1. Adding DC Offset or replace the Impulse Response Input to AMI_Init with a
Step Response
* Both are equivalent since a step response can be derived from an
Impulse Response and DC Offset and vice-versa
* In any case, one of these need to be done
Impact of Tx equalization on the offset
1. VrefDQ
* The physical memory DDR5 buffer has a register that must be set by the
controller to define the VrefDQ in the chip.
* This will be very close to the DC Offset defined above, but not
necessarily so.
* Need to define how an EDA tool handles the impairment caused by the
VrefDQ register resolution, and because a single VrefDQ register may control
several DQ channels with slightly different DC Offsets
2. Clock Ticks
* The DQS to DQ skew in the DDR5 memory receiver is defined by the
Controller. This skew is determined by simulation, or by a hardware training
algorithm
* One way to handle this is to put a CDR in the memory DQ Rx and assume
that this CDR will find, use and report the optimal DQS/DQ phase.
* A possible useful reserved parameter is the DQS/DQ interconnect skew.
* Another way is to have the Controller Tx AMI Model generate clock
ticks that the Memory Rx AMI Model reads and uses. A BIRD 147 protocol can be
defined between the Tx and Rx to optimize this skew (and the Rx DFE taps as
well).
I did leave out one issue "Component Based AMI Simulations"
1. Both Cadence and SiSoft believe that this should be dealt with by the EDA
tool. It knows the DQS/DQ interconnect skew for each DQ in a "Component", and
therefor can determine the required skew training parameters or the impairment
added to the nibble. Note that a component in this context can be a single
memory chip or multiple memory chips in a module. Similarly, the EDA tool knows
the Vcent for each DQ channel and can calculate the ideal VrefDQ for the module
and the impairment. There is little or no difference between DDR4 and DDR5 in
this regard.
2. Keysight believes that IBIS AMI needs to be enhanced (or a new
methodology) to deal with Component Level AMI Simulations for DDR5.
Power Aware
BIRD required to do all of the above:
1. Define a new AMI Reserved Parameter DC_Offset that is the voltage
half-way between the step response low and high limits.
That all folks!
* Cadence and SiSoft do not think BIRD is required to deal with the
asymmetric rise and fall times, although we do need to convince both DDR5 AMI
Model developers and users that the solutions we have implemented are
sufficiently accurate.
* We do not need to create a VrefDQ reserved parameter, it just becomes
a voltage impairment that can be included in Rx_Receiver_Sensitivity. In any
case, since the model is told what the DC Offset is it can determine what its
VrefDQ granularity impairment.
* I will defer to the Controller IC Vendors as to the need of training
the DQS/DQ skew using clock ticks created by the Tx and used by the Rx, but if
required it does not need a BIRD. It can be done under the auspices of a BIRD
147 protocol. The Tx AMI_GetWave can simply write out the clock ticks to a file
that the Rx AMI_GetWave reads. AMI Time domain simulations typically run at a
rate of 1 Million bits per minute. It takes 4 seconds to write an 8 Megabyte
file - a 12% performance hit over using the clock_times memory. If the 12%
performance hit is an issue that we can always write a BIRD that would support
use clock_times. I think writing this BIRD now is a classic example of
pre-mature optimization.
Comments?
Walter
Walter Katz
wkatz@xxxxxxxxxx<mailto:wkatz@xxxxxxxxxx>
Office 978.461-0449 x 133
Mobile 720.417-3762