All,
Curtis pointed out that I left out of paragraph 2.:
c. "Impact of TX equalization on the DC offset".
It is there now.
Walter
Walter Katz
wkatz@xxxxxxxxxx <mailto:wkatz@xxxxxxxxxx>
Office 978.461-0449 x 133
Mobile 720.417-3762
On Tue, Sep 11, 2018 at 4:05 PM, Walter Katz <wkatz@xxxxxxxxxx
<mailto:wkatz@xxxxxxxxxx> > wrote:
All,
This is the e-mail I shared in today’s IBIS-ATM meeting.
Walter
1. Asymmetric rise and fall times of a single ended channel.
a. Both Cadence and SiSoft believe that this can be done by the EDA tool
without any changes to the standard
b. Keysight believes that the standard is incomplete because
i. It
does not define how to generate the Impulse Response input to AMI_Init
ii. It
does not define how to generate the waveform input to the Rx AMI_GetWave
(mostly this)
iii. Or the
AMI methodology is invalid for single ended DDR5 DQ channels
2. Adding DC Offset or replace the Impulse Response Input to AMI_Init with
a
Step Response
a. Both are equivalent since a step response can be derived from an
Impulse
Response and DC Offset and vice-versa
b. In any case, one of these need to be done
c. "Impact of TX equalization on the DC offset".
Impact of Tx equalization on the offset
3. VrefDQ
a. The physical memory DDR5 buffer has a register that must be set by the
controller to define the VrefDQ in the chip.
b. This will be very close to the DC Offset defined above, but not
necessarily so.
c. Need to define how an EDA tool handles the impairment caused by the
VrefDQ register resolution, and because a single VrefDQ register may control
several DQ channels with slightly different DC Offsets
4. Clock Ticks
a. The DQS to DQ skew in the DDR5 memory receiver is defined by the
Controller. This skew is determined by simulation, or by a hardware training
algorithm
b. One way to handle this is to put a CDR in the memory DQ Rx and assume
that this CDR will find, use and report the optimal DQS/DQ phase.
c. A possible useful reserved parameter is the DQS/DQ interconnect skew.
d. Another way is to have the Controller Tx AMI Model generate clock ticks
that the Memory Rx AMI Model reads and uses. A BIRD 147 protocol can be
defined between the Tx and Rx to optimize this skew (and the Rx DFE taps as
well).
I did leave out one issue “Component Based AMI Simulations”
1. Both Cadence and SiSoft believe that this should be dealt with by the
EDA
tool. It knows the DQS/DQ interconnect skew for each DQ in a “Component”,
and therefor can determine the required skew training parameters or the
impairment added to the nibble. Note that a component in this context can be
a single memory chip or multiple memory chips in a module. Similarly, the
EDA tool knows the Vcent for each DQ channel and can calculate the ideal
VrefDQ for the module and the impairment. There is little or no difference
between DDR4 and DDR5 in this regard.
2. Keysight believes that IBIS AMI needs to be enhanced (or a new
methodology) to deal with Component Level AMI Simulations for DDR5.
Power Aware
BIRD required to do all of the above:
1. Define a new AMI Reserved Parameter DC_Offset that is the voltage
half-way between the step response low and high limits.
That all folks!
* Cadence and SiSoft do not think BIRD is required to deal with the
asymmetric rise and fall times, although we do need to convince both DDR5
AMI Model developers and users that the solutions we have implemented are
sufficiently accurate.
* We do not need to create a VrefDQ reserved parameter, it just
becomes a voltage impairment that can be included in
Rx_Receiver_Sensitivity. In any case, since the model is told what the DC
Offset is it can determine what its VrefDQ granularity impairment.
* I will defer to the Controller IC Vendors as to the need of
training the DQS/DQ skew using clock ticks created by the Tx and used by the
Rx, but if required it does not need a BIRD. It can be done under the
auspices of a BIRD 147 protocol. The Tx AMI_GetWave can simply write out the
clock ticks to a file that the Rx AMI_GetWave reads. AMI Time domain
simulations typically run at a rate of 1 Million bits per minute. It takes 4
seconds to write an 8 Megabyte file – a 12% performance hit over using the
clock_times memory. If the 12% performance hit is an issue that we can
always write a BIRD that would support use clock_times. I think writing this
BIRD now is a classic example of pre-mature optimization.
Comments?
Walter
Walter Katz
wkatz@xxxxxxxxxx <mailto:wkatz@xxxxxxxxxx>
Office 978.461-0449 x 133
Mobile 720.417-3762