How do I decide?Is LPDDR3 or Wide I/O the right memory technology for my next smartphone and tablet?
Marc GreenbergDirector, Product [emailprotected]
Mobile Forum Taiwan and Korea 2012
Customer Drivers In the IndustryPerformance, Power and Area (PPA) enables end product differentiation
Mobility is Key: Faster, Denser, Low power Chips
What the fuss is all about ……* Source : ECN Magazine March 2011
Products - A Deeper LookSamsung Wide-IO Memory for Mobile
Products - A Deeper LookXilinx brings 3D interconnect to
commercialization phase in digital FPGA world
* Source : EDN Magazine Sep 2010
Enables 100x Improvement in Die-to-Die Bandwidth Per Watt 2-3x Capacity Advantage Over Monolithic Devices
Future DRAM Bandwidth prediction
4
Future DRAM Bandwidth prediction
3-channel
DDR3-1866
3-channel
DDR4-2400
or
high-bandwidth
5
LPDDR1-400
LPDDR2-800 LPDDR2-800
LPDDR2-1066
3-channel
DDR3-1333
DDR3-1866 Wide-IO?
LPDDR3 or Wide-IO
LPDDR3 or Wide-IO
Future DRAM Bandwidth capability
2Ch LPDDR3-1600 = 100Gbit/sWide-IO = 100Gbit/s
6
1Ch LPDDR2-800 = 25Gbit/s
1Ch LPDDR2-1066 = 34Gbit/s
1Ch LPDDR3-1600 = 50Gbit/s2Ch LPDDR2-800 = 50Gbit/s
2Ch LPDDR2-1066 = 68Gbit/s
CPU to DRAM Existing inter-die connection methods
Parallel Connection across a PCB or PoP
• Most common CPU/SoC to DRAM connection today
• Well-understood and extensible
• Many pins required for high bandwidth
• ~60 signal pins for a 32-bit LPDDR2 interface (2012 low-mid range smartphone)
• ~120 signal pins for a 2-channel LPDDR2 interface (2012 mid-high end smartphone)
• ~300 signal pins for a 3-channel 64-bit DDR3 interface (2012 PC)
Serial Connection across a PCB
• Fewer pins than parallel connection
• Common for PCIe and other SERDES-based standards
• Can provide data transfer over longer physical distances if needed
• Potential latency and power considerations
• Not commonly used for DRAM at present; future solution?
Pin count, power, latency concerns?
• Package-on-Package
– Common in cellphones and tablets
– Limited number of dies, connections
Package-on-Package (PoP)
Upper die (for example, DRAM)
8
Upper die (for example, DRAM)Flipchip bumps to upper package PCBUpper package PCBPackage balls connect upper and lower PCBsLower die (for example, CPU or app processor)Lower Package PCB with landing pads on topPackage ballsSystem PCB
Example cross-section - Not to scale
New inter-die connection method: TSV
Bump pitches represent minimum practical pitch. Refer to JEDEC Standard for Wide-IO diameter and pitch
General benefits of TSVs
Number of connections Capacitance per connection
Average Connection Length
Relative power (proportional to f, c, #
connections)
PCB or PoP
Silicon Interposer
Direct C2C stacking
Impro
ved ~
10X
Impro
ved ~
6X
Impro
ved ~
200X
Impro
ved ~
6X
What is Wide-IO DRAM?
Possible Future non-mobile
4 128-bit channels
Total 512bits to DRAM
1066MHz DDR (2133MT/s)
1Tbit/s bandwidth
Possible Future non-mobile
2Tbit/s bandwidthBandwidth
JESD229 Standard
4 128-bit channels
Total 512bits to DRAM
200MHz SDR
100Gbit/s bandwidth
Possible Future Standard
4-8 64-128-bit channels
Total 256-512bits to DRAM
DDR Interface
200-400Gbit/s bandwidth
(25-51GByte/sec)
1Tbit/s bandwidth
Possible time
of introduction
Why do you need Wide-IO DRAM?
2
7
12
17
Pre
dic
ted
B
an
dw
idth
R
eq
uir
em
en
t (G
Byte
/sec)
Bandwidth Requirements of Future Mobile Devices
Tablet
Cellphone2
2012 2013 2014 2015Pre
dic
ted
B
an
dw
idth
R
eq
uir
em
en
t (G
Byte
/sec)
Why Wide-IO is Driving TSV
• DRAM is the ideal candidate to drive TSV technology
– Uneconomic or impossible to place large capacity – Uneconomic or impossible to place large capacity (Gbits) of DRAM on same die as CPU/Logic
– DRAM is usually manufactured on a non-logic process
– Requires high bandwidth connection between CPU and DRAM
– Low power connection between dies desirable
– Possibility of different memory configurations using the same CPU die
Wide-IO DRAM Controller and PHYChallenges Solutions
Merge existing
and new
technology
• Start with high performance, low power base architecture
• Re-add SDR support
• Add new Wide-IO feature supporttechnology • Add new Wide-IO feature support
• Create DFI extensions for Controller-PHY connection
New testing
requirements
• Extend BIST engine to test for new classes of error
introduced by TSV
Verification • Create memory model of Wide-IO device
• Extend verification environment for Wide-IO
PHY • Lightweight PHY, or PHY suitable for characterization?
• Next generation: probably needs PHY again
I/Os • Need appropriate IOs
• ESD?
What are the Wide-IO challenges?
Manufacturing Wide-IO DRAM and assembly:
� Test Memory Wafer after production using FC bumps
� Thin the wafer to ~50-100um thickness
� Form TSVs and fill with metal� Form TSVs and fill with metal
� Requires elevated temperatures – extra anneal step
� Apply backside metal and bumps
� No opportunity to test here
� Backside metal bump pitch too fine for most tester heads
� Handle dies while avoiding mechanical damage
� They are now the approximate aspect ratio of a postage stamp
� Attach dies (and interposers, if present) together
� Does it still work?
Thermal Issues:
• Where does the heat go?
• Some new tablets placing LPDDR2 DRAM on
What are the Wide-IO challenges?
• Some new tablets placing LPDDR2 DRAM on opposite side of board from CPU instead of PoP
Ecosystem Issues:
• New Technology
• How many parties involved in stack production?
• How are responsibilities divided?
• How are liabilities divided?
LPDDR2 LPDDR3
Specification release 2009 Specification release May 2012
Low Power Memory: LPDDR3 adds Bandwidth over existing LPDDR2 technology
Specification release 2009 Specification release May 2012
DDR-1066 (533MHz) DDR-1600 (800MHz) – 50% increase
1.2v HSUL
Unterminated I/Os
1.2v HSUL I/O
with On-Die Termination (ODT)
Read training Read training, Command/Address
(CA) training and Write Leveling
I/O Capacitance 2.5pF I/O Capacitance 1.8pF
Low Power consumption Expected to be less from lower I/O
capacitance and more advanced
process17
Attribute LPDDR3 Wide I/O
Bandwidth per die 51.2Gbit/s (X32) 102.4Gbit/s
LPDDR3 vs Wide-I/O -characteristics
Bandwidth per package 102.4Gbit/s (dual-
channel)
102.4Gbit/s
Dies per package Up to 4 (in theory) Up to 4 (in theory)
System configurations PoP or normal PCB
interconnect
Silicon Interposer or
direct chip-to-chip
General Improved, Evolutionary
Technology
New, Revolutionary
Technology
Compatibility Backwards compatible
with LPDDR2
May be forwards
compatible with Wide-IO2
Wide I/O:
- TSV Ballout dictates SoC
Construction
- Each channel contained within
2.5mm
LPDDR3:
- PoP Ballout dictates SoC
Construction
- Command and data separated
5-15mm on the SoC?
SoC Construction
2.5mm
- May be possible to reach all IOs
within channel without pipelining
5-15mm on the SoC?
- Extra pipeline flop stages
required to transmit data edge to
edge; adds latency and power
Channel A Data
Channel A CmdC
ha
nn
el A
Da
ta
Channel A Channel BMemory
Controller
Flop
Flop
Channel D Channel C
Note: Only one channel shown
CPU
CPU
CPU
Memory
Controller
CPU
Memory
Controller
Attribute LPDDR3 2Channel Wide I/O
Peak Bandwidth 102Gbit/s 102Gbit/s
Core power Predicted to be similar for both technologies
I/O Voltage 1.2V 1.2V
System Power Comparison
I/O Voltage 1.2V 1.2V
I/O Capacitance 1.8pF 0.5pF
Full-bandwidth, all chip
I/O Power (1/2 f c v2)
64*0.5*1600*cv2 =
51200cv2
512*0.5*200*cv2=
51200cv2
First-order approximation: the difference in IO power is proportional to c
Powerdown, Self-Refresh
and DPD capability
One power state for each
channel, one channel per
die, 1-2 channels per system
4 channels per die
SoC Power PHY may require DLL/PLL DLL/PLL not required
• Calculate Energy used by DRAM– For example:
∑
System Energy Comparison
Time = Recharge interval of device
∑ (power mode 1 * time in mode 1) + (power mode 2 * time in mode 2) + …
• Battery mass, volume, and cost are roughly proportional to the energy stored by the battery
• Enter tangible battery cost into System BoM Budget
• Intangible benefits of less power: – Less product mass
– Less product volume
Time = 0
LPDDR3 Wide-IO
DRAM Cost (Packaged Die) (Bare Die)
Cost of SoC Area (DDR PHY +
HSUL IO)
(Many IOs +
4 channels)
Budget-based decision
HSUL IO) 4 channels)
Cost for TSV (SoC and DRAM if needed) 0 ?
Silicon Interposer (if present) 0 ?
Assembly and test ? ?
Failed stacks: probability * cost ? ?
Cost of ~120 signal pins on package ? 0
% of system power consumed by DRAM * power
usage of DRAM * cost of battery
? 10-20%
less?
Other items (NRE, IP Costs, SPB codesign, etc) ? ?
Total ? ?
• Now you have some of the tools to make an LPDDR3 vs Wide-IO Decision– LPDDR3: Evolutionary and proven, but more power
– Wide-IO: New and exciting, with less power
Conclusion
• Cadence Memory Solutions include:– LPDDR3/LPDDR2/DDR4/DDR3 Controller and PHY
– Wide-IO Controller and PHY
– Flash Controller and PHY
– Memory Models
– Verification IP
– Signal Integrity Reference Designs
– Design, Verification, Physical Verification, and Test tools for TSV-based chip designs