This document discusses persistent memory and the Linux software stack. It begins by covering the evolution of non-volatile memory from battery backed RAM to emerging technologies like PCM and memristors. It then outlines the persistent memory Linux software stack, including the kernel subsystem and NVDIMM architecture. Finally, it discusses using and emulating persistent memory on Linux, including kernel configuration, hardware options, and libraries for programming with persistent memory.
4. Persistent Memory
Yesterday : Battery Backed RAM
Today : NVDIMM with RAM + FLASH
Power Down - copy to Flash, Power Up copy Back to RAM
Emerging NVDIMM : PCM - 3DX Point - Memristor - etc…
Offer 1000x speed vs NAND -> closer to RAM
Characteristics as seen by software : Synchronous Model
Load / Store memory instruction
5. New Generation HW NVM is no longer the bottleneck
But still limited by Block stack latency + Asynchronous
Model
6. Asynchronous Model : NVMe
“When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf
● Active Polling ( SYNC ) lower latency ( at the expense of
CPU) vs interrupt MSI-X (ASYNC)
● Used in Intel SPDK
14. BTT vs DAX
BTT : Block translation table
provides atomic sector update semantics for persistent memory devices
applications that rely on sector writes not being torn can continue to do so.
For Legacy application
DAX : stands for Direct Access
Allows mapping a pmem range directly into userspace via mmap
If the application is aware of persistent, byte-addressable memory, and can use it
to an advantage, DAX is the best path for it
16. Kernel Config ( > 4.2 )
Enable NVDIMM dynamic debug before you start playing with NVDIMMs
Add to the kernel cmd line:
libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg
ignore_loglevel
17. Pick your PMEM
Use ACPI 6.0 compatible NVDIMM hardware or
legacy NVDIMMs
Use virtual NVDIMMs provided by hypervisor
RAM as persistent memory
PCMSIM: NVM-disk Emulation
18. Emulation : RAM as PMEM
Bare metal :
Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory,
starting at 16G.
cat /proc/cmdline :
BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181
resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G
BTT works
20. Playing with DAX
Only ext2, ext4 and xfs currently support DAX
Note that block size should match page size
mkfs.ext4 -b 4096 /dev/pmem1
mount -t ext4 -o dax /dev/pmem1 /tmp/dax/
21. Playing with DAX - Cont
Then you just have to mmap it!
But remember: CFLUSH, etc.. for durability
22. NVML : Lets somebody else do the heavy lifting
http://pmem.io/
libpmem – Basic persistency handling
Libvmmalloc - Transparently converts all the dynamic memory allocations into
persistent memory allocations.
libpmemblk – Block access to pmem
libpmemlog - Log file on pmem (append-mostly)
libpmemobj - Transactional Object Store on pmem
Many more… pynvm , C++ bidings , etc..
24. Remote NVMe : using RDMA to transfer NVMe commands & data
http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/
25. Transitioning from Indirect to Direct Flow
● Project Donard ( PMC - Microsemi)
● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)
26. Comes with Challenge : Durability vs Visibility
http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf
29. Peer 2 Peer : Bypassing CPU + SW bottleneck
● NVM HW - Expose BAR
address
● March 16 : RFC patchset for
DAX allowing DMA to I/O
mem
● CCIX fabric
● Use case:
○ Pre-process in Data
path
○ Avoid RAM buffer (
HMM style )
○ SW only fetch what is
necessary
30. Future Hyperscale Architecture
NVMe gravy train for 3-5 years
Transition to Pmem optimised apps and
Natural evolution of Ethernet Connected
Drive => Fabric connected Pmem
Durable Array of Wimpy Nodes
Direct PMEM
Low power High perf K/V storage
Use pluggable front end
31. Links
Drivers specs: http://pmem.io/documents/
NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt
Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9
Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html
Libraries:
https://github.com/pmem/nvml/
https://github.com/perone/pynvm
http://opennvm.github.io/index.html
https://github.com/spdk/spdk
Project :
PMFS : https://github.com/linux-pmfs/pmfs
NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVA
PCMSIM : https://code.google.com/p/pcmsim/
Patch :
Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard
adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target : http://www.spinics.net/lists/linux-
mm/msg103990.html