SlideShare une entreprise Scribd logo
1  sur  33
Persistent Memory
Dr. Benoit Hudzia
@blopeur
benoit@stratoscale.com
Agenda
NVM Evolution
Persistent Memory Linux Software Stack
Using , Emulating PMEM on Linux
Remote PMEM
Micro Storage Architecture
NVM Evolution
Persistent Memory
Yesterday : Battery Backed RAM
Today : NVDIMM with RAM + FLASH
Power Down - copy to Flash, Power Up copy Back to RAM
Emerging NVDIMM : PCM - 3DX Point - Memristor - etc…
Offer 1000x speed vs NAND -> closer to RAM
Characteristics as seen by software : Synchronous Model
Load / Store memory instruction
New Generation HW NVM is no longer the bottleneck
But still limited by Block stack latency + Asynchronous
Model
Asynchronous Model : NVMe
“When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf
● Active Polling ( SYNC ) lower latency ( at the expense of
CPU) vs interrupt MSI-X (ASYNC)
● Used in Intel SPDK
Enter persistent Memory
Source: Intel
4KB
Read
64B
Read
Moving away from Block I/O
L
A
T
E
N
C
Y
A
C
C
E
S
S
Lead to a new Tiered Software Stack
Challenge: Durability
PMEM Linux Software Stack
Linux kernel (>4.2) subsystem
NVDIMM Software Architecture
http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
BTT vs DAX
BTT : Block translation table
provides atomic sector update semantics for persistent memory devices
applications that rely on sector writes not being torn can continue to do so.
For Legacy application
DAX : stands for Direct Access
Allows mapping a pmem range directly into userspace via mmap
If the application is aware of persistent, byte-addressable memory, and can use it
to an advantage, DAX is the best path for it
Using , Emulating PMEM on Linux
Kernel Config ( > 4.2 )
Enable NVDIMM dynamic debug before you start playing with NVDIMMs
Add to the kernel cmd line:
libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg
ignore_loglevel
Pick your PMEM
Use ACPI 6.0 compatible NVDIMM hardware or
legacy NVDIMMs
Use virtual NVDIMMs provided by hypervisor
RAM as persistent memory
PCMSIM: NVM-disk Emulation
Emulation : RAM as PMEM
Bare metal :
Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory,
starting at 16G.
cat /proc/cmdline :
BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181
resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G
BTT works
QEMU NVDIMM
Qemu :
qemu-system-x86_64 -object memory-backend-file,share,id=mem1,mem-
path=/dax/D1 -device nvdimm,memdev=mem1,reserve-label-data,id=nv1 -m
2048,maxmem=100G,slots=10 ….
Not yet in Upstream Qemu :
https://github.com/xiaogr/qemu/tree/nvdimm-v9
Seabios integration :
http://www.seabios.org/pipermail/seabios/2015-September/009770.html
Playing with DAX
Only ext2, ext4 and xfs currently support DAX
Note that block size should match page size
mkfs.ext4 -b 4096 /dev/pmem1
mount -t ext4 -o dax /dev/pmem1 /tmp/dax/
Playing with DAX - Cont
Then you just have to mmap it!
But remember: CFLUSH, etc.. for durability
NVML : Lets somebody else do the heavy lifting
http://pmem.io/
libpmem – Basic persistency handling
Libvmmalloc - Transparently converts all the dynamic memory allocations into
persistent memory allocations.
libpmemblk – Block access to pmem
libpmemlog - Log file on pmem (append-mostly)
libpmemobj - Transactional Object Store on pmem
Many more… pynvm , C++ bidings , etc..
Remote PMEM
Remote NVMe : using RDMA to transfer NVMe commands & data
http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/
Transitioning from Indirect to Direct Flow
● Project Donard ( PMC - Microsemi)
● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)
Comes with Challenge : Durability vs Visibility
http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf
RDMA + DDIO
RDMA + Non Allocating write
Peer 2 Peer : Bypassing CPU + SW bottleneck
● NVM HW - Expose BAR
address
● March 16 : RFC patchset for
DAX allowing DMA to I/O
mem
● CCIX fabric
● Use case:
○ Pre-process in Data
path
○ Avoid RAM buffer (
HMM style )
○ SW only fetch what is
necessary
Future Hyperscale Architecture
NVMe gravy train for 3-5 years
Transition to Pmem optimised apps and
Natural evolution of Ethernet Connected
Drive => Fabric connected Pmem
Durable Array of Wimpy Nodes
Direct PMEM
Low power High perf K/V storage
Use pluggable front end
Links
Drivers specs: http://pmem.io/documents/
NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt
Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9
Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html
Libraries:
https://github.com/pmem/nvml/
https://github.com/perone/pynvm
http://opennvm.github.io/index.html
https://github.com/spdk/spdk
Project :
PMFS : https://github.com/linux-pmfs/pmfs
NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVA
PCMSIM : https://code.google.com/p/pcmsim/
Patch :
Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard
adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target : http://www.spinics.net/lists/linux-
mm/msg103990.html
Thank You!
Questions ?
NVDIMM block I/O path

Contenu connexe

Tendances

Tendances (20)

Virtualization Architecture & KVM
Virtualization Architecture & KVMVirtualization Architecture & KVM
Virtualization Architecture & KVM
 
RAID
RAIDRAID
RAID
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
SK Telecom TACO Introduction at Berlin Summit
SK Telecom TACO Introduction at Berlin SummitSK Telecom TACO Introduction at Berlin Summit
SK Telecom TACO Introduction at Berlin Summit
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in Linux
 
Emc san-overview-presentation
Emc san-overview-presentationEmc san-overview-presentation
Emc san-overview-presentation
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Raid
RaidRaid
Raid
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
Database storage engines
Database storage enginesDatabase storage engines
Database storage engines
 
Project ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationProject ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementation
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualization
 
Hyper-Converged Infrastructure: Concepts
Hyper-Converged Infrastructure: ConceptsHyper-Converged Infrastructure: Concepts
Hyper-Converged Infrastructure: Concepts
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability
 
RAID and LVM
RAID and LVMRAID and LVM
RAID and LVM
 
2021 二月 Kasten K10 介紹與概觀
2021 二月 Kasten K10 介紹與概觀2021 二月 Kasten K10 介紹與概觀
2021 二月 Kasten K10 介紹與概觀
 
UVM ARCHITECTURE FOR VERIFICATION
UVM ARCHITECTURE FOR VERIFICATIONUVM ARCHITECTURE FOR VERIFICATION
UVM ARCHITECTURE FOR VERIFICATION
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 

En vedette

SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
UniFabric
 
Essential API Facade Patterns: Session Management (Episode 2)
Essential API Facade Patterns: Session Management (Episode 2)Essential API Facade Patterns: Session Management (Episode 2)
Essential API Facade Patterns: Session Management (Episode 2)
Apigee | Google Cloud
 

En vedette (20)

Lecture 7
Lecture 7Lecture 7
Lecture 7
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
 
HERD-Hanjun
HERD-HanjunHERD-Hanjun
HERD-Hanjun
 
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum
Paper on RDMA enabled Cluster FileSystem at Intel Developer ForumPaper on RDMA enabled Cluster FileSystem at Intel Developer Forum
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum
 
slides
slidesslides
slides
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
PostgreSQL on sas/ssd/nvme/nvdimm
PostgreSQL on sas/ssd/nvme/nvdimmPostgreSQL on sas/ssd/nvme/nvdimm
PostgreSQL on sas/ssd/nvme/nvdimm
 
DMA, Infiniband
DMA, InfinibandDMA, Infiniband
DMA, Infiniband
 
Essential API Facade Patterns: Session Management (Episode 2)
Essential API Facade Patterns: Session Management (Episode 2)Essential API Facade Patterns: Session Management (Episode 2)
Essential API Facade Patterns: Session Management (Episode 2)
 
Ceph on rdma
Ceph on rdmaCeph on rdma
Ceph on rdma
 
NVDIMM block drivers with NFIT
NVDIMM block drivers with NFITNVDIMM block drivers with NFIT
NVDIMM block drivers with NFIT
 
San disk axel rosenberg
San disk axel rosenbergSan disk axel rosenberg
San disk axel rosenberg
 
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
 
Virtualization Acceleration
Virtualization Acceleration Virtualization Acceleration
Virtualization Acceleration
 
Function Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver
 
Mellanox Storage Solutions
Mellanox Storage SolutionsMellanox Storage Solutions
Mellanox Storage Solutions
 
Memory types
Memory typesMemory types
Memory types
 
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
 
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage ComparisonIntel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
 

Similaire à Persistent memory

SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
UniFabric
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
Technocratz
 

Similaire à Persistent memory (20)

IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
Persistent Memory Programming: The Current State of the Ecosystem
Persistent Memory Programming: The Current State of the EcosystemPersistent Memory Programming: The Current State of the Ecosystem
Persistent Memory Programming: The Current State of the Ecosystem
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms final
 
C++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers KitC++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers Kit
 
AMP Kynetics - ELC 2018 Portland
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 Portland
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
 

Plus de Benoit Hudzia

Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14
Benoit Hudzia
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Benoit Hudzia
 

Plus de Benoit Hudzia (7)

TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016
 
Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
 
Nvmw 2014 extending main memory with flash-the optimized swap approach
Nvmw 2014  extending main memory with flash-the optimized swap approachNvmw 2014  extending main memory with flash-the optimized swap approach
Nvmw 2014 extending main memory with flash-the optimized swap approach
 
Hana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire Project
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
Hecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudziaHecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudzia
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Persistent memory

  • 1. Persistent Memory Dr. Benoit Hudzia @blopeur benoit@stratoscale.com
  • 2. Agenda NVM Evolution Persistent Memory Linux Software Stack Using , Emulating PMEM on Linux Remote PMEM Micro Storage Architecture
  • 4. Persistent Memory Yesterday : Battery Backed RAM Today : NVDIMM with RAM + FLASH Power Down - copy to Flash, Power Up copy Back to RAM Emerging NVDIMM : PCM - 3DX Point - Memristor - etc… Offer 1000x speed vs NAND -> closer to RAM Characteristics as seen by software : Synchronous Model Load / Store memory instruction
  • 5. New Generation HW NVM is no longer the bottleneck But still limited by Block stack latency + Asynchronous Model
  • 6. Asynchronous Model : NVMe “When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf ● Active Polling ( SYNC ) lower latency ( at the expense of CPU) vs interrupt MSI-X (ASYNC) ● Used in Intel SPDK
  • 7. Enter persistent Memory Source: Intel 4KB Read 64B Read
  • 8. Moving away from Block I/O L A T E N C Y A C C E S S
  • 9. Lead to a new Tiered Software Stack
  • 12. Linux kernel (>4.2) subsystem
  • 14. BTT vs DAX BTT : Block translation table provides atomic sector update semantics for persistent memory devices applications that rely on sector writes not being torn can continue to do so. For Legacy application DAX : stands for Direct Access Allows mapping a pmem range directly into userspace via mmap If the application is aware of persistent, byte-addressable memory, and can use it to an advantage, DAX is the best path for it
  • 15. Using , Emulating PMEM on Linux
  • 16. Kernel Config ( > 4.2 ) Enable NVDIMM dynamic debug before you start playing with NVDIMMs Add to the kernel cmd line: libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg ignore_loglevel
  • 17. Pick your PMEM Use ACPI 6.0 compatible NVDIMM hardware or legacy NVDIMMs Use virtual NVDIMMs provided by hypervisor RAM as persistent memory PCMSIM: NVM-disk Emulation
  • 18. Emulation : RAM as PMEM Bare metal : Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory, starting at 16G. cat /proc/cmdline : BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181 resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G BTT works
  • 19. QEMU NVDIMM Qemu : qemu-system-x86_64 -object memory-backend-file,share,id=mem1,mem- path=/dax/D1 -device nvdimm,memdev=mem1,reserve-label-data,id=nv1 -m 2048,maxmem=100G,slots=10 …. Not yet in Upstream Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9 Seabios integration : http://www.seabios.org/pipermail/seabios/2015-September/009770.html
  • 20. Playing with DAX Only ext2, ext4 and xfs currently support DAX Note that block size should match page size mkfs.ext4 -b 4096 /dev/pmem1 mount -t ext4 -o dax /dev/pmem1 /tmp/dax/
  • 21. Playing with DAX - Cont Then you just have to mmap it! But remember: CFLUSH, etc.. for durability
  • 22. NVML : Lets somebody else do the heavy lifting http://pmem.io/ libpmem – Basic persistency handling Libvmmalloc - Transparently converts all the dynamic memory allocations into persistent memory allocations. libpmemblk – Block access to pmem libpmemlog - Log file on pmem (append-mostly) libpmemobj - Transactional Object Store on pmem Many more… pynvm , C++ bidings , etc..
  • 24. Remote NVMe : using RDMA to transfer NVMe commands & data http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/
  • 25. Transitioning from Indirect to Direct Flow ● Project Donard ( PMC - Microsemi) ● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)
  • 26. Comes with Challenge : Durability vs Visibility http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf
  • 28. RDMA + Non Allocating write
  • 29. Peer 2 Peer : Bypassing CPU + SW bottleneck ● NVM HW - Expose BAR address ● March 16 : RFC patchset for DAX allowing DMA to I/O mem ● CCIX fabric ● Use case: ○ Pre-process in Data path ○ Avoid RAM buffer ( HMM style ) ○ SW only fetch what is necessary
  • 30. Future Hyperscale Architecture NVMe gravy train for 3-5 years Transition to Pmem optimised apps and Natural evolution of Ethernet Connected Drive => Fabric connected Pmem Durable Array of Wimpy Nodes Direct PMEM Low power High perf K/V storage Use pluggable front end
  • 31. Links Drivers specs: http://pmem.io/documents/ NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9 Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html Libraries: https://github.com/pmem/nvml/ https://github.com/perone/pynvm http://opennvm.github.io/index.html https://github.com/spdk/spdk Project : PMFS : https://github.com/linux-pmfs/pmfs NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVA PCMSIM : https://code.google.com/p/pcmsim/ Patch : Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target : http://www.spinics.net/lists/linux- mm/msg103990.html