2. Related topics
• Optimization for VM performance improvement
• Measurement: tools & methods
• High performance computing in virtual machines
3. Background
• Performance is a permanent issue!
– no best, but better
– global optimization -> infrastructure, architecture, ...
– local optimization -> CPU, memory, I/O, storage, ...
• How to arbitrate the performance?
– principles & standards vs. feasibility
– tools & methods vs. implementation
• Various applications focus on different aspects
– application deployment
– case study
5. Virtualization infrastructure
• Operating system support for virtual machines.
USENIX ATC’03
– examine and reduce the large overhead for Type II VMMs
(e.g., SimOS, UML, UMLinux)
7. Virtualization infrastructure
• A comparison of software and hardware techniques
for x86 virtualization. ASPLOS’06
– conclusion: the hardware VMM suffers lower performance
than the pure software VMM
– defect of hardware VMM
• no support for MMU virtualization
• fails to co-exist with existing software techniques for MMU
virtualization
Look ahead for
nested paging hardware
8. Virtualization infrastructure
• Accelerating two dimensional page walks for
virtualized systems. ASPLOS’08
– present an in-depth examination of the 2D page table walk
overhead and options for decreasing it
10. Optimization
• Satori: Enlightened page sharing. USENIX ATC’09
– system for sharing memory in virtualized systems
– detect sharing opportunities and manage the surplus
memory
11. Optimization
• High performance VMM-Bypass I/O in virtual
machines. USENIX ATC’06
– allows time-critical I/O operations to be carried out
directly in guest VMs without involvement of the VMM
and/or a privileged VM
12. Optimization
• Optimizing network virtualization in Xen.
USENIX ATC’06
– redefine the virtual network interfaces of guest domains
to incorporate high-level network offload features
– optimize the implementation of the data transfer path
between guest and driver domains
– provide support for guest operating systems to effectively
utilize advanced virtual memory features such as
superpages and global page mappings
13. Optimization
• High performance and scalable I/O virtualization via
self-virtualized devices. HPDC’07
– self-virtualized devices, which offload selected
virtualization functionality from the hypervisor
– self-virtualized network interface (SV-NIC)
14. Optimization
• Bridging the gap between software and hardware
techniques for I/O virtualization. USENIX ATC’08
– Problem 1: paravirtualized I/O causes high CPU overhead.
– problem 2: direct I/O removes the benefits of the driver
domain model.
– Solution: bridge the performance gap between the driver
domain model and direct I/O
15. Optimization
• XenLoop: a transparent high performance inter-VM
network loopback. HPDC’08
– a fully transparent and high performance
– intercept outgoing network packets and shepherds the
packets destined to co-resident VMs through a high-speed
inter-VM shared memory channel
16. Optimization
• Virtualization Polling Engine (VPE): Using dedicated
CPU cores to accelerate I/O virtualization. ICS’09
– takes advantage of dedicated CPU cores to help with the
virtualization of I/O devices by using an event-driven
execution model with dedicated polling threads.
18. Optimization
• I/O scheduling model of virtual machine based on
multi-core dynamic partitioning. HPDC’10
– Problem: scheduling of I/O missions was now treated as a
secondary concern when compared with scheduling of
processor resources.
• This would cause serious degradation of I/O performance and
make virtualization less desirable for I/O-intensive applications.
– Solution: monitor I/O operations, divide processor cores
into 3 subsets which take different missions respectively.
19. Measurement
• Measuring CPU overhead for I/O processing in the
Xen virtual machine monitor. USENIX ATC’05
– a light weight monitoring system
– measure the CPU usage of different virtual machines
caused by I/O processing
– “page-flipping” technique of Xen
• the memory page containing the I/O data in the driver domain is
exchanged with an unused page provided by the guest OS.
20. Measurement
• Diagnosing performance overheads in the Xen virtual
machine environment. VEE’05
– Xenoprof: a system-wide statistical profiling toolkit
implemented for Xen
• enable coordinated profiling of multiple VMs in a system to obtain
the distribution of hardware events (e.g., clock cycles, cache and
TLB misses)
– use the toolkit to analyze performance overheads incurred
by networking applications running in Xen VMs
21. Measurement
• Xenprobes, a lightweight user-space probing
framework for Xen virtual machine. USENIX ATC’07
– a lightweight framework to probe the guest kernels
– be useful for various purposes
• monitor real-time status of production systems
• analyze performance bottlenecks
• log specific events tracing problems
– introduce some unique advantages
• put the breakpoint handlers in user-space => easy use
• allow to probe multiple guests at the same time
• support all kind of OS supported by Xen
22. Measurement
• An analysis of HPC benchmarks in virtual machine
environments. Euro-Par’08
– Problem: predicting performance for applications is
toughly difficult in virtual environments.
– Research: investigate the behavior and identify patterns of
various overheads for HPC benchmark applications.
23. Measurement
• Application performance modeling in a virtualized
environment. HPCA’09
– build performance models for applications in virtualized
environments
– propose an iterative model training technique based on
artificial neural networks which is found to be accurate
across a range of applications
24. Measurement
• Performance comparison of two virtual machine
scenarios using an HPC application. HPCVirt’09
– compare the performance implications using HPC
application
– two VM node configuration
• 2 VMs with 1 process/VM
• 1 VM with 2 processes/VM
– the difference in overall performance impact is around 3%
25. HPC
• A case for high performance computing with virtual
machines. ICS’06
– Two key ideas: VMM bypass I/O and scalable VM image
management.
26. HPC
• Virtualization for high-performance computing.
OSR 2006(vol.40)
– discuss the trends, motivations, and issues in hardware
virtualization with emphasis on their value in HPC
environments
27. HPC
• Improving performance by embedding HPC
applications in lightweight Xen domains. HPCVirt’08
– HPC application and its execution environment can be
embedded within a lightweight guest domain
28. Summary: research areas
• Reduce virtualization overhead
– infrastructure
• Xen vs. KVM vs. VMware
• cloud computing related
– CPU and memory
• On the low-level, software strategies are becoming less important,
but hardware.
• On the high level, optimization is increasingly derived from
algorithm rather than architecture.
– I/O
• continue to be hot topics!
• network, disk, filesystem, ...
29. Summary: research areas
• Measurement and tools
– benchmark
– diagnosis and performance bottleneck
– implementation of practical tools
• Application driven performance improvement
– behavior analysis of specific applications, especially with
respect to that triggering virtualization overhead
– local optimize and customize VM for definite application
scenario
30. Our past work
• Optimizing virtual machines using hybrid
virtualization. SAC’11
31. TCR: to be expected ...
• VM security
• Virtualization technology and platform
• Novel memory architecture
• Cloud computing
• App. case study under virtualization environment
• VM miscellaneous (e.g., migration, time keeping)
Notes de l'éditeur
Type I: IBM’s VM/370, Disco, and VMware’s ESX Server Type II: SimOS, User-Mode Linux, and UMLinux Hybrid: operate mostly on the physical hardware but use the host OS to perform I/O
extends the idea of OS-bypass originated from user-level communication Left fig.: VM-Bypass I/O (I/O Handled by VMM Directly) Right fig.: VM-Bypass I/O (I/O Handled by Another VM)
self-virtualized devices: an I/O virtualization approach which improves I/O performance by offloading selected virtualization functionality onto the device SV-NIC: (1) provides virtual interfaces (VIFs) to guest virtual machines for an underlying physical device, the network interface, (2) manages the way in which the device’s physical resources are used by guest operating systems, and (3) provides high performance, low overhead network access to guest domains.
1. TCP/UDP based network communication tends to perform poorly when used between co-resident VMs, but has the advantage of being transparent to user applications. 2. Other solutions exploit inter-domain shared memory mechanisms to improve communication latency and bandwidth, but require applications or user libraries to be rewritten against customized APIs – something not practical for a large majority of distributed applications. XenLoop: intercepts outgoing network packets beneath the network layer and shepherds the packets destined to co-resident VMs through a high-speed inter-VM shared memory channel that bypasses the virtualized network interface.
将 I/O 资源池化,然后委派专门的 core 去维护这个池。
Virtualization technology has been gaining acceptance in the scientific community due to its overall flexibility in running HPC applications.
In this paper, we show how an HPC application and its execution environment can be embedded within a lightweight guest domain, alongside a domain that runs a conventional OS which is only used for administrative purpose. That permits the execution environment to take advantage of kernel-level facilities to improve performance, which would be hard to achieve in the traditional process model because of lack of support or excessive overhead.