Optimize GemFire Performance for Latency Sensitive Workloads

VMware vFabric™ GemFire®
Virtualizing Latency Sensitive Workloads and vFabric GemFire – PEX 2012
Emad Benjamin – Staff Architect

© 2011 VMware Inc. All rights reserved

Agenda
 The Data Challenge and Latency Sensitive Workloads
 VMware vFabric Cloud Application Platform
 High Performance Data with vFabric GemFire
 Primary GemFire Topologies and Usage
 Design and Sizing
 Best Practices
 Customer Case Study
 Next Steps

2

The Data Challenge and
Latency Sensitive Workloads

3

Data Challenges in Modern Application Architectures

 Explosive data growth
• 60% year over year
 Bridging data supply with data demand
• Indeterminate user load, 24x7 access, new device types driving increased
application use
 Business challenges
• How to outpace competitors by delivering superior service and experience
 IT challenges
• Scalability
• Performance
• Data reliability
• Geographic distribution

4

Latency Sensitive

 10ms to 100ms matter?
• Then this is a latency sensitive application
• High chatter between VMs – many small data packets – many updates

5

VMware vFabric
Cloud Application Platform

6

VMware Cloud Application Platform

Programming Rich Social and Data Integration Batch Spring WaveMaker Cloud
Model Web Mobile Access Patterns Framework Tool Suite Foundry

Java Runtime Web Runtime Messaging Global Data In-mem SQL App Monitoring Performance Mgmt
(tc Server) (ERS) (RabbitMQ) (GemFire) (SQLFire) (Spring Insight) (Hyperic)

Java Automated
Optimizations Virtual Datacenter App Provisioning
(EM4J, …) (AppDirector)
Cloud Infrastructure and Management

7

High Performance Data
with vFabric GemFire

8

Your Apps Are Cloud-Friendly… but What About Your Data?

“ The big glaring hole [with cloud]
is data handling.
” -Adrian Kunzle, MD
Head of Engineering & Architecture, JPMorgan Chase

File Systems Databases Other Systems

9

What’s the Problem?

How do you
scale
the data tier?
10

What is VMware vFabric GemFire?

 Data moves to the middle tier
• Closer to where it is needed
 Scalability
• Easily accommodate more
application users
 High performance
• Dramatic application performance
gains – execute from memory
 Data reliability
• Data written through or behind
to disk
 Geographic distribution
• WAN connectivity

11

vFabric GemFire in a Nutshell

Enterprise Data Consuming Applications

Reliable Event Notification Continuous Querying Parallel Execution Data Durability

High Throughput Low Latency High Scalability Continuous Availability WAN Distribution

vFabric GemFire Data Fabric

Conventional Data Storage Systems

File Systems Databases Other Data Systems

12

Enabling Extreme Data Scalability and Elasticity

Primary Use Cases
 Web session cache, L2 cache
• Shopping cart state management
 App data cache, in-memory DB
• High performance OLTP

Application Data  Grid data fabric: client compute
Lives Here • Shared data grid accessed by many
clients executing app logic

 Grid data fabric: fabric compute
• Shared data grid where app logic is
executed within the data fabric itself

Application Data
Sleeps Here
File Systems Databases Mainframes/other

13

GemFire Features
Java, C++, .NET

Update Update Java, C++, .NET

 Rich objects Request

 Ultra-low latency RAM durability
 Elastic growth without pausing Regions
Customers Java
Customer
 Partitioned active data Orders
Java O-R Mapper Java O-R Mapper

 Redundancy for instant FT Redundant Copies Address

 Co-located active data Products
Street
Promotions
 Replicated master data OQL OQL
OQL
City
 Ultra-fast co-located transactions
 Distributed transactions* Preferences

 Server-side event listeners
 Client-side durable subscriptions
 Parallel MapReduce function execution

 Parallel OQL queries
 Continuous queries
 LRU overflow to disk in native format for fast retrieval
 Parallel, shared nothing persistence to disk with online backup
 Synchronous or asynchronous write-through, read-through
 Uni- or bi-directional cluster synchronization over WAN * Available in v 7.0

14

Primary GemFire Topologies
and Usage

15


 Peer-to-peer
• Intercommunicating set of vFabric GemFire servers that do not have clients
accessing them
• For example, back office, or backend type of processing

Peer Peer
GemFire Server 1 GemFire Server 2

16


 Client/Server is the most common topology used in practice

Client Tier
Standalone Standalone Standalone Standalone
Client Cache 1 Client Cache 2 Client Cache 3 Client Cache 4

GemFireServer 1 GemFire Server 2

Server Tier
17

Primary GemFire Topologies – Global Multisite Standby Gateway
Paths
GemFire
GemFire GemFire2 5
1 Standby Standby GemFire6
Gateway Gateway

GemFire3 GemFire4 GemFire8
GemFire7
Gateway Gateway

New York Site Tokyo Site
GemFire12
Standby GemFire9
Gateway Gateway Primary Gateway
Paths

GemFire 11 GemFire 10

London Site
18

Primary GemFire Usage – Hibernate Cache

19

Primary GemFire Usage – HTTP Session Management

20

Design and Sizing – Three Basic Steps

Step 1 Step 2 Step 3
Determine vFabric Benchmark vertical Benchmark horizontal
GemFire server JVM scalability to scalability to
heap size needed to determine VM size for determine how many
house region data for GemFire server vFabric GemFire
both RR and PR needed for the servers are needed in
regions building-block VM a cluster

22

Design and Sizing – Understanding JVM Memory Segments

VM
Memory
for Guest OS
GemFire Memory
-Xss per thread
Direct
Java Stack
Native
Other mem Memory
JVM
Memory Perm Gen -XX:MaxPermSize
for
GemFire Non
Direct
JVM
Memory
Max
Heap Virtual
-Xmx Initial -Xms Address
Heap Space

23

Design and Sizing – Understanding JVM Memory Segments

VM Memory for GemFire = Guest OS Memory + JVM Memory for GemFire
JVM Memory for GemFire =

JVM Max Heap (-Xmx value) +
JVM Perm Size (-XX:MaxPermSize) +
NumberOfConcurrentThreads * (-Xss) + ―other Mem‖

 Guest OS Memory approximately 0.5-1G (depends on OS/other
processes)

 Perm Size is an area additional to the –Xmx (Max Heap) value and
is not GC-ed because it contains class-level information.

 “other mem” is additional mem required for NIO buffers, JIT code
cache, classloaders, socket buffers (receive/send), JNI, GC internal
info
24

Design and Sizing – Step 1: Calculating Region Data

 Formula 1

TotalMemoryPerGemFireSystemWithHeadRoom = TotalMemoryPerGemFireSystem * 1.5

 Formula 2

TotalMemoryPerGemFireSystem =
TotalOfAllMemoryForAllRegions +
TotalOfAllMemoryForIndicesInAllRegions +
TotalMemoryForSocketsAndThreads

 Formula 3

NumberOfGemFireServers=NumberOfVMsInSystem=
NumberOfJVMsInSystem= TotalMemoryPerGemFireSystemWithHeadRoom / 32GB

25

Design and Sizing – Step 1: Calculating Region Data (cont.)

 Formula 4

ApproxServerMachineRAM=
TotalMemoryPerGemFireSystemWithHeadRoom *
(DataLossTolerancePercentage/ (NumberOfRedundantCopies +1) )

26

Set mem reservation to
31955m or set to Active
VM
memory used by VM which
Memory
Guest OS could be lower
for
GemFire Memory
500m used by OS
(31955)
Java Stack -Xss per thread (192k*100)
JVM
Memory
for Other mem (=1484m)
GemFire Perm Gen -XX:MaxPermSize (256m)
(31455m)

JVM Max
Heap Initial
-Xmx Heap
(29696m) -Xms (29696m)

27

What is the practical limit for JVM Memory sizing (not to scale)

Most limiting practical sizing
64 bit Java factor is the per NUMA node
Theoretical RAM
Limit

Guest OS
Limit
1 to 16 TB ESX5i limit
32vCPU
16 Exa Bytes 1TB RAM Physical
Server
limit
~256G Per NUMA
<1TB RAM

28

Design and Sizing – NUMA Considerations

NUMA Node Local Mem = Total RAM on Server/Number of NUMA nodes

 For Example 1
• Assume 2 sockets server with 8 cores (8pCPU) and total of 196GB RAM
• This server has 2 NUMA nodes
• Each NUMA node will have 196GB/2=> 98GB RAM
• Hence the largest sized virtual machine should not exceed 8vCPU and 98GB
RAM
 For Example 2
• 2 sockets quad core on each socket (4pCPU) and total of 64GB
• Each NUMA node would get 64/2=> 32GB
• Hence the largest GemFire virtual machine should be sized as 4vCPU and
32GB RAM

29

2vCPU VMs
Less than
ESX 32GB RAM
on each VM
Scheduler

Each NUMA
Node has 128/2 128 GB RAM
64GB on Server

4vCPU VM
2vCPU VMs
Less than
ESX 32GB RAM
on each VM
Scheduler

Split by ESX into
2 NUMA Clients
ESX4.1

Each NUMA
64GB on Server

4vCPU VMs
Less than
32GB RAM
on each VM

Each NUMA
64GB on Server

Step 2 and Step 3: Establish Benchmark Vertical Scalability

ESTABLISH vFabric GemFire Investigate bottlenecked
Vertical scalability Test

BUILDING BLOCK VM layer
 Size within NUMA boundaries of Network, Storage,
ESX host Application Configuration,
 Establish JVM Heap Size and vSphere
 Size the Building Block VM that
houses vFabric GemFire Server

Building Block If horizontal scaling
If building block
VM is bottlenecked
app/VM config
mitigate, and
problem, adjust
iterate scale out
and iterate No
test

Test
complete
Step 3 – Iterative Horizontal Scalability
DETERMINE HOW MANY VMs SLA
Horizontal Scalability Test Establish Horizontal OK?
ScalabilityScale Out Test
 How many VMs do you need to
Building Block Building Block meet your Response Time
VM VM SLAs without reaching
70%-80% saturation of CPU?
 Establish your Horizontal
scalability Factor before
bottleneck appear in your
application

33


 Formula 6 – for Global Multisite Topology
Maximum Throughput (bits/second) =
TCP-Windows-Size In Bits / Round Trip Latency in Seconds

 Use WAN accelerators

34

vFabric GemFire on VMware – Best Practices

 Best Practices paper here:
• http://www.vmware.com/resources/techresources/10231
 vFabric GemFire on VMware
• Set appropriate memory reservation
• Leave HT enabled, size bases on vCPU=1.25pCPU if needed
• RHEL6 and SLES 11 SP1 have tickless kernel that does not rely on a high
frequency interrupt-based timer, and is therefore much friendlier to virtualized
latency-sensitive workloads
• Do not overcommit memory

36


• Put vSphere Distributed Resource Scheduler (DRS) in manual mode
• Locators process should not be VMware vSphere® vMotion® migration, it
otherwise would lead to network split brain problems
• vMotion over 10Gbps when doing scheduled maintenance
• Disable VMware HA
• Use Affinity and Anti-Affinity rules to avoid redundant copies on the same
VMware ESX®/ESXi host

37


data Many enterprise apps
consuming data
GemFire VM
from GemFire and
running within
running within
NUMA boundary
NUMA boundary

38


• Disable NIC interrupt coalescing on physical and virtual NIC
• Extremely helpful in reducing latency for latency-sensitive virtual machines
• Disable virtual interrupt coalescing for VMXNET3
• It can lead to some performance penalties for other virtual machines on the ESXi host,
as well as higher CPU utilization to deal with the higher rate of interrupts from the
physical NIC
• This implies it is best to use dedicated ESX cluster for vFabric GemFire workloads
• All host is configured the same way for latency sensitivity and this insures non
GemFire workloads are not negatively impacted

39


 vFabric GemFire on VMware – JVM tuning
• Size with 50% headroom
• Use –XX:CompressedOops
• Use JDK 1.6.0_24 or later
• Set –Xms=-Xmx
• Use –XX:+UseConcMarkSweepGC low-pause collector and parallel Young
Generation

40


 vFabric GemFire on VMware – JVM tuning
• -XX:+DisableExplicitGC
• -XX:CMSInitiatingOccupancyFraction=<50-75>
• -Xmn 33% of –Xmx and ideally less than a range of 2GB

41


 vFabric GemFire on VMware – General
• All peer-to-peer members of the distributed system must have the same
version of vFabric GemFire
• Clients can be up to one major release behind. For example, any 6.x client
interoperates with any 6.x or 7.x server, but not with an 8.x server
• Set cache-server max-connections and max-threads
• Use GFMon and VSD tools for monitoring
• When troubleshooting performance problems, check to see you are not
impacted by SYN cookies
• SYN cookies are the key element of a technique used to guard against SYN
flood attacks. Daniel J. Bernstein, the technique's primary inventor, defines SYN
cookies as ―particular choices of initial TCP sequence numbers by TCP servers

42

Airline Industry

 Client/Server topology Client Client Client

 Re-architecture of their
main Web store
• To speed up search,
checkout/book process
 In 2010 Next Gen
Session
Next Gen
Session
Next Gen
Session
Next Gen
Session

• 80+ million passengers carried Server Server Server Server

• 12B in revenue
Number of servers per data center 4
Number of JVMs per server 1
Heap Size per JVM 34GB
-Xms34G and –Xmx34G
Available heap memory per JVM 34GB

Available RAM per JVM 17GB
Includes 50% ratio for churn
Total RAM needed per data center 136GB

44

Getting Started – vmware.com/go/gemfire

45

Thank you! Any Questions?

You can buy my book here: https://www.createspace.com/3632131

46

Consistency Model

Data Fabric Node Data Fabric Node Data Fabric Node Data Fabric Node


Archival, OLAP and
Regulatory RDBMS
Synchronous consistency within the fabric
Database Node
Eventual consistency with archival database

Eventual consistency with other fabric instances

Storage Device

48

Memory-Based Performance

High vFabric GemFire uses memory on peer machines to make data
updates durable, allowing the updating thread to return 10x to
Performance 100x faster than updates written through to disk, without risking
any data loss. Typical latencies are in the few hundreds of
microseconds instead of tens to hundreds of milliseconds

vFabric GemFire can optionally write updates to disk,
or to a data warehouse, asynchronously and reliably

49

Cloud Ready

Elastic
Add or remove data servers
dynamically

Fabric is elastic so it can grow or shrink dynamically
with no interruption of service or data loss

50

Distributed Events

Active

• Targeted, guaranteed
delivery, event
notification, and
continuous queries

51

Partitioning and Co-Location Example

Counterparty Descriptions

Settlement Instructions

Netting Agreements

Replicated regions
model many-to-many
relationships

 Many-to-many, many-to-one, and one-to-many relationships can be modeled
 Co-location of related data eliminates distributed transactions
 All entities within the transaction are located on a single machine
 Targeted procedures have all the data entities they need locally

52

Partitioning and Co-Location Example

Position Data

Trade Data

Market Data

Instrument Data

Rating Information

Partitioned Data

Partitioned regions
model one-to-many
and many-to-one

 Many-to-many, many-to-one and one-to-many relationships can be modeled
 Co-location of related data eliminates distributed transactions
 All entities within the transaction are located on a single machine
 Targeted procedures have all the data entities they need locally

53

Parallel Queries

Parallel Batch Controller
or Client

Scatter-Gather (Map-Reduce)
Queries and Functions

54

Fault Tolerant, Data-Aware Function Routing

Data Aware Function
Targeted Batch Controller
or Client

vFabric GemFire provides ―data aware function routing‖—moving the
behavior to the correct data instead of moving the data to the behavior

55

Multisite Capability

Active Everywhere

Data replication for
disaster recovery is
done with the fault-
tolerant, bi-directional
shared-nothing, store-
and-forward gateways

56

Data Distribution

Distribute

vFabric GemFire can keep clusters that are distributed around the world
―eventually consistent‖ in near real-time and can operate reliably in
disconnected, intermittent, and low-bandwidth network environments

57


 Formula 5

TotalMemoryForSocketsAndThreads =
TotalMemoryForSockets + TotalMemoryForThreadOverhead

TotalMemoryForThreadOverhead = MaxClientThreads * ThreadStackSize

TotalMemoryForSockets = TotaNumbrOfsockets * SocketBufferSizeBytes

TotalNumberOfSockets = NumberOfServers * NumberOfThreadsOnServer
+ AppThreads
+ MaxClientThreads
+ MaxClientThreads * 2 * NumberofServers *
IfHostPartitionedRegionAndConserveSocketsIsFalse

58

12vCPU VM

ESX
Scheduler

Can be done through
12 vCPU vSocket/vNUMA
in ESX5

Each NUMA
64GB on Server

Primary GemFire Usage – Hibernate Cache

 Hibernate configuration
• (hibernate.cfg.xml)
<property name="hibernate.cache.use_second_level_cache">true</property>
• Set region.factory_class to GemFireRegionFactory
(hibernate.cfg.xml version 3.3+)
<property name="hibernate.cache.region.factory_class">
com.gemstone.gemfire.modules.hibernate.GemFireRegionFactory
</property>

60

Enabling Extreme Data Scalability and Elasticity

Key Capabilities
 Low-latency, linearly-scalable,
memory-based data fabric
• Data distribution, replication,
partitioning, and co-location
• Pools memory and disk across many
nodes
Application Data
Lives Here  Data-aware execution
• Move functionality to the data for peak
performance

 Active/continuous querying and
event notification
• Changes are propagated to one or
more ―active‖ copies
Application Data
Sleeps Here
File Systems Databases Mainframes/other

61

GemFire in Mission Critical Wall Street Applications

 Reference data (top 3 US-based bank)
• Large amounts of in-memory data, mostly static but some intraday updates
• 5x–10x performance increase
• Global distribution – consistent global views
• Domain-specific and regional edge caches
 Market data (top 3 Japan-based financials firm)
• Ultra low latency for value added ―derived‖ market data
• Fault tolerant store-and-forward global data distribution
• Global consistency
 Risk processing system (top 3 US-based bank)
• Credit risk, market risk, trader risk
• Over 1TB of credit risk data processing
• Processing moving from batch toward real time
• Consistent snapshot of data across long running calculation/analysis
62

Optimize GemFire Performance for Latency Sensitive Workloads

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Optimize GemFire Performance for Latency Sensitive Workloads

Similaire à Optimize GemFire Performance for Latency Sensitive Workloads (20)

Plus de Carter Shanklin

Plus de Carter Shanklin (8)

Dernier

Dernier (20)

Optimize GemFire Performance for Latency Sensitive Workloads

Notes de l'éditeur