Galera Cluster for MySQL, Percona XtraDB Cluster and MariaDB Cluster (the three “flavours” of Galera Cluster) make use of the Galera WSREP libraries to handle synchronous replication.MySQL Cluster is the official clustering solution from Oracle, while Galera Cluster for MySQL is slowly but surely establishing itself as the de-facto clustering solution in the wider MySQL eco-system.
In this webinar, we will look at all these alternatives and present an unbiased view on their strengths/weaknesses and the use cases that fit each alternative.
This webinar will cover the following:
MySQL Cluster architecture: strengths and limitations
Galera Architecture: strengths and limitations
Deployment scenarios
Data migration
Read and write workloads (Optimistic/pessimistic locking)
WAN/Geographical replication
Schema changes
Management and monitoring
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
1. MySQL Cluster (NDB) vs Galera for MySQL
Confidential
December 11, 2014
Alex Yu
alex@severalnines.com
2. Copyright Severalnines AB
Webinar Housekeeping
!This webinar is being recorded
!A link to the recording & slides will be posted on severalnines.com
!We welcome questions: enter questions into the chat box and we
will respond at the end of the presentation
!Think of something later?
!Email Severalnines at info@severalnines.com
2
3. Copyright Severalnines AB
Agenda
!MySQL Cluster (NDB) and Galera Architecture Overview
!Read and write workloads
!Deployment scenarios
!WAN/Geographical replication
!Data migration & Schema changes
!Management and performance monitoring
3
4. Copyright Severalnines AB
MySQL Cluster (NDB Storage Engine)
!Distributed shared nothing realtime database cluster
!Multi-master, auto sharding, in-memory & disk-data storage
!Near linear scalability with transparent load balancing
!SQL and NoSQL interfaces
!Local and Geographical Replication
!Synchronous and Asynchronous replication
!99.999% availability, no single point of failure
!Telecom “Carrier Grade” legacy
4
5. Copyright Severalnines AB
MySQL Cluster Applications Example
!Subscriber Databases (Telecom HLR/HSS systems)
!Massive volume of write traffic (location and updates)
!Response time < 3ms
!eCommerce
!Payment processing and fulfilment
!High batch and realtime loads
!Service Delivery Platforms
!High volume of traffic
!Mixed read/write loads
5
6. Copyright Severalnines AB
MySQL Cluster Architecture
6
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node Data Node Data Node Data Node
Mgmt
MgmNt ode
Node
Web App Clients
NDB API (C++)
SQL based clients
MySQL Client/Server protocol
MGM C API
Management
Client
SQL Nodes
Data Nodes
Management Nodes
(default arbitrator)
Synchronous replication
within a Node group
7. Copyright Severalnines AB
Automatic Sharding
7
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node Data Node Data Node Data Node
8. Copyright Severalnines AB
Automatic Sharding (cont.)
8
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node 1 Data Node 2 Data Node 3 Data Node 4
• Sharding based on hashing the primary key or a user defined key
• Each node stores primary fragment for 1 partition and back-up fragment for another
• # of node groups == # of data nodes / # of replicas
9. Copyright Severalnines AB
Automatic Sharding (cont.)
9
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node 1 Data Node 2 Data Node 3 Data Node 4
• Sharding based on hashing the primary key or a user defined key
• Each node stores primary fragment for 1 partition and back-up fragment for another
• # of node groups == # of data nodes / # of replicas
10. Copyright Severalnines AB
Automatic Sharding (cont.)
10
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node 1 Data Node 2 Data Node 3 Data Node 4
• Sharding based on hashing the primary key or a user defined key
• Each node stores primary fragment for 1 partition and back-up fragment for another
• # of node groups == # of data nodes / # of replicas
11. Copyright Severalnines AB
Automatic Sharding (cont.)
11
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node 1 Data Node 2 Data Node 3 Data Node 4
• Sharding based on hashing the primary key or a user defined key
• Each node stores primary fragment for 1 partition and back-up fragment for another
• # of node groups == # of data nodes / # of replicas
12. Copyright Severalnines AB
Automatic Sharding (cont.)
12
Table T with 8 rows
4 Data Nodes - 2 Node Groups
Data Node Data Node Data Node Data Node
• Sharding based on hashing the primary key or a user defined key
• Each node stores primary fragment for 1 partition and back-up fragment for another
• # of node groups == # of data nodes / # of replicas
13. Copyright Severalnines AB
Automatic Sharding (cont.)
13
Data Node Data Node Data Node Data Node
4 partitions
Secondary Fragments
4 Data Nodes - 2 Node Groups
Primary Fragments
1
2
3
4
• The cluster is fully operational as long as we have 1 node up in each node group!
• If all nodes in a single node group is gone then the cluster will gracefully shutdown
14. Copyright Severalnines AB
Automatic Sharding (cont.)
14
Data Node Data Node Data Node Data Node
4 partitions
Secondary Fragments
4 Data Nodes - 2 Node Groups
Primary Fragments
1
2
3
4
• The cluster is fully operational as long as we have 1 node up in each node group!
• If all nodes in a single node group is gone then the cluster will gracefully shutdown
15. Copyright Severalnines AB
Primary Key Requests
15
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node Data Node Data Node Data Node
Mgmt
MgmNt ode
Node
Web Web Web
• PK lookup goes directly to the node with the primary fragment
• Parallel operations, Transparent load balancing
16. Copyright Severalnines AB
Joins, index and table scans
16
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node Data Node Data Node Data Node
Mgmt
MgmNt ode
Node
Web Web Web
• Table and Index Scans parallel on all nodes
• Joins executes on data nodes, merged results sent back to SQL node
17. Copyright Severalnines AB
Migration to MySQL Cluster
!Limitations
!14K row size, 512 attributes (columns + indexes) / table
!32 attributes / key, only first 3072 bytes of column can be used for index
!No fulltext or spatial indexes, temporary tables cannot be created using the NDB storage engine
!Every table must have a Primary Key
!Hidden PK is automatically created if not defined
!In-Memory or disk-based tables
!Dataset exceeds available system memory for the cluster?
!Network, Local and Global Checkpoint
!Write intensive, dimension disk subsystem
!Dedicated >= 1Gb/s networking
!ALTER TABLE … ENGINE NDB
!Alt. MySQL Replication, Backup & Restore
17
18. Copyright Severalnines AB
Deployment Scenarios
18
MySQL
[SQL Node]
MySQL
[SQL Node]
…
Data Node Data Node Data Node Data Node
…
• Scale up to 48 Data Nodes
• Limit of 255 number of nodes (regardless of type)
19. Copyright Severalnines AB
Deployment Scenarios (cont.)
19
Master Slave/Standby
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
MySQL Asynchronous Replication
Single Point of Failure
• Multiple replication topologies available
• Master - Master (Bi-directional)
West East
• Conflict detection and resolution
• Master - Slave(s)
• Circular
• etc
Synchronous replication
within a Node group
20. Copyright Severalnines AB
Deployment Scenarios (cont.)
20
Master Slave/Standby
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
Primary
START SLAVE only on Primary!
Secondary/Standby
West East
• Master - Slave(s)
• Standby replication channel
• Manual failover
21. Copyright Severalnines AB
Deployment Scenarios (cont.)
21
Master Master
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
MySQL
[SQL Node]
MySQL
[SQL Node]
Data Node
Data Node
Data Node
Data Node
Primary
Secondary/Standby
• Master - Master (Bi-directional)
• Conflict detection and resolution
West East
• “timestamp based”
• row by row not transaction based
22. Copyright Severalnines AB
Galera Cluster for MySQL
22
!Synchronous (Virtually) Multi-Master Replication
!Read and Write on any Node
!No Master Failover! No Slave Lag!
!Guaranteed write consistency
!Cluster wide conflicts resolution (certification)
!Automatic Node Provisioning
!Highly Available and Scalable
Client Client Client
R/W R/W R/W
MySQL
[WSREP]
!No SPOF
!Read and Write (Parallel Applier threads) scalability
!Geographical Replication (Mix MySQL Async & Galera Sync)
Galera Replication (Synchronous)
!Codership, Percona XtraDB Cluster, MariaDB Galera Cluster
LB
MySQL
[WSREP]
MySQL
[WSREP]
23. Copyright Severalnines AB
Galera Cluster for MySQL (cont.)
!Recommended minimum 3 nodes
!Network partition/split-brain
!Blocking SST (rsync, mysqldump)
!Higher probability for “deadlocks”
!Cluster wide optimistic locking
!Locking conflicts detected at commit
!First to commit succeeds
!Replication performance dependent on
!Network latency
!Performance of the “slowest” or the farthest Node (RTT)
!Number of deployed nodes
23
Client Client Client
R/W R/W R/W
MySQL
[WSREP]
LB
MySQL
[WSREP]
MySQL
[WSREP]
Galera Replication (Synchronous)
24. Copyright Severalnines AB
Galera Concepts
!Primary Component - PC
!The whole cluster is a PC during normal operation
!Node and network failures
! Splits clusters into several components
!Only PC can continue to modify state
!Quorum algorithm invoked to select a PC during cluster
partitioning
!Majority rules
!Minority tries to reconnect with PC
24
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
Primary Component
25. Copyright Severalnines AB
Galera Concepts (cont.)
!State Snapshot Transfer - SST
!A transfer of a consistent snapshot of a node state corresponding to a certain GTID
!Initialize the state of a newly joining cluster node from an already initialized node (donor)
!Incremental State Transfer - IST
!Catch up with the cluster by replaying missing transactions
! Known initial node state
! Enough transactions cached at the donor
! gcache.size < database size
25
26. Copyright Severalnines AB
High Latency Network
!Galera 2.x WAN replication (MySQL 5.5)
!Point to point connection for all nodes!
!Transaction latency dependent on the slowest link
26
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
DC1
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
DC2
27. Copyright Severalnines AB
High Latency Network (cont.)
!Galera 3.x WAN optimization (MySQL 5.6)
!“Cluster” Segment ID to group nodes by location
!Replication between segments go over a single connection
!Replication writesets distributed within each segment peer to peer
!Segment connection/gateway can change per transaction
27
gmcast.segment = 1 gmcast.segment = 2
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
segment gateway
DC1 DC2
28. Copyright Severalnines AB
High Latency Network (cont.)
!Galera 3.x WAN optimization (MySQL 5.6)
!“Cluster” Segment ID to group nodes by location
!Replication between segments go over a single connection
!Replication writesets distributed within each segment peer to peer
!Segment connection/gateway can change per transaction
28
gmcast.segment = 1 segment gateway gmcast.segment = 2
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
DC1 DC2
29. Copyright Severalnines AB
Network Partition/Split Brain
!Quorum based system
!“Majority >50%” partition continues operation
!“Minority” partition blocks operations
! Until reconnected with Primary Component
!Use odd number of nodes
!Minimum 3 (5, 7, 9 etc)
!Galera Arbitrator (garbd)
!Useful if you have even number of nodes
!Nodes across DCs
!Replication relay
29
Galera
Arbitrator
DC3
MySQL
[WSREP]
MySQL
[WSREP]
DC1
MySQL
[WSREP]
DC2
Client Client Client
Load balancer
Replication
Relay
30. Copyright Severalnines AB
Migration to Galera Cluster
!Only InnoDB storage engine
!Limited MyISAM support - not recommended
!Every Table should have a Primary Key
!DELETE operations are unsupported on tables without a primary key
!Rows in tables without a primary key may appear in a different order on
different nodes. (for cert. md5sum pseudo key from full row)
!Transaction size
! A writeset is processed as a single memory-resident buffer
! Extremely large transactions e.g. LOAD DATA can affect performance
! wsrep_load_data_splitting = ON | OFF # 10K inserts/transaction
! wsrep_max_ws_rows (128K), wsrep_max_ws_size (1GB)
30
31. Copyright Severalnines AB
Migration to Galera Cluster (cont.)
!Auto Increments
!Managed automatically
! Node-1: 1, 4, 7
! Node-2: 2, 5, 8
! Node-3: 3, 6, 9
!Auto increment sequence gaps if inserts hit different nodes randomly
!Triggers fire only in the Galera node which executes the
transaction
!Events fire on all nodes
31
32. Copyright Severalnines AB
Schema Changes
!DDLs replicated in statement format
!Two main methods
!TOI - Total Order Isolation
!RSU - Rolling Schema Upgrade
!wsrep_osu_method = TOI | RSU
!wsrep_desync=ON + wsrep_on=OFF
!Disconnect from cluster and stop writeset replication (standalone MySQL server)
!Dropping Node
!Set global wsrep_cluster_address=gcomm://
!Joining must be through IST
!Percona Toolkit
!pt-online-schema-change
32
33. Copyright Severalnines AB
Schema Changes (cont.)
!TOI - Total Order Isolation
!Default DDL replication method
!Strict consistency, all nodes get the same change
!No schema backwards compatibility
!Strict commit order force every transactions to wait until DDL is completed
!Cluster performance degradation
33
34. Copyright Severalnines AB
Schema Changes (cont.)
!RSU - Rolling Schema Upgrade
!Desynchronize node from replication until DDL completes
!Incoming replication is buffered, nothing is replicated out of the node
!After the DDL completes the node will automatically join the cluster and catch up missed transactions from
the writeset cache (gcache.size)
!Potential no cluster performance degradation
!Schema changes need to be backwards compatible
! Applications should be able to use old and new schemas
!Only one RSU operation at a time
!Rolling operation of the cluster is manual
34
35. Copyright Severalnines AB
Deployment Scenarios
35
Users Users Users
HAProxy Load Balancer
hthtpttp http http
HAProxy Load Balancer
R/W R/W R/W
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
Galera Replication (Synchronous)
ClusterControl
hthtpttp
Admin
VIP
http://support.severalnines.com/entries/23612682-Install-HAProxy-and-Keepalived-Virtual-IP- subnet
36. Copyright Severalnines AB
Deployment Scenarios (cont.)
36
Galera as MySQL Slave
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
Slave
MySQL
[Master]
MySQL Replication
wsrep_mysql_replication_bundle=N
• Replication events can be bundled to commit as a single group
• Less waits for replication synchronization
• wsrep_mysql_replication_bundle=n
• Groups n mysql replication transactions in one large transaction
37. Copyright Severalnines AB
Deployment Scenarios (cont.)
37
Galera as MySQL replication Master
MySQL
[WSREP]
Master
MySQL
[WSREP]
MySQL
[WSREP]
Master
MySQL
[Slave]
MySQL Replication
MySQL
[Slave]
DC1
DC2
• Backups & Reports
• Disaster Recovery
38. Copyright Severalnines AB
Deployment Scenarios (cont.)
38
MySQL
[WSREP]
Disaster Recovery
Master Standby
MySQL Replication
MySQL
[WSREP]
MySQL
[WSREP]
Master
MySQL
[WSREP]
MySQL
[WSREP]
MySQL
[WSREP]
Master
DC1 DC2
• “Manual” replication failover
• Slave lag
39. Copyright Severalnines AB
Deployment Scenarios (cont.)
39
MySQL
[WSREP]
Master
MySQL
[WSREP]
Slave
MySQL
[WSREP]
Slave
MySQL
[Master]
MySQL Replication
Multi-Source Sink
MySQL
[Master]
MySQL
[Slave]
http://www.severalnines.com/blog/multi-source-replication-galera-cluster-mysql
40. Copyright Severalnines AB
Severalnines - ClusterControl
!Monitor and Manage Heterogeneous Database Cluster
!MySQL Cluster, Galera Cluster for MySQL, MongoDB
!Automatic
!Node and Cluster Recovery
!Scheduled Backups
!Add/Remove Nodes
!Create single DB Node and Cluster
!Alerts/Email
!Host and DB Metrics
40
44. Copyright Severalnines AB
Thank You!
!Severalnines recorded webinars
!http://www.severalnines.com/resources/webinars
!Severalnines Blog
!www.severalnines.com/blog
!Galera Cluster for MySQL Intro
!http://www.severalnines.com/clustercontrol-mysql-galera-tutorial
!MySQL Cluster Training
!http://www.severalnines.com/mysql-cluster-training
!More Questions? Contact us at:
!info@severalnines.com
44