SlideShare une entreprise Scribd logo
1  sur  32
Matt Lord - Principal Engineer, Vitess @
itess: VReplication
@mattalord
Standing on the Shoulders of a MySQL Giant
Agenda
@vitessio
❖ Vitess Overview
❖ VReplication Overview
❖ What’s New and Resources
Agenda
@vitessio
❖ Vitess Overview
❖ VReplication Overview
❖ What’s New and Resources
Vitess
A database clustering system for horizontal scaling of MySQL
• CNCF graduated project
• Open source, Apache 2.0 license
• Contributors from around the community
• Written in Golang
@vitessio
Cloud-native distributed database
- Runs in Kubernetes; Vitess Operator (VTOP)
Scalable
Highly available
Durability guarantees
Illusion of “single database”
- Single dedicated connection
- MySQL 5.7 or 8.0
- Compatible with frameworks / ORMs etc.
@vitessio
Vitess
Vitess Serves Millions of QPS in
Production
@vitessio
Concepts
Keyspace
- Logical database
Shard
- Subset or partition of a logical database
Cell
- Failure domain (e.g. DC or AZ)
@vitessio
Vitess Architecture Basics
A common replicated
database cluster with primary
and replicas
@vitessio
Vitess Architecture Basics
Each MySQL server is assigned a vttablet
- A daemon/sidecar
- Controls the mysqld process
- Interacts with the mysqld server
- Typically on same host as mysqld
@vitessio
Vitess Architecture Basics
In production you have multiple
keyspaces, each with 1 or more
shards
@vitessio
Vitess Architecture Basics
User and application traffic is routed via
vtgate
- A smart, stateless proxy
- Speaks the MySQL protocol
- Impersonates a monolithic MySQL
server
- Relays queries to vttablets
- Coordinates scatter-gather queries
when needed
@vitessio
Vitess Architecture Basics
A vitess deployment will run
multiple vtgate servers for scale out
@vitessio
Vitess Architecture Basics
vtgate will transparently route
queries to the correct keyspaces,
shards, and vttablets
app
app
commerce
shard 0
commerce
shard 1
internal
shard 1
(unsharded)
?
@vitessio
Vitess Architecture Basics
Queries routed based on schema & sharding
scheme (vindexes)
app
app
commerce/-80
commerce/80-
internal/-
USE commerce;
SELECT order_id, price
FROM orders
WHERE customer_id=4;
@vitessio
d2fd8867d50d2dfe
Vitess Architecture Basics
topo: distributed key/value store and coordination service
- Stores the state of vitess: schemas, shards, sharding
scheme, tablets, roles, etc.
- Provides a shared locking service
- etcd/ZooKeeper/Consul/Kubernetes
- Small dataset, mostly cached by vtgate
@vitessio
commerce/-80
commerce/80-
internal/-
vtctld: control daemon
- Runs ad hoc operations
- API server
- Reads/writes state in topo
- Uses locks in topo
- Operates on vttablets
@vitessio
Vitess Architecture Basics
commerce/-80
commerce/80-
internal/-
Vitess Architecture Summary
@vitessio
Agenda
❖ Vitess Overview
❖ VReplication Overview
❖ What’s New and Resources
@vitessio
VReplication
A Framework For Creating And Managing Data Streams And Workflows
When data matches some defined criteria then execute a defined
Workflow:
● Sharding
● Filtered Replication
● Transformations
● Materialized Views
● Online Migrations / Schema Changes
● Event Streams / CDC / Job Queues
● ...
@vitessio
● Add a tablet for the external/unmanaged MySQL instance
● Add this temporary tablet to Vitess
● Use MoveTables to move [and shard] the data into a Vitess
managed keyspace
Getting Data Into Vitess
@vitessio
leetdb/-80
leetdb/80-
okdb (RDS)
Unmanaged Tablet
● You have many/all tables in a single keyspace
● You want to move some of the tables to a new keyspace
● You want to achieve this without making significant changes to your
application or incurring downtime
● Use MoveTables to split the tables into N keyspaces
@vitessio
alldata/-
Vertical / Functional Sharding
products/-
orders/-
● Going from 1 to 2 shards, 2 to 4, 4 to 8, to … as your dataset and usage grows
○ Can also shrink by merging shards and/or keyspaces when needed
● Add new tablets to manage new keyspace range splits
● Use Reshard to redistribute the data
Horizontal Sharding
@vitessio
leetdb/-80
leetdb/80-
leetdb/80-c0
leetdb/c0-
leetdb/-40
leetdb/40-80
VDiff
@vitessio
- Runs a diff between the source
and target shards
- One VDiff per workflow
- This is a blocking call
● Real-time views into a [transformed] [subset] of data from db1 in db2
● Aggregated views on certain data to perform analytics against
● Local copies of a “global” lookup table (e.g. country,state,postcodes)
● This data is automatically kept correct and up-to-date
● Use Materialize to setup the materialized view
@vitessio
products/-
alldata/-
orders/-
Materialized Views
● Non-blocking, monitorable, revertable, cancelable, configurable throttling
● Supports typical SQL statements as well as declarative
● Resilient to failures, failovers, and topology changes
● Lazy, phased cleanup (to avoid cost of dropping large tables in prod)
● Uses VReplication, driven by primary tablet in each shard
$ vtctlclient ApplySchema -ddl_strategy "online" -sql "..." <keyspace>
● Has its own SQL statements:
mysql> show vitess_migrations; alter vitess_migration …;
mysql> show vitess_migration_logs; …
● Has its own set of vtctl commands:
$ vtctlclient OnlineDDL <keyspace> [show,retry,cancel]
[<migration_uuid>,all,running,complete,failed]
Online Schema Changes
@vitessio
● Use a Vitess managed message bus, job queue, or event stream
○ For example, managing “offline” processing data pipelines
● CREATE a standard [sharded] table with required fields and comments
○ https://vitess.io/docs/reference/features/messaging
● DMLs against the table generate events
● SUBSCRIBERs receive and acknowledge events via SQL or gRPC
Change Data Capture / Event Streams
@vitessio
app
app
commerce/-80
commerce/80-
internal/-
STREAM *
FROM vt_job_queue;
INSERT INTO vt_job_queue …;
UPDATE vt_job_queue
SET time_acked = NOW()
WHERE id = 100;
● This will grow over time based on new use cases that present
themselves
● Most recently, Vitess Native OnlineDDL
● Or your own custom workflows and pipelines! For example:
https://medium.com/bolt-labs/streaming-vitess-at-bolt-f8ea93211c3f
Any New Built-in Workflows...
@vitessio
● VDiff
○ Compare the full set of logical rows on both sides using a
consistent snapshot
● Limit impact on production traffic
○ configurable tablet throttling
○ –tablet_types, –max_replication_lag, –filtered_replication_wait_time, –
vreplication_copy_phase_max_innodb_history_list_length, –
vreplication_copy_phase_max_mysql_replication_lag, …
● -cells to avoid cross-AZ traffic
● Custom RoutingRules to make cutovers seamless for apps/users
● Rollbacks via $ vtctl ReverseTraffic
...
Safety Mechanisms
@vitessio
VDiff2
(Diffs done on tablets)
@vitessio
Agenda
@vitessio
❖ Vitess Overview
❖ VReplication Overview
❖ What’s New and Resources
New* and Upcoming Features
- 14.0 release GA late June 2022
- VReplication based Online DDL - 10.0 (GA in 14.0)
- Continuous benchmarking - since 11.0
- More supported query constructs - (MySQL 8.0) ongoing
- Gen4 query planner - GA in 13.0 (default in 14.0)
- MySQL compatible collations - 13.0
- Multi-column VIndexes - 13.0
- VTAdmin
- Automatic failure detection and handling (VTorc)
- VDiff2, running on tablets (and other improvements)
- Distributed x-shard transactions
@vitessio
‫٭‬ish
Resources
Docs: vitess.io/docs/
Code: github.com/vitessio/vitess
Slack: vitess.slack.com
Thank you!
@vitessio

Contenu connexe

Tendances

Tendances (20)

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Best practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialBest practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability Tutorial
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
MariaDB High Availability
MariaDB High AvailabilityMariaDB High Availability
MariaDB High Availability
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptx
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 

Similaire à Vitess VReplication: Standing on the Shoulders of a MySQL Giant

Integrating best of breed open source tools to vitess orchestrator pleu21
Integrating best of breed open source tools to vitess  orchestrator   pleu21Integrating best of breed open source tools to vitess  orchestrator   pleu21
Integrating best of breed open source tools to vitess orchestrator pleu21
Alkin Tezuysal
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
Steve Min
 

Similaire à Vitess VReplication: Standing on the Shoulders of a MySQL Giant (20)

Integrating best of breed open source tools to vitess orchestrator pleu21
Integrating best of breed open source tools to vitess  orchestrator   pleu21Integrating best of breed open source tools to vitess  orchestrator   pleu21
Integrating best of breed open source tools to vitess orchestrator pleu21
 
KubeCon_NA_2021
KubeCon_NA_2021KubeCon_NA_2021
KubeCon_NA_2021
 
How to shard MariaDB like a pro - FOSDEM 2021
How to shard MariaDB like a pro  - FOSDEM 2021How to shard MariaDB like a pro  - FOSDEM 2021
How to shard MariaDB like a pro - FOSDEM 2021
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
 
VMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphereVMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphere
 
How to build a winning solution for large scale VDI deployments
How to build a winning solution for large scale VDI deploymentsHow to build a winning solution for large scale VDI deployments
How to build a winning solution for large scale VDI deployments
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
Copy Data Management for the DBA
Copy Data Management for the DBACopy Data Management for the DBA
Copy Data Management for the DBA
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
 
VMworld - sto7650 -Software defined storage @VMmware primer
VMworld - sto7650 -Software defined storage  @VMmware primerVMworld - sto7650 -Software defined storage  @VMmware primer
VMworld - sto7650 -Software defined storage @VMmware primer
 
Cloud Bursting 101: What to do When Cloud Computing Demand Exceeds Capacity
Cloud Bursting 101: What to do When Cloud Computing Demand Exceeds CapacityCloud Bursting 101: What to do When Cloud Computing Demand Exceeds Capacity
Cloud Bursting 101: What to do When Cloud Computing Demand Exceeds Capacity
 
VoltDB on SolftLayer Cloud
VoltDB on SolftLayer CloudVoltDB on SolftLayer Cloud
VoltDB on SolftLayer Cloud
 
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
 
Slidy architecture
Slidy architectureSlidy architecture
Slidy architecture
 
VMworld 2014: Advanced SQL Server on vSphere Techniques and Best Practices
VMworld 2014: Advanced SQL Server on vSphere Techniques and Best PracticesVMworld 2014: Advanced SQL Server on vSphere Techniques and Best Practices
VMworld 2014: Advanced SQL Server on vSphere Techniques and Best Practices
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 

Plus de Matt Lord

Plus de Matt Lord (13)

MongDB Mobile: Bringing the Power of MongoDB to Your Device
MongDB Mobile: Bringing the Power of MongoDB to Your DeviceMongDB Mobile: Bringing the Power of MongoDB to Your Device
MongDB Mobile: Bringing the Power of MongoDB to Your Device
 
MongoDB Mobile: Bringing the Power of MongoDB to Your Device
MongoDB Mobile: Bringing the Power of MongoDB to Your DeviceMongoDB Mobile: Bringing the Power of MongoDB to Your Device
MongoDB Mobile: Bringing the Power of MongoDB to Your Device
 
Using MySQL Containers
Using MySQL ContainersUsing MySQL Containers
Using MySQL Containers
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
MySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB ClustersMySQL High Availability -- InnoDB Clusters
MySQL High Availability -- InnoDB Clusters
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQL
 
MySQL DBaaS with OpenStack Trove
MySQL DBaaS with OpenStack TroveMySQL DBaaS with OpenStack Trove
MySQL DBaaS with OpenStack Trove
 
Getting Started with MySQL Full Text Search
Getting Started with MySQL Full Text SearchGetting Started with MySQL Full Text Search
Getting Started with MySQL Full Text Search
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the Cloud
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
 

Dernier

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Dernier (20)

Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 

Vitess VReplication: Standing on the Shoulders of a MySQL Giant

  • 1. Matt Lord - Principal Engineer, Vitess @ itess: VReplication @mattalord Standing on the Shoulders of a MySQL Giant
  • 2. Agenda @vitessio ❖ Vitess Overview ❖ VReplication Overview ❖ What’s New and Resources
  • 3. Agenda @vitessio ❖ Vitess Overview ❖ VReplication Overview ❖ What’s New and Resources
  • 4. Vitess A database clustering system for horizontal scaling of MySQL • CNCF graduated project • Open source, Apache 2.0 license • Contributors from around the community • Written in Golang @vitessio
  • 5. Cloud-native distributed database - Runs in Kubernetes; Vitess Operator (VTOP) Scalable Highly available Durability guarantees Illusion of “single database” - Single dedicated connection - MySQL 5.7 or 8.0 - Compatible with frameworks / ORMs etc. @vitessio Vitess
  • 6. Vitess Serves Millions of QPS in Production @vitessio
  • 7. Concepts Keyspace - Logical database Shard - Subset or partition of a logical database Cell - Failure domain (e.g. DC or AZ) @vitessio
  • 8. Vitess Architecture Basics A common replicated database cluster with primary and replicas @vitessio
  • 9. Vitess Architecture Basics Each MySQL server is assigned a vttablet - A daemon/sidecar - Controls the mysqld process - Interacts with the mysqld server - Typically on same host as mysqld @vitessio
  • 10. Vitess Architecture Basics In production you have multiple keyspaces, each with 1 or more shards @vitessio
  • 11. Vitess Architecture Basics User and application traffic is routed via vtgate - A smart, stateless proxy - Speaks the MySQL protocol - Impersonates a monolithic MySQL server - Relays queries to vttablets - Coordinates scatter-gather queries when needed @vitessio
  • 12. Vitess Architecture Basics A vitess deployment will run multiple vtgate servers for scale out @vitessio
  • 13. Vitess Architecture Basics vtgate will transparently route queries to the correct keyspaces, shards, and vttablets app app commerce shard 0 commerce shard 1 internal shard 1 (unsharded) ? @vitessio
  • 14. Vitess Architecture Basics Queries routed based on schema & sharding scheme (vindexes) app app commerce/-80 commerce/80- internal/- USE commerce; SELECT order_id, price FROM orders WHERE customer_id=4; @vitessio d2fd8867d50d2dfe
  • 15. Vitess Architecture Basics topo: distributed key/value store and coordination service - Stores the state of vitess: schemas, shards, sharding scheme, tablets, roles, etc. - Provides a shared locking service - etcd/ZooKeeper/Consul/Kubernetes - Small dataset, mostly cached by vtgate @vitessio commerce/-80 commerce/80- internal/-
  • 16. vtctld: control daemon - Runs ad hoc operations - API server - Reads/writes state in topo - Uses locks in topo - Operates on vttablets @vitessio Vitess Architecture Basics commerce/-80 commerce/80- internal/-
  • 18. Agenda ❖ Vitess Overview ❖ VReplication Overview ❖ What’s New and Resources @vitessio
  • 19. VReplication A Framework For Creating And Managing Data Streams And Workflows When data matches some defined criteria then execute a defined Workflow: ● Sharding ● Filtered Replication ● Transformations ● Materialized Views ● Online Migrations / Schema Changes ● Event Streams / CDC / Job Queues ● ... @vitessio
  • 20. ● Add a tablet for the external/unmanaged MySQL instance ● Add this temporary tablet to Vitess ● Use MoveTables to move [and shard] the data into a Vitess managed keyspace Getting Data Into Vitess @vitessio leetdb/-80 leetdb/80- okdb (RDS) Unmanaged Tablet
  • 21. ● You have many/all tables in a single keyspace ● You want to move some of the tables to a new keyspace ● You want to achieve this without making significant changes to your application or incurring downtime ● Use MoveTables to split the tables into N keyspaces @vitessio alldata/- Vertical / Functional Sharding products/- orders/-
  • 22. ● Going from 1 to 2 shards, 2 to 4, 4 to 8, to … as your dataset and usage grows ○ Can also shrink by merging shards and/or keyspaces when needed ● Add new tablets to manage new keyspace range splits ● Use Reshard to redistribute the data Horizontal Sharding @vitessio leetdb/-80 leetdb/80- leetdb/80-c0 leetdb/c0- leetdb/-40 leetdb/40-80
  • 23. VDiff @vitessio - Runs a diff between the source and target shards - One VDiff per workflow - This is a blocking call
  • 24. ● Real-time views into a [transformed] [subset] of data from db1 in db2 ● Aggregated views on certain data to perform analytics against ● Local copies of a “global” lookup table (e.g. country,state,postcodes) ● This data is automatically kept correct and up-to-date ● Use Materialize to setup the materialized view @vitessio products/- alldata/- orders/- Materialized Views
  • 25. ● Non-blocking, monitorable, revertable, cancelable, configurable throttling ● Supports typical SQL statements as well as declarative ● Resilient to failures, failovers, and topology changes ● Lazy, phased cleanup (to avoid cost of dropping large tables in prod) ● Uses VReplication, driven by primary tablet in each shard $ vtctlclient ApplySchema -ddl_strategy "online" -sql "..." <keyspace> ● Has its own SQL statements: mysql> show vitess_migrations; alter vitess_migration …; mysql> show vitess_migration_logs; … ● Has its own set of vtctl commands: $ vtctlclient OnlineDDL <keyspace> [show,retry,cancel] [<migration_uuid>,all,running,complete,failed] Online Schema Changes @vitessio
  • 26. ● Use a Vitess managed message bus, job queue, or event stream ○ For example, managing “offline” processing data pipelines ● CREATE a standard [sharded] table with required fields and comments ○ https://vitess.io/docs/reference/features/messaging ● DMLs against the table generate events ● SUBSCRIBERs receive and acknowledge events via SQL or gRPC Change Data Capture / Event Streams @vitessio app app commerce/-80 commerce/80- internal/- STREAM * FROM vt_job_queue; INSERT INTO vt_job_queue …; UPDATE vt_job_queue SET time_acked = NOW() WHERE id = 100;
  • 27. ● This will grow over time based on new use cases that present themselves ● Most recently, Vitess Native OnlineDDL ● Or your own custom workflows and pipelines! For example: https://medium.com/bolt-labs/streaming-vitess-at-bolt-f8ea93211c3f Any New Built-in Workflows... @vitessio
  • 28. ● VDiff ○ Compare the full set of logical rows on both sides using a consistent snapshot ● Limit impact on production traffic ○ configurable tablet throttling ○ –tablet_types, –max_replication_lag, –filtered_replication_wait_time, – vreplication_copy_phase_max_innodb_history_list_length, – vreplication_copy_phase_max_mysql_replication_lag, … ● -cells to avoid cross-AZ traffic ● Custom RoutingRules to make cutovers seamless for apps/users ● Rollbacks via $ vtctl ReverseTraffic ... Safety Mechanisms @vitessio
  • 29. VDiff2 (Diffs done on tablets) @vitessio
  • 30. Agenda @vitessio ❖ Vitess Overview ❖ VReplication Overview ❖ What’s New and Resources
  • 31. New* and Upcoming Features - 14.0 release GA late June 2022 - VReplication based Online DDL - 10.0 (GA in 14.0) - Continuous benchmarking - since 11.0 - More supported query constructs - (MySQL 8.0) ongoing - Gen4 query planner - GA in 13.0 (default in 14.0) - MySQL compatible collations - 13.0 - Multi-column VIndexes - 13.0 - VTAdmin - Automatic failure detection and handling (VTorc) - VDiff2, running on tablets (and other improvements) - Distributed x-shard transactions @vitessio ‫٭‬ish