This presentation tackles a particularly challenging situation that often occurs when creating a distributed relational database.
In this presentation you will learn:
- What a ‘shard conflict’ is
- How to identify ‘shard conflicts’
- How to resolve ‘shard conflicts’ in a distributed database
- How ‘shard conflicts’ affect query processing
2. 2
The Database Scalability: The Shard Conflict
This presentation tackles a particularly
challenging situation that often occurs when
creating a distributed database.
In this presentation you will learn:
• What a ‘shard conflict’ is
• How to identify ‘shard conflicts’
• How to resolve ‘shard conflicts’ in a distributed database
• How ‘shard conflicts’ affect query processing
3. 3
Traditional Databases vs. Distributed Databases
Traditional Monolithic DB
Made up of tables of data that are
related to one another
Modern Distributed DB
Data distribution is necessary for
scalability
All of the data is located in one place and
is easily accessible
Information is spread across various
servers (instances)
The data relationship is stored deep in
the database and can be easily analyzed
and queried using conventional methods
Related data can be distributed into
different partitions, or shards, making
related query requests difficult to
process
4. 4
So, What Is a‘Shard Conflict’?
At ScaleBase, we have coined the term ‘shard conflict’ to
describe a situation where:
• A given statement cannot be executed as is, unchanged,
on all (or one) partitions and cannot be relied upon to
yield a truly correct result.
Let’s take a look at the following examples…
5. 5
Identifying the Conflict
Example #1
Choosing ‘id’ as the
shard key presents a
shard conflict,
because there is no
guarantee that all
employees are in the
same shard as their
corresponding
departments.
6. 6
Resolving the Conflict
Example #2
The Method
• Choose
‘department_id’ as
the ‘Employee
Table’shard key
The Outcome:
• The join query was
optimized as a result
of all department-
related data being
stored in the same
partition
• No cross-joins exist
between partitions
• Statements can now
safely be executed
on all partitions
7. 7
Wait a Minute...There’s Still a Conflict
‘Select e.first_name, e.last_name, m.first_name, m.last_name
from employee e join employee m on e.manager_id=m.id’
Join the ‘Employee Table’
together with itself to find a
manager there is no
guarantee they are in the
same shard.
The employee tables are not
capable of being sharded by
both ‘id’ and ‘manager_id’ at
the same time.
8. 8
‘Shard Conflict’ Effects on Query Processing
• It is clear from the examples that when dealing
with a foreign key and two tables, a common key
can be utilized to resolve certain (but not all)
conflicts
• Distributed data can become quite complex if not
handled correctly
• It’s the kind of problem that is not always
obvious, and can yield incorrect results,
unnoticed
9. 9
ScaleBase Can Help
ScaleBase is a modern, distributed MySQL database management
system. It is optimized for the cloud and deploys in minutes to enable you
to scale out to an unlimited number of users, data and transactions.
It is a horizontally scalable database cluster built on MySQL that
dynamically optimizes workloads and availability by logically distributing
data across public, private and geo-distributed clouds.
Contact Us
sales@scalebase.com
or
Download free software
ScaleBase Software
http://www.scalebase.com/software/
Use your relational aDBA skills
and get NoSQL capabilities
10. 10
Start Using ScaleBase Today
Check out ScaleBase’s software
• ScaleBase on Amazon
• ScaleBase on Rackspace
Notes de l'éditeur
The Future of the DBA: Adapting to a New World of IT
This presentation reviews the forces, trends and analyst research that is shaping the changing role of the DBA, along with the new skills required from DBAs in the current IT market
At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where:
A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result.
Let’s take a look at the following examples…
Example #1
Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
Example #2
The Method
Choose ‘department_id’ as the ‘Employee Table’shard key
The Outcome:
The join query was optimized as a result of all department-related data being stored in the same partition
No cross-joins exist between partitions
Statements can now safely be executed on all partitions
Join the ‘Employee Table’ together with itself to find a manager there is no guarantee they are in the same shard.
The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.
It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts
Distributed data can become quite complex if not handled correctly
It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed
ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions.
It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds.
Use your relational aDBA skills and get NoSQL capabilities
Contact Us
sales@scalebase.com
or
Download a free software
ScaleBase Software
http://www.scalebase.com/software/
Check out ScaleBase software
ScaleBase on Amazon
ScaleBase on Rackspace