17. greater number of smaller ranges faster than single token per node to rebuild the replacement node
重新相同数据量,多⽽而⼩小的速度要⽐比⼩小⽽而⼤大的来的快(从三个节点拷⻉贝⽐比从五个节点拷⻉贝要来的慢)
20. un-event data distribution
event data distribution
http://docs.datastax.com/en/archived/cassandra/1.1/docs/cluster_architecture/partitioning.html#
Ring:集群所有节点组成的Data Range
Token:每个节点都会在Ring上分配⼀一个或多个Token
Token Range:(上⼀一个Token,到当前Token]的值范围
Walk Clockwise:顺时针⽅方向⾛走到对应的第⼀一个节点
✅
🙅
分别对每个数据中⼼心都做均匀的分配,但是注意不能重叠分配Token
33. If no token is specified for the new node,
Cassandra automatically splits the token
range of the busiest node in the cluster.
The “busy” node streams half of its
data to the new node in the cluster.
When the node finishes bootstrapping,
it is available for client requests.
Vnodes simplify many tasks in Cassandra:
• You no longer have to calculate and assign tokens to each node.
• Rebalancing a cluster is no longer necessary when adding or removing nodes. When a node joins the cluster, it assumes(承担) responsibility
for an even(平等) portion of data from the other nodes in the cluster. If a node fails, the load is spread evenly across other nodes in the cluster.
• Rebuilding(重建,不是删除) a dead node is faster because it involves(包含,牵涉) every other node(其他所有节点) in the cluster and because
data is sent to the replacement node(替代的节点) incrementally(增量发送) instead of waiting until the end of the validation phase.
• Improves the use of heterogeneous machines in a cluster. You can assign a proportional number of vnodes to smaller and larger machines
When joining the cluster, a new node
receives data from all other nodes.
The cluster is automatically balanced after
the new node finishes bootstrapping.
Adding Capacity with VNodes or w/t VNodes
34. cluster = Cluster.builder()
.addContactPoints("192.168.50.100", "192.168.50.101")
.withLoadBalancingPolicy(new DCAwareRoundRobinPolicy("DC1"))
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.build();
session = cluster.connect(keyspace);
• Each node handles client requests, but the balancing policy is configurable
• Round Robin – evenly distributes queries across all nodes in the cluster, regardless of datacenter
• DC-Aware Round Robin – prefers hosts in the local datacenter and only uses nodes in remote
datacenters when local hosts cannot be reached
• Token-Aware – queries are first sent to local replicas
Load Balancing - Driver
Retry Policy - Client Driver
A policy that defines a default behavior to adopt when a request returns an exception.
Such policy allows to centralize the handling of query retries, allowing to minimize the need for
exception catching/handling in business code.
DowngradingConsistencyRetryPolicy - A retry policy that retries a query with a lower
consistency level than the one initially requested.
42. The client sends a mutation (insert/update/delete) to a node in the cluster.
That node serves as the coordinator for this transaction
Writing Data
RF=3
45. Writing Data
And the coordinator sends a successful response to the client.
RF=3
46. What if a node is down?
Only two nodes respond.
The client gets to choose if the write was successful.
Write Consistency Level = 2/Quorum
RF=3
• ONE Returns data from the nearest replica.
• QUORUM Returns the most recent data from the majority of replicas.
• ALL Returns the most recent data from all replicas.
47. CL = QUORUM
Will this write succeed?YES!!
A majority of replicas received the mutation.
RF=3
What if a node is down?
48. CL = QUORUM
Will this write succeed?NO.
Failed to write a majority of replicas.
RF=3
What if a node is down?
49. The client can still decide how to proceed
CL = QUORUM
DataStax Driver = DowngradingConsistencyRetryPolicy
Will this write succeed?YES!
With consistency downgraded to ONE, the write will succeed.
RF=3
50. Multi DC Writes
The coordinator forwards the mutation to local replicas and a remote coordinator.
DC1
RF=3
DC2
RF=3
51. The remote coordinator forwards the mutation to replicas in the remote DC
Multi DC Writes
DC1
RF=3
DC2
RF=3
57. What if the nodes disagree?
Data was written with QUORUM when one node was down.
The write was successful, but that node missed the update.
RF=3
WRITE
58. Now the node is back online, and it responds to a read request.
It has older data than the other replicas.
RF=3
What if the nodes disagree?
READ
59. The coordinator resolves the discrepancy and sends the newest data to the client.
READ REPAIR
The coordinator also notifies the “out of date” node that it has old data.
The “out of date” node receives updated data from another replica.
RF=3
What if the nodes disagree?
NEWEST
60. What if I’m only reading from a single node?
How will Cassandra know that a node has stale data?
C* will occasionally request a hash from other nodes to compare.
RF=3
Read Repair Chance
HASH
61. Hints provide a recovery mechanism for writes targeting offline nodes
• Coordinator can store a hint if target node for a write is down or fails to acknowledge
Hinted Handoff
HINT
62. The write is replayed when the target node comes online
Hinted Handoff
HINT
63. If all replica nodes are down, the write can still succeed once a hint has been written.
Note that if all replica nodes are down at write time, than ANY write will not be
readable until the replica nodes have recovered.
What if the hint is enough?
HINT
CL=ANY
64. During a read, does the coordinator really forward the query to all replicas?
That seems unnecessary!
Rapid Read Protection
RF=3
65. NO
Cassandra performs only as many requests as necessary to
meet the requested Consistency Level.
Cassandra routes requests to the most-responsive replicas.
Rapid Read Protection
RF=3
66. If a replica doesn’t respond quickly, Cassandra will try another node.
This is known as an “eager retry”
Rapid Read Protection
RF=3