SlideShare une entreprise Scribd logo
1  sur  12
Oracle NoSQL Database
Compared to Cassandra and HBase
Overview
 Oracle NoSQL Database is licensed under AGPL while Cassandra and HBase are Apache 2.0
licensed.
 Oracle NoSQL Database is in many respects, as a NoSQL Database implementation leveraging
BerkeleyDB in its storage layer, a commercialization of the early NoSQL implementations which
lead to the adoption of this category of technology. Several of the earliest NoSQL solutions
were based on BerkeleyDB and some are still to this day e.g. LinkedIn’s Voldemort. The Oracle
NoSQL Database is a Java based key-value store implementation that supports a value
abstraction layer currently implementing Binary and JSON types. Its key structure is designed in
such a way as to facilitate large scale distribution and storage locality with range based search
and retrieval. The implementation uniquely supports built in cluster load balancing and a full
range of transaction semantics from ACID to relaxed eventually consistent. In addition, the
technology is integrated with important open source technologies like Hadoop / MapReduce, an
increasing number of Oracle software solutions and tools and can be found on Oracle
Engineered Systems.
 Cassandra is a key-value store that supports a single value abstraction known as table-structure.
It uses partition based hashing over a ring based architecture where every node in the system
can handle any read-write request, so nodes become coordinators of requests when they do not
actually hold the data involved in the request operation.
 HBase is a key-value store that supports a single value abstraction known as table-structure (
popularly referred to as column family ). It is based on the Google Big Table design and is written
entirely in Java. HBase is designed to work on top of the HDFS file system. Unlike Hive, HBase
does not use MapReduce in its implementation, but accesses HDFS storage blocks directly and
storing a natively managed file type. The physical storage is similar to a column oriented
database and as such works particularly well for queries involving aggregations, similar to the
shared nothing analytic databases AsterData, GreenPlum, etc.
Comparison
The table below gives a high level comparison of Oracle NoSQL Database and Cassandra
features/capabilities. Low level details are found in links to Oracle and Cassandra online documentation.
Point HBase Cassandra ONDB
Foundatio
ns
HBase is based
on BigTable
(Google)
Cassandra is based on DynamoDB
(Amazon). Initially developed at
Facebook by former Amazon
engineers. This is one reason why
Cassandra supports multi data
center. Rackspace is a big
contributor to Cassandra due to
multi data center support.
ONDB is based Oracle
Berkeley DB Java Edition a
mature log-structured, high
performance, transactional
database.
Infrastruct
ure
HBase uses the
Hadoop
Infrastructure
(Zookeeper,
NameNode,
HDFS).
Organizations
that will
deploy Hadoop
anyway may
be comfortable
with leveraging
Hadoop
knowledge by
using HBase
Cassandra started and evolved
separate from Hadoop and its
infrastructure and Operational
knowledge requirements are
different than Hadoop. However,
for analytics, many Cassandra
deployments use Cassandra +
Storm (which uses Zookeeper),
and/or Cassandra + Hadoop.
ONDB has simple infrastructure
requirements and does not use
Zookeeper. Hadoop based
analytics are supported via a
ONDB/Hadoop connector.
Infrastruct
ure
Simplicity
and SPOF
The HBase-
Hadoop
Infrastructure
has several
"moving parts"
consisting of
Zookeeper,
Name Node,
Hbase Master,
and Data
Nodes,
Cassandra uses a a single Node-
type. All nodes are equal and
perform all functions. Any Node
can act as a coordinator, ensuring
no SPOF. Adding Storm or
Hadoop, of course, adds
complexity to the infrastructure.
ONDB uses a single node type
to store data and satisfy read
requests. Any node can accept a
request and forward it if
necessary. There is no SPOF. In
addition, there is a simple
watchdog process (the Storage
Node Agent or SNA for short)
on each machine to ensure high
availability and automatically
restart any data storage node in
Zookeeper is
clustered and
naturally fault
tolerant. Name
Node needs to
be clustered to
be fault
tolerant.
case of process level failures.
The SNA also helps with
administration of the store.
Read
Intensive
Use Cases
HBase is
optimized for
reads,
supported by
single-write
master, and
resulting strict
consistency
model, as well
as use of
Ordered
Partitioning
which supports
row-scans.
HBase is well
suited for
doing Range
based scans.
Cassandra has excellent single-
row read performance as long as
eventual consistency semantics are
sufficient for the use-case.
Cassandra quorum reads, which
are required for strict consistency
will naturally be slower than
Hbase reads. Cassandra does not
support Range based row-scans
which may be limiting in certain
use-cases. Cassandra is well suited
for supporting single-row queries,
or selecting multiple rows based
on a Column-Value index.
ONDB provides: 1) Strict
consistency reads at the master
2) eventual consistency reads,
with optional time constraints
on the recency of data and 3)
application level Read your
writes consistency. All reads
contact just a single storage
node making read operations
very efficient. ONDB also
supports range based scans.
Multi-Data
Center
Support
and
Disaster
Recovery
HBase
provides for
asynchronous
replication of
an HBase
Cluster across
a WAN. HBase
clusters cannot
be set up to
achieve zero
RPO, but in
steady-state
HBase should
be roughly
failover-
equivalent to
any other
DBMS that
relies on
Cassandra Random Partitioning
provides for row-replication of a
single row across a WAN, either
asynchronous (write.ONE,
write.LOCAL_QUORUM), or
synchronous (write.QUORUM,
write.ALL). Cassandra clusters
can therefore be set up to achieve
zero RPO, but each write will
require at least one wan-ACK
back to the coordinator to achieve
this capability.
[ Release 3.0 provides for
asynchronous cascaded
replication across data centers. ]
asynchronous
replication
over a WAN.
Fall-back
processes and
procedures
(e.g. after
failover) are
TBD.
Write.ON
E
Durability
Writes are
replicated in a
pipeline
fashion: the
first-data-node
for the region
persists the
write, and then
sends the write
to the next
Natural
Endpoint, and
so-on in a
pipeline
fashion.
HBase’s
commit log
"acks" a write
only after *all*
of the nodes in
the pipeline
have written
the data to their
OS buffers.
The first
Region Server
in the pipeline
must also have
persisted the
write to its
WAL.
Cassandra's coordinators will send
parallel write-requests to all
Natural Endpoints, The
coordinator will "ack" the write
after exactly one Natural Endpoint
has "acked" the write, which
means that node has also persisted
the write to its WAL. The writes
may or may not have committed to
any other Natural Endpoint.
ONDB considers a request with
ReplicaAckPolicy.NONE (the
ONDB equivalent of
Write.ONE) as having
completed after the change has
been written to the master's log
buffer; the change is propagated
to the other members of the
replication group, via an
efficient asynchronous stream-
based protcol.
Ordered
Partitionin
g
HBase only
supports
Ordered
Partitoning.
This means
Cassandra officially supports
Ordered Partitioning, but no
production user of Cassandra uses
Ordered Partitioning due to the
"hot spots" it creates and the
ONDB only supports random
partitioning. Prevailing
experience indicates that other
forms of partioning are really
hard to administer in practice.
that Rows for a
CF are stored
in RowKey
order in
HFiles, where
each Hfile
contains a
"block" or
"shard" of all
the rows in a
CF. HFiles are
distributed
across all data-
nodes in the
Cluster
operational difficulties such hot-
spots cause. Random Partitioning
is the only recommended
Cassandra partitioning scheme,
and rows are distributed across all
nodes in the cluster.
RowKey
Range
Scans
Because of
ordered
partitioning,
HBase queries
can be
formulated
with partial
start and end
row-keys, and
can locate rows
inclusive-of, or
exclusive of
these partial-
rowkeys. The
start and end
row-keys in a
range-scan
need not even
exist in Hbase.
Because of random partitioning,
partial rowkeys cannot be used
with Cassandra. RowKeys must be
known exactly. Counting rows in a
CF is complicated. It is highly
recommended that for these types
of use-cases, data should be stored
in columns in Cassandra, not in
rows.
ONDB range requests can be
defined with partial start and
end row-keys. The start and end
row-keys in a range-scan need
not exist in the store.
Linear
Scalability
for large
tables and
range
scans
Due to Ordered
Partitioning,
HBase will
easily scale
horizontally
while still
supporting
rowkey range
scans.
If data is stored in columns in
Cassandra to support range scans,
the practical limitation of a row
size in Cassandra is 10's of
Megabytes. Rows larger than that
causes problems with compaction
overhead and time.
There are no limits on range
scans across major or minor
keys. Range scans across major
keys require access to each
shard in the store. Release 3 will
support major key and index
range scans that are parallelized
across all the nodes in the store.
Minor key scans are serviced by
the single shard that contains
the data associated with the
minor key range.
Atomic
Compare
and Set
HBase
supports
Atomic
Compare and
Set. HBase
supports
supports
transaction
within a Row.
Cassandra does not support
Atomic Compare and Set.
Counters require dedicated counter
column-families which because of
eventual-consistency requires that
all replicas in all natural end-
points be read and updated with
ACK. However, hinted-handoff
mechanisms can make even these
built-in counters suspect for
accuracy. FIFO queues are
difficult (if not impossible) to
implement with Cassandra.
ONDB supports atomic
compare and set, making it
simple to implement counters.
ONDB also supports atomic
modification of multiple minor
key/value pairs under the same
major key.
Read Load
Balancing
- single
Row
Hbase does not
support Read
Load
Balancing
against a single
row. A single
row is served
by exactly one
region server at
a time. Other
replicas are
used ony in
case of a node
failure.
Scalability is
primarily
supported by
Partitioning
which
statistically
distributes
reads of
different rows
across multiple
data nodes.
Cassandra will support Read Load
Balancing against a single row.
However, this is primarily
supported by Read.ONE, and
eventual consistency must be
taken into consideration.
Scalability is primarily supported
by Partitioning which distributes
reads of different rows across
multiple data nodes.
ONDB supports read load
balancing. Only absolute
consistency reads need to be
directed to the master, eventual
consistency reads may be served
by any replica that can satisfy
the read consistency
requirements of the request.
Bloom
Filters
Bloom Filters
can be used in
HBase as
another form
of Indexing.
They work on
Cassandra uses bloom filters for
key lookup.
Bloom filters are used to
minimize reads to SST files that
do not contain a requested key,
in LSM-tree based storage
underlying HBase and
Cassandra. There is no need to
the basis of
RowKey or
RowKey+Colu
mnName to
reduce the
number of
data-blocks
that HBase has
to read to
satisfy a query.
(Bloom Filters
may exhibit
false-positives
(reading too
much data), but
never false
negatives
(reading not
enough data).
create and maintain Bloom
filters in the log-structured
storage architecture used by
ONDB.
Triggers
Triggers are
supported by
the
CoProcessor
capability in
HBase. They
allow HBase to
observe the
get/put/delete
events on a
table (CF), and
then execute
the trigger-
logic. Triggers
are coded as
java classes.
Cassandra does not support co-
processor-like functionality (as far
as we know)
ONDB does not support
triggers.
Secondary
Indexes
Hbase does not
natively
support
secondary
indexes, but
one use-case of
Triggers is that
a trigger on a
"put" can
automatically
Cassandra supports secondary
indexes on column families where
the column name is known. (Not
on dynamic columns).
Release 3.0 will support
secondary indexes.
keep a
secondary
index up-to-
date, and
therefore not
put the burden
on the
application
(client).
Simple
Aggregati
on
Hbase
CoProcessors
support out-of-
the-box simple
aggregations in
HBase. SUM,
MIN, MAX,
AVG, STD.
Other
aggregations
can be built by
defining java-
classes to
perform the
aggregation
Aggregations in Cassandra are not
supported by the Cassandra nodes
- client must provide aggregations.
When the aggregation requirement
spans multiple rows, Random
Partitioning makes aggregations
very difficult for the client.
Recommendation is to use Storm
or Hadoop for aggregations.
Aggregation is not supported by
ONDB.
HIVE
Integration
HIVE can
access HBase
tables directly
(uses de-
serialization
under the hood
that is aware of
the HBase file
format).
Work in Progress
( https://issues.apache.org/jira/bro
wse/CASSANDRA-4131)
No HIVE integration currently
PIG
Integration
PIG has native
support for
writing
into/reading
from HBase.
Cassandra 0.7.4+ No PIG integration currently
CAP
Theorem
Focus
Consistency,
Availability
Availability, Partition-Tolerance
Consistency, Availability,
Limited Partition-Tolerance if
there is a simple majority of
nodes on one side of a partition
(https://sleepycat.oracle.com/tra
c/wiki/JEKV/CAP has a
detailed discussion) .
Consistenc
y
Strong Eventual (Strong is Optional)
Offers different read
consistency models: 1) strict
consistency reads at the master
2) eventual consistency reads,
with optional time constraints
on the recency of data and 3)
Read your writes consistency.
Single
Write
Master
Yes
No (R+W+1 to get Strong
Consistency)
Yes
Optimized
For
Reads Writes
Both reads and writes. Log-
structured storage permits
append-only writes, with each
change being written once to
disk. Reads can be serviced at
any replica based upon the read
consistency requirements
associated with the request.
Reads can be satisfied at a
single node, by a single request
to disk. There are no bloom
filters to maintain and no risk of
false positives causing multiple
disk reads.
Main Data
Structure
CF, RowKey,
Name Value
Pair Set
CF, RowKey, Name Value Pair
Set
Major key, or minor key with its
associated value.
Dynamic
Columns
Yes Yes
Provides equivalent
functionality. Multiple minor
keys can be dynamically
associated with a major key.
Column
Names as
Data
Yes Yes
Provides equivalent
functionality via minor keys,
which can be treated as data.
Static
Columns
No Yes
[ R3.0 will support static
columns ]
RowKey
Slices
Yes No No
Static
Column
Value
Indexes
No Yes [ R3.0 ]
Sorted Yes Yes [ R3.0 ]
Column
Names
Cell
Versioning
Support
Yes No No
Bloom
Filters
Yes Yes(only on Key) Not necessary for ONDB
CoProcess
ors
Yes No No
Triggers
Yes(Part of
Coprocessor)
No No
Push
Down
Predicates
Yes(Part of
Coprocessor)
No No
Atomic
Compare
and Set
Yes No Yes
Explicit
Row
Locks
Yes No No
Row Key
Caching
Yes Yes Yes
Partitionin
g Strategy
Ordered
Partitioning
Random Partitioning
recommended
Random partitioning
Rebalanci
ng
Automatic
Not Needed with Random
Partitioning
Not Needed with Random
Partitioning
Availabilit
y
N-Replicas
across Nodes
N-Replicas across Nodes N-Replicas across Nodes
Data Node
Failure
Graceful
Degredation
Graceful Degredation
Graceful Degradation, as
described in the availability
section.
Data Node
Failure -
Replicatio
n
N-Replicas
Preserved
(N-1) Replicas Preserved + Hinted
Handoff
(N-1) Replicas Preserved.
Data Node
Restoratio
n
Same as Node
Addition
Requires Node Repair Admin-
action
Node catches up automatically
by replaying changes from a
member of the replication
group.
Data Node
Addition
Rebalancing
Automatic
Rebalancing Requires Token-
Assignment Adjustment
New nodes are added through
the Admin service, which
automatically redistributes data
across the new nodes.
Data Node
Manageme
nt
Simple (Roll
In, Role Out)
Human Admin Action Required Human Admin action required.
Cluster
Admin
Nodes
Zookeeper,
NameNode,
HMaster
All Nodes are Equal
ONDB has a highly available
Admin service, for
administrative actions, eg.
adding new nodes, replacing
failed nodes, software updates,
etc. but is not required for
steady state operation of the
service. There is a light weight
SNA process(described earlier)
on each machine to ensure high
availability and restart any data
storage node in case of failure.
SPOF
Now, all the
Admin Nodes
are Fault
Tolerant
All Nodes are Equal
There is no SPOF, as described
in the availability section.
Write.AN
Y
No, but
Replicas are
Node Agnostic
Yes (Writes Never Fail if this
option is used)
No
Write.ON
E
Standard, HA,
Strong
Consistency
Yes (often used), HA, Weak
Consistency
Yes. Requires that the Master be
reachable.
Write.QU
ORUM
No (not
required)
Yes (often used with
Read.QUORUM for Strong
Consistency
Yes. This is the default.
Write.ALL
Yes
(performance
penalty)
Yes (performance penalty, not
HA)
Yes (performance penalty, not
HA)
Asynchron
ous WAN
Replicatio
n
Yes, but it
needs testing
on corner
cases.
Yes (Replica's can span data
centers)
Asynchronous replication is
routine in ONDB. Nodes local
to the master will typically keep
up, and nodes seperated by high
latency WANs will have the
changes replayed
asynchronously via an efficient
stream based protocol.
Synchrono
us WAN
No
Yes with Write.QUORUM or
Write.EACH-QUORUM
Yes, for requests that require
acknowledgements
Replicatio
n
(ReplicaAckPolicy.SIMPLE_M
AJORITY or
ReplicaAckPolicy.ALL). The
acknowledging nodes will be
synchronized with the master.
Compressi
on Support
Yes Yes No

Contenu connexe

Tendances

Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014larsgeorge
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopPrasanna Rajaperumal
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 

Tendances (20)

Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Hbase
HbaseHbase
Hbase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 

Similaire à Oracle NoSQL Database Compared to Cassandra and HBase

Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project ReportTushar Dalvi
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationJoyabrata Das
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxBhavanaHotchandani
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsJeff Harris
 
cassandra
cassandracassandra
cassandraAkash R
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 

Similaire à Oracle NoSQL Database Compared to Cassandra and HBase (20)

Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Hbase
HbaseHbase
Hbase
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
H base
H baseH base
H base
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
cassandra
cassandracassandra
cassandra
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 

Plus de Paulo Fagundes

Oracle exalytics deployment for high availability
Oracle exalytics deployment for high availabilityOracle exalytics deployment for high availability
Oracle exalytics deployment for high availabilityPaulo Fagundes
 
Backup and Restore of database on 2-Node RAC
Backup and Restore of database on 2-Node RACBackup and Restore of database on 2-Node RAC
Backup and Restore of database on 2-Node RACPaulo Fagundes
 
Zero Downtime for Oracle E-Business Suite on Oracle Exalogic
Zero Downtime for Oracle E-Business Suite on Oracle ExalogicZero Downtime for Oracle E-Business Suite on Oracle Exalogic
Zero Downtime for Oracle E-Business Suite on Oracle ExalogicPaulo Fagundes
 
MongoDB for the SQL Server
MongoDB for the SQL ServerMongoDB for the SQL Server
MongoDB for the SQL ServerPaulo Fagundes
 
MongoDB - Javascript for your Data
MongoDB - Javascript for your DataMongoDB - Javascript for your Data
MongoDB - Javascript for your DataPaulo Fagundes
 
The Little MongoDB Book - Karl Seguin
The Little MongoDB Book - Karl SeguinThe Little MongoDB Book - Karl Seguin
The Little MongoDB Book - Karl SeguinPaulo Fagundes
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataPaulo Fagundes
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 

Plus de Paulo Fagundes (11)

Oracle exalytics deployment for high availability
Oracle exalytics deployment for high availabilityOracle exalytics deployment for high availability
Oracle exalytics deployment for high availability
 
Backup and Restore of database on 2-Node RAC
Backup and Restore of database on 2-Node RACBackup and Restore of database on 2-Node RAC
Backup and Restore of database on 2-Node RAC
 
Zero Downtime for Oracle E-Business Suite on Oracle Exalogic
Zero Downtime for Oracle E-Business Suite on Oracle ExalogicZero Downtime for Oracle E-Business Suite on Oracle Exalogic
Zero Downtime for Oracle E-Business Suite on Oracle Exalogic
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB for the SQL Server
MongoDB for the SQL ServerMongoDB for the SQL Server
MongoDB for the SQL Server
 
MongoDB - Javascript for your Data
MongoDB - Javascript for your DataMongoDB - Javascript for your Data
MongoDB - Javascript for your Data
 
Capacityplanning
Capacityplanning Capacityplanning
Capacityplanning
 
The Little MongoDB Book - Karl Seguin
The Little MongoDB Book - Karl SeguinThe Little MongoDB Book - Karl Seguin
The Little MongoDB Book - Karl Seguin
 
Mondodb
MondodbMondodb
Mondodb
 
The Power of Relationships in Your Big Data
The Power of Relationships in Your Big DataThe Power of Relationships in Your Big Data
The Power of Relationships in Your Big Data
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 

Dernier

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Dernier (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Oracle NoSQL Database Compared to Cassandra and HBase

  • 1. Oracle NoSQL Database Compared to Cassandra and HBase Overview  Oracle NoSQL Database is licensed under AGPL while Cassandra and HBase are Apache 2.0 licensed.  Oracle NoSQL Database is in many respects, as a NoSQL Database implementation leveraging BerkeleyDB in its storage layer, a commercialization of the early NoSQL implementations which lead to the adoption of this category of technology. Several of the earliest NoSQL solutions were based on BerkeleyDB and some are still to this day e.g. LinkedIn’s Voldemort. The Oracle NoSQL Database is a Java based key-value store implementation that supports a value abstraction layer currently implementing Binary and JSON types. Its key structure is designed in such a way as to facilitate large scale distribution and storage locality with range based search and retrieval. The implementation uniquely supports built in cluster load balancing and a full range of transaction semantics from ACID to relaxed eventually consistent. In addition, the technology is integrated with important open source technologies like Hadoop / MapReduce, an increasing number of Oracle software solutions and tools and can be found on Oracle Engineered Systems.  Cassandra is a key-value store that supports a single value abstraction known as table-structure. It uses partition based hashing over a ring based architecture where every node in the system can handle any read-write request, so nodes become coordinators of requests when they do not actually hold the data involved in the request operation.  HBase is a key-value store that supports a single value abstraction known as table-structure ( popularly referred to as column family ). It is based on the Google Big Table design and is written entirely in Java. HBase is designed to work on top of the HDFS file system. Unlike Hive, HBase does not use MapReduce in its implementation, but accesses HDFS storage blocks directly and storing a natively managed file type. The physical storage is similar to a column oriented database and as such works particularly well for queries involving aggregations, similar to the shared nothing analytic databases AsterData, GreenPlum, etc.
  • 2. Comparison The table below gives a high level comparison of Oracle NoSQL Database and Cassandra features/capabilities. Low level details are found in links to Oracle and Cassandra online documentation. Point HBase Cassandra ONDB Foundatio ns HBase is based on BigTable (Google) Cassandra is based on DynamoDB (Amazon). Initially developed at Facebook by former Amazon engineers. This is one reason why Cassandra supports multi data center. Rackspace is a big contributor to Cassandra due to multi data center support. ONDB is based Oracle Berkeley DB Java Edition a mature log-structured, high performance, transactional database. Infrastruct ure HBase uses the Hadoop Infrastructure (Zookeeper, NameNode, HDFS). Organizations that will deploy Hadoop anyway may be comfortable with leveraging Hadoop knowledge by using HBase Cassandra started and evolved separate from Hadoop and its infrastructure and Operational knowledge requirements are different than Hadoop. However, for analytics, many Cassandra deployments use Cassandra + Storm (which uses Zookeeper), and/or Cassandra + Hadoop. ONDB has simple infrastructure requirements and does not use Zookeeper. Hadoop based analytics are supported via a ONDB/Hadoop connector. Infrastruct ure Simplicity and SPOF The HBase- Hadoop Infrastructure has several "moving parts" consisting of Zookeeper, Name Node, Hbase Master, and Data Nodes, Cassandra uses a a single Node- type. All nodes are equal and perform all functions. Any Node can act as a coordinator, ensuring no SPOF. Adding Storm or Hadoop, of course, adds complexity to the infrastructure. ONDB uses a single node type to store data and satisfy read requests. Any node can accept a request and forward it if necessary. There is no SPOF. In addition, there is a simple watchdog process (the Storage Node Agent or SNA for short) on each machine to ensure high availability and automatically restart any data storage node in
  • 3. Zookeeper is clustered and naturally fault tolerant. Name Node needs to be clustered to be fault tolerant. case of process level failures. The SNA also helps with administration of the store. Read Intensive Use Cases HBase is optimized for reads, supported by single-write master, and resulting strict consistency model, as well as use of Ordered Partitioning which supports row-scans. HBase is well suited for doing Range based scans. Cassandra has excellent single- row read performance as long as eventual consistency semantics are sufficient for the use-case. Cassandra quorum reads, which are required for strict consistency will naturally be slower than Hbase reads. Cassandra does not support Range based row-scans which may be limiting in certain use-cases. Cassandra is well suited for supporting single-row queries, or selecting multiple rows based on a Column-Value index. ONDB provides: 1) Strict consistency reads at the master 2) eventual consistency reads, with optional time constraints on the recency of data and 3) application level Read your writes consistency. All reads contact just a single storage node making read operations very efficient. ONDB also supports range based scans. Multi-Data Center Support and Disaster Recovery HBase provides for asynchronous replication of an HBase Cluster across a WAN. HBase clusters cannot be set up to achieve zero RPO, but in steady-state HBase should be roughly failover- equivalent to any other DBMS that relies on Cassandra Random Partitioning provides for row-replication of a single row across a WAN, either asynchronous (write.ONE, write.LOCAL_QUORUM), or synchronous (write.QUORUM, write.ALL). Cassandra clusters can therefore be set up to achieve zero RPO, but each write will require at least one wan-ACK back to the coordinator to achieve this capability. [ Release 3.0 provides for asynchronous cascaded replication across data centers. ]
  • 4. asynchronous replication over a WAN. Fall-back processes and procedures (e.g. after failover) are TBD. Write.ON E Durability Writes are replicated in a pipeline fashion: the first-data-node for the region persists the write, and then sends the write to the next Natural Endpoint, and so-on in a pipeline fashion. HBase’s commit log "acks" a write only after *all* of the nodes in the pipeline have written the data to their OS buffers. The first Region Server in the pipeline must also have persisted the write to its WAL. Cassandra's coordinators will send parallel write-requests to all Natural Endpoints, The coordinator will "ack" the write after exactly one Natural Endpoint has "acked" the write, which means that node has also persisted the write to its WAL. The writes may or may not have committed to any other Natural Endpoint. ONDB considers a request with ReplicaAckPolicy.NONE (the ONDB equivalent of Write.ONE) as having completed after the change has been written to the master's log buffer; the change is propagated to the other members of the replication group, via an efficient asynchronous stream- based protcol. Ordered Partitionin g HBase only supports Ordered Partitoning. This means Cassandra officially supports Ordered Partitioning, but no production user of Cassandra uses Ordered Partitioning due to the "hot spots" it creates and the ONDB only supports random partitioning. Prevailing experience indicates that other forms of partioning are really hard to administer in practice.
  • 5. that Rows for a CF are stored in RowKey order in HFiles, where each Hfile contains a "block" or "shard" of all the rows in a CF. HFiles are distributed across all data- nodes in the Cluster operational difficulties such hot- spots cause. Random Partitioning is the only recommended Cassandra partitioning scheme, and rows are distributed across all nodes in the cluster. RowKey Range Scans Because of ordered partitioning, HBase queries can be formulated with partial start and end row-keys, and can locate rows inclusive-of, or exclusive of these partial- rowkeys. The start and end row-keys in a range-scan need not even exist in Hbase. Because of random partitioning, partial rowkeys cannot be used with Cassandra. RowKeys must be known exactly. Counting rows in a CF is complicated. It is highly recommended that for these types of use-cases, data should be stored in columns in Cassandra, not in rows. ONDB range requests can be defined with partial start and end row-keys. The start and end row-keys in a range-scan need not exist in the store. Linear Scalability for large tables and range scans Due to Ordered Partitioning, HBase will easily scale horizontally while still supporting rowkey range scans. If data is stored in columns in Cassandra to support range scans, the practical limitation of a row size in Cassandra is 10's of Megabytes. Rows larger than that causes problems with compaction overhead and time. There are no limits on range scans across major or minor keys. Range scans across major keys require access to each shard in the store. Release 3 will support major key and index range scans that are parallelized across all the nodes in the store. Minor key scans are serviced by the single shard that contains the data associated with the
  • 6. minor key range. Atomic Compare and Set HBase supports Atomic Compare and Set. HBase supports supports transaction within a Row. Cassandra does not support Atomic Compare and Set. Counters require dedicated counter column-families which because of eventual-consistency requires that all replicas in all natural end- points be read and updated with ACK. However, hinted-handoff mechanisms can make even these built-in counters suspect for accuracy. FIFO queues are difficult (if not impossible) to implement with Cassandra. ONDB supports atomic compare and set, making it simple to implement counters. ONDB also supports atomic modification of multiple minor key/value pairs under the same major key. Read Load Balancing - single Row Hbase does not support Read Load Balancing against a single row. A single row is served by exactly one region server at a time. Other replicas are used ony in case of a node failure. Scalability is primarily supported by Partitioning which statistically distributes reads of different rows across multiple data nodes. Cassandra will support Read Load Balancing against a single row. However, this is primarily supported by Read.ONE, and eventual consistency must be taken into consideration. Scalability is primarily supported by Partitioning which distributes reads of different rows across multiple data nodes. ONDB supports read load balancing. Only absolute consistency reads need to be directed to the master, eventual consistency reads may be served by any replica that can satisfy the read consistency requirements of the request. Bloom Filters Bloom Filters can be used in HBase as another form of Indexing. They work on Cassandra uses bloom filters for key lookup. Bloom filters are used to minimize reads to SST files that do not contain a requested key, in LSM-tree based storage underlying HBase and Cassandra. There is no need to
  • 7. the basis of RowKey or RowKey+Colu mnName to reduce the number of data-blocks that HBase has to read to satisfy a query. (Bloom Filters may exhibit false-positives (reading too much data), but never false negatives (reading not enough data). create and maintain Bloom filters in the log-structured storage architecture used by ONDB. Triggers Triggers are supported by the CoProcessor capability in HBase. They allow HBase to observe the get/put/delete events on a table (CF), and then execute the trigger- logic. Triggers are coded as java classes. Cassandra does not support co- processor-like functionality (as far as we know) ONDB does not support triggers. Secondary Indexes Hbase does not natively support secondary indexes, but one use-case of Triggers is that a trigger on a "put" can automatically Cassandra supports secondary indexes on column families where the column name is known. (Not on dynamic columns). Release 3.0 will support secondary indexes.
  • 8. keep a secondary index up-to- date, and therefore not put the burden on the application (client). Simple Aggregati on Hbase CoProcessors support out-of- the-box simple aggregations in HBase. SUM, MIN, MAX, AVG, STD. Other aggregations can be built by defining java- classes to perform the aggregation Aggregations in Cassandra are not supported by the Cassandra nodes - client must provide aggregations. When the aggregation requirement spans multiple rows, Random Partitioning makes aggregations very difficult for the client. Recommendation is to use Storm or Hadoop for aggregations. Aggregation is not supported by ONDB. HIVE Integration HIVE can access HBase tables directly (uses de- serialization under the hood that is aware of the HBase file format). Work in Progress ( https://issues.apache.org/jira/bro wse/CASSANDRA-4131) No HIVE integration currently PIG Integration PIG has native support for writing into/reading from HBase. Cassandra 0.7.4+ No PIG integration currently CAP Theorem Focus Consistency, Availability Availability, Partition-Tolerance Consistency, Availability, Limited Partition-Tolerance if there is a simple majority of nodes on one side of a partition (https://sleepycat.oracle.com/tra c/wiki/JEKV/CAP has a detailed discussion) .
  • 9. Consistenc y Strong Eventual (Strong is Optional) Offers different read consistency models: 1) strict consistency reads at the master 2) eventual consistency reads, with optional time constraints on the recency of data and 3) Read your writes consistency. Single Write Master Yes No (R+W+1 to get Strong Consistency) Yes Optimized For Reads Writes Both reads and writes. Log- structured storage permits append-only writes, with each change being written once to disk. Reads can be serviced at any replica based upon the read consistency requirements associated with the request. Reads can be satisfied at a single node, by a single request to disk. There are no bloom filters to maintain and no risk of false positives causing multiple disk reads. Main Data Structure CF, RowKey, Name Value Pair Set CF, RowKey, Name Value Pair Set Major key, or minor key with its associated value. Dynamic Columns Yes Yes Provides equivalent functionality. Multiple minor keys can be dynamically associated with a major key. Column Names as Data Yes Yes Provides equivalent functionality via minor keys, which can be treated as data. Static Columns No Yes [ R3.0 will support static columns ] RowKey Slices Yes No No Static Column Value Indexes No Yes [ R3.0 ] Sorted Yes Yes [ R3.0 ]
  • 10. Column Names Cell Versioning Support Yes No No Bloom Filters Yes Yes(only on Key) Not necessary for ONDB CoProcess ors Yes No No Triggers Yes(Part of Coprocessor) No No Push Down Predicates Yes(Part of Coprocessor) No No Atomic Compare and Set Yes No Yes Explicit Row Locks Yes No No Row Key Caching Yes Yes Yes Partitionin g Strategy Ordered Partitioning Random Partitioning recommended Random partitioning Rebalanci ng Automatic Not Needed with Random Partitioning Not Needed with Random Partitioning Availabilit y N-Replicas across Nodes N-Replicas across Nodes N-Replicas across Nodes Data Node Failure Graceful Degredation Graceful Degredation Graceful Degradation, as described in the availability section. Data Node Failure - Replicatio n N-Replicas Preserved (N-1) Replicas Preserved + Hinted Handoff (N-1) Replicas Preserved. Data Node Restoratio n Same as Node Addition Requires Node Repair Admin- action Node catches up automatically by replaying changes from a member of the replication group. Data Node Addition Rebalancing Automatic Rebalancing Requires Token- Assignment Adjustment New nodes are added through the Admin service, which
  • 11. automatically redistributes data across the new nodes. Data Node Manageme nt Simple (Roll In, Role Out) Human Admin Action Required Human Admin action required. Cluster Admin Nodes Zookeeper, NameNode, HMaster All Nodes are Equal ONDB has a highly available Admin service, for administrative actions, eg. adding new nodes, replacing failed nodes, software updates, etc. but is not required for steady state operation of the service. There is a light weight SNA process(described earlier) on each machine to ensure high availability and restart any data storage node in case of failure. SPOF Now, all the Admin Nodes are Fault Tolerant All Nodes are Equal There is no SPOF, as described in the availability section. Write.AN Y No, but Replicas are Node Agnostic Yes (Writes Never Fail if this option is used) No Write.ON E Standard, HA, Strong Consistency Yes (often used), HA, Weak Consistency Yes. Requires that the Master be reachable. Write.QU ORUM No (not required) Yes (often used with Read.QUORUM for Strong Consistency Yes. This is the default. Write.ALL Yes (performance penalty) Yes (performance penalty, not HA) Yes (performance penalty, not HA) Asynchron ous WAN Replicatio n Yes, but it needs testing on corner cases. Yes (Replica's can span data centers) Asynchronous replication is routine in ONDB. Nodes local to the master will typically keep up, and nodes seperated by high latency WANs will have the changes replayed asynchronously via an efficient stream based protocol. Synchrono us WAN No Yes with Write.QUORUM or Write.EACH-QUORUM Yes, for requests that require acknowledgements
  • 12. Replicatio n (ReplicaAckPolicy.SIMPLE_M AJORITY or ReplicaAckPolicy.ALL). The acknowledging nodes will be synchronized with the master. Compressi on Support Yes Yes No