this Chapter gives information about Document Based Database and Graph based Database. It gives their basic structures, Features,applications ,Limitations and use cases
2. Document Databases
Documents are the main concept in document
databases.
The database stores and retrieves documents,
which can be XML, JSON, BSON, and so on.
These documents are self-describing, hierarchical
tree data structures which can consist of maps,
collections, and scalar values.
The documents stored are similar to each other but
do not have to be exactly the same.
Document databases store documents in the value
part of the key-value store; think about document
databases as key-value stores where the value is
examinable
3. What Is a Document Database?
Document databases are considered to be non-
relational (or NoSQL) databases.
Instead of storing data in fixed rows and columns,
document databases use flexible documents.
Document databases are the most popular
alternative to tabular, relational databases.
They do not have a set number of fields, slots, etc.
and there are no empty spaces -- the missing info is
simply omitted rather than there being an empty slot
left for it. Data can be added, edited, removed and
queried.
4. The keys assigned to each document are unique
identifiers required to access data within the
database, usually a path, string or Uniform Resource
Identifier. IDs tend to be indexed in the database to
speed up data retrieval.
The following list helps draw a parallel between the
two types of databases:
SQL: Table, Row, Column, Primary
key, Index, View, Nested table or object, Array
MongoDB: Collection, Document, Field, ObjectId,
Index, View, Embedded document, Array
5. What are documents?
A document is a record in a document database. A
document typically stores information about one
object and any of its related metadata.
Documents store data in field-value pairs. The
values can be a variety of types and structures,
including strings, numbers, dates, arrays, or objects.
Documents can be stored in formats like
JSON, BSON, and XML.
7. Collections
A collection is a group of documents.
Collections typically store documents that have
similar contents.
Not all documents in a collection are required to
have the same fields, because document databases
have a flexible schema.
8. CRUD operations
Document databases typically have an API or query
language that allows developers to execute the CRUD
(create, read, update, and delete) operations.
Create:
Documents can be created in the database. Each document
has a unique identifier.
Read:
Documents can be read from the database. The API or query
language allows developers to query for documents using their
unique identifiers or field values. Indexes can be added to the
database in order to increase read performance.
Update:
Existing documents can be updated — either in whole or in
part.
Delete:
Documents can be deleted from the database.s
9. Features
Consistency
Availability
Transactions
Document model
Flexible schema
Distributed and resilient
Querying through an API or query language
10. Consistency:
Consistency in MongoDB database is configured by using
the replica sets and choosing to wait for the writes to
be replicated to all the slaves or a given number of
slaves.
Every write can specify the number of servers the write
has to be propagated to before it returns as successful.
Similar to various options available for read, you can
change the settings to achieve strong write consistency, if
desired.
By default, a write is reported successful once the
database receives it; you can change this so as to wait for
the writes to be synced to disk or to propagate to two or
more slaves.
This is known as WriteConcern
11. Availability:
The CAP theorem dictates that we can have only
two of Consistency, Availability, and Partition
Tolerance.
Document databases try to improve on availability by
replicating data using the master-slave setup.
The same data is available on multiple nodes and
the clients can get to the data even when the primary
node is down.
Usually, the application code does not have to
determine if the primary node is available or not.
MongoDB implements replication, providing high
availability using replica sets.
12. Transactions:
Transactions at the single-document level are known as
atomic transactions.
Transactions involving more than one operation are
not possible, although there are products such as
RavenDB that do support transactions across multiple
operations.
By default, all writes are reported as successful.
A finer control over the write can be achieved by using
WriteConcern parameter.
13. Document model
Data is stored in documents (unlike other databases that
store data in structures like tables or graphs).
Documents map to objects in most popular programming
languages, which allows developers to rapidly develop
their applications.
Flexible schema:
Document databases have a flexible schema, meaning
that not all documents in a collection need to have the
same fields.
Note that some document databases support schema
validation, so the schema can be optionally locked down.
14. Distributed and resilient:
Document databases are distributed, which allows for
horizontal scaling (typically cheaper than vertical scaling)
and data distribution.
Document databases provide resiliency through
replication.
Querying through an API or query language:
Document databases have an API or query language that
allows developers to execute the CRUD operations on the
database.
Developers have the ability to query for documents based
on unique identifiers or field values.
15. Suitable Use Cases
Event Logging
Content Management Systems, Blogging
Platforms
Web Analytics or Real-Time Analytics
E-Commerce Applications
16. Event Logging
Applications have different event logging needs; within
the enterprise, there are many different applications that
want to log events.
Document databases can store all these different types of
events and can act as a central data store for event
storage.
This is especially true when the type of data being
captured by the events keeps changing.
Events can be sharded by the name of the application
where the event originated or by the type of event such as
order_processed or customer_logged.
17. Content Management Systems, Blogging
Platforms
Since document databases have no predefined schemas
and usually understand JSON documents, they work well
in content management systems or applications for
publishing websites, managing user comments, user
registrations, profiles, web-facing documents.
Web Analytics or Real-Time Analytics
Document databases can store data for real-time
analytics; since parts of the document can be updated, it’s
very easy to store page views or unique visitors, and new
metrics can be easily added without schema changes
18. E-Commerce Applications
E-commerce applications often need to have flexible
schema for products and orders, as well as the ability to
evolve their data models without expensive database
refactoring or data migration
19. Examples of Document Data Models
Amazon DocumentDB
MongoDB
Cosmos DB
ArangoDB
Couchbase Server
CouchDB
20. Advantages:
The document model is ubiquitous, intuitive, and
enables rapid software development.
The flexible schema allows for the data model to
change as an application's requirements change.
Document databases have rich APIs and query
languages that allow developers to easily interact
with their data.
Document databases are distributed (allowing for
horizontal scaling as well as global data distribution)
and resilient.
21. Disadvantages:
Weak Atomicity:
It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two
collections will require us to run two separate queries i.e.
one for each collection. This is where it breaks atomicity
requirements.
Consistency Check Limitations:
One can search the collections and documents that are
not connected to an author collection but doing this might
create a problem in the performance of database
performance.
Security:
Nowadays many web applications lack security which in
turn results in the leakage of sensitive data. So it
becomes a point of concern, one must pay attention to
web app vulnerabilities.
22. Graph Databases
A graph database is a type of database used to
represent the data in the form of a graph.
A graph database is a type of NoSQL database that
is designed to handle data with complex
relationships and interconnections.
In a graph database, data is stored as nodes and
edges, where nodes represent entities and edges
represent the relationships between those entities.
The concept of a Graph Database is based on the
theory of graphs. It was introduced in the year 2000.
23. They are commonly referred to NoSql databases as
data is stored using nodes, relationships and
properties instead of traditional databases.
A graph database is very useful for heavily
interconnected data. Here relationships between
data are given priority and therefore the relationships
can be easily visualized. They are flexible as new
data can be added without hampering the old ones.
They are useful in the fields of social networking,
fraud detection, AI Knowledge graphs etc.
24. It has three components:
nodes, relationships, and properties.
Nodes:
represent the objects or instances.
They are equivalent to a row in database.
The node basically acts as a vertex in a graph.
The nodes are grouped by applying a label to each
member.
25. Relationships:
They are basically the edges in the graph.
They have a specific direction, type and form patterns of
the data.
They basically establish relationship between nodes.
Properties:
They are the information associated with the nodes.
26.
27.
28. Once we have a graph of these nodes and edges
created, we can query the graph in many ways,.
A query on the graph is also known as traversing the
graph.
An advantage of the graph databases is that we can
change the traversing requirements without having to
change the nodes or edges.
In graph databases, traversing the joins or relationships
is very fast.
The relationship between nodes is not calculated at
query time but is actually persisted as a relationship.
Traversing persisted relationships is faster than
calculating them for every query.
30. Consistency
Since graph databases are operating on connected
nodes, most graph database solutions usually do not
support distributing the nodes on different servers.
There are some solutions, however, that support node
distribution across a cluster of servers, such as Infinite
Graph.
Within a single server, data is always consistent,
especially in Neo4J which is fully ACID-compliant.
When running Neo4J in a cluster, a write to the master is
eventually synchronized to the slaves, while slaves are
always available for read.
31. Writes to slaves are allowed and are immediately
synchronized to the master; other slaves will not be
synchronized immediately, though—they will have to wait
for the data to propagate from the master.
Graph databases ensure consistency through
transactions. They do not allow dangling relationships:
The start node and end node always have to exist, and
nodes can only be deleted if they don’t have any
relationships attached to them.
32. Transactions
Neo4J is ACID-compliant. Before changing any nodes or
adding any relationships to existing nodes, we have to
start a transaction.
A transaction has to be marked as success, otherwise
Neo4J assumes that it was a failure and rolls it back
when finish is issued.
sSetting success without issuing finish also does not
commit the data to the database.
33. Availability
Neo4J, as of version 1.8, achieves high availability by
providing for replicated slaves.
These slaves can also handle writes: When they are
written to, they synchronize the write to the current
master, and the write is committed first at the master and
then at the slave.
Other slaves will eventually get the update.
Neo4J uses the Apache ZooKeeper [ZooKeeper] to keep
track of the last transaction IDs persisted on each slave
node and the current master node.
If the server is the first one to join the cluster, it becomes
the master; when a master goes down, the cluster elects
a master from the available nodes, thus providing high
availability.
34. Query Features
Neo4J also has the Cypher [Cypher] query language
for querying the graph.
Cypher needs a node to START the query. The start
node can be identified by its node ID, a list of node IDs,
or index lookups.
Cypher uses the MATCH keyword for matching
patterns in relationships; the WHERE keyword filters
the
35. properties on a node or relationship. The RETURN
keyword specifies what gets returned by the query —
nodes, relationships, or fields on the nodes or
relationships.
Outside these query languages, Neo4J allows you to
query the graph for properties of the nodes, traverse
the graph, or navigate the nodes relationships using
language bindings
Properties of a node can be indexed using the indexing
service.
Similarly, properties of relationships or edges can be
indexed, so a node or edge can be found by the value.
Indexes should be queried to find the starting node to
begin a traversal
36. Advantages
Establishing the relationships with external sources as
well
No joins are required since relationships is already
specified.
Query is dependent on concrete relationships and not on
the amount of data.
It is flexible and agile.
it is easy to manage the data in terms of graph.
Efficient data modeling:
Graph databases allow for efficient data modeling by
representing data as nodes and edges. This allows for more
flexible and scalable data modeling than traditional relational
databases.
37. Flexible relationships:
Graph databases are designed to handle complex relationships
and interconnections between data elements. This makes them
well-suited for applications that require deep and complex queries,
such as social networks, recommendation engines, and fraud
detection systems.
High performance:
Graph databases are optimized for handling large and complex
datasets, making them well-suited for applications that require
high levels of performance and scalability.
Scalability:
Graph databases can be easily scaled horizontally, allowing
additional servers to be added to the cluster to handle increased
data volume or traffic.
Easy to use:
Graph databases are typically easier to use than traditional
relational databases. They often have a simpler data model and
query language, and can be easier to maintain and scale.
38. Disadvantages
Often for complex relationships speed becomes
slower in searching.
The query language is platform dependent.
They are inappropriate for transactional data
It has smaller user base.
Limited use cases: Graph databases are not suitable
for all applications. They may not be the best choice
for applications that require simple queries or that
deal primarily with data that can be easily
represented in a traditional relational database.
39. Specialized knowledge:
Graph databases may require specialized knowledge and
expertise to use effectively, including knowledge of graph
theory and algorithms.
Immature technology:
The technology for graph databases is relatively new and
still evolving, which means that it may not be as stable or
well-supported as traditional relational databases.
Integration with other tools:
Graph databases may not be as well-integrated with other
tools and systems as traditional relational databases,
which can make it more difficult to use them in
conjunction with other technologies.
40. Use cases of graph databases
Fraud detection
Connected Data
Recommendation engines
Route optimization
Pattern discovery
Knowledge management