10 Types of Databases (2019)

Types of Databases

10 Types of Databases used in 2019

This blog explains 10 types of NoSQL databases that used in 2019. It introduces the characteristics and examples of NoSQL databases, namely, column, key-value, document, and graph databases.


The term NoSQL is more of an approach or way to address data management versus being a rigid definition.

NoSQL databases

NoSQL databases choose their priority for CAP tolerance. For instance,

  • HBase and Accumulo are consistent and partition tolerant.
  • CouchDB is partition tolerant and available.
  • Neo4j is available and consistent.


Choosing the correct NoSQL database requires an understanding of factors such as the requirements, SLAs (latency), initial size and growth projections, costs, feature or functionality, and concurrency.


It’s also important to understand that NoSQL databases have their own defaults and may have the capability of changing their priorities for CAP. It is also important to understand the roadmap of these NoSQL databases because they are evolving quickly.


NoSQL systems have been divided into four major categories, namely, Column Databases, Key-Value Databases, Document Databases, and Graph Databases

Column Databases

1. Column Databases: These systems partition a table by column into column families, where each column family is stored in its own files. They also allow versioning of data values.


Column-oriented data stores are used when key-value data stores reach their limits because you want to store a very large number of records with a very large amount of information that goes beyond the simple nature of the key-value store.


The main benefit of using columnar databases is that you can quickly access a large amount of data.


A row in an RDBMS is a continuous disk entry and multiple rows are stored in different disk locations, which makes them more difficult to access; in contrast, in columnar databases, all cells that are part of a column are stored contiguously.


As an example, consider performing a lookup for all blog titles in an RDBMS. For tackling millions of records common on the web, it might be costly in terms of disk entries, whereas in columnar databases, such a search would represent only one access.


Such databases are very handy for retrieving large amounts of data from a specific family, but the tradeoff is that they lack flexibility. The most used columnar database is Google Cloud BigTable, especially Apache HBase and Cassandra.


One of the other benefits of columnar databases is ease of scaling because data are stored in columns; these columns are highly scalable in terms of the amount of information they can store. This is why they are used mainly for keeping nonvolatile, long-living information and in scaling use cases.


2. Key-Value Databases: These systems have a simple data model based on fast access by the key to the value associated with the key; the value can be a record or an object or a document or even have a more complex data structure.

Key-Value Databases

Key-value databases are the easiest NoSQL databases to understand. These data stores basically act like a dictionary and work by matching a key to a value.


They are often used for high-performance use cases in which basic information needs to be stored—for example, when session information may need to be written and retrieved very quickly.


These data stores really perform well and are efficient for this kind of use case; they are also usually highly scalable.


Key-value data stores can also be used in a queuing context to ensure that data would not be lost, such as in logging architecture or search engine indexing architecture use cases.


Redis and Riak KV are the most famous key-value data stores; Redis is more widely used and has an in-memory K-V store with optional persistence.


Redis is often used in web applications to store session-related data, such as node or PHP web applications; it can serve thousands of session retrievals per second without altering the performance.


3. Document Databases: These systems store data in the form of documents using well-known formats such as JavaScript Object Notation (JSON). Documents are accessible via their document id, but can also be accessed rapidly using other indexes.


Columnar databases are not the best for structuring data that contain deeper nesting structures—that is where document-oriented data stores come into play.


Data are indeed stored into key-value pairs, but these are all compressed into what is called a document. This document relies on a structure or encoding such as XML, but most of the time, it relies on JavaScript Object Notation (JSON).


Document-oriented databases are used whenever there is a need to nest information. For instance, for representing an account in your application may have the following information:


Basic information: first name, last name, birthday, profile picture, URL, creation date, and so on


Additional information: address, authentication method (password, Facebook, etc.), interests, and so on NoSQL document stores are often used in web applications because representing an object with the nested object is fairly easy; moreover, integrating with front-end JavaScript technology is seamless because both technologies work with JSON.


Although document databases are more useful structurally and for representing data, they also have their downside. They basically need to acquire the whole document—for example, when they are reading for a specific field—and this can dramatically affect performance. Famous NoSQL document databases are MongoDB, Couchbase, and Apache CouchDB.


4. Graph Databases:

 Graph Databases


Characteristics of NoSQL Systems

1. Availability, Replication, and Eventual Consistency: Many applications that use NoSQL systems require continuous system availability. To accomplish this, data are replicated over two or more nodes in a transparent manner, so that if one node fails, the data are still available on other nodes.


Replication improves data availability and can also improve read performance because read requests can often be serviced from any of the replicated data nodes.


However, write performance becomes more cumbersome because an update must be applied to every copy of the replicated data items; this can slow down write performance if serializable consistency is required.


 Many NoSQL applications do not require serializable consistency, so more relaxed forms of consistency known as eventual consistency are used.


2. Replication: Two major replication models are used in NoSQL systems: master-slave and master-master replication:


a. The master-slave replication requires one copy to be the master copy; all write operations must be applied to the master copy and then propagated to the slave copies, usually using eventual consistency, that is, the slave copies will eventually be the same as the master copy. For read, the master-slave paradigm can be configured in various ways.


One configuration requires all reads to also be at the master copy, so this would be similar to the primary site or primary copy methods of distributed concurrency control, with similar advantages and disadvantages.


Another configuration would allow reads at the slave copies but would not guarantee that the values are the latest writes since writes to the slave nodes can be done after they are applied to the master copy.


b. The master-master replication allows reads and writes at any of the replicas but may not guarantee that reads at nodes that store different copies see the same values.


Different users may write the same data item concurrently at different nodes of the system, so the values of the item will be temporarily inconsistent.

A reconciliation method to resolve conflicting write operations of the same data item at different nodes must be implemented as part of the master-master replication scheme.


3. Sharding of Files:


In many NoSQL applications, files (or collections of data objects) can have many millions of records (or documents or objects), and these records can be accessed concurrently by thousands of users. So, it is not practical to store the whole file in one node.


Sharding or horizontal partitioning of the file records is often employed in NoSQL systems. This serves to distribute the load of accessing the file records to multiple nodes.


The combination of sharing the file records and replicating the shards works in tandem to improve load balancing as well as data availability.


4. High-Performance Data Access:

High-performance data access: In many NoSQL applications, it is necessary to find individual records or objects (data items) from among the millions of data records or objects in a file.


The majority of accesses to an object will be by providing the key value rather than by using complex query conditions. The object key is similar to the concept of object id. To achieve this, most systems use one of two techniques: hashing or range partitioning on object keys:


a. Hashing, a hash function h(K) is applied to the key K, and the location of the object with key K is determined by the value of h(K).


b. Range partitioning, the location is determined via a range of key values; for example, the location I would hold the objects whose key values K are in the range Kimin ≤ K ≤ Kimax.


In applications that require range queries, where multiple objects within a range of key values are retrieved, range partitioning is preferred.


Other indexes can also be used to locate objects based on attribute conditions different from the key K.


5. Scalability: There are two kinds of scalability in distributed systems: scale-up and scale-out. Scale-up scalability, on the other hand, refers to expanding the storage and computing power of existing nodes.


In NoSQL systems, scale-out scalability is employed while the system is operational, so techniques for distributing the existing data among new nodes without interrupting system operation are necessary. 


In NoSQL systems, scale-out scalability is generally used, where the distributed system is expanded by adding more nodes for data storage and processing as the volume of data grows.


NoSQL Characteristics Related to Data Models and Query Languages

Data Models

1. Not Requiring a Schema:

The flexibility of not requiring a schema is achieved in many NoSQL systems by allowing semistructured, self-describing data. The users can specify a partial schema in some systems to improve storage efficiency, but it is not required to have a schema in most of the NoSQL systems.


As there may not be a schema to specify constraints, any constraints on the data would have to be programmed in the application programs that access the data items.


There are various languages for describing semistructured data, such as JavaScript Object Notation (JSON) and Extensible Markup Language (XML). JSON is used in several NoSQL systems, but other methods for describing semi-structured data can also be used.


2. Less Powerful Query Languages:

Many applications that use NoSQL systems may not require a powerful query language such as SQL, because search (read) queries in these systems often locate single objects in a single file based on their object keys.


NoSQL systems typically provide a set of functions and operations as a programming application programming interface (API), so reading and writing the data objects are accomplished by calling the appropriate operations by the programmer.


In many cases, the operations are called CRUD operations, to create, read, update, and delete. In other cases, they are known as SCRUD because of an added Search (or Find) operation.


Some NoSQL systems also provide a high-level query language, but it may not have the full power of SQL; only a subset of SQL querying capabilities would be provided.


In particular, many NoSQL systems do not provide join operations as part of the query language itself; the joins need to be implemented in the application programs.


Versioning: Some NoSQL systems provide storage of multiple versions of the data items, with the timestamps of when the data version was created.


Column Databases

Column Databases


Query languages for column family databases may look similar to SQL. The query language can support

SQL-like terms such as INSERT, UPDATE, DELETE, and SELECT

Column family-specific operations such as CREATE COLUMNFAMILY


Google BigTable

Google BigTable

BigTable, primarily used by Google for various important projects, is envisioned as a distributed storage system, designed to manage petabytes of data, distributed across several thousand commodity servers allowing for even further horizontal scaling.


At the hardware level, these thousand commodity servers are grouped under the distributed clusters, which, in turn, are connected through the central switch.


Each cluster is constituted by various racks and a rack can, in turn, consist of several commodity computing machines that communicate with each other through the rack switches.


In fact, these are the switches that are connected to the central switch to help in inter-rack or intercluster communications.


The BigTable efficiently deals with latency and data size issues. It is a compressed, proprietary high-performance storage system, underpinned by the Google File System (GFS) for log and data files storage, and Google SSTable (String Sorted Table) that stores the BigTable data internally.


SSTable offers a persistent, immutable ordered map with keys and values as byte strings, with the key being the primary source for searching. The SSTable is a collection of blocks of typical (though configurable) size of 64 kb.


Unlike a relational database, it is a sparse, multi-dimensional sorted map, indexed by a row key, a column key, and a timestamp. The map stores each value as an uninterrupted array of bytes.


In the BigTable data model, there is no support for Atomicity, Consistency, Isolation, and Durability (ACID) transactions across row keys. As the BigTable data model is not relational in nature, it does not support join operations and there is no SQL-type language support also.


This may pose challenges to those users who largely rely on SQL-like languages for data manipulation. In the BigTable data model, at the fundamental hardware level, interact communication is less efficient than in intra rack communication, that is communication within a rack through the rack switches.


In a write operation, a valid transaction is logged into the tablet log. The commit is the group committed to improving the throughput. On the commit moment of the transaction, the data are inserted into the memory table (memtable).


During the write operation, the size of the memtable continues to grow, and once it reaches the threshold, the current memtable freezes and a new memtable is generated. The former is converted into an SSTable and finally written into GFS.


In a read operation, the SSTables and the memtable form an efficient merged view and the valid read operation is performed on it.


HBase Data Model and Versioning

Data Model


Versions and timestamps: HBase can keep several versions of a data item, along with the timestamp associated with each version. The timestamp is a long integer number that represents the system time when the version was created, so newer versions have larger timestamp values.


HBase uses midnight “January 1, 1970, UTC” as timestamp value zero, and uses a long integer that measures the number of milliseconds since that time as the system timestamp value (this is similar to the value returned by the Java utility java.util.Date.getTime() and is also used in MongoDB).


It is also possible for the user to define the timestamp value explicitly in a date format rather than using the system-generated timestamp.


d. Cells: A cell holds a basic data item in HBase. The key (address) of a cell is specified by a combination of (table, row, column family, column qualifier, timestamp).


If the timestamp is left out, the latest version of the item is retrieved unless a default number of versions is specified, say the last three versions.


The default number of versions to be retrieved, as well as the default number of versions that the system needs to keep, are parameters that can be specified during table creation.


e. Namespaces: A namespace is a collection of tables. A namespace basically specifies a collection of one or more tables that are typically used together by user applications, and it corresponds to a database that contains a collection of tables in relational terminology.


[Note: You can free download the complete Office 365 and Office 2019 com setup Guide.]


HBase CRUD Operations

HBase has low-level CRUD (create, read, update, delete) operations, as in many of the NoSQL systems. The formats of some of the basic CRUD operations in HBase are illustrated below.


Creating a table: create <tablename>, <column family>, <column family>, … Inserting Data: put <tablename>, <rowid>, <column family>:<column qualifier>, <value>


Reading Data (all data in a table): scan <tablename> Retrieve Data (one item): get <tablename>,<rowid>

Reading Data


Riak is a key-value open source NoSQL data model that has been developed by Basho Technologies. The model is considered a fine implementation of Amazon’s Dynamo principles and distributes the data through nodes by employing the consistent hashing in an ordinary key-value system into buckets, simply the namespaces.


Riak consists of the supported client libraries for several programming languages such as Erlang, Java, PHP, Python, Ruby, and C/C++. that help in setting or retrieving the value of the key, the most prominent operations a user performs with Riak.


The model has received wide acceptance in companies such as AT&T, AOL, and Ask.com - What's Your Question?.


Similar to other efficient NoSQL data models, Riak also has the potential fault-tolerant availability and by default replicates the key-value store at three places across the nodes of the cluster.


In unfavorable circumstances such hardware failure, the node outage will not completely shut down the write operation, rather its master-less peer-to-peer architecture allows a neighboring node (beyond default three) to respond to the write operation at that moment, and the data can be read back later.


Riak Features

1.Consistency: In distributed key-value store implementations such as Riak, the eventually consistent model of consistency is implemented.


Since the value may have already been replicated to other nodes, Riak has two ways of resolving update conflicts: either the newest write wins and the older writes lose, or both (all) values are returned, allowing the client to resolve the conflict.


2.Transactions: Riak uses the concept of quorum implemented by using the W value— replication factor—during the write API call. Consider a Riak cluster with a replication factor of 5 and suppose we supply the W value of 3.


When writing, the write is reported as successful only when it is written and reported as a success on at least three of the nodes.


This allows Riak to have to write tolerance; in our example, with N equal to 5 and with a W value of 3, the cluster can tolerate N−W = 2 nodes being down for write operations, though we would still have lost some data on those nodes for reading.


3.Query: Key-value stores can query by the key.


4.Scaling: Many key-value stores scale by using sharding: the value of the key determines on which node the key is stored.


Document Databases



MongoDB Features

1.Consistency: Consistency in the MongoDB database is configured by using the replica sets and choosing to wait for the writes to be replicated to all the slaves or a given number of slaves. Every write can specify the number of servers the write has to be propagated to before it returns as successful.


2.Transactions: In traditional RDBMS, transactions mean modifying the database with insert, update, or delete commands over different tables and then deciding to keep the changes or not by using commit or rollback.


These constructs are generally not available in NoSQL solutions—a write either succeeds or fails. Transactions at the single-document level are known as “atomic” transactions.


By default, all writes are reported as successful. Finer control over the write can be achieved by using the WriteConcern parameter. We ensure that order is written to more than one node before it is reported successful by using WriteConcern. REPLICAS_SAFE.


Different levels of WriteConcern let you choose the safety level during writes; for example, when writing log entries, you can use the lowest level of safety, WriteConcern.NONE.


3. Availability: MongoDB implements replication, providing high availability using replica sets. In a replica set, there are two or more nodes participating in asynchronous master-slave replication.


The replica-set nodes elect the master, or primary, among themselves. Assuming all the nodes have equal voting rights, some nodes can be favored for being closer to the other servers, for having more RAM, and so on; users can affect this by assigning a priority—a number between 0 and 1,000—to a node.


All requests go to the master node, and the data are replicated to the slave nodes. If the master node goes down, the remaining nodes in the replica set vote among themselves to elect a new master; all future requests are routed to the new master, and the slave nodes start getting data from the new master.


When the node that had failed comes back online, it joins in as a slave and catches up with the rest of the nodes by pulling all the data it needs to get current.


4. Query: One of the good features of document databases, as compared to key-value stores, is that we can query the data inside the document without having to retrieve the whole document by its key and then introspect the document. This feature brings these databases closer to the RDBMS query model.


MongoDB has a query language that is expressed via JSON and has constructed such as $query for the where clause, $orderby for sorting the data, or $explain to show the execution plan of the query. There are many more constructs such as these that can be combined to create a MongoDB query.



Scaling implies adding nodes or changing data storage without simply migrating the database to a bigger box. Scaling for heavy- read loads can be achieved by adding more read slaves so that all the reads can be directed to the slaves.


Given a heavy-read application, with our three-node replica-set cluster, we can add more read capacity to the cluster as the read load increases just by adding more slave nodes to the replica set to execute reads with the slave Ok flag.


When a new node is added, it will sync up with the existing nodes, join the replica set as a secondary node, and start serving read requests. An advantage of this setup is that we do not have to restart any other nodes, and there is no downtime for the application either.


MongoDB Data Model

The operation createCollection is used to create each collection.


1. Can be specified by the user: User-generated ObjectsIds can have any value specified by the user as long as it uniquely identifies the document and so these Ids are similar to primary keys in relational systems.


2.Can be system generated if the user does not specify an _id field for a particular document. System-generated ObjectIds have a specific format, which combines the timestamp when the object is created (4 bytes, in an internal MongoDB format), the node id (3 bytes), the process id (2 bytes), and a counter (3 bytes) into a 16-byte Id value.


A collection does not have a schema. The structure of the data fields in documents is chosen based on how documents will be accessed and used, and the user can choose a normalized design (similar to normalized relational tuples) or a denormalized design (similar to XML documents or complex objects).


Interdocument references can be specified by storing in one document the ObjectId or ObjectIds of other related documents.

The workers’ information is embedded in the project document; so there is no need for the “worker” collection:


_id: “P1”,

Pname: “ProductL”, Plocation: “Pune”, Workers: [

{ Ename: “Amitabh Bacchan”, Hours: 32.5


{ Ename: “Priyanka Chopra”, Hours: 20.0




This is known as the denormalized pattern, which is similar to creating a complex object or an XML document.

Another option is where worker references are embedded in the project document, but the worker documents themselves are stored in a separate “worker” collection:


_id: “P1”,

Pname: “ProductL”, Plocation: “Pune”, WorkerIds: [“W1”, “W2”]


{ _id: “W1”,

Ename: “Amitabh Bacchan”, Hours: 32.5


{ _id: “W2”,

Ename: “Priyanka Chopra”, Hours: 20.0


A third option is to use a normalized design, similar to First Normal Form (1NF) relations:


_id: “P1”,

Pname: “ProductL”, Plocation: “Pune”


{ _id: “W1”,

Ename: “Amitabh Bacchan”, ProjectId: “P1”,

Hours: 32.5


{ _id: “W2”,

Ename: “PriyankaBacchan”, ProjectId: “P1”,

Hours: 20.0



The choice of which design option to use depends on how the data will be accessed.


MongoDB CRUD Operations

MongoDB has several CRUD operations, where CRUD stands for creating, read, update, delete. Documents can be created and inserted into their collections using the insert operation, whose format is



The parameters of the insert operation can include either a single document or an array of documents, as shown below:

 style="margin: 0px; width: 955px; height: 157px;">db.project.insert( { _id: “P1”, Pname: “ProductL”, Plocation: “Pune” } ) db.worker.insert( [ { _id: “W1”, Ename: “Amitabh Bacchan”, ProjectId: “P1”, Hours: 32.5 },

{ _id: “W2”, Ename: “Priyanka Chopra”, ProjectId: “P1”, Hours: 20.0 } ] )

The delete operation is called remove, and the format is



The documents to be removed from the collection are specified by a Boolean condition on some of the fields in the collection documents.


There is also an update operation, which has a condition to select certain documents, and a $set clause to specify the update. It is also possible to use the update operation to replace an existing document with another one but keep the same ObjectId.


For read queries, the main command is called find, and the format is db.<collection_ name>.find(<condition>)


General Boolean conditions can be specified as <condition>, and the documents in the collection that return true are selected for the query result.


1. Hash partitioning: Hash partitioning applies a hash function h(K) to each shard key K, and the partitioning of keys into chunks is based on the hash values. If most searches retrieve one document at a time, hash partitioning may be preferable because it randomizes the distribution of shard key values into chunks.


2. Range partitioning: In general, if range queries are commonly applied to a collection (for example, retrieving all documents whose shard key value is between 200 and 400), then range partitioning is preferred because each range query will typically be submitted to a single node that contains all the required documents in one shard.


When sharding is used, MongoDB queries are submitted to a module called the query router, which keeps track of which nodes contain which shards based on the particular partitioning method used on the shard keys. The query (CRUD operation) will be routed to the nodes that contain the shards that hold the documents that the query is requesting.


If the system cannot determine which shards hold the required documents, the query will be submitted to all the nodes that hold shares of the collection.


Sharding and replication are used together; sharding focuses on improving performance via load balancing and horizontal scalability, whereas replication focuses on ensuring system availability when certain nodes fail in the distributed system.




OrientDB is an open source graph-document NoSQL data model. It largely extends the graph data model but combines the features of both document and graph data models up to a certain extent.


At the data level for the schema-less content, it is document based in nature, whereas to traverse the relationship, it is graph oriented, and therefore, fully supports schema-less, schema-full, or schema-mixed data.


The database is completely distributed in nature and can be spanned across several servers. It supports the state of the art multi-master replication distributed system.


It is a fully ACID-compliant data model and also offers role-based security profile to the users. This database engine is light weighted, written in Java, and hence, is portable in nature, platform independent, and can run on Windows, Linux, etc.


One of the salient features of OrientDB is its fast indexing system for lookups and insertion, and that is based on MVRB-Tree algorithm, originated from Red-Black Tree and B+ Tree.


OrientDB relies on SQL for basic operations and uses some graph operator extensions to avoid SQL joins in order to deal with relationships in data; graph traversal language is used as the query processing language and can loosely be termed as OrientDB’s SQL.

OrientDB has out-of-the-box supports for web such as HTTP, RESTful, and JSON without any external intermediaries.


Neo4j Features

Neo4j Features


5. Scaling: With graph databases, sharding is difficult, as graph databases are not aggregate oriented but relationship oriented. Since any given node can be related to any other node, storing-related nodes on the same server are better for graph traversal.


Since traversing a graph when the nodes are on different machines is not good for performance, graph database scaling can be achieved by using some common techniques:


 We can add enough RAM to the server so that the working set of nodes and relationships is held entirely in memory. This technique is helpful only if the dataset that we are working with fit in a realistic amount of RAM.


We can improve the read scaling of the database by adding more slaves with read-only access to the data, with all the writes going to the master.


This pattern of writing once and reading from many servers is useful when the dataset is large enough to not fit in a single machine’s RAM but small enough to be replicated across multiple machines.


Slaves can also contribute to availability and read scaling, as they can be configured to never become a master, remaining always read-only. When the dataset size makes replication impractical, we can share the data from the application side using domain-specific knowledge.


NoSQL databases for big data

databases for big data


Cloud storage

Cloud storage

Like so many modern computing terms the Cloud sounds friendly, comforting, inviting, and familiar, but actually ‘the Cloud’ is, as mentioned earlier, just a way of referring to a network of interconnected servers housed in data centers across the world. These data centers provide a hub for storing big data.


Through the Internet we share the use of these remote servers, provided (on payment of a fee) by various companies, to store and manage our files, to run apps, and so on. As long as your computer or other device has the requisite software to access the Cloud, you can view your files from anywhere and give permission for others to do so.


You can also use software that ‘resides’ in the Cloud rather than on your computer. So it’s not just a matter of accessing the Internet but also of having the means to store and process information—hence the term ‘Cloud computing’.


Our individual Cloud storage needs are not that big, but scaled up the amount of information stored is massive.


Amazon is the biggest provider of Cloud services but the amount of data managed by them is a commercial secret. We can get some idea of their importance in Cloud computing by looking at an incident that occurred in February 2017 when Amazon Web Services’ Cloud storage system, S3, suffered a major outage (i.e. service was lost).


This lasted for approximately five hours and resulted in the loss of connection to many websites and services, including Netflix, Expedia, and the US Securities and Exchange Commission.


Amazon later reported the human error as the cause, stating that one of their employees had been responsible for inadvertently taking servers offline. Rebooting these large systems took longer than expected but was eventually completed successfully.