Relational vs Non Relational Databases

12 November 2017

Relational vs Non Relational Databases

Filed under: Data Architecture — Vincent Rainardi @ 6:35 am

Most of us database professionals are familiar with relational databases like Oracle and SQL Server. This is understandable because most companies are using relational databases. But we also have graph database, object databases, document databases, key-value databases, columnar databases. All of these are known as non relational databases. They are also called NoSQL databases.

So let’s find out what a relational database is, and what a non relational database is. A data architect should understand non relational databases, even though they are not as popular as relational databases.

A. Relational Databases

A relation means a table, which consists of columns (aka attributes) and rows (aka tuples). More formally, a relation is a header with a body. A header is a set of attribute names with the constraints, and the body is a set of tuples from a domain.

A table is called a relation because it is a relation between the header and the body, between the attribute and the tuples. For example: Employee relation consists of a header with 2 attributes: Name and Date of Birth. The Name attribute can contain any value up to 10 characters, as long as it is alphanumeric. The Date of Birth attribute can contain any valid date. So “Andy”, “Bob” and “Carol” relate to the Name attribute. “25 Dec 2000” and “1 Jan 1995” relate to the Date of Birth attribute. That is why a table is called a relation, and the whole thing is called Relational Model, by Edgar F Codd in 1969.

A relational database is a database where the data is stored as values in tables. Data inside a relational database is retrieved using a query called SQL, Structured Query Language. SQL is also used to add and update the data.

A1. Relvar, attribute value, database schema

Actually, a relation does not mean a table. A relation means a table definition (does not contain data). A relation variable means the table instance which contains data. A relation variable is called a relvar.

An attribute means a column. An attribute value means the value of an attribute for a particular tuple (row).

A database schema is a set of relations (tables), the header (columns) associated with those relations, and their predicates (constraints).

A2. Constraints

A constraint can be of one several forms:

Domain constraint is a restriction on the type of values that an attribute can have, e.g. numbers, dates, or alphanumeric. Today a domain constraint is known as a data type.
Data constraint is a restriction on the possible values that an attribute can have, e.g. only 1 to 10. Today a data constraint is known as a check constraint.
Key constraint is a way to identify a tuple using the values of certain attributes.

There several types of key constraint:

A superkey is a set of attributes which identifies a tuple (row), which is chosen to be implemented. There are no two tuples which have the same superkey. A superkey is now known as a primary key.
A foreign key is an attribute which identifies a tuple in another relation.
A candidate key is an attribute (or a set of attributes) which identifies a tuple. It can be chosen as a superkey. If not chosen it becomes an alternate key.
An alternate key is a candidate key which is not chosen as a superkey.

A3. Query

A query means getting data from a set of relations in the database. The output of a query is called an answer, or a result set.

A query can do several operations:

Restrict means limiting the rows returned on the result set.
Project means limiting the columns returned on the result set.
Product means combining all tuples in a relation to all tuples in another relation (Cartesian explosion)
Join means doing a product, and then filter the result to only the tuples we want.
Union means combining a table with similar structures.
Difference means listing the tuples in a relation which are not found in another relation.
Intersect means listing the tuples found in in both relations.

We also have other operations: outer join, outer union, aggregation (e.g. sum), group and ungroup.

A4. What A Relational Database Is

So a Relational Database is a database in the form of tables (relation), which consists of columns (attribute) and rows (tuples). The database is created, populated and queried using SQL language. Each column (attribute) has a domain (data type). Each table has an identifier (superkey or primary key) and can have foreign keys. In SQL we have join, product, union, project, restrict, difference and intersect.

Relation means table. The word “Relational” here means the relation between the Table Header (the data structure) and the Table Body (the data content). Not between one table and another table. It is based on Relational Model by Edgar F Codd in 1969.

SQL Server, Oracle, MySQL, DB2, Sybase, Teradata, Informix are examples of Relational Databases.

In a relational database, the data domain (data type) must be known first, before we can store the data. And secondly, the data structure must be designed first, before we can store the data in the database.

B. Non Relational Databases (aka NoSQL Databases)

Non Relational Databases are databases which are not in the form of tables, where the data attributes are put as table header, and the data is put in the table body. The attributes can be stored in the table body. Or they can be stored as a node or as a file. Or the data might not have attributes. The data might not be numbers, text and dates, but images, videos and documents. The data could be DNA sequences, social network, or chemical molecules. Storing them as text and numbers might require a lot of space, or might not be suitable to how the data is used/queried.

Non Relational Database is known as NoSQL, stands for Not Only SQL (No doesn’t mean no here, but it means not only).

There are various different types of Non Relational Database. The 5 most famous ones are #1 to #5 below: (#6 no longer exists and #7 is a combination of #1 to #5)

Key-Value Pair Database
Document Database
Graph Database
Object Database
Columnar Database
Hierarchical Database
Multi Model Database

B1. Key-Value Database

The most widely used Non Relational Database is Key-Value database, aka Key-Value store. It consists of files consisting of 2 fields: Key field and Value field. It is very simple. Because there is no data type, we can store anything in there. The benefit of using Key-Value database is we do not need to define the data structure (the schema). The other benefit is that it can be distributed across many nodes (file servers), so the performance is highly scalable. But the winning benefit is the money: it can store a large amount of data at much lower cost than relational database.

A Key-Value database is ideal for storing large amount of unstructured data. In some implementation of the Key-Value Database we can store not only strings but also lists of string. In other implementation, the Key-Value Database is in-memory, boosting the query performance.

The main disadvantage of Key-Value Databases is that the data consistency is not guaranteed. After 1 second it is consistent, but before 1 second it is not.

Examples of Key-Value Databases are Redis, Aerospike, Riak KV, Berkeley DB, LevelDB and Apache Ignite. Amazon DynamoDB, Azure CosmosDB and Oracle NoSQL are not only a Key-Value Database, they are also a Document Database, which is why they are classified as a Multi Model Database (see point B7 below).

Hadoop is essentially Key-Value store, it stores Key-Value data. MapReduce operates on Key-Value data stored in Hadoop. But Hadoop, unlike Redis or Aerospike, is not a Key-Value database. Hadoop is not a database system, it is merely a file system (HDFS).

B2. Document Database

Document Database stores JSON and XML documents. It doesn’t stores binary files such as videos, PDFs and images. The word “document” in Document Database means JSON and XML, not a file we normally work in the office. Microsoft Office files such as xlsx, docx, pptx (2007 and above) are XML files, but they are zipped so they becomes binary. So Excel and Word files are not what we store in Document Database. This is why some people prefer to call it “Document-oriented Database”.

Why is it important to store JSON and XML files? Because they contain data, such as loan, claim or patient data. It’s just that the format is JSON or XML. Data is widely stored as text files, i.e. CSV or tab delimited. But the limitation is that it is not validated. If a row in a CSV file is missing a field, it causes problems when we try to import it using an ETL tool. So we prefer XML files. Using a Schema Definition (XSD), the maker of the file can automatically validate the file, e.g. the data type for each field is, what values are permitted, etc. So the consumer of the file is guaranteed that the file is “good”.

That was in 2012, when XML is the de-facto standard. In 2017 JSON practically replaces XML, and is now the de-facto standard. People prefer putting data into JSON files than into XML files because JSON files are smaller and the processing is faster and easier. And JSON files are easier to read than XML (link, link). You can’t win against these 3 arguments.

The power of putting data in JSON documents is the hierarchical navigation. If we put a product hierarchy into JSON, we can query all nodes under product A up to 3 levels deep, or all the siblings of product A under the same grandparent. Hierarchical query like this is the main advantage of using a Document Database, compared to a Relational Database.

In a Document Database, the data structure is contained within the document. Whether we use XML files or JSON files, the data structure is inside the data file. So we don’t need to define the data structure in advance, like in a Relational Database.

A Document Database enables us to update a JSON document (or XML), adding data into it or deleting data from it. If we want to add a new field, we just add it into the JSON document. There is schema or data structure that we need to modify like in a Relational Database.

Examples of Document Database are: MongoDB, RavenDB, Apache CouchDB, RethinkDB, RaptorDB, JasDB, IBM Cloudant.

B3. Graph Database

In Graph Database data is stored as nodes connected to other nodes using lines. For example, a Graph Database containing social network data, a node can be: a person, a company, a city or a place (e.g. a restaurant, a shop, a school). The line shows the relationship between 2 nodes. A line has a time property, which shows when the relationship occurred.

A line between 2 persons can be: “is a friend of” or “is a colleague of”. A line between a person and a company can be “is an employee of” or “is a customer of”. A line between a place and a person can be: “had dinner in” or “bought something in”. The time property of a line can be “on 3^rd March 2016”, “from 4^th April 2016 to 5^th May 2017” or “every Monday”.

A property of a “person” node can be name, date of birth, address. A property of a “place” node can be name, type of place and rating.

In the case of a hospital, the node can be patient, healthcare professional, medication, ward and illness. In the case of an electricity company, the node can be customer, product, tariff, engineer and bill.

In the case of social media, examples of the queries on a graph database are:

Who likes “Amari Noodle Bar” in the last 3 months?
The friends of the people who likes “Amari Noodle Bar” (up to 3 hops) – where do they live (just street / post code, not the house number)
What are the education background of people who have ever worked for both IBM and Microsoft?

As you can see from these queries, a Graph Database is a very powerful tool to have, if it is populated with good data. It is very different to a relational database.

Examples of Graph Database are Neo4J, GraphBase, Gephi, and InfiniteGraph.

B4. Object Database

Object Database stores data as objects. By objects I mean as in classes in Java or C#.NET. For example: a customer is an object, with name, date of birth and address as the attributes. The Java code defines this customer object, populates for 10 customers then stores these 10 customers in the Object Database (each customer is called an “instance”, and the customer itself is a “class”). The Java code can then update the address of customer #5, delete customer #3 and insert customer #11, all in 1 database call. It can also copy customer #6 into a new customer with all attributes having the same values except the name.

To query an Object Database, we use JPA (Java Persistence API). JPA is the object oriented version of SQL. Like SQL, JPA uses SELECT FROM WHERE GROUP BY.

For example: SELECT C.Name, C.Address.City FROM Customer as C WHERE C.DateOfBirth.Month = “December”

About the “C.Address.City” above: because it is an Object Database, Address is an object which has attributes, such as Street, City and Post Code. JPA also have JOIN, ORDER BY and SUB QUERY. JPA has UPDATE and DELETE (no INSERT) but this is not the recommended way to update objects.

Object Databases are not popular because object oriented languages such as Java and .NET stores their object data in Relational Databases. The OO language has an interface to save objects as tables in Relational Databases. This interface is called Object-Relational Mapping (link). For example, .NET has TableAdapter (link), which enables us to do this:

CustomerTableAdapter.Insert(Customer.Name, Customer.DateOfBirth, Customer.Address)

And to update or delete, just replace the word Insert above with Update and Delete. To query we use the standard SQL language, because the data is stored in the Customer table in Relational Database. This makes the query a lot easier, especially when aggregating data.

Examples of Object Databases are: ObjectDB, GemStone, Ignite ObjectStore, Ekky ObjectDatabase++.

B5. Columnar Database

Columnar Databases store the data in tables. But instead of storing it as rows, the data is stored as columns. For example, if we have a table 1 million orders, with 3 fields: date, product and quantity. The order table in the Columnar Database looks like this: (red numbers are the ID of the order)

Date: 1/11/2017: 1,2,3,4,5. 2/11/2017: 6,7,8,9. 3/11/2017: …
Product: Product1: 1,3,4,6,7,15. Product2: 2,5,8,9,10. Product3: …
Quantity: 1: 1,2,3,5,6,9,10. 2: 4,7,8,11,12,13. 3: …

Because orders with the same date are stored together, orders with the same products are stored together, and orders with the same quantity are stored together, then when we query how many orders are for Product2 and date = 2/11/2017 and quantity = 1, it will be very fast to return the result.

Examples of Columnar Databases are: Greenplum, MariaDB AX, MonetDB, InfiniDB, Apache Kudu.

B6. Hierarchical Database

In a Hierarchical Data Model the data is organised like a tree. Today there is no Hierarchical Database, not as a stand-alone commercial product. Today Hierarchical Data Models are now mainly implemented in Document Databases, such as MongoDB (see B2 above). In a Document Database we can do hierarchical navigation on the data, i.e. query the data hierarchically. For example: in a product hierarchy we can query all nodes under product A up to 3 levels deep, or all the siblings of product A under the same grandparent.

B7. Multi Model Database

Multi Model Databases can store data as Key-Value, Document, Graph, Object or Columnar. It does not have to be all five formats; two formats are enough to be called as Multi Model. Most Multi Model NoSQL Databases can store data as Key-Value and Document. AWS is arguably the most complete, it can store data as Key-Value, Document, Object and Columnar.

Oracle NoSQL, Amazon DynamoDB, Azure Cosmos DB, MUMPS, OrientDB

Reference:

Paraha Sarathi, A Deep Dive into NoSQL: A complete list of NoSQL databases, link
Josh Wyse, Why JSON is better than XML, link
W3Schools, JSON vs XML, link
Couchbase, Comparing document-oriented and relational data, link
org, Definition of Object DBMS: link
Wikipedia, NoSQL databases, link
Wikipedia, Column-oriented DMBS, link

Comments (1)

1 Comment »

That’s good to know that relational databases will have things like headers and columns for you to enter the data. I have been considering getting a database set up, and I could see how those would make it easier to organize the data. If I decide to have some help me set up a database, I’ll have to ask for a relational one.

Comment by Tyler Johnson — 7 July 2020 @ 2:06 pm | Reply

RSS feed for comments on this post. TrackBack URI

Data Platform and Data Science

12 November 2017