August 24, 2015

Modern Databases of the 21st Century

BY :     August 24, 2015

databasesOne size fits all or one size fits none? The relational database model was invented by Ted Codd in the early 1970’s. There were a few early prototypes built, but the model became mainstream when IBM built and marketed DB2 in 1984. By the early 1990’s, relational database systems began to be considered the standard and the only way to solve all database related problems. It didn’t matter what kind of workload one needed to support – relational databases were considered the panacea. In other words, a “one size fits all” concept evolved and stayed until mid to late 2000’s.

It was at this time that researchers and technology leaders started to question the concept. They realized that for every kind of problem, there is a better ways of solving it than the traditional relational database system. Said differently, they realized that “one size fits none”!

Modern Databases for Online Analytical Processing (OLAP) workloads:

Columnar Databases: Datawarehouse models, in a majority of cases, are represented as star schemas (or a variation thereof) with Fact tables at the Center connected to the Dimension tables via primary-key, foreign-key relationships. Analytical queries are written against the Fact tables and typically query and aggregate using only a small fraction of columns available in the tables. On the other hand, traditional relational databases are row-oriented, causing a dataset much larger than required, to get loaded into the memory from the disk. This is because the entire record needs to be loaded into the memory even when only a few columns of the record are referred to in the client query.

The modern take on improving the design is to shift the paradigm by 90 degrees and store data column-by-column rather than row-by-row on the disk. The approach offers multiple advantages over the traditional approach. First of all, it improves performance by allowing the system to load only the data actually required to serve the client query from the disk into the memory. Secondly, the data in a column can be encoded and compressed to a much higher degree than that in a row. This is because column data is of the same kind, whereas row data is not. Data compression leads to reduction in storage required, and reduces the amount of data that needs to be moved from disk to memory and vice versa.  Thirdly, the query executor performs much better in a column-oriented database when the decision to select a value is dependent on only a few columns. Hence, in a majority of cases a columnar database suits an OLAP workload much better. HP’s Vertica, SAP’s HANA and Amazon’s Redshift are a few examples of such commercially available databases.

Modern Database Systems for Online Transaction Processing (OLTP) workloads:

  1. Main Memory Databases: Moore’s law, over the years, has allowed the amount of main memory available in database systems to increase exponentially. On the other hand, data needs of OLTP systems (leaving aside Facebook like systems) typically do not grow exponentially. The entire catalog of Amazon, for example, is of the order of 1 Terabyte and does not grow exponentially as the growth is usually linked with the growth of the company’s business. Surprisingly enough, 1 Terabyte of RAM costs $30,000 or less and is definitely not a big deal for a company like Amazon. It also is a fact that, on an average, around 30% of time spent on executing queries is wasted on managing the Buffer Cache of the database system. Therefore, many modern OLTP database systems attempt to load the entire database in the main memory and get rid of the concept of Buffer Cache and complexities associated with it such as encoding/decoding of data, and sending data back and forth between main memory and the disk.
  2. Single Threaded Databases: Row level locking is a necessary evil in traditional database systems. It is necessary because these systems are multi-threaded and allow multiple threads to attempt to update the same data at the same time. It is evil because managing locks and associated latches is time consuming. Locking and latching, on an average, take about 30% of time available to a database. This time should ideally be spent on running database queries. A primary reason that traditional databases are multi-threaded is that they don’t want to keep a transaction waiting while the one ahead of it is spending I/O cycles loading data into memory from the disk. Modern main memory-oriented databases, on the other hand, can afford to be single threaded, as they don’t need to send data back and forth between memory and the disk. These modern single threaded databases are therefore able to spend more time on running user queries. Volt DB is one such memory oriented, single threaded commercially available system.
  3. NoSQL Databases: Another category of modern databases is NoSQL databases. NoSQL databases can prove to be a good fit for certain types of OLTP workloads. They provide the ability to scale horizontally, are fault tolerant, provide a flexible schema, and can accommodate data other than relational such as JSON documents. They also support a variety of data models ranging from the most simple key-value pairs to the more sophisticated models such as document-oriented, column family and graph databases. However, one needs to tread carefully before going for a NoSQL database as they sacrifice certain essential database characteristics such as the ability to “join” and the ability to wrap multiple DML operations in a transaction. Cassandra, MongoDB and CouchDB are some of the popular NoSQL databases out there.
Ashish Kashyap

About

Ashish Kashyap is a Senior IT Leader with over 15 years of software consulting experience in a variety of industries such as pharmaceutical, retail, airline, automobile, marketing, and utility. He is an experienced solutions architect and has designed & developed B2B and B2C kind of applications on a variety of platforms including J2EE, COTS, and SAAS.

More on Ashish Kashyap.

Related Posts

Your email address will not be published. Required fields are marked *

7 + 5 =


  1. Erik Haahr · August 24, 2015 Reply

    I totally agree that a lot of new database technology is entering the market – I’d say Hadoop is the most well known example of the new entrants though not the first.

    But I see far too many discarding the relevance of the RDBMS in favor of one or more of the new technologies.

    But some RDBMS are a lot better at handling various workloads than the rumor claim. An often overlooked example, that handles most of the requirements perhaps except storing “unstructured” data, is Teradata RDBMS with its massively parallel, shared nothing architecture allows for a mix of row-based and columnar storage. It also uses data temperature to automatically place data where it makes most sense – Hot data in memory, warm data on SSD-drives, tempered data on the fastest (sectors of) disks and cold data on the slowest (sectors of) disks. Due to the architecture you get linear scalability up to 61 PB of uncompressed user data.

  2. Jaap Bloem · August 24, 2015 Reply

    Nice & thorough, thanks!

  3. Sridhar Yenamandra · September 10, 2015 Reply

    Thanks Ashish for an excellent quick summary of databases. Summarizing your article leaves below points:

    1) Star Schema is used in OLAP systems, however an improvement can be made using column-column approach rather than traditional row-row approach. HP’s Vertica, SAP’s HANA and Amazon’s Redshift are examples.
    2) Rather than having a database that does hard disk storage-memory writes and reads , thus wasting lot of time in that , one can opt for main memory and single threaded databases like Volt DB.
    3) No SQL databases provides all together a different dimension of scalability, felxibility and supports various kinds of data models such as Cassandra, MongoDB and CouchDB.

    Questions:
    1. Can columnar databases be used in OLTP kind of systems ? If not ( not better suited) why ? If yes how ?
    2. Main memory loading and single threading looks more like an charesteristic of a system ( or system architecture and RAM availability) than database. Can’t this be applied to traditional relational databases or any database ?

    Would appreciate if you write another article detailing more on No SQL databases

    Thanks again !

  4. Sridhar Yenamandra · September 10, 2015 Reply

    Thanks Ashish for an excellent quick summary of databases. Summarizing your article leaves below points:
    1) Star Schema is used in OLAP systems, however an improvement can be made using column-column approach rather than traditional row-row approach. HP’s Vertica, SAP’s HANA and Amazon’s Redshift are examples.
    2) Rather than having a database that does hard disk storage-memory writes and reads , thus wasting lot of time in that , one can opt for main memory and single threaded databases like Volt DB.
    3) No SQL databases provides all together a different dimension of scalability, felxibility and supports various kinds of data models such as Cassandra, MongoDB and CouchDB.

    Questions:
    1. Can columnar databases be used in OLTP kind of systems ? If not ( not better suited) why ? If yes how ?
    2. Main memory loading and single threading looks more like an charesteristic of a system ( or system architecture and RAM availability) than database. Can’t this be applied to traditional relational databases or any database ?
    Would appreciate if you write another article detailing more on No SQL databases
    Thanks again !

*Opinions expressed on this blog reflect the writer’s views and not the position of the Sogeti Group