Over the holidays, we will repost the top ten most popular blog posts of the year. This is one: NoSQL databases have been under the spotlight for some time now. It’s because in certain use cases they are really better suited. At the beginning of a new project it’s now important to consider all the options for the data storage, and not only relational databases. In some cases the relational option is still the right one, but very often other alternatives can be very efficient, especially for performance, scalability and schema flexibility. So, NoSQL is not just a buzzword. Let’s demystify a few of the NoSQL concepts… What is NoSQL ? First of all “NoSQL” does not mean “No SQL at all” but “Not Only SQL”. In other words, NoSQL is not here to replace existing SQL databases, it just brings another approach for data storage that can be used when a classical “Relational Database” has reached its limits. With a so vague definition we could consider that any storage system without SQL language is a NoSQL database… But usually the “NoSQL” term is used only for the following categories of databases… 1) “key-value” data store This type of database works as a big cache stored on a remote server. Each value is associated with a key. The value is “opaque” (no specific structure, not indexed). The storage is usually entirely “in-memory” (with sometimes an optional storage on files). The main “key-value” solutions are Redis (http://redis.io/ ), Memcached (http://memcached.org/) and Riak (http://basho.com/riak/ ) 2) “Column-oriented” databases In this kind of database, the data is also associated with a “key” but it is organized by “columns” and the columns can be grouped by “family”. With the column notion, this type of database is not so far of the relational databases but there’s no fixed column definition (no schema) hence the storage is more flexible. 3) “Document-oriented” databases In this type each key is associated with a “document” (usually formatted in JSON or XML). Obviously it’s possible to store anything in each “document” so this is the best solution to combine structured data (JSON or XML) with data store flexibility (no schema). Examples of “document oriented” databases : MongoDB (http://www.mongodb.org/ ), CouchBase (http://www.couchbase.com/ ), CouchDB (http://couchdb.apache.org/ ). 4) “Graph” databases This type is especially designed to store graphs (nodes and all the links between nodes) The best known graph database is Neo4J ( http://www.neo4j.org/ ) Why use a NoSQL database ? NoSQL is very often associated with “Big Data” because it provides natural scalability, improves the performances and can store a huge amount of data. But it’s not the only use case… When to use NoSQL databases ?
- When the volume of data is big enough to require more than 1 physical server.
- When availability is more important than consistency (see the “CAP theorem” below)
- When the data to be stored must be very flexible (no fixed structure)
- Consistency (a read sees all previously completed writes)
- Availability (reads and writes always succeed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
- Classical relational databases : Availability + Consistency
- NoSQL databases : Availability + Partition tolerance OR Consistency + Partition tolerance
- Very often an application uses different kinds of data
- It integrates information from different sources
- A unique technology (like a relational database) is not sufficient for all the users expectations
- User sessions stored in a “key-value” database (like Redis )
- Employees relationships stored in a “graph” database (like Neo4J)
- The product catalog stored in a “document” database for flexibility (like MongoDB)
- The analytics data stored in a “key-columns” database (like Cassandra)
- Financial data stored in a “classical relational” database (like Oracle)
Database name | Type | API protocole | Written in |
Redis | Key-Value | Native/socket | C |
Memcached | Key-Value | Native/socket | C |
Riak | Key-Value | Prot.Buf, REST | Erlang/C/C++ |
Cassandra | Column oriented | Thrift (or CQL) | Java |
HBase | Column oriented | Prot.Buf, REST, Thrift | Java |
BigTable | Column oriented | ||
MongoDB | Document/JSON | Native/socket | C++ |
CouchDB | Document/JSON | Erlang | |
CouchBase | Document/JSON | Memcached Prot. | Erlang/C/C++ |
Neo4J | Graph | Java | |