NoSQL databases have been under the spotlight for some time now. It’s because in certain use cases they are really better suited. At the beginning of a new project it’s now important to consider all the options for the data storage, and not only relational databases. In some cases the relational option is still the right one, but very often other alternatives can be very efficient, especially for performance, scalability and schema flexibility. So, NoSQL is not just a buzzword. Let’s demystify a few of the NoSQL concepts… What is NoSQL ? First of all “NoSQL” does not mean “No SQL at all” but “Not Only SQL”. In other words, NoSQL is not here to replace existing SQL databases, it just brings another approach for data storage that can be used when a classical “Relational Database” has reached its limits. With a so vague definition we could consider that any storage system without SQL language is a NoSQL database… But usually the “NoSQL” term is used only for the following categories of databases… 1) “key-value” data store This type of database works as a big cache stored on a remote server. Each value is associated with a key. The value is “opaque” (no specific structure, not indexed). The storage is usually entirely “in-memory” (with sometimes an optional storage on files). The main “key-value” solutions are Redis (http://redis.io/ ), Memcached (http://memcached.org/) and Riak (http://basho.com/riak/ ) 2) “Column-oriented” databases In this kind of database, the data is also associated with a “key” but it is organized by “columns” and the columns can be grouped by “family”. With the column notion, this type of database is not so far of the relational databases but there’s no fixed column definition (no schema) hence the storage is more flexible. 3) “Document-oriented” databases In this type each key is associated with a “document” (usually formatted in JSON or XML). Obviously it’s possible to store anything in each “document” so this is the best solution to combine structured data (JSON or XML) with data store flexibility (no schema). Examples of “document oriented” databases : MongoDB (http://www.mongodb.org/ ), CouchBase (http://www.couchbase.com/ ), CouchDB (http://couchdb.apache.org/ ). 4) “Graph” databases This type is especially designed to store graphs (nodes and all the links between nodes) The best known graph database is Neo4J ( http://www.neo4j.org/ ) Why use a NoSQL database ? NoSQL is very often associated with “Big Data” because it provides natural scalability, improves the performances and can store a huge amount of data. But it’s not the only use case… When to use NoSQL databases ? – When the volume of data is big enough to require more than 1 physical server. – When availability is more important than consistency (see the “CAP theorem” below) – When the data to be stored must be very flexible (no fixed structure) Cloud computing integration NoSQL databases are very often available in the Cloud. For example, Google uses “Big Table” for data storage in “Google App Engine” (http://cloud.google.com ). It’s also very easy to host a database instance on Amazon or Azure servers with a provider like Garantia Data ( “Redis” and “Memcached” hosting on http://garantiadata.com/ ). They are also present in many “Plateform as a Service (PaaS)” offers (CloudBees for example http://www.cloudbees.com ) Schema flexibility Most of the NoSQL databases are “schema less”. It means that you can store unstructured data. And hence the content can evolve without any administration operation on the database (no “create table”, “alter table”, etc.. ). This flexibility is very useful for time to market constraints. It’s probably the most important advantage. It can bring a better reactivity and adaptability for any kind of application independently of the amount of data to be stored. Scalability A NoSQL databases provides “horizontal scalability” (scalability by adding machines into a pool of resources) thanks to the data partitioning. The limits The CAP theorem The “CAP theorem” (also known as Brewer’s theorem) states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency (a read sees all previously completed writes)
- Availability (reads and writes always succeed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
- Classical relational databases : Availability + Consistency
- NoSQL databases : Availability + Partition tolerance OR Consistency + Partition tolerance
Database name | Type | API protocole | Written in |
Redis | Key-Value | Native/socket | C |
Memcached | Key-Value | Native/socket | C |
Riak | Key-Value | Prot.Buf, REST | Erlang/C/C++ |
Cassandra | Column oriented | Thrift (or CQL) | Java |
HBase | Column oriented | Prot.Buf, REST, Thrift | Java |
BigTable | Column oriented | ||
MongoDB | Document/JSON | Native/socket | C++ |
CouchDB | Document/JSON | Erlang | |
CouchBase | Document/JSON | Memcached Prot. | Erlang/C/C++ |
Neo4J | Graph | Java | |
Another good resource on this subject is: Introduction to NoSQL by Martin Fowler http://www.youtube.com/watch?v=qI_g07C_Q5I
I have also done some analysis of NoSQL. 200+ slide deck available on slideshare:
http://www.slideshare.net/VeryFatBoy/bcs-2013
PDF download is far better than viewing online.
Thank you for this link.
Great job !
The article is interesting and No SQL seems easy to implement…..but I think in practice it isn’t always the case ….and really I don’t know why? …..In my professional career I often found that even though there’s a No SQL implemented, people prefer to use the old méthods and technics …….paper report for data saving……..