NoSQL has been there for a while and has gained recognition from the developer community. It is now used in many projects, especially in the Cloud Native Apps world, thanks to the myriad of NoSQL offerings from public cloud vendors.
What are the reasons behind this success? Are those good reasons? Could there be a trap? Before giving some clues, let’s come back to the roots of NoSQL.
After years of indexed sequential access files (ISAM), the database world has been dominated by SQL. However, the raise of internet created needs for applications capable of managing millions of concurrent users and petabytes of data. Besides, the business model of those applications required the lowest possible cost. Did I forget to mention, it must be fully resilient?
NoSQL was born to tackle this challenge. Nobody ever woke a up one day shouting out loud “I’ll invent a database that will not have any SQL capability, and I will rule the world!”. Instead of being the target, NoSQL was the consequence of a new paradigm required to solve the problem. To scale massively to handle huge workloads, bottlenecks had to be eliminated.
Data distribution was one bottleneck: classic clustering systems deal with this problem with dedicated instances managing the routing. An architecture using a statistical distribution mechanism based on a hash function allowed to remove those instances and replace these by a direct connection to the cluster node hosting the required data. It did also facilitate resilience by eliminating points of failures. Amazon’s Dynamo model uses such statistical distribution and has inspired many NoSQL implementations.
Locking was another big issue: the classic locking mechanism implemented by classic databases is a serious bottleneck. By moving to optimistic locking, the early NoSQL implementations did speed up the locking process by just eliminating it. A new locking paradigm was born. Considering that locking will succeed in the vast majority of cases, the new way of working was just: do not lock, perform the action and check afterwards if somebody had the same idea at the same time. This approach based on a distributed versioning indicator, called a vector clock, eliminated the locking bottleneck and made possible a brand new scale of transactions per second. This came at a cost: the ‘eventual consistency’ model requires complex algorithms to manage conflicts. Sometimes a manual resolution is preferred over an automatic resolution, which would nullify the performance advantage (the double hotel booking example is a typical case: it is a design choice for the sake of performance, not a bug).
Beside those huge setups which require a performance level unreachable with classic databases, NoSQL entered a market of smaller setups where its cost-effectiveness was a strong argument against SQL databases (large SQL clusters are out of reach for many companies), and this was especially relevant for startups, unless heavily funded.
And then, developers started to love it. They liked NoSQL because it is fast, has a high-tech image and because it is simple stupid to program. Finally, the last reason became the most important one, and developers started to love it for this sole reason: ease of use. Insidiously the simple Key-Value, or Key-Object database without schema nor complex requests to design became the “quick and dirty” grail.
Today, NoSQL has become more mature and many products have deviated from the original model by re-implementing features such as locking, customizable data distribution, advanced request management, ACID compliance. Those features are often at the service of a strong domain-oriented specialization like document databases, graph databases… NoSQL has serious justifications, but is still not as general purpose as SQL. The case of projects started with NoSQL by default is not rare, it is quite common in those projects to see complex code being written just to mimic what a good old SQL request would have done … better and faster.
We love NoSQL and would like you to love it. Just love it for the right reasons!