Since we all used big table in Google Labs & Facebook there is always a mentioning about performance these portals have provided. There are many articles/blogs/stories published on the same and all in all it stops at NoSQL. Early this year, when Netflix migrated to use NoSQL successfully, now almost everyone wants to evaluate this technique further. As first major NoSQL Now Conference started in August (August 23 – 25) this post is to give quick introduction of NoSQL and touch base pointers that will help in evaluating its use in existing or future projects.
NoSQL commonly expanded to ‘Not Only SQL’. In my observations, mostly it’s been considered as an alternative to RDMS as we are used to ‘SQL’ means RDBMS because of its popularity. In fact, it’s not an alternative; it’s a different database model with set of different objectives. One more point to be noted here is, NoSQL is not a single system/solution rather it’s a class of database management systems that differ from classic RDBMS.
Recently I was working on solving a performance issue in one of the project. It involved intensive File IO and huge set of DB operations. Due to complexity of queries and size of data, module was lacking with respect to desired performance. Ultimate solution for the problem was to de-normalize existing DB schema, add batch processing and on top of it distribute batch processing across multiple nodes to further enhance it. I don’t see much difference between this solution and NoSQL at concept level, NoSQL takes this concept to a broader level. I would say NoSQL engines rely on a distributed storage systems and parallel processing across different nodes.
NoSQL implementations are generally categorized as below:
- Document Store – Apache CouchDB, MongoDB, SimpleDB etc.
- Graph – AllegroGraph, OrientDB etc.
- Key-value Stores – Apache Cassandra, BigTable, Apache Hadoop etc.
So why or when I will use NoSQL?
- For certain use cases so admired ACID nature of RDBMS takes its toll on application performance and eventually availability of the application. Some of the applications may not need relationship between data stored along with it.
- One of the most talked aspects is SCALABILITY. With classic RDBMS approach scalability is achieved with architecting the database properly. On the other hand, NoSQL datastructures have no predefined schemas. It focuses on only those datastructures that can scale and restricting use of these datastructures ensures significantly higher horizontal scalability.
- High Availability is another important dimension of NoSQL which comes at relative cheaper cost than RDBMS. With synchronous replication of data, it ensures high availability.
As its still evolving, migration from RDBMS to NoSQL is not simple step; approach followed by Netflix is a very good example to consider here.
Though throughout this post I have compared NoSQL with RDBMS, it’s mainly to highlight features and explore use of NoSQL to solve problems faced with RDBMS. RDBMS is not going anywhere. In evaluating few of NoSQL engines; I have observed complexity as compared to RDBMS. Being this said, I think NoSQL implementations on Cloud will definitely be a next step forward.