Storing, processing and extracting valuable knowledge from data has been the most general use case in the last 40 years of business applications. We have seen a waving trend in data storage, manipulation and access that still repeats. Technologies have been moving from general approaches to specialised techniques and back to generalised with revisited techniques.
Looking back 15 years we were at the peak of specialisation with OLAP cubes as tools to manipulate application specific data, rooted at Data Marts, fed by dedicated ETL tools and accessed through XML/A interfaces. Knowledge management and analytics were able to scale thanks to very specific bases of knowledge and de-normalisation.
Five years later, and thanks to the Cloud computing shift and increased resource availability, generalisation came back with distributed, normalised data-stores and ORM frameworks that were able to abstract the persistence layer and split datasets intelligently, while using underneath, our old friend the Structured Query Language (SQL) — remember that Facebook was using MySQL databases –.
SQL databases however, were imposing ACID constraints which in many use cases were not necessary. Note that not all applications require referential integrity or transactional capabilities. So 5 to 6 years ago specialisation was returning with the emergence of NoSQL paradigm. Key value stores implementing the Big Table structure such as Hbase, Column oriented stores such as Cassandra and Document stores such as MongoDB emerged quickly demonstrating an increase of performance by several orders of magnitude for certain types of applications. However, these models required lots of specialisation, meaning that for example an application tailored to Hbase would not be easily converted to a MongoDB application.
A key point in that shift to specialisation again was the concept of Map Reduce, enabling batch processing of enormous amounts of data to extract knowledge – as an answer to BI approaches from 10 years back -.
In the last 4 years, most of the Internet large scale applications are storing their data in a NoSQL data store, but now they realise that specialisation imposes a big restriction on flexibility to query data. Impala, Hive, Kiji, Pig, etc. are shifting the trend again, our old friend SQL is returning, this time on NoSQL data-stores.
To summarise, NoSQL data-stores are very important enabling Internet applications to scale, however, do not underestimate the potential of BI technologies from 15 years ago. OLAP cubes still rock!