Not Only SQL, Not Only Hadoop
Manuel Sevilla, the Chief Technical Officer for the Global BIM TLI of Capgemini, writes that Big Data is not only Hadoop.
NoSQL doesn’t mean No SQL, but Not only SQL! And the SQL word represents the relational databases, not the SQL language. Using the No SQL expression may be confusing, but it sounds really good, and this is why it is still used today. It regroups a lot of technologies like Cassandra, Neo4J, MongoDB, HBase, and by extension, Hadoop (remember, Hadoop is not only one tool but a combination of many tools.).
Hadoop is not only one tool but a combination of many tools that delivers a solution which is “very scalable at low cost” and can “work with non-modeled and non-structured data”.
A Big Data project is first and foremost a business plan to demonstrate the value of investing in Big Data. And the implementation by itself involves a few steps:
Data acquisition: From internal databases, from external sources, from machines, from people, with a full portfolio of tools and with all the intellectual property (IP), legal and privacy constraints
Data marshaling: All the data acquired has to be sorted to be removed (non-useful data) or stored in the best format (through Hadoop or No SQL solutions but also BI appliances, in-memory solutions…)
Analytics: All this data has to be mined, to be used to do predictive, to alert, to find innovative correlations
Action: Once something is discovered thanks to the Analytics phase, it has to be used to feed the transactional systems to transform the insights into money (cost reduction or revenue increase)
Data governance: All this is utopia without data quality and an efficient master data management solution
I think Hadoop is part of the solution, as Hadoop is part of the data storing thanks to HBase and HDFS and part of the Analytics phase with MapReduce but it doesn’t cover the other phases, and very often cannot cover 100% of Data Marshaling and Analytics.