There are too many books/resources and it’s easy to get lost in the wild. Here are some of the references to get started with different technologies.


O’Reilly Learning Spark is a nice book to get started with Spark. Here is a review of the book.


The books around HBase (HBase: The Definitive Guide and HBase in Action) are the ones to get started with HBase. But, both the books are a couple of years old and not being updated. The HBase official reference documentation gives a quick overview of the various topics and is updated continuously with the latest HBase features.

Machine Learning

For most of us it’s easy to grasp a concept when we can visualize it than thinking it at an abstract level.  The Shape of Data is a blog which tries to explain the different concepts around ML from a geometrical perspective. It’s a nice collection for those who are getting started with ML.


Today it’s MR, tomorrow it might be some other computing model. To address the challenges o the existing computing models, new ones are found all the time. Most of the Big Data frameworks are being ported to YARN for the sake of efficiency. YARN had been there for some time, but Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 is the only book exclusively on Hadoop.

 Note that some of the above links have affiliation to Amazon.