Tag Archives: book

Review of the O’Reilly Learning Spark Book

Learning Spark Book CoverLearning Spark: Lightning-Fast Big Data Analytics is a WIP (Work-In-Progress) book, so this review is far from complete. WIP books are called O’Reilly Rough Cuts and are available from Safari with the proper subscription. Once the book is complete and has been published, I will write a detailed review on the same here. As of now only 6 chapters have been published in the book and the TOC is mentioned only for those 6 chapters.

Here are some of the good things about the book

  • Spark provides bindings (API) forĀ  Python/Scala/Java and not all the features are ported to Python. Most of the examples in the book are mentioned in all the three languages. Once the book is complete, I guess all the examples will be Python/Scala/Java.
  • The language in the book is crisp and clear to understand.

Here are some of the things missing in the book

  • The book starts with what is Spark, installing Spark and immediately jumps into the programming aspects around RDD. Big Picture (architecture) on how Spark works is missing. To efficiently architect/code it’s important to know how Spark works behind the API.
  • I started reading the online documentation around Spark, played with the Spark installation and then moved to reading the Learning Spark book. The online documentation is really good and the book as of now is not adding nothing much above the online documentation. Guess, once the book is complete a lot of additional information will be included.
  • A quick introduction to functional programming languages (2-3 pages) would be good to get started with the Spark programming. For those not familiar with the functional programming concepts, the Spark code might be a big cryptic.
  • Comparing Spark/RDD with the existing models (Hadoop/MR) will help the readers appreciate the beauty of Spark/RDD.

Conclusion

The online documentation around Spark is really good and the Learning Spark book is not offering much above the online documentation as of now. The book as-is is very crisp/clear and needs and needs to go into a bit more depth on how Spark runs behind the API. As the book makes progress and more chapters are added to it, it gets interesting.