Very often I do get the query ‘I am familiar with X, Y and Z, is it good enough to get started with Big Data ?’. This post is to address the same. Here we will look at what is required to get started with Big Data and the rational behind it.
There is a lot of information on the internet to get started with each one of them and it is easy to get lost. So, the references to get started with those technologies are also included.
Linux : Big Data (the Hadoop revolution) started on the Linux OS. Even now most of the Big Data softwares are initially developed on Linux and porting to Windows has been an after thought. Microsoft partnered with Hortonworks to speed up the porting of the Big Data softwares to Windows.
To get started with the latest softwares around Big Data, knowledge of Linux is a must. The good thing about Linux is that it is free and it opens up a lot of opportunities. Would recommend to go through all the tutorials here, except the seventh one.
Java : Most of the Big Data softwares are developed in Java. I say most, exceptions are Spark has been developed in Scala, Impala has been developed in C/C++ and so on.
To extend the Big Data softwares knowledge of Java is a must. Also, sometimes the documentation might not be up to mark and so it might be required to go through the underlying code for the Big Data software to see how some thing works or is not working as the way it is expected to.
For the above mentioned reasons knowledge of Java is must. Basics of core Java is enough, knowledge of enterprise Java is not required. Go through the Java Basics section and the Java Object Oriented section here.
SQL : Not everyone is comfortable with programming in Java and other languages. That’s the reason why SQL abstractions have been introduced on top of the different Big Data frameworks. Those who are from a database background can get very easily started with Big Data because of the SQL abstraction. Hive, Impala, Phoenix are few of such softwares.
Expertise in SQL is not required. The basics of the DDL and the DML operations is more than enough. Here are some nice tutorials to get started with SQL.
Others : The above mentioned skills are good enough to get started with Big Data. As one gets into more and more at Big Data, would also recommend to look at R, Python and Scala. Each of these languages have got their strength and weakness and depending upon the requirement the appropriate option can be picked to write Big Data programs.
To become good at Big Data it’s required for an aspirant to have a good overview of the different technologies and the above guide mentions what is required and where to start reading about them.
Best of luck !!!