Prerequisites to get started with Big Data

Very often I do get the query ‘I am familiar with X, Y and Z, is it good enough to get started with Big Data ?’. Old and new construnuctions under Central Station Amsterdam from Flickr by urban portrait in photo pictures under CCThis post is to address the same. Here we will look at what is required to get started with Big Data and the rational behind it.

There is a lot of information on the internet to get started with each one of them and it is easy to get lost. So, the references to get started with those technologies are also included.

Linux : Big Data (the Hadoop revolution) started on the Linux OS. Even now most of the Big Data softwares are initially developed on Linux and porting to Windows has been an after UBUNTU! Tux floats to work! from Flick by danoz2k9 under CCthought. Microsoft partnered with Hortonworks to speed up the porting of the Big Data softwares to Windows.

To get started with the latest softwares around Big Data, knowledge of Linux is a must. The good thing about Linux is that it is free and it opens up a lot of opportunities. Would recommend to go through all the tutorials here, except the seventh one.

There are more than 100 different flavors of Linux and Ubuntu is one of the popular distribution to get started for those who are new to Linux.

Java : Most of the Big Data softwares are developed in Java. I say most, exceptions are Spark has been developed in Scala, Impala has been developed in C/C++ and so on.

To extend the Big Data softwares knowledge of Java is a must. Also, sometimes the documentation might not be up to mark and so it might be required to go through the The Evolution of Computer Programming Languages #C #Fortran #Java #Ruby from Flick by dullhunk under CCunderlying code for the Big Data software to see how some thing works or is not working as the way it is expected to.

For the above mentioned reasons knowledge of Java is must. Basics of core Java is enough, knowledge of enterprise Java is not required. Go through the Java Basics section and the Java Object Oriented section here.

Java programs can be developed with as simple as notepad. But, developing in an IDE like Eclipse makes it a piece of cake. Here is a nice tutorial on Eclipse.

Databases 2 by Tim Morgan from Flickr under CCSQL : Not everyone is comfortable with programming in Java and other languages. That’s the reason why SQL abstractions have been introduced on top of the different Big Data frameworks. Those who are from a database background can get very easily started with Big Data because of the SQL abstraction. Hive, Impala, Phoenix are few of such softwares.

Expertise in SQL is not required. The basics of the DDL and the DML operations is more than enough. Here are some nice tutorials to get started with SQL.

Others : The above mentioned skills are good enough to get started with Big Data. As one gets into more and more at Big Data, would also recommend to look at R, Python and Scala. Each of these languages have got their strength and weakness and depending upon the requirement the appropriate option can be picked to write Big Data programs.

To become good at Big Data it’s required for an aspirant to have a good overview of the different technologies and the above guide mentions what is required and where to start reading about them.

Best of luck !!!

Leave a Reply

Your email address will not be published. Required fields are marked *