Thursday, 27 October 2016

best hadoop training center in delhi

Best hadoop training center has been observed that Hadoop has gained immense popularity because of big data processing. The MapReduce paradigm has displayed itself to be really powerful for many data analytics applications. However optimization of Hadoop  and MapReduce needs proper expertise.

After getting the best Hadoop training from one of the best institutes in Delhi, I have decided to share with you some tips that can improve the performance of Hadoop jobs. And this is the aim of my today's article. So, let's have a look.

Use More Reducers
It is a common mistake done by most of the first time users to run a job with single reducer. While utilizing reducer is ok for some applications, it is not for the majority. Be sure to utilize large number of reducers, minimally one per node in your cluster.
Increase HDFS Blocksize, Namenode & Datanode Threads

You should know that the default block size of files in HDFS is 64M. It should be set to at least 128M, however you may find that 256M or even 512M will function better on many systems. You can set the dfs.block.size parameter in Hadoop 1.0 and the dfs.blocksize parameter in Hadoop 2.0.

On the other hand, both the HDFS datanode and HDFS namenode have 10 threads by default, holding the capability of handling requests. You should keep the default of 10 threads, if your cluster is small enough that the thread count would fall below 10. You can modify the thread count by setting the dfs.datanode.handler.countparameters and dfs.namenode.handler.count in hdfs-site.xml.

Allow Compression
You can enable compression to reduce the time spent in the shuffle phase, and improve your job performance in this hadoop training center, when you find the amount of data that are sent in and around the network during the MapReduce shuffle phase is quite large. Also, if you find ultimately the output of your job is large, it will be ideal to compress the data before writing it to disk. This way, the time on I/O writing time can be saved.

Monitor Your System
The most vital thing to do for improving job performance is to monitor the entire system thoroughly and adjust in accordance with that. Without right monitoring, it will not be revealed how much memory your job employs, how heavy the I/O is or is not, or whether you have any failing hardware in your cluster  that needs maintenance.

Hope you have found these hadoop optimization tips useful. If you want to build your career in Hadoop, enroll a reputed training center today and get the best Hadoop training.