Hadoop Map reduce is widely used term to process large scatters le data. Hadoop contain data in different clusters which has their own resources, and memory. Map Reduce is a programming model and an associated implementation for processing and generating large data sets. Hundreds of Map Reduce pro- gram shave been implemented and upwards of one thousand Map Reduce jobs are executed on Google’s clusters every day. This is current challenging issue of improving the performance of hadoop structure. In this paper focus is on optimizing job scheduling in hadoop. Workload Characteristic and Resource Aware (WCRA) Hadoop scheduler is proposed, in which based on the performance, nodes in the cluster are scheduled. CPU busy and Disk I/O busy. The amount of primary memory available in the node is ensured to be more than 25% before scheduling the job. Performance parameters of Map tasks such as the time required for parsing the data, map, and sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound or Disk I/O bound. Tasks are assigned the priority based on their minimum Estimated Completion Time. The jobs are scheduled on a compute node in such a way that jobs already running on it will not be affected. Experimental results have given 30 % improvement in performance compared to Hadoop’s FIFO, Fair and Capacity scheduler.
Hadoop, Map Reduces.