You are on page 1of 5

Write a MapReduce program to find Dept wise salary.

Empno EmpName Dept Salary

Mapper.py

#!/usr/bin/env python

import sys

for line in sys.stdin:

line=line.strip()

words=line.split()

size=len(words)

print '%s\t%s' %(words[size-2],words[size-1])

Reducer.py

#!/usr/bin/env python

import sys

current_dept=None

dept=None

current_sal=0

for line in sys.stdin:

line=line.strip()

dept,sal=line.split('\t',1)

try:

sal=int(sal)

except ValueError:

continue

if current_dept==dept:
current_sal+=sal

else:

if current_dept:

print '%s\t%s' %(current_dept,current_sal)

current_dept=dept

current_sal=sal

if current_dept==dept:

print '%s\t%s' %(current_dept,current_sal)

Input.txt

1011 Abc CSE 50000

1012 Def ECE 45000

1013 Efg Mech 45000

1014 Ghi CSE 55000

1015 Jkl CSE 75000

1016 Mno Mech 35000

1017 Pqr ECE 46000

1018 Stu EEE 25000

1019 Vwx CSE 31000

1020 Yzz EEE 25000

Output

18/09/26 01:55:11 INFO mapreduce.Job: map 0% reduce 0%

18/09/26 01:55:24 INFO mapreduce.Job: map 50% reduce 0%

18/09/26 01:55:25 INFO mapreduce.Job: map 100% reduce 0%


18/09/26 01:55:31 INFO mapreduce.Job: map 100% reduce 100%

18/09/26 01:55:32 INFO mapreduce.Job: Job job_1537948147353_0006 completed successfully

18/09/26 01:55:32 INFO mapreduce.Job: Counters: 50

File System Counters

FILE: Number of bytes read=128

FILE: Number of bytes written=355974

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=494

HDFS: Number of bytes written=42

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Killed map tasks=1

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=19173

Total time spent by all reduces in occupied slots (ms)=5223

Total time spent by all map tasks (ms)=19173

Total time spent by all reduce tasks (ms)=5223

Total vcore-seconds taken by all map tasks=19173

Total vcore-seconds taken by all reduce tasks=5223


Total megabyte-seconds taken by all map tasks=19633152

Total megabyte-seconds taken by all reduce tasks=5348352

Map-Reduce Framework

Map input records=10

Map output records=10

Map output bytes=102

Map output materialized bytes=134

Input split bytes=206

Combine input records=0

Combine output records=0

Reduce input groups=4

Reduce shuffle bytes=134

Reduce input records=10

Reduce output records=4

Spilled Records=20

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=269

CPU time spent (ms)=1880

Physical memory (bytes) snapshot=575602688

Virtual memory (bytes) snapshot=4511502336

Total committed heap usage (bytes)=392306688

Shuffle Errors

BAD_ID=0
CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=288

File Output Format Counters

Bytes Written=42

18/09/26 01:55:32 INFO streaming.StreamJob: Output directory: /4094_out_Dept

[cloudera@quickstart ~]$ hadoop fs -ls /4094_out_Dept

Found 2 items

-rw-r--r-- 1 cloudera supergroup 0 2018-09-26 01:55 /4094_out_Dept/_SUCCESS

-rw-r--r-- 1 cloudera supergroup 42 2018-09-26 01:55 /4094_out_Dept/part-00000

[cloudera@quickstart ~]$ hadoop fs -cat /4094_out_Dept/part-00000

CSE 211000

ECE 91000

EEE 50000

Mech 80000

[cloudera@quickstart ~]$

You might also like