Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Advanced MapReduce in Hadoop
Introduction
You, this course and Us (1:20)
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase - use the Combiner (14:40)
Not all Reducers are Combiners (14:31)
How many mappers and reducers does your MapReduce have? (8:23)
Parallelizing reduce using Shuffle And Sort (14:55)
MapReduce is not limited to the Java language - Introducing the Streaming API (5:05)
Python for MapReduce (12:19)
MapReduce Customizations For Finer Grained Control
Setting up your MapReduce to accept command line arguments (13:47)
The Tool, ToolRunner and GenericOptionsParser (12:36)
Configuring properties of the Job object (10:41)
Customizing the Partitioner, Sort Comparator, and Group Comparator (15:16)
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
The heart of search engines - The Inverted Index (14:41)
Generating the inverted index using MapReduce (10:25)
Custom data types for keys - The Writable Interface (10:23)
Represent a Bigram using a WritableComparable (13:13)
MapReduce to count the Bigrams in input text (8:26)
Test your MapReduce job using MRUnit (13:41)
Input and Output Formats and Customized Partitioning
Introducing the File Input Format (12:48)
Text And Sequence File Formats (10:21)
Data partitioning using a custom partitioner (7:11)
Make the custom partitioner real in code (10:25)
Total Order Partitioning (10:10)
Input Sampling, Distribution, Partitioning and configuring these (9:04)
Secondary Sort (14:34)
The Tool, ToolRunner and GenericOptionsParser
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock