Introduction | StackSkills

Autoplay
Autocomplete

Previous Lesson Complete and Continue

Taming Big Data with Apache Spark and Python - Hands On!

Getting Started with Spark

Introduction (1:46)
How to Use This Course
[Activity] Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies. (14:41)
[Activity] Installing the MovieLens Movie Rating Dataset (3:35)
[Activity] Run your first Spark program! Ratings histogram example. (6:11)

Spark Basics and Simple Examples

Introduction to Spark (10:11)
The Resilient Distributed Dataset (RDD) (12:35)
Ratings Histogram Walkthrough (13:27)
Key/Value RDD's, and the Average Friends by Age Example (16:08)
[Activity] Running the Average Friends by Age Example (5:40)
Filtering RDD's, and the Minimum Temperature by Location Example (8:11)
[Activity]Running the Minimum Temperature Example, and Modifying it for Maximums (5:06)
[Activity] Running the Maximum Temperature by Location Example (3:19)
[Activity] Counting Word Occurrences using flatmap() (7:24)
[Activity] Improving the Word Count Script with Regular Expressions (4:42)
[Activity] Sorting the Word Count Results (7:46)
[Exercise] Find the Total Amount Spent by Customer (4:01)
[Excercise] Check your Results, and Now Sort them by Total Amount Spent. (5:09)
Check Your Sorted Implementation and Results Against Mine. (2:44)

Advanced Examples of Spark Programs

[Activity] Find the Most Popular Movie (5:53)
[Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers (8:25)
Find the Most Popular Superhero in a Social Graph (4:29)
[Activity] Run the Script - Discover Who the Most Popular Superhero is! (6:00)
Superhero Degrees of Separation: Introducing Breadth-First Search (7:56)
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark (6:44)
[Activity] Superhero Degrees of Separation: Review the Code and Run it (9:35)
Item-Based Collaborative Filtering in Spark, cache(), and persist() (10:10)
[Activity] Running the Similar Movies Script using Spark's Cluster Manager (10:55)
[Exercise] Improve the Quality of Similar Movies (3:05)

Running Spark on a Cluster

Introducing Elastic MapReduce (5:09)
[Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY (9:58)
Partitioning (4:21)
Create Similar Movies from One Million Ratings - Part 1 (5:10)
[Activity] Create Similar Movies from One Million Ratings - Part 2 (11:26)
Create Similar Movies from One Million Ratings - Part 3 (3:30)
Troubleshooting Spark on a Cluster (3:43)
More Troubleshooting, and Managing Dependencies (6:02)

SparkSQL, DataFrames, and DataSets

Introducing SparkSQL (6:08)
Executing SQL commands and SQL-style functions on a DataFrame (8:16)
Using DataFrames instead of RDD's (5:52)

Other Spark Technologies and Libraries

Introducing MLLib (8:09)
[Activity] Using MLLib to Produce Movie Recommendations (2:55)
Analyzing the ALS Recommendations Results (4:53)
Using DataFrames with MLLib (7:31)
Spark Streaming (8:04)
[Activity] Structured Streaming in Python (8:47)
GraphX (2:11)

You Made It! Where to Go from Here.

Learning More about Spark and Data Science (3:43)

Introduction

Complete and Continue