Real Time Spark Project for Beginners: Hadoop, Spark, Docker

+ View more

Course overview

In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time. There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability. Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies. Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data. The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker. Data Visualization is built using Django Web Framework and Flexmonster.

What will i learn?

Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker
Setting up Single Node Hadoop and Spark Cluster on Docker
Features of Spark Structured Streaming using Spark with Scala
Features of Spark Structured Streaming using Spark with Python(PySpark)
How to use PostgreSQL with Spark Structured Streaming
Basic understanding of Apache Kafka
How to build Data Visualization using Django Web Framework and Flexmonster
Fundamentals of Docker and Containerization
Basic understanding of Programming Language

Requirements

Basic understanding of Programming Language

Curriculum for this course

24 Lessons 06:34:31 Hours

Introduction

2 Lessons 00:41:09 Hours

Introduction to Apache Spark
00:32:28
Real Time Spark Project Overview Building End to End Streaming Data Pipeline
00:08:41

Environment Setup

6 Lessons 01:38:10 Hours

Setting up Docker Environment
00:09:55
Create Single Node Kafka Cluster on Docker
00:08:16
Create Single Node Apache Hadoop and Spark Cluster on Docker
00:35:07
Setting up IntelliJ IDEA Community Edition IDE
00:21:01
Setting up PyCharm Community Edition IDE
00:16:41
Setting up Django Web Framework
00:07:10

Development - Project Code Walk-through

5 Lessons 01:46:25 Hours

Event Simulator using Python Server Status Detail
00:19:16
Building Streaming Data Pipeline using Scala Spark Structured Streaming
00:30:58
Building Streaming Data Pipeline using PySpark Spark Structured Streaming
00:28:54
Setting up PostgreSQL Database Events Database
00:04:56
Building Dashboard using Django Web Framework and Flexmonster Visualization
00:22:21

Complete Project Demo

2 Lessons 00:24:43 Hours

Real Time Spark Project Demo
00:14:32
Running Real Time Streaming Data Pipeline using Spark Cluster On Docker
00:10:11

Bonus Tutorial - Docker Tutorial for Beginners

9 Lessons 02:04:04 Hours

Introduction to Docker
00:11:38
Install Docker on Ubuntu 18.04
00:09:57
Docker Commands Commonly Used
00:10:34
Create First Docker Image and Container
00:09:49
Create MySQL Docker Container
00:10:59
Cassandra on Docker Container
00:09:05
MongoDB on Docker Container
00:08:01
Setting up Docker Compose
00:18:35
How to create Docker Volume
00:35:26

Resources

0 Lessons 00:00:00 Hours

+ View more

Other related courses

About instructor

Handson Hybrid

0 Reviews | 3 Students | 5 Courses

Student feedback

0 Reviews

Reviews

Preview this course

₹4990 ₹499

Includes:

06:34:31 Hours On demand videos
24 Lessons
Access on mobile and tv
Full lifetime access
Compare this course with other

Real Time Spark Project for Beginners: Hadoop, Spark, Docker

What will i learn?

Handson Hybrid

Reviews

Are you sure ?