Real Time Spark Project for Beginners: Hadoop, Spark, Docker

Building Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker

Beginner 0(0 Ratings) 1 Students enrolled English
Created by Handson Hybrid
Last updated Mon, 02-Jan-2023
+ View more
Course overview

In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time. There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability. Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies. Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data. The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker. Data Visualization is built using Django Web Framework and Flexmonster.

What will i learn?

  • Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker
  • Setting up Single Node Hadoop and Spark Cluster on Docker
  • Features of Spark Structured Streaming using Spark with Scala
  • Features of Spark Structured Streaming using Spark with Python(PySpark)
  • How to use PostgreSQL with Spark Structured Streaming
  • Basic understanding of Apache Kafka
  • How to build Data Visualization using Django Web Framework and Flexmonster
  • Fundamentals of Docker and Containerization
  • Basic understanding of Programming Language
Requirements
  • Basic understanding of Programming Language
Curriculum for this course
24 Lessons 06:34:31 Hours
Introduction
2 Lessons 00:41:09 Hours
  • Introduction to Apache Spark
    00:32:28
  • Real Time Spark Project Overview Building End to End Streaming Data Pipeline
    00:08:41
Environment Setup
6 Lessons 01:38:10 Hours
  • Setting up Docker Environment
    00:09:55
  • Create Single Node Kafka Cluster on Docker
    00:08:16
  • Create Single Node Apache Hadoop and Spark Cluster on Docker
    00:35:07
  • Setting up IntelliJ IDEA Community Edition IDE
    00:21:01
  • Setting up PyCharm Community Edition IDE
    00:16:41
  • Setting up Django Web Framework
    00:07:10
Development - Project Code Walk-through
5 Lessons 01:46:25 Hours
  • Event Simulator using Python Server Status Detail
    00:19:16
  • Building Streaming Data Pipeline using Scala Spark Structured Streaming
    00:30:58
  • Building Streaming Data Pipeline using PySpark Spark Structured Streaming
    00:28:54
  • Setting up PostgreSQL Database Events Database
    00:04:56
  • Building Dashboard using Django Web Framework and Flexmonster Visualization
    00:22:21
Complete Project Demo
2 Lessons 00:24:43 Hours
  • Real Time Spark Project Demo
    00:14:32
  • Running Real Time Streaming Data Pipeline using Spark Cluster On Docker
    00:10:11
Bonus Tutorial - Docker Tutorial for Beginners
9 Lessons 02:04:04 Hours
  • Introduction to Docker
    00:11:38
  • Install Docker on Ubuntu 18.04
    00:09:57
  • Docker Commands Commonly Used
    00:10:34
  • Create First Docker Image and Container
    00:09:49
  • Create MySQL Docker Container
    00:10:59
  • Cassandra on Docker Container
    00:09:05
  • MongoDB on Docker Container
    00:08:01
  • Setting up Docker Compose
    00:18:35
  • How to create Docker Volume
    00:35:26
Resources
0 Lessons 00:00:00 Hours
+ View more
Other related courses
About instructor

Handson Hybrid

0 Reviews | 3 Students | 5 Courses
Student feedback
0
0 Reviews
  • (0)
  • (0)
  • (0)
  • (0)
  • (0)

Reviews

₹4990 ₹499
Includes: