CII-IIT Madras Certification Course on Big Data: 20 Hrs: Aug/Sep '21

IIT Madras – CII Certification Course

“Harnessing the Power of Big Data”

Data ­-> Insights -> Action

August 07,14,21 & 28 & September 04, 2021: Virtual

(Total Duration: 20 hours: 4 hours per day on 5 Saturday’s)

Big Data is the term that describes the large volume of data – both structured and unstructured, that inundates a business on a day-to-day basis. But it is not the amount of data that is important. It is what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

The spread of the COVID-19 global pandemic has generated an exponentially mounting and extraordinary volume of data that can be harnessed to improve our understanding of big data management to better anticipate and respond to such unforeseen ‘black swan’ events and risks. For instance, firms are using big data analytics to manage and mitigate uncertainties and bottlenecks in supply chains.

With this background, the Confederation of Indian Industry (CII) & the Indian Institute of Technology Madras (IITM) are joining hands to jointly deliver a Certification Course on "Harnessing the Power of Big Data" in August 07,14,21 & 28 & September 04, 2021. The total duration of this certification course will be 20 hours - 4 hours per day including the theory and hands-on aspects.

This certificate course will outline how big data can be harnessed by businesses to study emerging challenges and engage them for better decision making. It is designed based on the job requirements of a Big Data Analyst and will have an equal balance of theory and practical knowledge. It aims to get delegates started on using Big Data Platforms, Program Elementary Applications on Apache Spark and usage of Big Data Frameworks.

This course will be handled by IITM Professors, Industry Leaders and Experts from GITAA Pvt Ltd. GITAA’s data practitioners are equipped with a unique perspective based on their experiences in the big data analytics sector which also enables them to illustrate key aspects based on real-life case studies. The activity uses case-study methodology to help participants understand the nuances of dealing with big data and in giving a real-life project experience.

Components of the course

Interactive live online hands-on sessions using case study approach
Each module will have an assignment that will need to be completed

Module Name

Topics covered

Introduction to big data & PySpark Installation

Introduction to big data
Characteristics of big data
Challenges with big data
Big data frameworks
Framework for solving data science problems
Typology of data science problems
Installing and configuring Python, Spark and, Jupyter
Basics of PySpark- Illustration using examples

Distributed Computing

What and Why of Distributed Systems?
Distributed File System
Distributed Programming Model
Parallel Processing explained with WordCount
Concept of Cloud Computing
Big Data and Cloud Computing – Benefits

Getting started with Spark
Understanding spark an environment with Spark-Shell & User Interface

Hadoop and MapReduce

Introduction to Hadoop
How MapReduce works?
Parallelism in MapReduce
Example: K means Clustering – Sequential and with MapReduce
Example: K means Clustering – Sequential and with MapReduce
When does MapReduce work and Why?
