DATA SCIENCE PROGRAM

Module 1: Python

Introduction

Data Types
Operators
Strings
Lists
Tuples
Sets
Dictionaries
Conditional Statements
Loops
List And Dictionaries Comprehension
Functions
Anonymous Functions
Generators
Modules
Exceptions And Error handling
Classes And Objects (OOPS)
Date And Time
Regex

Module 2: EDA and Data visualization using different Python libraries

Exploratory Data Analysis (EDA) using pandas and Numpy

Introduction to pandas library for data manipulation and data analysis.
Overview of Numpy, a fundamental package for scientific computing with Python.
Data cleaning techniques, handling missing values, dealing with outliers.
Statistical analysis of data using Numpy functions.

Data Visualization using matplotlib and seaborn library

Introduction to data visualization.
Exploring different types of plots.
Customizing the plots with labels, titles, colors.
Introduction to seaborn.
Advanced plotting techniques with seaborn.

Module 3: Statistics

Descriptive Statistics

Data-types of data
A measure of central tendency - Mean-Median-Mode
A measure of shape - Variance, Standard deviation, Range, IQR
The measure of shape - Skewness, and kurtosis
Covariance
Correlation - Pearson correlation & Spearman's rank correlation
Probability - Events, Sample Space, Mutually exclusive events, Mutually exclusive events
Classical and Conditional Probability
Probability distribution - Discrete and Continuous
Uniform Distribution
Expected values, Variance, and means
Gaussian/Normal Distribution
Properties, mean, variance, empirical rule of normal distribution
Standard normal distribution and Z-score

Inferential Statistics

Central Limit Theorem
Hypothesis testing - Null and Alternate hypothesis
Type - I and Type - II error
Critical value, significance level, p-value
One-tailed and two-tailed test
T-test - one sample, two-sample, and paired t-test
f-test
One way and two way ANOVA
Chi-Square test

Module 4: Machine Learning

Introduction to Machine Learning

Introduction to Machine Learning and its types (supervised, unsupervised, reinforcement learning)
Setting up the development environment (Python, Jupyter Notebook, libraries: NumPy, Pandas, Scikit-learn)
Overview of the Machine Learning workflow and common data preprocessing techniques

Introduction to data science and its applications

Definition of data science and its role in various industries.
Explanation of the data science lifecycle and its key stages.
Overview of the different types of data: structured, unstructured, and semi-structured.
Discussion of the importance of data collection, data quality, and data preprocessing.

Data Engineering and Preprocessing

Introduction to Data Engineering: Data cleaning, transformation, and integration
Data cleaning and Handling missing values: Imputation, deletion, and outlier treatment
Feature Engineering techniques: Creating new features, handling date and time variables, and encoding categorical variables
Data Scaling and Normalization: Standardization, min-max scaling, etc.
Dealing with categorical variables: One-hot encoding, label encoding, etc.

Model Evaluation and Hyperparameter Tuning

Cross-validation and model evaluation techniques
Hyperparameter tuning using GridSearchCV and RandomizedSearchCV
Model selection and comparison

Supervised Learning - Regression

Introduction to Regression: Definition, types, and use cases
Linear Regression: Theory, cost function, gradient descent, residual analysis, Q-Q Plot, Interaction Terms, and assumptions
Polynomial Regression: Adding polynomial terms, degree selection, and overfitting
Lasso and Ridge Regression: Regularization techniques for controlling model complexity
Evaluation metrics for regression models: Mean Squared Error (MSE), R-squared, and Mean Absolute Error (MAE)
Hands-On-House Price Prediction

Supervised Learning - Classification

Introduction to Classification: Definition, types, and use cases
Logistic Regression: Theory, logistic function, binary and multiclass classification
Decision Trees: Construction, splitting criteria, pruning, and visualization
Random Forests: Ensemble learning, bagging, and feature importance
Evaluation metrics for classification models: Accuracy, Precision, Recall, Fl-score, and ROC curves
Implementation of classification models using scikit-learn library
Hands-On - Heart Disease Detection & Food Order Prediction

SVM, KNN & Naive Bayes

Support Vector Machines (SVM): Study SVM theory, different kernel functions (linear, polynomial, radial basis function), and the margin concept. Implement SVM classification and regression, and evaluate the models.
K-Nearest Neighbors (KNN): Understand the KNN algorithm, distance metrics, and the concept of K in KNN. Implement KNN classification and regression, and evaluate the models.
Naive Bayes: Learn about the Naive Bayes algorithm, conditional probability, and Bayes' theorem. Implement Naive Bayes classification, and evaluate the model's performance
Hands-On - Contact Tracing & Sarcasm Detection

Ensemble Methods and Boosting

AdaBoost: Boosting technique, weak learners, and iterative weight adjustment
Gradient Boosting (XGBoost): Gradient boosting algorithm, Regularization, and hyperparameter tuning
Evaluation and fine-tuning of ensemble models: Cross-validation, grid search, and model selection
Handling imbalanced datasets: Techniques for dealing with class imbalance, such as oversampling and undersampling
Hands-On - Medical Insurance Price Prediction

Unsupervised Learning - Clustering

Introduction to Clustering: Definition, types, and use cases
K-means Clustering: Algorithm steps, initialization methods, and elbow method for determining the number of clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Core points, density reachability, and epsilon-neighborhoods
Evaluation of clustering algorithms: Silhouette score, cohesion, and separation metrics
Hands-On - Credit Card Clustering

Unsupervised Learning - Dimensionality Reduction

Introduction to Dimensionality Reduction: Curse of dimensionality, feature extraction, and feature selection
Principal Component Analysis (PCA): Eigenvectors, eigenvalues, variance explained, and dimensionality reduction
Implementation of PCA using scikit-learn library
Hands-On - MNIST Data

Recommendation Systems

Introduction to Recommendation Systems: Understand the concept of recommendation systems, different types (collaborative filtering, content-based, hybrid), and evaluation metrics.
Collaborative Filtering: Explore collaborative filtering techniques, including user-based and item-based approaches, and implement a collaborative filtering model.
Content-Based Filtering: Study content-based filtering methods, such as TF-IDF and cosine similarity, and build a content-based recommendation system.
Deployment and Future Directions: Discuss the deployment of recommendation systems and explore advanced topics in NLP and recommendation systems.
Hands-On - News Recommendation System

Reinforcement Learning

Introduction to Reinforcement Learning: Agent, environment, state, action, and reward
Markov Decision Processes (MDP): Markov property, transition probabilities, and value functions
Q-Learning algorithm: Exploration vs. exploitation, Q-table, and learning rate
Hands-on reinforcement learning projects and exercises
Hands-On - Working with OpenAI Gym

Developing API using Flask / Webapp with Streamlit

Introduction to Flask / Streamlit web framework
Creating a Flask / Streamlit application for ML model deployment
Integrating data preprocessing and ML model
Designing a user-friendly web interface

Deployment of ML Models

Building a web application for Machine Learning models: Creating forms, handling user input, and displaying results
Deployment using AWS (Amazon Web Services): Setting up an AWS instance, configuring security groups, and deploying the application
Deployment using PythonAnywhere: Uploading Flask application files, configuring WSGI, and launching the application

Project Work and Consolidation

Work on a real-world Machine Learning project: Identify a problem, gather data, and define project scope
Apply the learned concepts and algorithms: Data collection, preprocessing, model building, and evaluation
Deployment of the project on AWS or PythonAnywhere: Showcase the developed application and share the project with others
Presentation and discussion of the project: Demonstrate the project, explain design decisions, and receive feedback

Module 5: NLP

Natural Language Processing (NLP)

Introduction to NLP: Understand the basics of NLP, its applications, and challenges.
Named Entity Recognition (NER): Understand the various approaches and tools used for NER, such as rule-based systems, statistical models, and deep learning.
Text Preprocessing: Learn about tokenization, stemming, lemmatization, stop word removal, and other techniques for text preprocessing.
Text Representation: Explore techniques such as Bag-of-Words (BoW), TF-IDF, and word embeddings (e.g., Word2Vec, GloVe) for representing text data.
Sequential Models: Introduction to RNN, LSTM, Hands-on Keras LSTM.
Sentiment Analysis: Study sentiment analysis techniques, build a sentiment analysis model using supervised learning, and evaluate its performance.
Hands-On: Real-Time Sentiment Analysis.

Module 6: Deep Learning

RISE OF THE DEEP LEARNING

Introduction
History of Deep Learning
Perceptrons
Multi-Level Perceptrons
Representations
Training Neural Networks
Activation Functions

Artificial Neural Networks

Introduction
Deep Learning
Understanding Human Brain
In-Depth Perceptrons
Example for Perceptron
Multi Classifier
Neural Networks
Input Layer
Output Layer
Sigmoid Function
Introduction to TensorFlow and Keras
CPU VS GPU
Introduction to Google Collaboratory
Training Neural Network
Understanding Notations
Activation Functions
Hyperparameter Tuning in Keras
Feed-Forward Networks
Online Offline Mode
Bidirectional RNN
Understanding Dimensions
Back Propagation
Loss Function
SGD
Regularization
Training for Batches
Hands-On: Facial Emotion Recognition

Computer Vision and Machine Learning Modules

CNN - Convolution Neural Networks

Introduction to CNN
Applications of CNN
Idea behind CNN
Understanding Images
Understanding Videos
Convolutions
Striding and Padding
Max Pooling
Edges, Gradients, and Textures
Understanding Channels
Formulas
Weight and Bias
Feature Map
Pooling
Combining

CNN - Transfer Learning

Introduction
AlexNet
GoogleNet
ResNet
Transfer learning using Keras
Hands-On - Face Mask Detection

RNN - Recurrent Neural Networks

Introduction to RNNs
Training RNNs
RNN Formula
Architecture
Batch Data
Simplified Notations
Types of RNNs
LSTM
CRUS
Training RNN
One to many
Vanishing Gradient problem
Hands-On - COVID-19 Cases Prediction

Generative Models and GANs

Introduction to Generative Models
Understanding GANs (Generative Adversarial Networks)
GAN Architecture
GAN Training
Evaluating GAN Performance
GAN Variants and Applications

Computer Vision

Intro to OpenCV
Reading and Writing Images
Saving images
Draw shapes using OpenCV
Face detection and eye detection using OpenCV
CNN with Keras
VGG
Hands-On - Real Time Pose Estimator

Projects & Case Study

Real-Time Rain Prediction using ML

Install necessary libraries
Obtain an API key
Fetch live weather data
Preprocess the data
Train a machine learning model
Evaluate the model
Integrate the model with Flask
Display the results
Test and debug
Deploy the application
Continuously update the weather data

Real Time Drowsiness Detection Alert System

Dataset collection
Data preprocessing
Feature extraction
Labeling
Model selection
Model training
Model evaluation
Real-time implementation
Alert mechanism
Continuous improvement

House Price Prediction using LSTM

Identify a reliable source for house price data
Understand the website structure
Perform web scraping
Preprocess the scraped data
Define the problem
Split the data
Train the model
Evaluate the model
Fine-tune the model (optional)
Deploy the model
Continuously update the dataset and retrain the model

Customizable Chatbot using OpenAI API

Define chatbot goals and scope
Gather training data
Data preprocessing
API integration
Model customization
User input handling
Response generation
Post-processing and filtering
Error handling and fallback mechanisms
Continuous improvement

Fire and Smoke Detection using CNN

Data collection
Data preprocessing
Dataset augmentation
Model architecture
Training
Model evaluation
Fine-tuning
Real-time inference
Thresholding and alerts
Model optimization

Enroll