DATA SCIENCE PROGRAM



Module 1: Python

Introduction

  • Data Types
  • Operators
  • Strings
  • Lists
  • Tuples
  • Sets
  • Dictionaries
  • Conditional Statements
  • Loops
  • List And Dictionaries Comprehension
  • Functions
  • Anonymous Functions
  • Generators
  • Modules
  • Exceptions And Error handling
  • Classes And Objects (OOPS)
  • Date And Time
  • Regex

Module 2: EDA and Data visualization using different Python libraries

Exploratory Data Analysis (EDA) using pandas and Numpy

  • Introduction to pandas library for data manipulation and data analysis.
  • Overview of Numpy, a fundamental package for scientific computing with Python.
  • Data cleaning techniques, handling missing values, dealing with outliers.
  • Statistical analysis of data using Numpy functions.

Data Visualization using matplotlib and seaborn library

  • Introduction to data visualization.
  • Exploring different types of plots.
  • Customizing the plots with labels, titles, colors.
  • Introduction to seaborn.
  • Advanced plotting techniques with seaborn.

Module 3: Statistics

Descriptive Statistics

  • Data-types of data
  • A measure of central tendency - Mean-Median-Mode
  • A measure of shape - Variance, Standard deviation, Range, IQR
  • The measure of shape - Skewness, and kurtosis
  • Covariance
  • Correlation - Pearson correlation & Spearman's rank correlation
  • Probability - Events, Sample Space, Mutually exclusive events, Mutually exclusive events
  • Classical and Conditional Probability
  • Probability distribution - Discrete and Continuous
  • Uniform Distribution
  • Expected values, Variance, and means
  • Gaussian/Normal Distribution
  • Properties, mean, variance, empirical rule of normal distribution
  • Standard normal distribution and Z-score

Inferential Statistics

  • Central Limit Theorem
  • Hypothesis testing - Null and Alternate hypothesis
  • Type - I and Type - II error
  • Critical value, significance level, p-value
  • One-tailed and two-tailed test
  • T-test - one sample, two-sample, and paired t-test
  • f-test
  • One way and two way ANOVA
  • Chi-Square test

Module 4: Machine Learning

Introduction to Machine Learning

  • Introduction to Machine Learning and its types (supervised, unsupervised, reinforcement learning)
  • Setting up the development environment (Python, Jupyter Notebook, libraries: NumPy, Pandas, Scikit-learn)
  • Overview of the Machine Learning workflow and common data preprocessing techniques

Introduction to data science and its applications

  • Definition of data science and its role in various industries.
  • Explanation of the data science lifecycle and its key stages.
  • Overview of the different types of data: structured, unstructured, and semi-structured.
  • Discussion of the importance of data collection, data quality, and data preprocessing.

Data Engineering and Preprocessing

  • Introduction to Data Engineering: Data cleaning, transformation, and integration
  • Data cleaning and Handling missing values: Imputation, deletion, and outlier treatment
  • Feature Engineering techniques: Creating new features, handling date and time variables, and encoding categorical variables
  • Data Scaling and Normalization: Standardization, min-max scaling, etc.
  • Dealing with categorical variables: One-hot encoding, label encoding, etc.

Model Evaluation and Hyperparameter Tuning

  • Cross-validation and model evaluation techniques
  • Hyperparameter tuning using GridSearchCV and RandomizedSearchCV
  • Model selection and comparison

Supervised Learning - Regression

  • Introduction to Regression: Definition, types, and use cases
  • Linear Regression: Theory, cost function, gradient descent, residual analysis, Q-Q Plot, Interaction Terms, and assumptions
  • Polynomial Regression: Adding polynomial terms, degree selection, and overfitting
  • Lasso and Ridge Regression: Regularization techniques for controlling model complexity
  • Evaluation metrics for regression models: Mean Squared Error (MSE), R-squared, and Mean Absolute Error (MAE)
  • Hands-On-House Price Prediction

Supervised Learning - Classification

  • Introduction to Classification: Definition, types, and use cases
  • Logistic Regression: Theory, logistic function, binary and multiclass classification
  • Decision Trees: Construction, splitting criteria, pruning, and visualization
  • Random Forests: Ensemble learning, bagging, and feature importance
  • Evaluation metrics for classification models: Accuracy, Precision, Recall, Fl-score, and ROC curves
  • Implementation of classification models using scikit-learn library
  • Hands-On - Heart Disease Detection & Food Order Prediction

SVM, KNN & Naive Bayes

  • Support Vector Machines (SVM): Study SVM theory, different kernel functions (linear, polynomial, radial basis function), and the margin concept. Implement SVM classification and regression, and evaluate the models.
  • K-Nearest Neighbors (KNN): Understand the KNN algorithm, distance metrics, and the concept of K in KNN. Implement KNN classification and regression, and evaluate the models.
  • Naive Bayes: Learn about the Naive Bayes algorithm, conditional probability, and Bayes' theorem. Implement Naive Bayes classification, and evaluate the model's performance
  • Hands-On - Contact Tracing & Sarcasm Detection

Ensemble Methods and Boosting

  • AdaBoost: Boosting technique, weak learners, and iterative weight adjustment
  • Gradient Boosting (XGBoost): Gradient boosting algorithm, Regularization, and hyperparameter tuning
  • Evaluation and fine-tuning of ensemble models: Cross-validation, grid search, and model selection
  • Handling imbalanced datasets: Techniques for dealing with class imbalance, such as oversampling and undersampling
  • Hands-On - Medical Insurance Price Prediction

Unsupervised Learning - Clustering

  • Introduction to Clustering: Definition, types, and use cases
  • K-means Clustering: Algorithm steps, initialization methods, and elbow method for determining the number of clusters
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Core points, density reachability, and epsilon-neighborhoods
  • Evaluation of clustering algorithms: Silhouette score, cohesion, and separation metrics
  • Hands-On - Credit Card Clustering

Unsupervised Learning - Dimensionality Reduction

  • Introduction to Dimensionality Reduction: Curse of dimensionality, feature extraction, and feature selection
  • Principal Component Analysis (PCA): Eigenvectors, eigenvalues, variance explained, and dimensionality reduction
  • Implementation of PCA using scikit-learn library
  • Hands-On - MNIST Data

Recommendation Systems

  • Introduction to Recommendation Systems: Understand the concept of recommendation systems, different types (collaborative filtering, content-based, hybrid), and evaluation metrics.
  • Collaborative Filtering: Explore collaborative filtering techniques, including user-based and item-based approaches, and implement a collaborative filtering model.
  • Content-Based Filtering: Study content-based filtering methods, such as TF-IDF and cosine similarity, and build a content-based recommendation system.
  • Deployment and Future Directions: Discuss the deployment of recommendation systems and explore advanced topics in NLP and recommendation systems.
  • Hands-On - News Recommendation System

Reinforcement Learning

  • Introduction to Reinforcement Learning: Agent, environment, state, action, and reward
  • Markov Decision Processes (MDP): Markov property, transition probabilities, and value functions
  • Q-Learning algorithm: Exploration vs. exploitation, Q-table, and learning rate
  • Hands-on reinforcement learning projects and exercises
  • Hands-On - Working with OpenAI Gym

Developing API using Flask / Webapp with Streamlit

  • Introduction to Flask / Streamlit web framework
  • Creating a Flask / Streamlit application for ML model deployment
  • Integrating data preprocessing and ML model
  • Designing a user-friendly web interface

Deployment of ML Models

  • Building a web application for Machine Learning models: Creating forms, handling user input, and displaying results
  • Deployment using AWS (Amazon Web Services): Setting up an AWS instance, configuring security groups, and deploying the application
  • Deployment using PythonAnywhere: Uploading Flask application files, configuring WSGI, and launching the application

Project Work and Consolidation

  • Work on a real-world Machine Learning project: Identify a problem, gather data, and define project scope
  • Apply the learned concepts and algorithms: Data collection, preprocessing, model building, and evaluation
  • Deployment of the project on AWS or PythonAnywhere: Showcase the developed application and share the project with others
  • Presentation and discussion of the project: Demonstrate the project, explain design decisions, and receive feedback

Module 5: NLP

Natural Language Processing (NLP)

  • Introduction to NLP: Understand the basics of NLP, its applications, and challenges.
  • Named Entity Recognition (NER): Understand the various approaches and tools used for NER, such as rule-based systems, statistical models, and deep learning.
  • Text Preprocessing: Learn about tokenization, stemming, lemmatization, stop word removal, and other techniques for text preprocessing.
  • Text Representation: Explore techniques such as Bag-of-Words (BoW), TF-IDF, and word embeddings (e.g., Word2Vec, GloVe) for representing text data.
  • Sequential Models: Introduction to RNN, LSTM, Hands-on Keras LSTM.
  • Sentiment Analysis: Study sentiment analysis techniques, build a sentiment analysis model using supervised learning, and evaluate its performance.
  • Hands-On: Real-Time Sentiment Analysis.

Module 6: Deep Learning

RISE OF THE DEEP LEARNING

  • Introduction
  • History of Deep Learning
  • Perceptrons
  • Multi-Level Perceptrons
  • Representations
  • Training Neural Networks
  • Activation Functions

Artificial Neural Networks

  • Introduction
  • Deep Learning
  • Understanding Human Brain
  • In-Depth Perceptrons
  • Example for Perceptron
  • Multi Classifier
  • Neural Networks
  • Input Layer
  • Output Layer
  • Sigmoid Function
  • Introduction to TensorFlow and Keras
  • CPU VS GPU
  • Introduction to Google Collaboratory
  • Training Neural Network
  • Understanding Notations
  • Activation Functions
  • Hyperparameter Tuning in Keras
  • Feed-Forward Networks
  • Online Offline Mode
  • Bidirectional RNN
  • Understanding Dimensions
  • Back Propagation
  • Loss Function
  • SGD
  • Regularization
  • Training for Batches
  • Hands-On: Facial Emotion Recognition

Computer Vision and Machine Learning Modules

CNN - Convolution Neural Networks

  • Introduction to CNN
  • Applications of CNN
  • Idea behind CNN
  • Understanding Images
  • Understanding Videos
  • Convolutions
  • Striding and Padding
  • Max Pooling
  • Edges, Gradients, and Textures
  • Understanding Channels
  • Formulas
  • Weight and Bias
  • Feature Map
  • Pooling
  • Combining

CNN - Transfer Learning

  • Introduction
  • AlexNet
  • GoogleNet
  • ResNet
  • Transfer learning using Keras
  • Hands-On - Face Mask Detection

RNN - Recurrent Neural Networks

  • Introduction to RNNs
  • Training RNNs
  • RNN Formula
  • Architecture
  • Batch Data
  • Simplified Notations
  • Types of RNNs
  • LSTM
  • CRUS
  • Training RNN
  • One to many
  • Vanishing Gradient problem
  • Hands-On - COVID-19 Cases Prediction

Generative Models and GANs

  • Introduction to Generative Models
  • Understanding GANs (Generative Adversarial Networks)
  • GAN Architecture
  • GAN Training
  • Evaluating GAN Performance
  • GAN Variants and Applications

Computer Vision

  • Intro to OpenCV
  • Reading and Writing Images
  • Saving images
  • Draw shapes using OpenCV
  • Face detection and eye detection using OpenCV
  • CNN with Keras
  • VGG
  • Hands-On - Real Time Pose Estimator

Projects & Case Study

Real-Time Rain Prediction using ML

  • Install necessary libraries
  • Obtain an API key
  • Fetch live weather data
  • Preprocess the data
  • Train a machine learning model
  • Evaluate the model
  • Integrate the model with Flask
  • Display the results
  • Test and debug
  • Deploy the application
  • Continuously update the weather data

Real Time Drowsiness Detection Alert System

  • Dataset collection
  • Data preprocessing
  • Feature extraction
  • Labeling
  • Model selection
  • Model training
  • Model evaluation
  • Real-time implementation
  • Alert mechanism
  • Continuous improvement

House Price Prediction using LSTM

  • Identify a reliable source for house price data
  • Understand the website structure
  • Perform web scraping
  • Preprocess the scraped data
  • Define the problem
  • Split the data
  • Train the model
  • Evaluate the model
  • Fine-tune the model (optional)
  • Deploy the model
  • Continuously update the dataset and retrain the model

Customizable Chatbot using OpenAI API

  • Define chatbot goals and scope
  • Gather training data
  • Data preprocessing
  • API integration
  • Model customization
  • User input handling
  • Response generation
  • Post-processing and filtering
  • Error handling and fallback mechanisms
  • Continuous improvement

Fire and Smoke Detection using CNN

  • Data collection
  • Data preprocessing
  • Dataset augmentation
  • Model architecture
  • Training
  • Model evaluation
  • Fine-tuning
  • Real-time inference
  • Thresholding and alerts
  • Model optimization

Enroll