GST Technologies
Home
About
Courses
IT
Management
Multimedia
Interships
Projects
IT Projects
Python
Java
DotNet
B.E
B.Tech
M.Tech
Diploma
Arts and Science
Gallery
Student
Video
Placement
Reviews
IT
Management
Multimedia
Contact
Data Science Training
🔹
DATA SCIENCE SYLLABUS
MODULE 1: Introduction to Data Science
What is Data Science?
Data Science Lifecycle (Data Collection → Wrangling → Analysis → Modeling → Communication)
Roles: Data Scientist, Data Analyst, Data Engineer
Applications: Finance, Healthcare, Marketing, E-commerce, Social Media
MODULE 2: Mathematics & Statistics for Data Science
A. Linear Algebra
Vectors, Matrices, Tensors
Matrix Operations
Eigenvalues & Eigenvectors
Applications in ML (e.g., PCA)
B. Calculus
Derivatives and Gradients
Partial Derivatives
Chain Rule
Applications in Optimization (e.g., Gradient Descent)
C. Probability and Statistics
Descriptive Statistics: Mean, Median, Mode, Variance, Skewness
Probability Theory: Bayes Theorem, Conditional Probability
Distributions: Normal, Binomial, Poisson, Uniform
Hypothesis Testing: p-value, t-test, ANOVA, Chi-Square
Confidence Intervals and Z-scores
Central Limit Theorem
MODULE 3: Programming for Data Science
Python or R (Primary Language)
Variables, Loops, Functions
List Comprehensions, Lambda Functions
Error Handling, File I/O
Object-Oriented Programming (OOP)
Python Libraries:
NumPy (Arrays and Linear Algebra)
Pandas (DataFrames, Data Cleaning, Merging)
Matplotlib & Seaborn (Data Visualization)
Scikit-learn (ML Models)
Statsmodels (Statistical Analysis)
MODULE 4: Data Wrangling & Preprocessing
Data Collection Techniques: APIs, Web Scraping, SQL
Handling Missing Values
Data Cleaning: Duplicates, Typos, Outliers
Data Transformation: Normalization, Standardization
Feature Engineering
Encoding Categorical Variables (One-Hot, Label Encoding)
Date/Time Handling
MODULE 5: Exploratory Data Analysis (EDA)
Univariate Analysis
Bivariate & Multivariate Analysis
Correlation Analysis
Boxplots, Histograms, Heatmaps
Detecting Outliers
Business Understanding from Data Patterns
MODULE 6: Machine Learning Fundamentals
Supervised Learning:
Linear Regression
Logistic Regression
Decision Trees and Random Forests
K-Nearest Neighbors (KNN)
Support Vector Machines (SVM)
Naive Bayes Classifier
Unsupervised Learning:
Clustering: K-means, Hierarchical, DBSCAN
Dimensionality Reduction: PCA, t-SNE
Model Evaluation:
Train-Test Split, Cross-Validation
Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC
Confusion Matrix
MODULE 7: Advanced Machine Learning & Deep Learning
Ensemble Methods: Bagging, Boosting (XGBoost, LightGBM)
Neural Networks (ANN)
CNNs (Computer Vision)
RNNs and LSTM (Time Series / NLP)
Deep Learning Frameworks: TensorFlow, Keras, PyTorch
MODULE 8: Time Series Analysis
Components: Trend, Seasonality, Noise
AR, MA, ARMA, ARIMA models
Forecasting Techniques
Exponential Smoothing
Prophet Model by Facebook
MODULE 9: Natural Language Processing (NLP)
Text Preprocessing: Tokenization, Stop Words, Stemming, Lemmatization
Bag of Words, TF-IDF
Word Embeddings: Word2Vec, GloVe
Sentiment Analysis
Topic Modeling: LDA
Transformer Models (BERT, GPT - intro)
MODULE 10: Data Visualization and Storytelling
Principles of Effective Data Visualization
Dashboards (Tableau / Power BI)
Plotly, Altair, and Streamlit for Interactive Visuals
Communicating Data Insights to Stakeholders
Building Infographics
MODULE 11: Big Data & Cloud Platforms
Introduction to Big Data: Characteristics and Use Cases
Hadoop Ecosystem (HDFS, MapReduce)
Apache Spark (PySpark for ML at scale)
Cloud Platforms:
AWS (S3, EC2, SageMaker)
Google Cloud (BigQuery)
Azure ML
MODULE 12: Data Engineering Essentials
ETL/ELT Pipelines
Relational Databases (MySQL, PostgreSQL)
NoSQL Databases (MongoDB)
Data Warehousing Concepts
Airflow for Scheduling Pipelines
CI/CD in Data Science
MODULE 13: MLOps & Deployment
Model Saving (Pickle, Joblib)
Deployment Tools: Flask, FastAPI, Docker
Model Monitoring and Retraining
Version Control (Git)
MLflow for Experiment Tracking
DevOps + Data Science Integration
MODULE 14: Ethics and Governance in Data Science
Data Privacy (GDPR, HIPAA)
Fairness and Bias in AI
Explainable AI (XAI)
Model Interpretability (LIME, SHAP)
Responsible AI Principles
🔹 TOOLS & SOFTWARE
Languages: Python, R, SQL
Libraries: NumPy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, NLTK, SpaCy
Visualization: Tableau, Power BI, Matplotlib, Plotly, Seaborn
Big Data: Hadoop, Spark, Kafka
Cloud: AWS, GCP, Azure
GST TECHNOLOGY