GST Technologies
Data Science Training

 

    🔹 DATA SCIENCE SYLLABUS

    MODULE 1: Introduction to Data Science

    • What is Data Science?
    • Data Science Lifecycle (Data Collection → Wrangling → Analysis → Modeling → Communication)
    • Roles: Data Scientist, Data Analyst, Data Engineer
    • Applications: Finance, Healthcare, Marketing, E-commerce, Social Media

    MODULE 2: Mathematics & Statistics for Data Science
      A. Linear Algebra
      • Vectors, Matrices, Tensors
      • Matrix Operations
      • Eigenvalues & Eigenvectors
      • Applications in ML (e.g., PCA)
      B. Calculus
      • Derivatives and Gradients
      • Partial Derivatives
      • Chain Rule
      • Applications in Optimization (e.g., Gradient Descent)
      C. Probability and Statistics
      • Descriptive Statistics: Mean, Median, Mode, Variance, Skewness
      • Probability Theory: Bayes Theorem, Conditional Probability
      • Distributions: Normal, Binomial, Poisson, Uniform
      • Hypothesis Testing: p-value, t-test, ANOVA, Chi-Square
      • Confidence Intervals and Z-scores
      • Central Limit Theorem

    MODULE 3: Programming for Data Science
    • Python or R (Primary Language)
    • Variables, Loops, Functions
    • List Comprehensions, Lambda Functions
    • Error Handling, File I/O
    • Object-Oriented Programming (OOP)
    • Python Libraries:
      • NumPy (Arrays and Linear Algebra)
      • Pandas (DataFrames, Data Cleaning, Merging)
      • Matplotlib & Seaborn (Data Visualization)
      • Scikit-learn (ML Models)
      • Statsmodels (Statistical Analysis)

    MODULE 4: Data Wrangling & Preprocessing
    • Data Collection Techniques: APIs, Web Scraping, SQL
    • Handling Missing Values
    • Data Cleaning: Duplicates, Typos, Outliers
    • Data Transformation: Normalization, Standardization
    • Feature Engineering
    • Encoding Categorical Variables (One-Hot, Label Encoding)
    • Date/Time Handling

    MODULE 5: Exploratory Data Analysis (EDA)
    • Univariate Analysis
    • Bivariate & Multivariate Analysis
    • Correlation Analysis
    • Boxplots, Histograms, Heatmaps
    • Detecting Outliers
    • Business Understanding from Data Patterns

    MODULE 6: Machine Learning Fundamentals
      Supervised Learning:
      • Linear Regression
      • Logistic Regression
      • Decision Trees and Random Forests
      • K-Nearest Neighbors (KNN)
      • Support Vector Machines (SVM)
      • Naive Bayes Classifier
      Unsupervised Learning:
      • Clustering: K-means, Hierarchical, DBSCAN
      • Dimensionality Reduction: PCA, t-SNE
      Model Evaluation:
      • Train-Test Split, Cross-Validation
      • Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC
      • Confusion Matrix

    MODULE 7: Advanced Machine Learning & Deep Learning
    • Ensemble Methods: Bagging, Boosting (XGBoost, LightGBM)
    • Neural Networks (ANN)
    • CNNs (Computer Vision)
    • RNNs and LSTM (Time Series / NLP)
    • Deep Learning Frameworks: TensorFlow, Keras, PyTorch

    MODULE 8: Time Series Analysis
    • Components: Trend, Seasonality, Noise
    • AR, MA, ARMA, ARIMA models
    • Forecasting Techniques
    • Exponential Smoothing
    • Prophet Model by Facebook

    MODULE 9: Natural Language Processing (NLP)
    • Text Preprocessing: Tokenization, Stop Words, Stemming, Lemmatization
    • Bag of Words, TF-IDF
    • Word Embeddings: Word2Vec, GloVe
    • Sentiment Analysis
    • Topic Modeling: LDA
    • Transformer Models (BERT, GPT - intro)

    MODULE 10: Data Visualization and Storytelling
    • Principles of Effective Data Visualization
    • Dashboards (Tableau / Power BI)
    • Plotly, Altair, and Streamlit for Interactive Visuals
    • Communicating Data Insights to Stakeholders
    • Building Infographics

    MODULE 11: Big Data & Cloud Platforms
    • Introduction to Big Data: Characteristics and Use Cases
    • Hadoop Ecosystem (HDFS, MapReduce)
    • Apache Spark (PySpark for ML at scale)
    • Cloud Platforms:
      • AWS (S3, EC2, SageMaker)
      • Google Cloud (BigQuery)
      • Azure ML

    MODULE 12: Data Engineering Essentials
    • ETL/ELT Pipelines
    • Relational Databases (MySQL, PostgreSQL)
    • NoSQL Databases (MongoDB)
    • Data Warehousing Concepts
    • Airflow for Scheduling Pipelines
    • CI/CD in Data Science

    MODULE 13: MLOps & Deployment
    • Model Saving (Pickle, Joblib)
    • Deployment Tools: Flask, FastAPI, Docker
    • Model Monitoring and Retraining
    • Version Control (Git)
    • MLflow for Experiment Tracking
    • DevOps + Data Science Integration

    MODULE 14: Ethics and Governance in Data Science
    • Data Privacy (GDPR, HIPAA)
    • Fairness and Bias in AI
    • Explainable AI (XAI)
    • Model Interpretability (LIME, SHAP)
    • Responsible AI Principles

     

    🔹 TOOLS & SOFTWARE
    • Languages: Python, R, SQL
    • Libraries: NumPy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, NLTK, SpaCy
    • Visualization: Tableau, Power BI, Matplotlib, Plotly, Seaborn
    • Big Data: Hadoop, Spark, Kafka
    • Cloud: AWS, GCP, Azure

GST TECHNOLOGY