Projects
-
Video Segmentation Tool
Computer Vision
Overview: A semi-automatic video segmentation tool that accelerates the preparation of image segmentation data for video sequences using GUI-based annotation and AI propagation.
Technologies: Python, PyQt6, PyTorch, OpenCV, OSVOS (One-Shot Video Object Segmentation)
Key Features:
- • Manual annotation tools (pencil, polygon)
- • Semi-automatic label propagation using Optical Flow and OSVOS
- • Frame-by-frame navigation with multi-label support
- • MVC architecture for scalable design
Impact: Combines manual annotation capabilities with automated segmentation algorithms to significantly reduce manual labeling effort for video datasets.
-
Skin Cancer Detection
Computer Vision
Overview: A comprehensive medical AI system for skin cancer detection using Swin Transformer with parameter-efficient bottleneck adapters. Supports both binary classification (benign vs malignant) and multiclass classification (HAM10000 7-class dataset) for clinical-grade dermatoscopic image analysis.
Technologies: Python, PyTorch, Swin Transformer, Vision Transformer (ViT), Parameter-Efficient Fine-Tuning, Bottleneck Adapters, Medical Imaging
Key Features:
- • Parameter-efficient fine-tuning with bottleneck adapters
- • Swin Transformer and ViT architectures for medical imaging
- • Binary and 7-class classification capabilities
- • HAM10000 dataset integration for dermatoscopic analysis
Impact: Advances dermatological AI by combining efficient transfer learning techniques with state-of-the-art vision transformers for accurate skin cancer detection and classification in clinical settings.
-
Face Demography Analysis
Computer Vision
Overview: A real-time facial emotion detection system using YOLOv8 for face detection and FER for emotion analysis across 7 categories.
Technologies: Python, YOLOv8, OpenCV, Facial Expression Recognition (FER), ONNX
Key Features:
- • Real-time emotion recognition across 7 categories
- • Multiple input sources (images, videos, webcam)
- • Two-stage detection pipeline
- • Command-line interface with flexible parameters
Applications: Combines YOLOv8 face detection with emotion recognition algorithms for real-time video streams, suitable for demographic analysis and human-computer interaction applications.
-
Baseline Models for EEG Analysis
Signal Processing
Overview: A comprehensive framework for EEG signal analysis and classification, providing baseline implementations of classical machine learning models for the research community.
Technologies: Python, NumPy, SciPy, scikit-learn, PyTorch, MNE-Python, XGBoost
Key Features:
- • Support for multiple EEG datasets (CHB-MIT, BCI Competition 2a, LEE, Klinik)
- • Classical ML models with signal processing techniques
- • Modular architecture with cross-validation
- • Integration with Weights & Biases for experiment tracking
Impact: Provides standardized framework for EEG signal processing including epilepsy detection and motor imagery classification with reproducible baseline performance metrics across different datasets.
-
Vibration Noise Detection
Signal Processing
Overview: Signal processing system for detecting and analyzing vibration-induced noise patterns using machine learning techniques and acoustic signal analysis.
Technologies: Python, Signal Processing, Machine Learning, Audio Analysis, Spectral Analysis
Key Features:
- • Vibration pattern recognition and classification
- • Spectral analysis for noise characterization
- • Real-time signal processing capabilities
- • Anomaly detection in mechanical systems
Applications: Enables predictive maintenance and fault detection in mechanical systems by analyzing vibration signatures and identifying abnormal noise patterns for industrial monitoring.
-
Self-supervised EEG Embedding
Signal Processing
Overview: Learn meaningful EEG representations without labeled data across multiple tasks and datasets using self-supervised learning techniques.
Technologies: PyTorch Lightning, Python, Neural Networks, EEG Signal Processing
Key Features:
- • Self-supervised learning framework for EEG
- • Task-agnostic embeddings for multiple datasets
- • Modular neural architecture with "Minion Networks"
- • Cross-validation support for robust evaluation
Impact: Creates transferable embeddings across different neurological signal analysis tasks using auxiliary tasks like temporal context prediction and channel reconstruction.
-
EEG-LTENT
Signal Processing
Overview: EEG-LTENT (Learned Task-agnostic Embeddings for Neural Time-series) provides a foundation model approach for EEG signal analysis. Instead of training task-specific models from scratch, this framework learns universal EEG representations that can be fine-tuned for various downstream applications.
Technologies: Python, PyTorch, Conformer Architecture, Vector Quantization, Masked Autoencoding, Self-Supervised Learning, CNN-Transformer Hybrid
Key Features:
- • Task-agnostic learning: Pre-train once, adapt to many tasks (classification, regression, clustering)
- • Self-supervised approach using masked autoencoding on unlabeled EEG data
- • Discrete representations through vector quantization for interpretable embedding spaces
- • Universal embeddings that transfer across datasets, tasks, and subjects
Impact: Revolutionizes EEG analysis by creating a foundation model that learns from unlabeled data and transfers knowledge across different neurological tasks, significantly improving performance on limited labeled datasets.
-
Image Captioning
Computer Vision
Overview: A comprehensive image captioning system that automatically generates descriptive text for images using deep learning techniques and attention mechanisms.
Technologies: Python, PyTorch, Computer Vision, Natural Language Processing, CNN, RNN, Attention
Key Features:
- • Encoder-decoder architecture with attention mechanism
- • CNN-based image feature extraction
- • RNN-based text generation
- • Multi-modal learning approach
Impact: Bridges computer vision and natural language processing to generate human-like descriptions of visual content for accessibility and automated content description applications.
-
PDF-QA using LLM
NLP
Overview: A Python application that processes PDF documents and extracts structured information using Large Language Models, designed specifically for medical reports.
Technologies: OpenAI GPT, Meta's LLaMA, Python, PyTorch, Hugging Face Transformers, LangChain
Key Features:
- • Dual LLM support (GPT and LLaMA)
- • Automated PDF content extraction with structured output
- • Batch processing capabilities
- • Resource usage monitoring
Impact: Uses large language models to parse PDF files and extract patient information, test results, and panel summaries with flexible LLM backends and structured CSV/JSON outputs.
-
Persian Text Classification using GloVe
NLP
Overview: Classify ten classes of Persian text datasets using GloVe embedding to address unique challenges of Persian language processing in multi-class classification.
Technologies: Python, Jupyter Notebook, GloVe Embedding, Machine Learning
Key Features:
- • Multi-class Persian text classification framework
- • GloVe word embedding implementation for Persian
- • Comprehensive preprocessing pipeline
- • Specialized handling of Persian language characteristics
Applications: Leverages GloVe embeddings to convert Persian text into numerical representations for effective machine learning classification across ten distinct text categories.
-
Tumor Classification Using Machine Learning
Computer Vision
Overview: A comprehensive machine learning project for tumor subtype classification using mutation and copy number variation data from genomic analysis.
Technologies: Python, Scikit-learn, Pandas, NumPy, Jupyter Notebook, Deep Neural Networks
Key Features:
- • Classification of tumor subtypes (PDM vs SCM)
- • Multiple ML approaches including SVM, XGBoost, and few-shot learning
- • Feature selection and dimensionality reduction techniques
- • 5-fold cross-validation evaluation
Results: Achieved 69.17% accuracy with Logistic Regression using Tumor Mutation Burden, copy number variations, and missense mutations with metaheuristic optimization for feature selection.