Projects - Milad Soleymani

Video Segmentation Tool

Computer Vision

Overview: A semi-automatic video segmentation tool that accelerates the preparation of image segmentation data for video sequences using GUI-based annotation and AI propagation.

Technologies: Python, PyQt6, PyTorch, OpenCV, OSVOS (One-Shot Video Object Segmentation)

Key Features:

• Manual annotation tools (pencil, polygon)
• Semi-automatic label propagation using Optical Flow and OSVOS
• Frame-by-frame navigation with multi-label support
• MVC architecture for scalable design

Impact: Combines manual annotation capabilities with automated segmentation algorithms to significantly reduce manual labeling effort for video datasets.

View on GitHub

Skin Cancer Detection

Computer Vision

Overview: A comprehensive medical AI system for skin cancer detection using Swin Transformer with parameter-efficient bottleneck adapters. Supports both binary classification (benign vs malignant) and multiclass classification (HAM10000 7-class dataset) for clinical-grade dermatoscopic image analysis.

Technologies: Python, PyTorch, Swin Transformer, Vision Transformer (ViT), Parameter-Efficient Fine-Tuning, Bottleneck Adapters, Medical Imaging

Key Features:

• Parameter-efficient fine-tuning with bottleneck adapters
• Swin Transformer and ViT architectures for medical imaging
• Binary and 7-class classification capabilities
• HAM10000 dataset integration for dermatoscopic analysis

Impact: Advances dermatological AI by combining efficient transfer learning techniques with state-of-the-art vision transformers for accurate skin cancer detection and classification in clinical settings.

View on GitHub

Face Demography Analysis

Computer Vision

Overview: A real-time facial emotion detection system using YOLOv8 for face detection and FER for emotion analysis across 7 categories.

Technologies: Python, YOLOv8, OpenCV, Facial Expression Recognition (FER), ONNX

Key Features:

• Real-time emotion recognition across 7 categories
• Multiple input sources (images, videos, webcam)
• Two-stage detection pipeline
• Command-line interface with flexible parameters

Applications: Combines YOLOv8 face detection with emotion recognition algorithms for real-time video streams, suitable for demographic analysis and human-computer interaction applications.

View on GitHub

Baseline Models for EEG Analysis

Signal Processing

Overview: A comprehensive framework for EEG signal analysis and classification, providing baseline implementations of classical machine learning models for the research community.

Technologies: Python, NumPy, SciPy, scikit-learn, PyTorch, MNE-Python, XGBoost

Key Features:

• Support for multiple EEG datasets (CHB-MIT, BCI Competition 2a, LEE, Klinik)
• Classical ML models with signal processing techniques
• Modular architecture with cross-validation
• Integration with Weights & Biases for experiment tracking

Impact: Provides standardized framework for EEG signal processing including epilepsy detection and motor imagery classification with reproducible baseline performance metrics across different datasets.

View on GitHub

Vibration Noise Detection

Signal Processing

Overview: Signal processing system for detecting and analyzing vibration-induced noise patterns using machine learning techniques and acoustic signal analysis.

Technologies: Python, Signal Processing, Machine Learning, Audio Analysis, Spectral Analysis

Key Features:

• Vibration pattern recognition and classification
• Spectral analysis for noise characterization
• Real-time signal processing capabilities
• Anomaly detection in mechanical systems

Applications: Enables predictive maintenance and fault detection in mechanical systems by analyzing vibration signatures and identifying abnormal noise patterns for industrial monitoring.

View on GitHub

Self-supervised EEG Embedding

Signal Processing

Overview: Learn meaningful EEG representations without labeled data across multiple tasks and datasets using self-supervised learning techniques.

Technologies: PyTorch Lightning, Python, Neural Networks, EEG Signal Processing

Key Features:

• Self-supervised learning framework for EEG
• Task-agnostic embeddings for multiple datasets
• Modular neural architecture with "Minion Networks"
• Cross-validation support for robust evaluation

Impact: Creates transferable embeddings across different neurological signal analysis tasks using auxiliary tasks like temporal context prediction and channel reconstruction.

View on GitHub

EEG-LTENT

Signal Processing

Overview: EEG-LTENT (Learned Task-agnostic Embeddings for Neural Time-series) provides a foundation model approach for EEG signal analysis. Instead of training task-specific models from scratch, this framework learns universal EEG representations that can be fine-tuned for various downstream applications.

Technologies: Python, PyTorch, Conformer Architecture, Vector Quantization, Masked Autoencoding, Self-Supervised Learning, CNN-Transformer Hybrid

Key Features:

• Task-agnostic learning: Pre-train once, adapt to many tasks (classification, regression, clustering)
• Self-supervised approach using masked autoencoding on unlabeled EEG data
• Discrete representations through vector quantization for interpretable embedding spaces
• Universal embeddings that transfer across datasets, tasks, and subjects

Impact: Revolutionizes EEG analysis by creating a foundation model that learns from unlabeled data and transfers knowledge across different neurological tasks, significantly improving performance on limited labeled datasets.

View on GitHub

Image Captioning

Computer Vision

Overview: A comprehensive image captioning system that automatically generates descriptive text for images using deep learning techniques and attention mechanisms.

Technologies: Python, PyTorch, Computer Vision, Natural Language Processing, CNN, RNN, Attention

Key Features:

• Encoder-decoder architecture with attention mechanism
• CNN-based image feature extraction
• RNN-based text generation
• Multi-modal learning approach

Impact: Bridges computer vision and natural language processing to generate human-like descriptions of visual content for accessibility and automated content description applications.

View on GitHub

PDF-QA using LLM

NLP

Overview: A Python application that processes PDF documents and extracts structured information using Large Language Models, designed specifically for medical reports.

Technologies: OpenAI GPT, Meta's LLaMA, Python, PyTorch, Hugging Face Transformers, LangChain

Key Features:

• Dual LLM support (GPT and LLaMA)
• Automated PDF content extraction with structured output
• Batch processing capabilities
• Resource usage monitoring

Impact: Uses large language models to parse PDF files and extract patient information, test results, and panel summaries with flexible LLM backends and structured CSV/JSON outputs.

View on GitHub

Persian Text Classification using GloVe

NLP

Overview: Classify ten classes of Persian text datasets using GloVe embedding to address unique challenges of Persian language processing in multi-class classification.

Technologies: Python, Jupyter Notebook, GloVe Embedding, Machine Learning

Key Features:

• Multi-class Persian text classification framework
• GloVe word embedding implementation for Persian
• Comprehensive preprocessing pipeline
• Specialized handling of Persian language characteristics

Applications: Leverages GloVe embeddings to convert Persian text into numerical representations for effective machine learning classification across ten distinct text categories.

View on GitHub

Tumor Classification Using Machine Learning

Computer Vision

Overview: A comprehensive machine learning project for tumor subtype classification using mutation and copy number variation data from genomic analysis.

Technologies: Python, Scikit-learn, Pandas, NumPy, Jupyter Notebook, Deep Neural Networks

Key Features:

• Classification of tumor subtypes (PDM vs SCM)
• Multiple ML approaches including SVM, XGBoost, and few-shot learning
• Feature selection and dimensionality reduction techniques
• 5-fold cross-validation evaluation

Results: Achieved 69.17% accuracy with Logistic Regression using Tumor Mutation Burden, copy number variations, and missense mutations with metaheuristic optimization for feature selection.

View on GitHub