Synthetic Data Generation for Network Intrusion Detection (NIDS)
Developed a boxplot-constrained CT-GAN-based synthetic data generator to address data imbalance in NIDS using 100% synthetic data. Achieved impressive accuracy rates: 96.21% (GB), 96.19% (RF via scalable PySpark pipeline), 96.10% (DT), and 93.59% (SVM) on CIC-IDS2017 dataset.
Duration: Jan 2025 – Mar 2025
Technologies: Python, PySpark, Pandas, NumPy, CTGAN, Scikit-learn, SciPy, Matplotlib, Seaborn
Key Features: Applied Random Forest-based feature selection and undersampling to improve model efficiency. Validated fidelity using PCA, boxplots, and classifier performance.

