To ensure stable and interpretable feature selection in RNA-seq data for classifying kidney disease progression stages. The project emphasizes the importance of reproducibility in biomedical machine learning pipelines by assessing the consistency of selected features under different sampling conditions.
Stability refers to how consistently a feature (gene) is selected when feature selection is repeated on different subsamples of the dataset.
**Reliability:** Ensures selected genes truly reflect the underlying biology, not noise or random chance.
Interpretability: Builds trust in the features used for predictions.
Reproducibility: Key for scientific research and clinical decision-making.
**Domain:** Gene expression (RNA-seq)
**Use Case:** Classification of kidney disease into four subtypes:
Early Progressive
Early Stable
Late Progressive
Late Stable
Standard feature selection methods may identify different gene sets each time they're run on slightly different data, which undermines model reliability.
Goal: Find genes that are consistently informative, not just occasionally selected.