Built a data-driven framework to investigate tumor-specific signaling using large-scale phosphoproteomics data from CPTAC (~18k phosphosites, 165 paired samples).
Key contributions:
- Designed a paired tumor–normal statistical pipeline to reduce inter-patient variability
- Constructed a differential phosphosite co-regulation network using Spearman correlation
- Identified signaling modules via Louvain community detection
- Performed kinase enrichment analysis (KEA) to infer regulatory pathways
- Discovered a CDK-dominant signaling module and prioritized candidate “dark phosphosites” (ILF3, RFC1, NCL)
The entire workflow was implemented on Azure Machine Learning, enabling scalable computation and reproducible analysis of high-dimensional omics data.


