Sarwan Ali

Postdoctoral Research Scientist

Taub Institute  ·  Columbia University Irving Medical Center  ·  New York, NY

I build machine learning methods for biological sequence data — protein function, variant interpretation, and the genetic architecture of aging-related disease.

Currently a postdoc with Giuseppe Tosto at Columbia, where I work on multi-ancestry polygenic risk modeling and longitudinal cognitive trajectories across the ADSP-R5, MESA, FHS, MHAS, and U19 aging cohorts. Previously, I completed my PhD in Computer Science at Georgia State University with Murray Patterson, with research stints at IBM Research, Bosch, Boston College, and Emory.

Open to collaboration on sequence-based deep learning for regulatory genomics, variant interpretation, and the biology of aging. Please reach out.
Sarwan Ali

Research

four threads

Sequence → Function

Transformer encoders, protein language models (ESM-2, ProtBERT), chaos-game representations, and hashing-based sketches for protein and DNA sequence analysis at scale.

Variant Interpretation

Calibration and difficulty-class evaluation of missense pathogenicity predictors (AlphaMissense, EVE, ESM1b); multi-ancestry polygenic risk modeling for Alzheimer's disease.

Aging & Cognition

Longitudinal trajectories, polygenic architecture, and population genetics of aging-related cognitive decline across five major cohorts (ADSP, MESA, FHS, MHAS, U19).

ML Methods

Training dynamics of Transformers, conformal prediction under contamination, optimizer stability theory, and rigorous benchmarking practices for applied ML.

Affiliation

2021 → present
Jan 2025 —
present
Postdoctoral Research Scientist
Columbia University Irving Medical Center · Taub Institute
Advisor: Giuseppe Tosto · REC Scholar (Alzheimer's Disease Research Center)
Jan 2021 —
Dec 2024
PhD, Computer Science
Georgia State University · Atlanta, GA
Advisor: Murray Patterson · Dean Dissertation Grant · MBD Fellowship · U.S. Patent
Jun — Oct
2024
Research Specialist (Biomedical Informatics)
Emory University · Atlanta, GA
Advisor: Selen Bozkurt. Young lung-cancer relapse prediction from EHR data.
Aug — Dec
2022
Visiting Research Scholar
Boston College · Newton, MA
Advisor: José Bento. 3D geometric ML for protein-aptamer interactions.
May — Aug
2022
Knowledge Engineering Intern
Bosch · Sunnyvale, CA
Advisor: Hyeongsik Kim. Sequence + interrupt patterns in manufacturing data → U.S. patent.
Sep 2021 —
May 2022
Research Collaborator
IBM T. J. Watson Research Center · AI Foundations Group
Advisor: Pin-Yu Chen. ML robustness for SARS-CoV-2 genome classification → Nature Scientific Reports.

Selected Publications

all 65+ →
2026

Murmur2Vec: A Theoretically-Grounded Hashing-Based Embedding for Large-Scale Biological Sequence Analysis

Sarwan Ali

Bioinformatics · under review

k-mer spectrum kernel sketching with closed-form bias/variance, Johnson–Lindenstrauss concentration, and excess-risk bounds. Scales sequence ML to multi-million-sequence regimes.

sequence DL theory
2023

Benchmarking Machine Learning Robustness in COVID-19 Genome Sequence Classification

Sarwan Ali, Bikram Sahoo, Alexander Zelikovsky, Pin-Yu Chen, Murray Patterson

Nature Scientific Reports · IF 4.9 · paper

robustness sequence DL
2024

Molecular Sequence Classification Using Efficient Kernel-Based Embedding

Sarwan Ali, Tamkanat E. Ali, Taslim Murad, Haris Mansoor, Murray Patterson

Information Sciences · IF 8.1 · paper

sequence DL kernel methods

Recent news

archive →
Oct 2025
Became a Research and Education Core (REC) Scholar at Columbia University's Alzheimer's Disease Research Center.
Oct 2025
Paper Nearest Neighbor CCP-Based Molecular Sequence Analysis accepted at IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).
Aug 2025
Paper Anderson Acceleration For Molecular Sequencing accepted at Springer Neural Computing & Applications (IF 5.1).
Aug 2025
Paper Explicit Path CGR: Maintaining Sequence Fidelity in Geometric Representations accepted at ACM CIKM (CORE Rank A).
Jan 2025
Started as Postdoctoral Research Scientist at Columbia University Irving Medical Center (Taub Institute), working with Giuseppe Tosto.
Dec 2024
Defended my PhD at Georgia State University. Thesis advisor: Murray Patterson.