DPhil Candidate, Genomic Medicine and Statistics Programme
Health resources such as the UK biobank (UKBB) offer a wide range of genotypic and phenotypic data on a large number of participants, allowing us to explore disease heterogeneity and comorbidity. Problematically, disease diagnoses are sometimes missing and often not of gold standard quality, making it impossible to confidently assess how diseases are associated with other healthcare data, and as such, to reveal their substructure. We therefore intend to infer gold standard disease diagnoses from other phenotypic data in the UKBB, using established genetic associations from cleaner data as a guide for the quality of our inference. Once such diagnoses are attained, we intend to explore disease substructure by identifying which clinical features are associated with multiple diseases simultaneously. Insights into disease substructure will help target both existing and new treatments to the forms of disease in which they are most likely to be effective.
I am also exploring disease substructure from a genetic (rather than a phenotypic) perspective, by finding genetic variants associated with different subtypes of irritable bowel syndrome, and comparing these to genetic associations found in related diseases.
Prior to this, I worked with Chris Wallace at the MRC Biostatistics Unit in Cambridge. There, I developed the R package Peaky to map physical, regulatory interactions between different genomic regions to aid in the functional interpretation of disease-associated variants within them.