What is extreme phenotype sequencing? Taking STEPS to improve genome-wide association studies

Illustration of puzzle cube

A new statistical technique in genomic studies finds a more complete picture of the genetic variations associated with diseases and other traits.

Sequencing the genomes of patients to pinpoint disease-causing genes is a powerful technology that is leading to new treatments. Genome-wide association studies (GWAS) are able to find thousands of genetic variations associated with diseases and other traits.

Genetic origins of disease: finding a needle in a stack of needles

A major challenge in genome-wide association is designing epidemiological and clinical studies to include all the traits that offer clues to deciphering the genetic origins of a disease pathology. First, there is the problem of deciding which patients to sequence. Also, it’s difficult to identify the culprit genes when one disease is caused by multiple genetic mutations. It’s like trying to find a specific needle in a stack of needles. In the context of spotting gene mutations, you need to find more than just the obvious answer.

Secondary insights into genetic origins of diseases?

Let’s use searching for genes that cause high blood pressure as an example. To reduce the cost of whole sequencing, one way to narrow the number of patients would be to sequence the genomes of only the patients with the highest blood pressure and the patients with the lowest blood pressure. Called extreme phenotype sequencing, such data would offer insights into the genetics of high blood pressure. This approach has proven promising for detecting the complex genetic origins of diseases.

Besides blood pressure, additional secondary traits to consider could be cardiovascular disease or renal disease. Secondary trait association testing without considering the extreme phenotype sequencing design introduces the risk of false-positives, meaning some of those genetic variants identified may not have a valid association with the secondary traits. What if you create biostatistical methods that avoid false-positives?

A STEP in the valid and robust direction

Our methods enable researchers designing extreme phenotype sequencing studies to perform “secondary trait association testing.” This process produces the maximum likelihood that the patients who are chosen for sequencing based on a primary trait correlated with secondary traits offer the greatest chance of being genetically relevant to the study.

Published in the journal Biostatistics, our process is a novel secondary trait analysis method researchers can apply to achieve valid genetic association analysis. We dubbed our technique “STs under EPS designs,” or STEPS.

For this publication, we looked at a disorder called benign ethnic neutropenia, a natural condition of abnormally low white blood cells, as an example to evaluate the performance of STEPS.

Genetic studies of neutropenia use white blood cell count as a primary trait. For example, in its extreme phenotype sequencing effort, our project selected subjects with the lowest and highest white blood counts to decipher the genetics of neutropenia. What about platelet count, those cell fragments that cause clotting, based on these available data? We used different statistical methods to identify genetic variations associated with secondary traits that are correlated with a primary trait.

Our method found more genetic variants

In our demonstration, we used about 1,000 samples from people with benign ethnic neutropenia to show the performance of STEPS. We considered seven secondary traits, including platelet count, in the analysis. To test the methods, we divided the identified genetic variants into three groups with high, moderate and low possible correlation with secondary traits based on existing results on GWAS catalog. Our STEPS method identified more genetic variants that were highly possible correlated with secondary traits than the other methods. This demonstrated our method was more valid and robust for identifying genetic variants associated with secondary traits that are correlated with the disorder.

We made STEPS publicly available so any researcher can apply this process to their data and determine whether their secondary trait analysis is valid.

About the author

Guolian Kang, PhD, is an associate faculty member of the Biostatistics Department at St. Jude Children’s Research Hospital. View full bio.

More Articles From Guolian Kang

Related Posts

Drug which blocks stress granule formation offers insight into biomolecular condensates

Jeffery Klco and Juan Barajas
Jeffery Klco and Juan Barajas

Following the breadcrumbs: Gain-of-function mutation leads researchers to new therapeutic target for pediatric AML

Graphic depicting infant vaccination
Graphic depicting infant vaccination

Exploring the interplay between vaccines and immune system development in infants

Stay ahead of the curve