Increasing Diversity in Genomic Research

We live in an age of ever-increasing data. In just about every field imaginable, data is being generated at a mind-blowing pace. Luckily, this data isn’t simply sitting around collecting dust, but instead, it’s being mined for meaningful insights that can fuel progress across our society.

This is particularly true in genomics as the technologies for sequencing genomes continue to advance. While sequencing the first genome cost billions of dollars, today a person can have their genome sequenced for only a few hundred dollars. Driving down the cost of sequencing has allowed the inclusion of more and more people, in turn accelerating the pace of genomic research. Yet genomic research also has a problem when it comes to its data — lack of diversity. 

This problem is particularly apparent when considering where — or, more specifically, from whom — most genomic data comes from. While people of European ancestry make up less than 25% of the global population, they represent the lion’s share of the participants in genetic research, specifically genome-wide association studies (or GWAS). This Euro-centric view also has its own diversity problem. From 2005 to 2018, just three countries contributed the majority of GWAS participants: the United States, the United Kingdom, and Iceland.

Figure 1. Representation of different ethnic groups in genomic datasets. From Morales et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog.

As a result, other ethnicities are vastly under-represented in genomic research. Asians, for example, account for 60% of the world’s population but only 11% of GWAS participants. And other ethnicities, including African, African American, Latino populations, represent just 4% of GWAS participants.  

This lack of diversity in genomic data is problematic. First and foremost, it reflects major disparities in how different ethnicities interact with — and benefit from — biomedical research. Sadly, such inequities are not new across medicine and science. 

The Euro-centric nature of genomic data is alarming for another reason: it stifles discovery by undermining scientists’ efforts to determine how genomic differences across populations contribute to health and disease. 

While humans are remarkably similar at the genomic level — two unrelated individuals share about 99.9% of their DNA sequences — the small percentage that does differ can hold vital genetic clues about our traits and health, from hair color to the risk of Alzheimer’s disease. 

For example, sickle cell disease is a blood disorder that disproportionately affects people with African ancestry. It is caused by a mutation in a protein abundant in red blood cells. Similarly, cystic fibrosis, also caused by mutations in a single protein, is more common in European populations. Of course, these conditions can — and do — affect individuals from other parts of the world, too. 

If scientists only study one group, they can miss important information that can influence the health of a large portion of the world’s population — like why asthma-related deaths are four to five times higher in people of African, Puerto Rican, and Mexican descent. A recent study discovered genetic variants in these populations that correspond with a decreased sensitivity to albuterol, a drug commonly found in inhalers, which could help explain the unusual severity of the disease in those populations.

At the same time, researchers are likely missing discoveries that can help not just the particular group being studied, but everyone. Consider a brand-new class of cholesterol-lowering drugs known as PCSK9 inhibitors. These drugs were developed because of the finding that a single, non-functional copy of the PCSK9 gene was associated with remarkably low levels of cholesterol — a finding that came from genomic studies of people with African ancestry.

At Nebula, we are deeply committed to increasing the diversity of genomic data and helping to ensure that everyone benefits from the insights that flow from genomic research. Importantly, our method for analyzing the human genome involves low-coverage whole-genome sequencing, which provides an unbiased sampling of the genome compared to microarray-based genotyping that tests a set of pre-determined genetic variants originally identified in Europeans. That means low-coverage whole-genome sequencing has the power to capture and make sense of the variation present within a genome regardless of a user’s ethnicity. Figure 2 shows how the accuracy of low-coverage whole-genome sequencing compares to microarray-based genotyping in different populations. We also published an extended comparison to other genetic testing options in an earlier blog post.


Figure 2. Whole-genome sequencing at 0.4x coverage produced significantly more accurate results than microarray-based genotyping for individuals of non-European ancestries. Adopted from the Gencove Blog.

The affordability of low-coverage whole-genome sequencing makes it a critical tool not just for uncovering the information within your genome, but also for ensuring a more equitable future for genomic research — and human health. You can order our sequencing kit today to begin your journey of learning more about your genes and the insights they can impart.

Share this post
Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on print
Share on email

Get your own kit today!

Contribute to medical breakthroughs and get rewarded. Understand your genes. Own your health data.