Diversity in Current Genomic Research: Complications and Implications of Improvement

Krishna Jaladanki
5 min readApr 24, 2021
A recent study mapped out genetic diversity across the world. The red indicates high diversity, while the blue indicates low diversity (14).

Although all humans are 99.9% genetically identical, even the 0.1% can cause drastic differences among us, as depicted in the image above. To compare and contrast the genetics of the human race, researchers use genome-wide association studies (GWAS), which are an essential component of genetic research. GWAS analyze genetic data from many individuals and associate specific genetic variations with particular diseases. Physicians can then use these associations for genetic screens. GWAS have multiple benefits, but they have one major flaw: 78% of the data present in genomic databases is derived from individuals of European descent. As of 2019, only 4% of the data present in these studies is from individuals of non-European or non-Asian descent (9).

Percentage breakdown of ancestral representation in GWAS studies (15)

The high prevalence of European populations in GWAS causes genetic findings to be biassed towards this group since distinct population-specific factors impact an individual’s phenotypic expression, including modifier genes, allelic variation, and environmental factors (4). For cystic fibrosis (CF), a genetic pulmonary disease, the causative allele ΔF508 is found in 70% of CF cases in European populations. However, this allele only accounts for 29% of CF cases in people of African descent (11).

The bias towards individuals of European descent in GWAS limit the generalizability of study findings. The same mutation in both a European and non-European subject may not cause the same disease. Conversely, different genetic mutations in different populations may cause the same genetic disorder (7). An indicator of this data’s usefulness in other populations is polygenic risk scores (PRS), which demonstrate the estimated effect of genetic variants on an individual’s phenotype. PRS have a 4.5 times higher prediction accuracy for European individuals than African individuals and two times higher accuracy than Asian individuals (3).

Skewed data sets in GWAS also miss important gene-disease relationships that are prominent in non-European populations. For example, GWAS are used to find single-nucleotide polymorphisms (SNPs), which are mutations present in a significant portion of a specific population. SNPs help researchers locate genes associated with diseases (5). However, without genetically diverse data to discover SNPs in non-European populations, scientists are unaware of potentially useful markers to recognize gene-disease relationships. SNPs are also associated with drug metabolism, which could affect drug efficacy and resistance (12). The absence of data to find SNPs in non-European ancestries could lead to complications in the medical setting.

There are several potential benefits of implementing a more ethnically diverse population sample in genetic studies, including novel insight into human traits and diseases, improvements in healthcare, and an enhanced understanding of health disparities. Already, organizations such as TOPmed and PAGE II have made tremendous strides to increase the percentage of non-European ancestries in GWAS. This new inclusion has allowed for comparative analysis across different populations, leading to the discovery of new causative loci for various diseases, such as obesity, diabetes, and cleft palate (1). If this trend continues, researchers will be able to further identify more loci for additional genetically influenced diseases, leading to improved genetic screens relevant to any given ancestry (13).

In medicine, a lack of diversity in GWAS inhibits the utility of precision medicine efforts in non-Europeans. These individuals are more likely to receive ambiguous genetic test results and false-positive/negative diagnoses due to missing information about disease-causing variants. The inclusion of this information would likely decrease the number of false diagnoses, which would help patients receive more effective treatment. PRS would also become more accurate with increased population representation, which would allow doctors to better predict the risk of disease in a non-European individual (3). Another field that holds potential for improvement is pharmacogenetics. An increase in diversity would allow researchers to find new links between genetic variants and drug safety/efficacy in non-European patients. Based on this information, doctors would be able to prescribe medicine that is the safest and most effective for each patient (8).

Finally, researchers could find disease relationships that disproportionately affect particular ancestries through epigenetics. Epigenetics is the study of sociodemographic factors that impact an individual’s gene expression (6). Certain lineages, such as African Americans, are more likely to experience adverse situations during childhood, which may induce epigenetic changes. These changes may cause long-term impacts on gene expression and could possibly lead to disease onset (1). Studying epigenetics with a more diverse population representation would give scientists more insight into the relationship between sociodemographic factors and disease in non-European populations.

It is evident that the current ancestral representation in GWAS limits scientists’ understanding of genetics and genetic diseases. The inclusion of a more diverse representation in genetic studies holds tremendous potential to benefit our understanding of human genetics and will improve our ability to recognize and combat genetic diseases in all ancestries.

The solution for diversity in genomic research is in our “hands”


  1. Aroke, Edwin N et al. “Could epigenetics help explain racial disparities in chronic pain?.” Journal of pain research vol. 12 701–710. 18 Feb. 2019, doi:10.2147/JPR.S191848
  2. Bentley, Amy R et al. “Evaluating the promise of inclusion of African ancestry populations in genomics.” NPJ genomic medicine vol. 5 5. 25 Feb. 2020, doi:10.1038/s41525–019–0111-x
  3. Broad Institute of MIT and Harvard. “Need to increase diversity within genetic data sets: Diversifying population-level genetic data beyond Europeans will expand the power of polygenic scores.” ScienceDaily. ScienceDaily, 29 March 2019. <www.sciencedaily.com/releases/2019/03/190329134743.htm>.
  4. “Genetics for All.” Nature News, Nature Publishing Group, 29 Mar. 2019, www.nature.com/articles/s41588-019-0394-y#citeas.
  5. Guan, Boxin et al. “Detecting Disease-Associated SNP-SNP Interactions Using Progressive Screening Memetic Algorithm.” IEEE/ACM transactions on computational biology and bioinformatics, vol. PP 10.1109/TCBB.2020.3019256. 28 Aug. 2020, doi:10.1109/TCBB.2020.3019256
  6. Kaliman, Perla. “Epigenetics and meditation.” Current opinion in psychology vol. 28 (2019): 76–80. doi:10.1016/j.copsyc.2018.11.010
  7. Kammenga, Jan E. “The background puzzle: how identical mutations in the same gene lead to different disease symptoms.” The FEBS journal vol. 284,20 (2017): 3362–3373. doi:10.1111/febs.14080
  8. Landry, Latrice G et al. “Lack Of Diversity In Genomic Databases Is A Barrier To Translating Precision Medicine Research Into Practice.” Health affairs (Project Hope) vol. 37,5 (2018): 780–785. doi:10.1377/hlthaff.2017.1595
  9. Morales, Joannella et al. “A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog.” Genome biology vol. 19,1 21. 15 Feb. 2018, doi:10.1186/s13059–018–1396–2
  10. Peterson, Roseann E et al. “Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations.” Cell vol. 179,3 (2019): 589–603. doi:10.1016/j.cell.2019.08.051
  11. Sirugo, Giorgio, et al. “The Missing Diversity in Human Genetic Studies.” Cell, vol. 177, no. 1, 21 Mar. 2019, doi:https://doi.org/10.1016/j.cell.2019.02.048.
  12. Wishart, David S et al. “DrugBank 5.0: a major update to the DrugBank database for 2018.” Nucleic acids research vol. 46,D1 (2018): D1074-D1082. doi:10.1093/nar/gkx1037
  13. Wojcik, Genevieve L et al. “Genetic analyses of diverse populations improves discovery for complex traits.” Nature vol. 570,7762 (2019): 514–518. doi:10.1038/s41586–019–1310–4
  14. Jakobsen, R. K. (2017, January 7). Global genetic diversity mapped by new study. ScienceNordic. https://sciencenordic.com/biodiversity-denmark-dna/global-genetic-diversity-mapped-by-new-study/1440946.
  15. Wu, K. J. (2019, March 21). Lack of diversity in genetic research could be costing us our health. PBS. https://www.pbs.org/wgbh/nova/article/lack-diversity-genetic-research-could-be-costing-us-our-health/.