TY - JOUR
T1 - Summix
T2 - A method for detecting and adjusting for population structure in genetic summary data
AU - Arriaga-MacKenzie, Ian S.
AU - Matesi, Gregory
AU - Chen, Samuel
AU - Ronco, Alexandria
AU - Marker, Katie M.
AU - Hall, Jordan R.
AU - Scherenberg, Ryan
AU - Khajeh-Sharafabadi, Mobin
AU - Wu, Yinfei
AU - Gignoux, Christopher R.
AU - Null, Megan
AU - Hendricks, Audrey E.
N1 - Publisher Copyright:
© 2021 American Society of Human Genetics
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
AB - Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
KW - allele frequency
KW - ancestry
KW - common controls
KW - external controls
KW - gnomAD
KW - population stratification
KW - population structure
KW - summary
UR - http://www.scopus.com/inward/record.url?scp=85108873414&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2021.05.016
DO - 10.1016/j.ajhg.2021.05.016
M3 - Article
C2 - 34157305
AN - SCOPUS:85108873414
SN - 0002-9297
VL - 108
SP - 1270
EP - 1282
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 7
ER -