TxArray Design
Want to know more about TxArray Coverage? Click here to skip down the page.
What is the TxArray?
Designed by the iGeneTRAIN consortium, the Tx GWAS Array (using the Axiom Affymetrix technology) is a genotyping array developed to study genomic loci relevant to transplantation. It provides comprehensive, cost-effective and accurate coverage of approximately 782,000 transplant markers. Content is customized for deep capture of variants across HLA, KIR, pharmacogenomic and metabolic loci.
Why was the panel developed?
The Tx GWAS Array was designed to identify genomic associations that will help improve patient outcome in transplantation. It is targeted toward developing biomarkers for matching donor-recipient pairs and minimizing complications associated with immunosuppressant drugs. As discussed immediately below, it is designed to be accurate and comprehensive.
Does it perform well?
Our benchmarks for performance are accuracy and comprehensiveness.
Accuracy: In a study by Li et al., we investigated concordance between 279,061 genotypes common to the TxArray and HapMap2 (r22, b36). We tested 22,944,075 sample-SNP combinations for concordance and observed a concordance rate of 0.996. Concordance rates for the three populations (African, Asian, and European) were very similar.
As our array is specifically set up to test MHC and X chromosome SNPs, we also tested SNPs in these two regions. Results are comparable to the overall concordance: The concordance rate for the MHC SNPs is 0.994, and 0.998 for SNPs on the X chromosome. We also performed this analysis using data from the 1000 Genomes Project reference panel and observed comparably high concordance rates.
Overall, we show that the genotyping quality of the TxArray is high, which enables accurate association testing, as well as imputation of ungenotyped SNPs.
Comprehensiveness: The ~782,000 markers on the TxArray can be sub-divided into five categories, all of which are comprehensively covered. These include the following, which are summarized below
- A. Cross-Platform “Cosmopolitan” Genome-wide Coverage markers (~350,000 markers)
- o Compatibility markers (~18K markers)
- o Additional Coverage for non-European populations (~50,000 markers)
- B. Module-specific content from the UK Biobank Core Array (~36K markers)
- o Targeted MHC and Transplant-specific modules
- o MHC and KIR content for fine-mapping and imputation
- C. Transplant-specific Content
- o Pharmacogenomic (~ 9,500 SNPs)
- o Candidate Genes associated with Transplant Outcomes (~91,900 polymorphic sites)
- D. Functional Variants Modules
- o Affymetrix Biobank array content ( ~250,000 SNPs)
- o Human Gene Mutation Database (~3,571 variants)
- o Additional LoF variants (~ 8,500 unique putative exonic SNVs)
- o Untranslated Region (UTR) Coverage (~184,000 SNPs)
- o A priori Associations To focus on known phenotypes, 8,136 SNPs
- o Copy Number Variations (CNVs) and Polymorphisms (CNPs) (~2,200 manually curated CNV regions).
- E. GWAS Booster (~135,363 additional markers)
What is the technology?
The Tx GWAS Array is designed using Affymetrix Genotyping Platform and Assay Technology. The Axiom genotyping platform utilizes a two color, ligation-based assay using 30-mer Oligonucleotide probes synthesized in situ onto a microarray substrate. There are ~1.38 million features available for experimental content with each feature ~3 μm2 with each SNP feature contains a unique oligonucleotide sequence complementary to the sequence flanking the polymorphic site on either the forward or the reverse strand. Solution probes bearing attachment sites for one of two dyes, depending on the 3′ (SNP-site) base (A or T, versus C or G) are hybridized to the target complex, followed by ligation for specificity.
Can you use a different GWAS array?
Yes – our imputation pipeline facilitates all conventional GWAS array (we have used over 10,000 samples from additional GWAS platform).
What loci are covered in the TxArray?
The transplant-specific modules and genome-wide content for the Tx GWAS Array was designed based on a tiered system built on the main Affymetrix GWAS imputation grids for the major human populations as defined by the HapMap Project and subsequent high density population reference studies yielding high density genomic datasets including representative individuals of European ancestry (Utah residents with ancestry from Northern and Western Europe [CEU]), of Asian descent (Japanese from Tokyo [JPT], Japan and Han Chinese from Beijing [CHB], China), and of African ancestry (Yoruba in Ibadan, Nigeria) [YRI] and Americans of African Ancestry in SouthWest [ASW]). In addition to this core content, additional modules of SNPs were added sequentially so that maximal economy of markers was retained by ensuring no redundant SNPs were added. We describe the tiers sequentially below:
Genome-wide Imputation Grid (~296K markers): The TxArray’s core imputation grid consists of genome-wide ~296K SNPs shared in common with the conventional Affymetrix Biobank Array. These include a set of 246K SNPs, also included in the UK Biobank array, that provide high-density coverage (mean r2 > 0.81 and 0.90) across European populations (CEU) at minor allele frequencies (MAFs) > 1% and 5%, respectively.
Additional Coverage for non-European populations: An additional set of ~50K SNPs, covered in the 1KGP Phase I reference panel, were additionally extracted from the Affymetrix-Biobank array to improve the mean coverage achieved in African and other populations. These SNPs were chosen with the goal of achieving comprehensive overlap with already existing UK Biobank Axiom Array and the Axiom Biobank Array, to facilitate additional collaborative efforts where joint or meta-analyses of samples genotyped across these platforms and other conventional GWAS platforms are required.
Compatibility markers (~18K markers): This module was designed to optimize and standardize genotyping quality control (QC) and sample validation through the use of: Polymorphisms capturing Ancestry informative markers (AIMs); fingerprinting panels; mitochondrial, Y-chromosome; and miRNA binding sites or targets regions were included.
These constitute markers identified based on reported GWAS signals and candidate gene associations across pharmacogenomic and metabolic phenotypes. Again, to enable cross-platform analysis, where feasible, we also included markers directly overlapping the UK-Biobank array and additional markers for the transplant-specific content. The following UK-Biobank array modules were included:
- HLA and KIR region markers (7,348 and 1,546 variants respectively)
- Known phenotype associations curated by the National Human Genome Research Institute (NHGRI) GWAS Catalog, (8,136 variants)
- Known CNVs (2,369 variants)
- Expression-quantitative trail loci, or eQTLs (17,115 variants)
- Lung-tissue specific or pulmonary function-associated markers (8,645 variants)
Targeted MHC and Transplant-specific modules: Specific modular content incorporated in the array dedicated to address transplant community research goals. Aside from the above-described modules overlapping with the UK-Biobank array, we expanded modules dedicated to non-HLA MHC region markers, deep coverage of known and predicted LoF variants, and untranslated regions (UTR)-specific module. Note that all positions and variants referenced herein are based on the human genome builds hg19/build37 .
MHC and KIR content for fine-mapping and imputation: The TxArray provides the most current and densest coverage of the extended MHC [Chr 6:25.5MB to 34MB hg19/build37]. While the UK-Biobank array includes dense HLA-specific coverage, a number of MHC genes and markers mapping variants outside of theHLA-encoding regions are critical players in immune function and some have known roles in histocompatibility (e.g. MICA, MICB). Thus, we included a comprehensive set of MHC markers in addition to the conventional HLA-coding regions.
Additionally, given the important role of KIR in allo-recognition through its interaction with HLA, we included additional KIR SNPs to enable fine-mapping, imputation, and structural variation association analysis, as well as interaction analyses across KIR and HLA Class I, which has a known role in histocompatibility in HCT, as well as other MHC loci.
To build this content and attempt to preserve significant overlap with state-of-the-art, popular genotyping platforms, we curated and included in our design content from the following resources and platforms:
- UK-Biobank array (8,894 total variants), including 7,348 HLA markers and 1,546 KIR markers.
- Multiethnic HLA haplotype tagging SNPs (421 SNPs).
- The Type 1 Diabetes Genetic Consortium (T1DGC) Imputation panel (4,794 SNPs included directly tiling or tagging by LD those SNPs in the HLA imputation panel for SNP2HLA ).
- Non-redundant MHC validated SNPs from existing genotyping platforms used in large-scale studies: (i) Metabochip (1,123 SNPs) and (ii) Immunochip (12,609 variants).
The content above includes 10,820 non-redundant SNP markers. We maximized the coverage of this content using a non-redundant set of best-tagging variants to achieve satisfactory tagging of the major HapMap continental populations including African (ASW and YRI), European (CEU) and Asian (CHD, JPT, CHB).
Pharmacogenomic: Drug Absorption, Metabolism, Excretion and Toxicity markers, n ~ 7,500 SNPs including markers derived from PharmGKB. As these SNPs were of key relevance to this array, we also included at least one or more tagging SNPs to cover those common variants present in the 1KGP database. Literature searching was also performed (see below) for serious adverse events and pharmacogenomics studies related to IST and other therapeutics relating to transplantation. Previous candidate gene/pathway genotyping results from the DeKAF study were also included (n~2,000 SNPs).
Candidate Genes associated with Transplant Outcomes: Over 600 transplantation-related genetic association studies were curated from PubMed repository using approaches outlined in the Supplementary Materials. To maximize the coverage in the CEU, YRI, and ASN populations, we selected an additional non-redundant set of 23.8K variants to boost coverage the total of 91.9K polymorphic sites included in these loci. The SNPs were chosen based on an algorithm that attempts to maximize the expected mean coverage across all three populations simultaneously instead of one at a time by selecting the tagging SNP marker that tags most SNP markers from all three populations first; this strategy enables identification of minimal SNP sets for maximal cross-ethnic coverage.
Aside from the modules noted as being shared with the UK-Biobank, the following categories of variants were included in the design for this component.
Affymetrix Biobank array content: We considered a total of ~250,000 SNPs from the Axiom Biobank Genotyping Array, including 86,000 putative exonic SNVs and putative LoF variants. As not all of these have been validated and many are not polymorphic in the general populations, we used one of the largest whole-exome sequencing reference datasets available at the time of the design, comprising over 32,000 samples, to annotate and filter these variants based on the observed minor allele counts (MACs). We included only those variants with MACs greater than five observations in this database, which yielded ~168K exonic or coding variants and over 16K putative LoF variants. A total of 178,680 unique variants were selected in this module.
Human Gene Mutation Database (HGMD): we curated variants of The HGMD LoF database (up until August 1st 2013). Again, as above, we only included MAC observed greater than five times, for a total of 3,571 variants.
Additional LoF variants: Using the above noted ~32,000 exome database, we identified additional putative LoFs included in the Affymetrix Biobank Genotyping Array, UK Biobank Axiom Array or HGMD databases; again, filtering the observed SNVs and indels from analysis across over 32,000 human exomes (http://exac.broadinstitute.org) for at least MACs greater than five, we obtained a conservative set of 8,557 unique putative exonic SNVs and/or putative LoF variants.
Untranslated Region (UTR) Coverage: to provide maximal coverage of SNPs that may affect functional gene expression, we additionally focused on the coverage of 5’ (and 3’) UTRs defined as the exonic region between the transcriptional start (stop) and translational start (stop) sites as defined by either the RefSeq or ENSEMBLE human genome (hg19) reference sequences in June 2013. Using a MAF cutoff for inclusion of > 1% or 5% in CEU and AFR (ASW + YRI) populations, respectively, we included a total of ~184,000 SNPs and described in the Supplementary Material.
A priori Associations: To focus on known phenotypes, 8,136 SNPs that reached a conventional GWS threshold at P < 5 x 10-8 (December 2012) for both quantitative traits and disease-specific reported in NHGRI GWAS Catalog were included.
Copy Number Variations (CNVs) and Polymorphisms (CNPs)
CNP tagging and Regional Coverage: to cover common genomic structural elements by SNP-tagging we included 5,410 markers and we used an additional 21,960 variants to cover ~2,200 manually curated CNV regions as described in the Supplementary Materials.