GWAS data

GWAS data

LASA filenames: LASA_Affy, LASA_Illumina

Contact: Natasja van Schoor

Background

Genotyping array data (often referred to as GWAS data) are currently used to identify single nucleotide polymorphism (SNP) associated with various traits in Genome-Wide Association Studies (GWAS). SNP array data cover variants located in the protein coding region as well as in non-coding regions of the DNA.

GWAS data can be used to perform association – GWAS or candidate gene – studies, and to perform analyses developed for these data: polygenic risk scoring, estimation of SNP-based heritability and genetic correlations. In LASA, genotyping data are available for the first, second and third cohort.

Measurements in LASA


Blood collection

Blood samples were drawn from respondents participating in the LASA medical interview in C-cycle (1995-1996), G-cycle (2008-2009), 2B-cycle (2002-2003) and 3B-cycle (2012-2013). In the first and second cohort, DNA isolation was made from buffy coats in C and 2B cycle or full blood samples in G-cycle. For participants who had both full blood samples and buffy coats available, full blood samples were used to extract the DNA. In the third cohort full blood samples drawn at baseline (3B-cycle) were used for DNA isolation. In all samples DNA was extracted using standard procedures.

Measurement procedure & quality control (QC)

Genotyping for the first cohort was done using two arrays: Axiom-NL Array (Affymetrix Inc, Santa Clara, CA., USA) at Avera Institute for Human Genetics, Sioux Falls, SD., USA and Infinium Global Screening Array (GSA) (Illumina Inc, San Diego, CA., USA), as part of the EU GSA consortium at Human Genomics Facility (HuGe-F), Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands. At first, we were able to genotype 623 participants with the Axiom-NL Array. For this, we selected participants who had data on the C-cycle and D-cycle. Later, we were able to genotype the remaining persons with blood samples available using GSA.

Genotyping for the second and third cohort was done using Infinium Global Screening Array-24-v1.0 (GSA) (Illumina Inc, San Diego, CA., USA), as part of the EU GSA consortium at Human Genomics Facility (HuGe-F), Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands.

Axiom-NL Array [1] targets around 610 000 SNPs and includes SNPs commonly found in other genotyping platforms as well as specific markers from previous GWAS results, SNPs associated with psychiatric disorders, fertility and twinning.

GSA-24 v.1.0 targets around 690 000 SNPs and includes common SNPs as well as other SNPs important in clinical research and precision medicine research.

Due to technical differences, quality control (QC) and imputation were done separately for each array. For both arrays, QC was performed using Ricopili (Rapid Imputation for COnsortias PIpeLIne), an established tool developed by the Psychiatric Genomics Consortium [2]. Samples with sex mismatch (genetic sex does not match reported sex), duplicate samples, excess heterozygosity and call rate < 0.98 were removed after QC. SNPs with call rate < 0.98 and minor allele frequency (MAF) < 0.01 were also excluded.

A summary of the total number of individuals and SNPs available per array can be found in table 1.

Ancestry and relatedness

Principal components (PC) for each array were calculated and the data was plotted (see Fig. 1 and 4) together with the 1000 Genome dataset. Samples of non-European ancestry were identified and later removed using the 1000 Genome data as reference. Then, 10 PCs were calculated for each array (see Fig. 2 and 5). These PCs are available and recommended to be used as covariates in all analysis to adjust for population stratification. To check whether the generated principal components do properly correct for population stratification we ran a GWAS on height (see Fig. 3 and 6).

The data was further checked for relatedness between participants and a list of related individuals is available.

Imputation

Both datasets were imputed  based on Haplotype Reference Consortium (HRC) panel version 1.1.
Imputation was performed on the Michigan Imputation Server for the autosomal chromosomes (1-22).

A description of the GWAS data can be found in the LASA cohort update paper: “Hoogendijk, E.O., Deeg, D.J.H., de Breij, S.et al.The Longitudinal Aging Study Amsterdam: cohort update 2019 and additional data collections.Eur J Epidemiol35,61–74 (2020). https://doi.org/10.1007/s10654-019-00541-2”

Table 1: Number of samples and SNPs available per array

Genotyped QC-ed Europeans
(excl. related

samples)
Affymetrix 623
(631 028 SNPs)
620
(600 950 SNPs)
590
Illumina 1880
(686 082 SNPs)
1791
(471 977 SNPs)
1689
Total 2503 2411 2279


Variable information

The raw genotypes, QC-ed genotypes and imputed data files are available. Specific SNPs can be extracted upon request.

Data includes also:

  • A list of participants of non-European ancestry as indicated by principal component analysis on the genotype data using subjects from 1000 Genome panel as population reference.
  • 10 ancestry-informative principal components for all participants of European ancestry.
  • A list of related participants based on DNA data is.
  • An overall information sheet on imputed data which includes imputation quality and minor allele frequency for each imputed SNP.


Suggestions

  • Use the QC-ed data of the European ancestry samples or re-run QC and principal component analysis yourself if you want to use the raw data.
  • Select unrelated subjects based on phenotype availability.
  • No post-imputation QC is done. Please do so after you get GWAS results in accordance with the requirements from the GWAS consortium you are involved.
  • When selecting a few SNPs, always ask for the info files as well. We suggest using only imputed SNPs of high quality (R2 ≥ 0.8 and MAF>0.01).


Please take these into account in your analyses!

We are open for collaborations in GWAS meta-analyses involving phenotypes related to ageing. To facilitate this, a special GWAS meta-analysis form is available.

Check here the PCA graphical presentation for each array.

Availability of information per wave 1

BCDE
2B*
FGH

3B*
MB*IJK*
Affymetrix

-Me+----Me+-----
Illumina

-Me+--Me+-Me+-Me+---

1 More information about the LASA data collection waves is available here.

* 2B=baseline second cohort;
3B=baseline third cohort;
MB=migrants: baseline first cohort;
K=under construction

Me=data collected in medical interview
+= see Blood collection description

Previous use in LASA


References

  1. Ehli, E.A., et al., A method to customize population-specific arrays for genome-wide association testing. European Journal of Human Genetics, 2017. 25(2): p. 267-270.
  2. Lam M., Awasthi S, Watson H.J., et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne,Bioinformatics, 2020, 36 (3): p.930–933.
  3. The Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–1283 (2016). https://doi.org/10.1038/ng.3643


Date of last update: April 22, 2020