Multiple testing in high-throughput sequence data: Experiences from Group 8 of Genetic Analysis Workshop 17

Inke R. König, Jeremie Nsengimana, Charalampos Papachristou, Matthew A. Simonson, Kai Wang, Jason A. Weisburd

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The use of high-throughput sequence data in genetic epidemiology allows the investigation of common and rare variants in the entire genome, thus increasing the amount of information and the potential number of statistical tests performed within one study. As a consequence, the problem of multiple testing may become even more pressing than in previous studies. As an important challenge, the exact number of statistical tests depends on the actual statistical method used. Furthermore, many statistical approaches for the analysis of sequence data require permutation. Thus it may be difficult to also use permutation to estimate correct type I error levels as in genome-wide association studies. In view of this, a separate group at Genetic Analysis Workshop 17 was formed with a focus on multiple testing. Here, we present the approaches used for the workshop. Apart from tackling the multiple testing problem, the new group focused on different issues. Some contributors developed and investigated modifications of existing collapsing methods. Others aimed at improving the identification of functional variants through a reduction and analysis of the underlying data dimensions. Two research groups investigated the overall accumulation of rare variation across the genome and its value in predicting phenotypes. Finally, other investigators left the path of traditional statistical analyses by reversing null and alternative hypotheses and by proposing a novel resampling method. We describe and discuss all these approaches.

Original languageEnglish (US)
JournalGenetic Epidemiology
Volume35
Issue numberSUPPL. 1
DOIs
StatePublished - Dec 5 2011
Externally publishedYes

Fingerprint

Education
Genome
Molecular Epidemiology
Genome-Wide Association Study
Sequence Analysis
Research Personnel
Phenotype
Research

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Genetics(clinical)

Cite this

König, Inke R. ; Nsengimana, Jeremie ; Papachristou, Charalampos ; Simonson, Matthew A. ; Wang, Kai ; Weisburd, Jason A. / Multiple testing in high-throughput sequence data : Experiences from Group 8 of Genetic Analysis Workshop 17. In: Genetic Epidemiology. 2011 ; Vol. 35, No. SUPPL. 1.
@article{37d937e68c9d4c0694a7f886806f27d9,
title = "Multiple testing in high-throughput sequence data: Experiences from Group 8 of Genetic Analysis Workshop 17",
abstract = "The use of high-throughput sequence data in genetic epidemiology allows the investigation of common and rare variants in the entire genome, thus increasing the amount of information and the potential number of statistical tests performed within one study. As a consequence, the problem of multiple testing may become even more pressing than in previous studies. As an important challenge, the exact number of statistical tests depends on the actual statistical method used. Furthermore, many statistical approaches for the analysis of sequence data require permutation. Thus it may be difficult to also use permutation to estimate correct type I error levels as in genome-wide association studies. In view of this, a separate group at Genetic Analysis Workshop 17 was formed with a focus on multiple testing. Here, we present the approaches used for the workshop. Apart from tackling the multiple testing problem, the new group focused on different issues. Some contributors developed and investigated modifications of existing collapsing methods. Others aimed at improving the identification of functional variants through a reduction and analysis of the underlying data dimensions. Two research groups investigated the overall accumulation of rare variation across the genome and its value in predicting phenotypes. Finally, other investigators left the path of traditional statistical analyses by reversing null and alternative hypotheses and by proposing a novel resampling method. We describe and discuss all these approaches.",
author = "K{\"o}nig, {Inke R.} and Jeremie Nsengimana and Charalampos Papachristou and Simonson, {Matthew A.} and Kai Wang and Weisburd, {Jason A.}",
year = "2011",
month = "12",
day = "5",
doi = "10.1002/gepi.20651",
language = "English (US)",
volume = "35",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "SUPPL. 1",

}

Multiple testing in high-throughput sequence data : Experiences from Group 8 of Genetic Analysis Workshop 17. / König, Inke R.; Nsengimana, Jeremie; Papachristou, Charalampos; Simonson, Matthew A.; Wang, Kai; Weisburd, Jason A.

In: Genetic Epidemiology, Vol. 35, No. SUPPL. 1, 05.12.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Multiple testing in high-throughput sequence data

T2 - Experiences from Group 8 of Genetic Analysis Workshop 17

AU - König, Inke R.

AU - Nsengimana, Jeremie

AU - Papachristou, Charalampos

AU - Simonson, Matthew A.

AU - Wang, Kai

AU - Weisburd, Jason A.

PY - 2011/12/5

Y1 - 2011/12/5

N2 - The use of high-throughput sequence data in genetic epidemiology allows the investigation of common and rare variants in the entire genome, thus increasing the amount of information and the potential number of statistical tests performed within one study. As a consequence, the problem of multiple testing may become even more pressing than in previous studies. As an important challenge, the exact number of statistical tests depends on the actual statistical method used. Furthermore, many statistical approaches for the analysis of sequence data require permutation. Thus it may be difficult to also use permutation to estimate correct type I error levels as in genome-wide association studies. In view of this, a separate group at Genetic Analysis Workshop 17 was formed with a focus on multiple testing. Here, we present the approaches used for the workshop. Apart from tackling the multiple testing problem, the new group focused on different issues. Some contributors developed and investigated modifications of existing collapsing methods. Others aimed at improving the identification of functional variants through a reduction and analysis of the underlying data dimensions. Two research groups investigated the overall accumulation of rare variation across the genome and its value in predicting phenotypes. Finally, other investigators left the path of traditional statistical analyses by reversing null and alternative hypotheses and by proposing a novel resampling method. We describe and discuss all these approaches.

AB - The use of high-throughput sequence data in genetic epidemiology allows the investigation of common and rare variants in the entire genome, thus increasing the amount of information and the potential number of statistical tests performed within one study. As a consequence, the problem of multiple testing may become even more pressing than in previous studies. As an important challenge, the exact number of statistical tests depends on the actual statistical method used. Furthermore, many statistical approaches for the analysis of sequence data require permutation. Thus it may be difficult to also use permutation to estimate correct type I error levels as in genome-wide association studies. In view of this, a separate group at Genetic Analysis Workshop 17 was formed with a focus on multiple testing. Here, we present the approaches used for the workshop. Apart from tackling the multiple testing problem, the new group focused on different issues. Some contributors developed and investigated modifications of existing collapsing methods. Others aimed at improving the identification of functional variants through a reduction and analysis of the underlying data dimensions. Two research groups investigated the overall accumulation of rare variation across the genome and its value in predicting phenotypes. Finally, other investigators left the path of traditional statistical analyses by reversing null and alternative hypotheses and by proposing a novel resampling method. We describe and discuss all these approaches.

UR - http://www.scopus.com/inward/record.url?scp=82455192519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82455192519&partnerID=8YFLogxK

U2 - 10.1002/gepi.20651

DO - 10.1002/gepi.20651

M3 - Article

C2 - 22128061

AN - SCOPUS:82455192519

VL - 35

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - SUPPL. 1

ER -