Defining window-boundaries for genomic analyses using smoothing spline techniques期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

Defining window-boundaries for genomic analyses using smoothing spline techniques

Authors:	Timothy M Beissinger Guilherme JM Rosa Shawn M Kaeppler Daniel Gianola Natalia de Leon

Affiliation:	.Department of Plant Sciences, University of California, Davis, 95616 USA ;.Department of Animal Sciences, University of Wisconsin, Madison, 53706 USA ;.Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, 53792 USA ;.Department of Agronomy, University of Wisconsin, Madison, 53706 USA ;.Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, 53706 USA ;.Department of Dairy Science, University of Wisconsin, Madison, 53706 USA

Abstract:	Background High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome. Results Simulations applying this method were performed to identify selection signatures from pooled sequencing F_ST data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing F_ST data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach. Conclusions We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F_ST data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号