期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GWAsimulator: a rapid whole-genome simulation program

Li C Li M 《Bioinformatics (Oxford, England)》2008,24(1):140-142

SUMMARY: GWAsimulator implements a rapid moving-window algorithm to simulate genotype data for case-control or population samples from genomic SNP chips. For case-control data, the program generates cases and controls according to a user-specified multi-locus disease model, and can simulate specific regions if desired. The program uses phased genotype data as input and has the flexibility of simulating genotypes for different populations and different genomic SNP chips. When the HapMap phased data are used, the simulated data have similar local LD patterns as the HapMap data. As genome-wide association (GWA) studies become increasingly popular and new GWA data analysis methods are being developed, we anticipate that GWAsimulator will be an important tool for evaluating performance of new GWA analysis methods. AVAILABILITY: The C++ source code, executables for Linux, Windows and MacOS, manual, example data sets and analysis program are available at http://biostat.mc.vanderbilt.edu/GWAsimulator 相似文献

2.

Population Structure of Peronospora effusa in the Southwestern United States

Rebecca Lyon James Correll Chunda Feng Burt Bluhm Sandesh Shrestha Ainong Shi Kurt Lamour 《PloS one》2016,11(2)

Peronospora effusa is an obligate pathogen that causes downy mildew on spinach and is considered the most economically important disease of spinach. The objective of the current research was to assess genetic diversity of known historical races and isolates collected in 2014 from production fields in Yuma, Arizona and Salinas Valley, California. Candidate neutral single nucleotide polymorphisms (SNPs) were identified by comparing sequence data from reference isolates of known races of the pathogen collected in 2009 and 2010. Genotypes were assessed using targeted sequencing on genomic DNA extracted directly from infected plant tissue. Genotyping 26 historical and 167 contemporary samples at 46 SNP loci revealed 82 unique multi-locus genotypes. The unique genotypes clustered into five groups and the majority of isolates collected in 2014 were genetically closely related, regardless of source location. The historical samples, representing several races, showed greater genetic differentiation. Overall, the SNP data indicate much of the genotypic variation found within fields was produced during asexual development, whereas overall genetic diversity may be influenced by sexual recombination on broader geographical and temporal scales. 相似文献

3.

Underestimated effect sizes in GWAS: fundamental limitations of single SNP analysis for dichotomous phenotypes

Stringer S Wray NR Kahn RS Derks EM 《PloS one》2011,6(11):e27964

Complex diseases are often highly heritable. However, for many complex traits only a small proportion of the heritability can be explained by observed genetic variants in traditional genome-wide association (GWA) studies. Moreover, for some of those traits few significant SNPs have been identified. Single SNP association methods test for association at a single SNP, ignoring the effect of other SNPs. We show using a simple multi-locus odds model of complex disease that moderate to large effect sizes of causal variants may be estimated as relatively small effect sizes in single SNP association testing. This underestimation effect is most severe for diseases influenced by numerous risk variants. We relate the underestimation effect to the concept of non-collapsibility found in the statistics literature. As described, continuous phenotypes generated with linear genetic models are not affected by this underestimation effect. Since many GWA studies apply single SNP analysis to dichotomous phenotypes, previously reported results potentially underestimate true effect sizes, thereby impeding identification of true effect SNPs. Therefore, when a multi-locus model of disease risk is assumed, a multi SNP analysis may be more appropriate. 相似文献

4.

Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions 总被引：1，自引：0，他引：1

Dou J Zhao X Fu X Jiao W Wang N Zhang L Hu X Wang S Bao Z 《Biology direct》2012,7(1):17-9

ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome. 相似文献

5.

SNP discovery in nonmodel organisms: strand bias and base‐substitution errors reduce conversion rates

下载免费PDF全文

Anders Gonçalves da Silva William Barendse James W. Kijas Wes C. Barris Sean McWilliam Rowan J. Bunch Russell McCullough Blair Harrison A. Rus Hoelzel Phillip R. England 《Molecular ecology resources》2015,15(4):723-736

Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep‐sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand‐bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single‐copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms. 相似文献

6.

Discovery of cSNPs in pig using full-length enriched cDNA libraries of the Korean native pig as a source of genetic diversity

Vijaya R. Dirisala Juhyun Kim Kwangha Park Hoon-Taek Lee Chankyu Park 《Biotechnology and Bioprocess Engineering》2007,12(4):424-432

Clones from full-length enriched cDNA libraries serve as valuable resources for functional genomic studies. We analyzed 3.210 chromatograms obtained from sequencing the 5′-ends of brainstem, liver, neocortex, and spleen clones derived from full-length enriched cDNA libraries of Korean native pigs. In addition, 50,000 pig EST sequence trace files were obtained from Genbank and combined with our sequencing information for SNP identificationin silico. For the SNP analysis, neocortex, and liver libraries were newly constructed, whereas the sequencing results from brainstem and spleen libraries were from previously constructed libraries. The putative SNPs from thein silico analysis were confirmed by genomic PCR from a group of 20 pigs of four different breeds. Using this approach, 86% of cSNPs identifiedin silico were confirmed and the SNP detection frequency was 1 SNP per 338 bp. Interestingly, we found a valine deletion at amino acid position 126 of the neuronal and endocrine protein gene in the Korean native pig. We confirmed that this deletion was caused by alternative splicing at the NAGNAG acceptors. Our study shows that large-scale EST sequencing of Korean native pigs can be effectively employed for natural polymorphism-based pig genome analysis. 相似文献

7.

Extensive linkage disequilibrium and parallel adaptive divergence across threespine stickleback genomes

Hohenlohe PA Bassham S Currey M Cresko WA 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2012,367(1587):395-408

Population genomic studies are beginning to provide a more comprehensive view of dynamic genome-scale processes in evolution. Patterns of genomic architecture, such as genomic islands of increased divergence, may be important for adaptive population differentiation and speciation. We used next-generation sequencing data to examine the patterns of local and long-distance linkage disequilibrium (LD) across oceanic and freshwater populations of threespine stickleback, a useful model for studies of evolution and speciation. We looked for associations between LD and signatures of divergent selection, and assessed the role of recombination rate variation in generating LD patterns. As predicted under the traditional biogeographic model of unidirectional gene flow from ancestral oceanic to derived freshwater stickleback populations, we found extensive local and long-distance LD in fresh water. Surprisingly, oceanic populations showed similar patterns of elevated LD, notably between large genomic regions previously implicated in adaptation to fresh water. These results support an alternative biogeographic model for the stickleback radiation, one of a metapopulation with appreciable bi-directional gene flow combined with strong divergent selection between oceanic and freshwater populations. As predicted by theory, these processes can maintain LD within and among genomic islands of divergence. These findings suggest that the genomic architecture in oceanic stickleback populations may provide a mechanism for the rapid re-assembly and evolution of multi-locus genotypes in newly colonized freshwater habitats, and may help explain genetic mapping of parallel phenotypic variation to similar loci across independent freshwater populations. 相似文献

8.

Multiple target loci assembly sequencing (mTAS)

Han H Yoon JK Cho BC Kim H Bang D 《Analytical biochemistry》2011,415(2):218-220

Here we present multiple target loci assembly sequencing (mTAS), a method for examining multiple genomic loci in a single DNA sequencing read. The key to the success of mTAS target sequencing is the uniform amplification of multiple target genomic loci into a single DNA fragment using polymerase cycling assembly (PCA). Using this strategy, we successfully collected multiloci sequence information from a single DNA sequencing run. We applied mTAS to examine 29 different sets of human genomic loci, each containing from 2 to 11 single-nucleotide polymorphisms (SNP) present at different exons. We believe mTAS can be used to reduce the cost of Sanger sequencing-based genetic analysis. 相似文献

9.

Evaluation of multi-locus models for genome-wide association studies: a case study in sugar beet

T Würschum T Kraft 《Heredity》2015,114(3):281-290

Association mapping has become a widely applied genomic approach to dissect the genetic architecture of complex traits. A major issue for association mapping is the need to control for the confounding effects of population structure, which is commonly done by mixed models incorporating kinship information. In this case study, we employed experimental data from a large sugar beet population to evaluate multi-locus models for association mapping. As in linkage mapping, markers are selected as cofactors to control for population structure and genetic background variation. We compared different biometric models with regard to important quantitative trait locus (QTL) mapping parameters like the false-positive rate, the QTL detection power and the predictive power for the proportion of explained genotypic variance. Employing different approaches we show that the multi-locus model, that is, incorporating cofactors, outperforms the other models, including the mixed model used as a reference model. Thus, multi-locus models are an attractive alternative for association mapping to efficiently detect QTL for knowledge-based breeding. 相似文献

10.

Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster

Ober U Ayroles JF Stone EA Richards S Zhu D Gibbs RA Stricker C Gianola D Schlather M Mackay TF Simianer H 《PLoS genetics》2012,8(5):e1002685

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ～2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. 相似文献

11.

CropSNPdb: a database of SNP array data for Brassica crops and hexaploid bread wheat

Armin Scheben Brent Verpaalen Cynthia T. Lawley Chon‐Kit K. Chan Philipp E. Bayer Jacqueline Batley David Edwards 《The Plant journal : for cell and molecular biology》2019,98(1):142-152

Advances in sequencing technology have led to a rapid rise in the genomic data available for plants, driving new insights into the evolution, domestication and improvement of crops. Single nucleotide polymorphisms (SNPs) are a major component of crop genomic diversity, and are invaluable as genetic markers in research and breeding programs. High‐throughput SNP arrays, or ‘SNP chips’, can generate reproducible sets of informative SNP markers and have been broadly adopted. Although there are many public repositories for sequencing data, which are routinely uploaded, there are no formal repositories for crop SNP array data. To make SNP array data more easily accessible, we have developed CropSNPdb ( http://snpdb.appliedbioinformatics.com.au ), a database for SNP array data produced by the Illumina Infinium? hexaploid bread wheat (Triticum aestivum) 90K and Brassica 60K arrays. We currently host SNPs from datasets covering 526 Brassica lines and 309 bread wheat lines, and provide search, download and upload utilities for users. CropSNPdb provides a useful repository for these data, which can be applied for a range of genomics and molecular crop‐breeding activities. 相似文献

12.

Double‐digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq‐ion) with nonmodel organisms

Hans Recknagel Arne Jacobs Pawel Herzyk Kathryn R. Elmer 《Molecular ecology resources》2015,15(6):1316-1329

Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double‐digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single‐end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11 000 polymorphic loci per library of 6–30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost‐effective generation of variable and reproducible genetic markers. 相似文献

13.

SNPs by AFLP (SBA): a rapid SNP isolation strategy for non-model organisms 总被引：7，自引：0，他引：7

下载免费PDF全文

Nicod JC Largiadèr CR 《Nucleic acids research》2003,31(5):e19

Despite the great potential of single nucleotide polymorphism (SNP) markers in evolutionary studies, in particular for inferring population genetic parameters, SNP analysis has almost exclusively been limited to humans and ‘genomic model’ organisms, due to the lack of available sequence data in non-model organisms. Here, we describe a rapid and cost effective method to isolate candidate SNPs in non-model organisms. This SNP isolation strategy consists basically in the direct sequencing of amplified fragment length polymorphism bands. In a first application of this method, 10 unique DNA fragments that contained 24 SNPs were discovered in 11.11 kb of sequenced genomic DNA of a non-model species, the brown trout (Salmo trutta). 相似文献

14.

Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout 总被引：1，自引：0，他引：1

Hohenlohe PA Amish SJ Catchen JM Allendorf FW Luikart G 《Molecular ecology resources》2011,11(Z1):117-122

The increased numbers of genetic markers produced by genomic techniques have the potential to both identify hybrid individuals and localize chromosomal regions responding to selection and contributing to introgression. We used restriction-site-associated DNA sequencing to identify a dense set of candidate SNP loci with fixed allelic differences between introduced rainbow trout (Oncorhynchus mykiss) and native westslope cutthroat trout (Oncorhynchus clarkii lewisi). We distinguished candidate SNPs from homeologs (paralogs resulting from whole-genome duplication) by detecting excessively high observed heterozygosity and deviations from Hardy-Weinberg proportions. We identified 2923 candidate species-specific SNPs from a single Illumina sequencing lane containing 24 barcode-labelled individuals. Published sequence data and ongoing genome sequencing of rainbow trout will allow physical mapping of SNP loci for genome-wide scans and will also provide flanking sequence for design of qPCR-based TaqMan(?) assays for high-throughput, low-cost hybrid identification using a subset of 50-100 loci. This study demonstrates that it is now feasible to identify thousands of informative SNPs in nonmodel species quickly and at reasonable cost, even if no prior genomic information is available. 相似文献

15.

The construction of a haplotype reference panel using extremely low coverage whole genome sequences and its application in genome-wide association studies and genomic prediction in Duroc pigs

《Genomics》2022,114(1):340-350

Extremely low coverage whole genome sequencing (lcWGS) is an economical technique to obtain high-density single nucleotide polymorphisms (SNPs). Here, we explored the feasibility of constructing a haplotype reference panel (lcHRP) using lcWGS and evaluated the effects of lcHRP through a genome-wide association study (GWAS) and genomic prediction in pigs. A total of 297 and 974 Duroc pigs were genotyped using lcWGS and a 50 K SNP array, respectively. We obtained 19,306,498 SNPs using lcWGS with an accuracy of 0.984. With the help of lcHRP, the accuracy of imputation from the SNP array to lcWGS was 0.922. Compared to the SNP array findings, those from the imputation-based GWAS identified more signals across four traits. With the integration of the top 1% imputation-based GWAS findings as genomic features, the accuracies of genomic prediction was improved by 6.0% to 13.2%. This study showed the great potential of lcWGS in pigs' molecular breeding. 相似文献

16.

SNP Discovery with EST and NextGen Sequencing in Switchgrass (Panicum virgatum L.)

Elhan S. Ersoz Mark H. Wright Jasmyn L. Pangilinan Moira J. Sheehan Christian Tobias Michael D. Casler Edward S. Buckler Denise E. Costich 《PloS one》2012,7(9)

Although yield trials for switchgrass (Panicum virgatum L.), a potentially high value biofuel feedstock crop, are currently underway throughout North America, the genetic tools for crop improvement in this species are still in the early stages of development. Identification of high-density molecular markers, such as single nucleotide polymorphisms (SNPs), that are amenable to high-throughput genotyping approaches, is the first step in a quantitative genetics study of this model biofuel crop species. We generated and sequenced expressed sequence tag (EST) libraries from thirteen diverse switchgrass cultivars representing both upland and lowland ecotypes, as well as tetraploid and octoploid genomes. We followed this with reduced genomic library preparation and massively parallel sequencing of the same samples using the Illumina Genome Analyzer technology platform. EST libraries were used to generate unigene clusters and establish a gene-space reference sequence, thus providing a framework for assembly of the short sequence reads. SNPs were identified utilizing these scaffolds. We used a custom software program for alignment and SNP detection and identified over 149,000 SNPs across the 13 short-read sequencing libraries (SRSLs). Approximately 25,000 additional SNPs were identified from the entire EST collection available for the species. This sequencing effort generated data that are suitable for marker development and for estimation of population genetic parameters, such as nucleotide diversity and linkage disequilibrium. Based on these data, we assessed the feasibility of genome wide association mapping and genomic selection applications in switchgrass. Overall, the SNP markers discovered in this study will help facilitate quantitative genetics experiments and greatly enhance breeding efforts that target improvement of key biofuel traits and development of new switchgrass cultivars. 相似文献

17.

A simple route to single-nucleotide polymorphisms in a nonmodel species: identification and characterization of SNPs in the Artic ringed seal (Pusa hispida hispida)

Olsen MT Volny VH Bérubé M Dietz R Lydersen C Kovacs KM Dodd RS Palsbøll PJ 《Molecular ecology resources》2011,11(Z1):9-19

Although single-nucleotide polymorphisms (SNPs) have become the marker of choice in the field of human genetics, these markers are only slowly emerging in ecological, evolutionary and conservation genetic analyses of nonmodel species. This is partly because of difficulties associated with the discovery and characterization of SNP markers. Herein, we adopted a simple straightforward approach to identifying SNPs, based on screening of a random genomic library. In total, we identified 768 SNPs in the ringed seal, Pusa hispida hispida, in samples from Greenland and Svalbard. Using three seal samples, SNPs were discovered at a rate of one SNP per 402 bp, whereas re-sequencing of 96 seals increased the density to one SNP per 29 bp. Although applicable to any species of interest, the approach is especially well suited for SNP discovery in nonmodel organisms and is easily implemented in any standard genetics laboratory, circumventing the need for prior genomic data and use of next-generation sequencing facilities. 相似文献

18.

Noninvasive genome sampling in chimpanzees

Kohn MH 《Molecular ecology》2010,19(24):5328-5331

The inevitable has happened: genomic technologies have been added to our noninvasive genetic sampling repertoire. In this issue of Molecular Ecology, Perry et al. (2010) demonstrate how DNA extraction from chimpanzee faeces, followed by a series of steps to enrich for target loci, can be coupled with next-generation sequencing. These authors collected sequence and single-nucleotide polymorphism (SNP) data at more than 600 genomic loci (chromosome 21 and the X) and the complete mitochondrial DNA. By design, each locus was 'deep sequenced' to enable SNP identification. To demonstrate the reliability of their data, the work included samples from six captive chimps, which allowed for a comparison between presumably genuine SNPs obtained from blood and potentially flawed SNPs deduced from faeces. Thus, with this method, anyone with the resources, skills and ambition to do genome sequencing of wild, elusive, or protected mammals can enjoy all of the benefits of noninvasive sampling. 相似文献

19.

A resource of genome‐wide single‐nucleotide polymorphisms generated by RAD tag sequencing in the critically endangered European eel

J. M. Pujolar M. W. Jacobsen J. Frydenberg T. D. Als P. F. Larsen G. E. Maes L. Zane J. B. Jian L. Cheng M. M. Hansen 《Molecular ecology resources》2013,13(4):706-714

Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome. 相似文献

20.

Analysis of genome-wide association study data using the protein knowledge base

Sara Ballouz Jason Y Liu Martin Oti Bruno Gaeta Diane Fatkin Melanie Bahlo Merridee A Wouters 《BMC genetics》2011,12(1):1-20

Background

Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control.

Results

Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data.

Conclusions

Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases. 相似文献