首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach.  相似文献   

2.
The objective of this review paper is to describe the development and application of a suite of more than 40 computerized dairy farm decision support tools contained at the University of Wisconsin-Madison (UW) Dairy Management website http://DairyMGT.info. These data-driven decision support tools are aimed to help dairy farmers improve their decision-making, environmental stewardship and economic performance. Dairy farm systems are highly dynamic in which changing market conditions and prices, evolving policies and environmental restrictions together with every time more variable climate conditions determine performance. Dairy farm systems are also highly integrated with heavily interrelated components such as the dairy herd, soils, crops, weather and management. Under these premises, it is critical to evaluate a dairy farm following a dynamic integrated system approach. For this approach, it is crucial to use meaningful data records, which are every time more available. These data records should be used within decision support tools for optimal decision-making and economic performance. Decision support tools in the UW-Dairy Management website (http://DairyMGT.info) had been developed using combination and adaptation of multiple methods together with empirical techniques always with the primary goal for these tools to be: (1) highly user-friendly, (2) using the latest software and computer technologies, (3) farm and user specific, (4) grounded on the best scientific information available, (5) remaining relevant throughout time and (6) providing fast, concrete and simple answers to complex farmers’ questions. DairyMGT.info is a translational innovative research website in various areas of dairy farm management that include nutrition, reproduction, calf and heifer management, replacement, price risk and environment. This paper discusses the development and application of 20 selected (http://DairyMGT.info) decision support tools.  相似文献   

3.
A critical step in any SAGE, MPSS and SBS data analysis is tag-to-gene assignment. Current available tools are limited by a tag-by-tag annotation process and/or do not provide the dataset that is used to produce a complete tag-to-gene mapping. We developed ACTG, a web-based application that allows a large-scale tag-to-gene mapping using several reference datasets. ACTG can annotate SAGE (14 or 21 bp), MPSS (17 or 20 bp) and SBS (16 bp) data for both human and mouse organisms. AVAILABILITY: http://retina.med.harvard.edu/ACTG/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

4.
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv.  相似文献   

5.
A user-friendly graphical data analysis to perform stability analysis of genotype x environmental interactions, using Tai's stability model and additive main effects and multiplicative interaction (AMMI) biplots, are presented here. This practical approach integrates statistical and graphical analysis tools available in SAS systems and provides user-friendly applications to perform complete stability analyses without writing SAS program statements or using pull-down menu interfaces by running the SAS macros in the background. By using this macro approach, the agronomists and plant breeders can effectively perform stability analysis and spend more time in data exploration, interpretation of graphs, and output, rather than debugging their program errors. The necessary MACRO-CALL files can be downloaded from the author's home page at http://www.ag.unr.edu/gf. The nature and the distinctive features of the graphics produced by these applications are illustrated by using published data.  相似文献   

6.
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster‐based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single‐ and multilocus models that can efficiently conduct the association tests on such high‐dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.  相似文献   

7.
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available.  相似文献   

8.
MOTIVATION: An important contribution to the Gene Ontology (GO) project is to develop tools that facilitate the creation, maintenance and use of ontologies. Several tools have been created for communicating and using the GO project. However, a limitation with most of these tools is that they suffer from lack of a comprehensive search facility. We developed a web application, GOfetcher, with a very comprehensive search facility for the GO project and a variety of output formats for the results. GOfetcher has three different levels for searching the GO: 'Quick Search', 'Advanced Search' and 'Upload Files' for searching. The application includes a unique search option which generates gene information given a nucleotide or protein accession number which can then be used in generating GO information. The output data in GOfetcher can be saved into several different formats; including spreadsheet, comma-separated values and the extensible markup language (XML) format. The database is available at http://mcbc.usm.edu/gofetcher/.  相似文献   

9.
Esophageal squamous-cell carcinoma (ESCC) is one of the most lethal malignancies in the world and occurs at particularly higher frequency in China. While several genome-wide association studies (GWAS) of germline variants and whole-genome or whole-exome sequencing studies of somatic mutations in ESCC have been published, there is no comprehensive database publically available for this cancer. Here, we developed the Chinese Cancer Genomic Database-Esophageal Squamous Cell Carcinoma (CCGD-ESCC) database, which contains the associations of 69,593 single nucleotide polymorphisms (SNPs) with ESCC risk in 2022 cases and 2039 controls, survival time of 1006 ESCC patients (survival GWAS) and gene expression (expression quantitative trait loci, eQTL) in 94 ESCC patients. Moreover, this database also provides the associations between 8833 somatic mutations and survival time in 675 ESCC patients. Our user-friendly database is a resource useful for biologists and oncologists not only in identifying the associations of genetic variants or somatic mutations with the development and progression of ESCC but also in studying the underlying mechanisms for tumorigenesis of the cancer. CCGD-ESCC is freely accessible at http://db.cbi.pku.edu.cn/ccgd/ESCCdb.  相似文献   

10.
SUMMARY: SpA is a web-accessible system for the management, visualization and statistical analysis of T-cell receptor spectratype data. Users upload data from their spectratype analyzers to SpA, which saves the raw data and user-defined supplementary covariates to a secure database. The statistical engine performs several data analyses and statistical summaries. The visualization engine displays spectratype histograms in a Java applet and in an image file suitable for download. All of these results are also saved to the database and remain accessible to the user. Additional statistical tools specific to the analysis of multiple spectratypes are also available through the SpA interface. AVAILABILITY: The service is freely accessible via the web at http://www.duke.edu/~kepler/spa.html. Additional technical support and specialized statistical analysis and consultation are available by arrangement with the authors and, depending on the service requested, may be subject to fee.  相似文献   

11.
12.
MOTIVATION: Cluster analysis is one of the most important data mining tools for investigating high-throughput biological data. The existence of many scattered objects that should not be clustered has been found to hinder performance of most traditional clustering algorithms in such a high-dimensional complex situation. Very often, additional prior knowledge from databases or previous experiments is also available in the analysis. Excluding scattered objects and incorporating existing prior information are desirable to enhance the clustering performance. RESULTS: In this article, a class of loss functions is proposed for cluster analysis and applied in high-throughput genomic and proteomic data. Two major extensions from K-means are involved: penalization and weighting. The additive penalty term is used to allow a set of scattered objects without being clustered. Weights are introduced to account for prior information of preferred or prohibited cluster patterns to be identified. Their relationship with the classification likelihood of Gaussian mixture models is explored. Incorporation of good prior information is also shown to improve the global optimization issue in clustering. Applications of the proposed method on simulated data as well as high-throughput data sets from tandem mass spectrometry (MS/MS) and microarray experiments are presented. Our results demonstrate its superior performance over most existing methods and its computational simplicity and extensibility in the application of large complex biological data sets. AVAILABILITY: http://www.pitt.edu/~ctseng/research/software.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

13.
茄子是重要的园艺作物,也是茄科植物中种植最广泛的蔬菜之一。茄子果实相关农艺性状是一种复杂的数量性状,传统育种选育效率低、周期长。高通量测序技术与生物信息学技术的快速发展,使得全基因组关联分析(genome-wide association study, GWAS)在解析茄子果实相关复杂农艺性状的遗传规律方面展现出巨大的应用前景。本文对全基因组关联分析在茄子的果形、果色等果实相关农艺性状中的研究进展进行了综述;针对茄子数量性状遗传研究中普遍存在的“丢失遗传力”(missing heritability)问题,从4个GWAS策略在茄子果实相关农艺性状研究中的应用热点出发,提出了未来茄子GWAS的发展对策;并结合当前茄子遗传改良的实践需求,展望了GWAS策略在茄子分子育种领域的广阔应用前景。本文为今后利用GWAS解析各种茄子果实相关性状的遗传基础以及选育符合消费者需求的果实材料提供了理论依据和参考。  相似文献   

14.
Maria Masotti  Bin Guo  Baolin Wu 《Biometrics》2019,75(4):1076-1085
Genetic variants associated with disease outcomes can be used to develop personalized treatment. To reach this precision medicine goal, hundreds of large‐scale genome‐wide association studies (GWAS) have been conducted in the past decade to search for promising genetic variants associated with various traits. They have successfully identified tens of thousands of disease‐related variants. However, in total these identified variants explain only part of the variation for most complex traits. There remain many genetic variants with small effect sizes to be discovered, which calls for the development of (a) GWAS with more samples and more comprehensively genotyped variants, for example, the NHLBI Trans‐Omics for Precision Medicine (TOPMed) Program is planning to conduct whole genome sequencing on over 100 000 individuals; and (b) novel and more powerful statistical analysis methods. The current dominating GWAS analysis approach is the “single trait” association test, despite the fact that many GWAS are conducted in deeply phenotyped cohorts including many correlated and well‐characterized outcomes, which can help improve the power to detect novel variants if properly analyzed, as suggested by increasing evidence that pleiotropy, where a genetic variant affects multiple traits, is the norm in genome‐phenome associations. We aim to develop pleiotropy informed powerful association test methods across multiple traits for GWAS. Since it is generally very hard to access individual‐level GWAS phenotype and genotype data for those existing GWAS, due to privacy concerns and various logistical considerations, we develop rigorous statistical methods for pleiotropy informed adaptive multitrait association test methods that need only summary association statistics publicly available from most GWAS. We first develop a pleiotropy test, which has powerful performance for truly pleiotropic variants but is sensitive to the pleiotropy assumption. We then develop a pleiotropy informed adaptive test that has robust and powerful performance under various genetic models. We develop accurate and efficient numerical algorithms to compute the analytical P‐value for the proposed adaptive test without the need of resampling or permutation. We illustrate the performance of proposed methods through application to joint association test of GWAS meta‐analysis summary data for several glycemic traits. Our proposed adaptive test identified several novel loci missed by individual trait based GWAS meta‐analysis. All the proposed methods are implemented in a publicly available R package.  相似文献   

15.
SUMMARY: Microarray data management and processing (MAD) is a set of Windows integrated software for microarray analysis. It consists of a relational database for data storage with many user-interfaces for data manipulation, several text file parsers and Microsoft Excel macros for automation of data processing, and a generator to produce text files that are ready for cluster analysis. AVAILABILITY: Executable is available free of charge on http://pompous.swmed.edu. The source code is also available upon request.  相似文献   

16.
MOTIVATION: RNA H-type pseudoknots are ubiquitous pseudoknots that are found in almost all classes of RNA and thought to play very important roles in a variety of biological processes. Detection of these RNA H-type pseudoknots can improve our understanding of RNA structures and their associated functions. However, the currently existing programs for detecting such RNA H-type pseudoknots are still time consuming and sometimes even ineffective. Therefore, efficient and effective tools for detecting the RNA H-type pseudoknots are needed. RESULTS: In this paper, we have adopted a heuristic approach to develop a novel tool, called HPknotter, for efficiently and accurately detecting H-type pseudoknots in an RNA sequence. In addition, we have demonstrated the applicability and effectiveness of HPknotter by testing on some sequences with known H-type pseudoknots. Our approach can be easily extended and applied to other classes of more general pseudoknots. AVAILABILITY: The web server of our HPknotter is available for online analysis at http://bioalgorithm.life.nctu.edu.tw/HPKNOTTER/ CONTACT: cllu@mail.nctu.edu.tw, chiu@cc.nctu.edu.tw  相似文献   

17.
MollDE: a homology modeling framework you can click with   总被引:1,自引:0,他引:1  
SUMMARY: Molecular Integrated Development Environment (MolIDE) is an integrated application designed to provide homology modeling tools and protocols under a uniform, user-friendly graphical interface. Its main purpose is to combine the most frequent modeling steps in a semi-automatic, interactive way, guiding the user from the target protein sequence to the final three-dimensional protein structure. The typical basic homology modeling process is composed of building sequence profiles of the target sequence family, secondary structure prediction, sequence alignment with PDB structures, assisted alignment editing, side-chain prediction and loop building. All of these steps are available through a graphical user interface. MolIDE's user-friendly and streamlined interactive modeling protocol allows the user to focus on the important modeling questions, hiding from the user the raw data generation and conversion steps. MolIDE was designed from the ground up as an open-source, cross-platform, extensible framework. This allows developers to integrate additional third-party programs to MolIDE. AVAILABILITY: http://dunbrack.fccc.edu/molide/molide.php CONTACT: rl_dunbrack@fccc.edu.  相似文献   

18.
MOTIVATION: An important application of protein microarray data analysis is identifying a serodiagnostic antigen set that can reliably detect patterns and classify antigen expression profiles. This work addresses this problem using antibody responses to protein markers measured by a novel high-throughput microarray technology. The findings from this study have direct relevance to rapid, broad-based diagnostic and vaccine development. RESULTS: Protein microarray chips are probed with sera from individuals infected with the bacteria Francisella tularensis, a category A biodefense pathogen. A two-step approach to the diagnostic process is presented (1) feature (antigen) selection and (2) classification using antigen response measurements obtained from F.tularensis microarrays (244 antigens, 46 infected and 54 healthy human sera measurements). To select antigens, a ranking scheme based on the identification of significant immune responses and differential expression analysis is described. Classification methods including k-nearest neighbors, support vector machines (SVM) and k-Means clustering are applied to training data using selected antigen sets of various sizes. SVM based models yield prediction accuracy rates in the range of approximately 90% on validation data, when antigen set sizes are between 25 and 50. These results strongly indicate that the top-ranked antigens can be considered high-priority candidates for diagnostic development. AVAILABILITY: All software programs are written in R and available at http://www.igb.uci.edu/index.php?page=tools and at http://www.r-project.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
In the course of evolution, the genomes of grasses have maintained an observable degree of gene order conservation. The information available for already sequenced genomes can be used to predict the gene order of nonsequenced species by means of comparative colinearity studies. The “Wheat Zapper” application presented here performs on-demand colinearity analysis between wheat, rice, Sorghum, and Brachypodium in a simple, time efficient, and flexible manner. This application was specifically designed to provide plant scientists with a set of tools, comprising not only synteny inference, but also automated primer design, intron/exon boundaries prediction, visual representation using the graphic tool Circos 0.53, and the possibility of downloading FASTA sequences for downstream applications. Quality of the “Wheat Zapper” prediction was confirmed against the genome of maize, with good correlation (r?>?0.83) observed between the gene order predicted on the basis of synteny and their actual position on the genome. Further, the accuracy of “Wheat Zapper” was calculated at 0.65 considering the “Genome Zipper” application as the “gold” standard. The differences between these two tools are amply discussed, making the point that “Wheat Zapper” is an accurate and reliable on-demand tool that is sure to benefit the cereal scientific community. The Wheat Zapper is available at http://wge.ndsu.nodak.edu/wheatzapper/.  相似文献   

20.
Menda N  Buels RM  Tecle I  Mueller LA 《Plant physiology》2008,147(4):1788-1799
The amount of biological data available in the public domain is growing exponentially, and there is an increasing need for infrastructural and human resources to organize, store, and present the data in a proper context. Model organism databases (MODs) invest great efforts to functionally annotate genomes and phenomes by in-house curators. The SOL Genomics Network (SGN; http://www.sgn.cornell.edu) is a clade-oriented database (COD), which provides a more scalable and comparative framework for biological information. SGN has recently spearheaded a new approach by developing community annotation tools to expand its curational capacity. These tools effectively allow some curation to be delegated to qualified researchers, while, at the same time, preserving the in-house curators' full editorial control. Here we describe the background, features, implementation, results, and development road map of SGN's community annotation tools for curating genotypes and phenotypes. Since the inception of this project in late 2006, interest and participation from the Solanaceae research community has been strong and growing continuously to the extent that we plan to expand the framework to accommodate more plant taxa. All data, tools, and code developed at SGN are freely available to download and adapt.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号