首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genomic deletions have long been known to play a causative role in microdeletion syndromes. Recent whole-genome genetic studies have shown that deletions can increase the risk for several psychiatric disorders, suggesting that genomic deletions play an important role in the genetic basis of complex traits. However, the association between genomic deletions and common, complex diseases has not yet been systematically investigated in gene mapping studies. Likelihood-based statistical methods for identifying disease-associated deletions have recently been developed for familial studies of parent-offspring trios. The purpose of this study is to develop statistical approaches for detecting genomic deletions associated with complex disease in case–control studies. Our methods are designed to be used with dense single nucleotide polymorphism (SNP) genotypes to detect deletions in large-scale or whole-genome genetic studies. As more and more SNP genotype data for genome-wide association studies become available, development of sophisticated statistical approaches will be needed that use these data. Our proposed statistical methods are designed to be used in SNP-by-SNP analyses and in cluster analyses based on combined evidence from multiple SNPs. We found that these methods are useful for detecting disease-associated deletions and are robust in the presence of linkage disequilibrium using simulated SNP data sets. Furthermore, we applied the proposed statistical methods to SNP genotype data of chromosome 6p for 868 rheumatoid arthritis patients and 1,197 controls from the North American Rheumatoid Arthritis Consortium. We detected disease-associated deletions within the region of human leukocyte antigen in which genomic deletions were previously discovered in rheumatoid arthritis patients.  相似文献   

2.
Kebing Yu  Arthur R. Salomon 《Proteomics》2010,10(11):2113-2122
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab‐based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic data sets is critically important. The high‐throughput autonomous proteomic pipeline described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is composed of a software that controls the acquisition of mass spectral data along with automation of post‐acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user‐configurable lab‐based relational database. The software design of high‐throughput autonomous proteomic pipeline focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples.  相似文献   

3.
4.
In this article we highlight recent developments in computational functional genomics to identify networks of functionally related genes and proteins based on diverse sources of genomic data. Our specific focus is on statistical methods to identify genetic networks. We discuss integrated analysis of microarray datasets, methods to combine heterogeneous data sources, the analysis of high-dimensional phenotyping screens and describe efforts to establish a reliable and unbiased gold standard for method comparison and evaluation.  相似文献   

5.
6.
One of the most important goals of biological investigation is to uncover gene functional relations. In this study we propose a framework for extraction and integration of gene functional relations from diverse biological data sources, including gene expression data, biological literature and genomic sequence information. We introduce a two-layered Bayesian network approach to integrate relations from multiple sources into a genome-wide functional network. An experimental study was conducted on a test-bed of Arabidopsis thaliana. Evaluation of the integrated network demonstrated that relation integration could improve the reliability of relations by combining evidence from different data sources. Domain expert judgments on the gene functional clusters in the network confirmed the validity of our approach for relation integration and network inference.  相似文献   

7.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

8.
9.
10.
Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets.  相似文献   

11.
Evolution of exon-intron structure and alternative splicing   总被引:1,自引:0,他引:1  
Despite significant advances in high-throughput DNA sequencing, many important species remain understudied at the genome level. In this study we addressed a question of what can be predicted about the genome-wide characteristics of less studied species, based on the genomic data from completely sequenced species. Using NCBI databases we performed a comparative genome-wide analysis of such characteristics as alternative splicing, number of genes, gene products and exons in 36 completely sequenced model species. We created statistical regression models to fit these data and applied them to loblolly pine (Pinus taeda L.), an example of an important species whose genome has not been completely sequenced yet. Using these models, the genome-wide characteristics, such as total number of genes and exons, can be roughly predicted based on parameters estimated from available limited genomic data, e.g. exon length and exon/gene ratio.  相似文献   

12.

Background  

An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. Thus it is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is also of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers.  相似文献   

13.
Dissecting evolutionary dynamics of ecologically important traits is a long-term challenge for biologists.Attempts to understand natural variation and molecular mechanisms have motivated a move from laboratory model systems to non-model systems in diverse natural environments.Next generation sequencing methods,along with an expansion of genomic resources and tools,have fostered new links between diverse disciplines,including molecular biology,evolution,ecology,and genomics.Great progress has been made in a few non-model wild plants,such as Arabidopsis relatives,monkey flowers,and wild sunflowers.Until recently,the lack of comprehensive genomic information has limited evolutionary and ecological studies to larger QTL (quantitative trait locus) regions rather than single gene resolution,and has hindered recognition of general patterns of natural variation and local adaptation.Further efforts in accumulating genomic data and developing bioinformatic and biostatistical tools are now poised to move this field forward.Integrative national and international collaborations and research communities are needed to facilitate development in the field of evolutionary and ecological genomics.  相似文献   

14.
Almost a decade ago, a new phylogeny of bilaterian animals was inferred from small-subunit ribosomal RNA (rRNA) that claimed the monophyly of two major groups of protostome animals: Ecdysozoa (e.g., arthropods, nematodes, onychophorans, and tardigrades) and Lophotrochozoa (e.g., annelids, molluscs, platyhelminths, brachiopods, and rotifers). However, it received little additional support. In fact, several multigene analyses strongly argued against this new phylogeny. These latter studies were based on a large amount of sequence data and therefore showed an apparently strong statistical support. Yet, they covered only a few taxa (those for which complete genomes were available), making systematic artifacts of tree reconstruction more probable. Here we expand this sparse taxonomic sampling and analyze a large data set (146 genes, 35,371 positions) from a diverse sample of animals (35 species). Our study demonstrates that the incongruences observed between rRNA and multigene analyses were indeed due to long-branch attraction artifacts, illustrating the enormous impact of systematic biases on phylogenomic studies. A refined analysis of our data set excluding the most biased genes provides strong support in favor of the new animal phylogeny and in addition suggests that urochordates are more closely related to vertebrates than are cephalochordates. These findings have important implications for the interpretation of morphological and genomic data.  相似文献   

15.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.  相似文献   

16.
Array-based comparative genomic hybridization (arrayCGH) is a microarray-based comparative genomic hybridization technique that has been used to compare tumor genomes with normal genomes, thus providing rapid genomic assays of tumor genomes in terms of copy-number variations of those chromosomal segments that have been gained or lost. When properly interpreted, these assays are likely to shed important light on genes and mechanisms involved in the initiation and progression of cancer. Specifically, chromosomal segments, deleted in one or both copies of the diploid genomes of a group of patients with cancer, point to locations of tumor-suppressor genes (TSGs) implicated in the cancer. In this study, we focused on automatic methods for reliable detection of such genes and their locations, and we devised an efficient statistical algorithm to map TSGs, using a novel multipoint statistical score function. The proposed algorithm estimates the location of TSGs by analyzing segmental deletions (hemi- or homozygous) in the genomes of patients with cancer and the spatial relation of the deleted segments to any specific genomic interval. The algorithm assigns, to an interval of consecutive probes, a multipoint score that parsimoniously captures the underlying biology. It also computes a P value for every putative TSG by using concepts from the theory of scan statistics. Furthermore, it can identify smaller sets of predictive probes that can be used as biomarkers for diagnosis and therapeutics. We validated our method using different simulated artificial data sets and one real data set, and we report encouraging results. We discuss how, with suitable modifications to the underlying statistical model, this algorithm can be applied generally to a wider class of problems (e.g., detection of oncogenes).  相似文献   

17.
Abstract Dissecting evolutionary dynamics of ecologically important traits is a long‐term challenge for biologists. Attempts to understand natural variation and molecular mechanisms have motivated a move from laboratory model systems to non‐model systems in diverse natural environments. Next generation sequencing methods, along with an expansion of genomic resources and tools, have fostered new links between diverse disciplines, including molecular biology, evolution, ecology, and genomics. Great progress has been made in a few non‐model wild plants, such as Arabidopsis relatives, monkey flowers, and wild sunflowers. Until recently, the lack of comprehensive genomic information has limited evolutionary and ecological studies to larger QTL (quantitative trait locus) regions rather than single gene resolution, and has hindered recognition of general patterns of natural variation and local adaptation. Further efforts in accumulating genomic data and developing bioinformatic and biostatistical tools are now poised to move this field forward. Integrative national and international collaborations and research communities are needed to facilitate development in the field of evolutionary and ecological genomics.  相似文献   

18.
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.  相似文献   

19.
Understanding the genomic basis of adaptation in maize is important for gene discovery and the improvement of breeding germplasm, but much remains a mystery in spite of significant population genetics and archaeological research. Identifying the signals underpinning adaptation are challenging as adaptation often coincided with genetic drift, and the base genomic diversity of the species in massive. In this study, tGBS technology was used to genotype 1,143 diverse maize accessions including landraces collected from 20 countries and elite breeding lines of tropical lowland, highland, subtropical/midaltitude and temperate ecological zones. Based on 355,442 high‐quality single nucleotide polymorphisms, 13 genomic regions were detected as being under selection using the bottom‐up searching strategy, EigenGWAS. Of the 13 selection regions, 10 were first reported, two were associated with environmental parameters via EnvGWAS, and 146 genes were enriched. Combining large‐scale genomic and ecological data in this diverse maize panel, our study supports a polygenic adaptation model of maize and offers a framework to enhance our understanding of both the mechanistic basis and the evolutionary consequences of maize domestication and adaptation. The regions identified here are promising candidates for further, targeted exploration to identify beneficial alleles and haplotypes for deployment in maize breeding.  相似文献   

20.
The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here, we present and evaluate a method of SNP discovery using RNA sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has as experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all‐in‐one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other nonmodel systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号