首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidate genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.  相似文献   

2.
3.
As DNA sequencing technology has markedly advanced in recent years2, it has become increasingly evident that the amount of genetic variation between any two individuals is greater than previously thought3. In contrast, array-based genotyping has failed to identify a significant contribution of common sequence variants to the phenotypic variability of common disease4,5. Taken together, these observations have led to the evolution of the Common Disease / Rare Variant hypothesis suggesting that the majority of the "missing heritability" in common and complex phenotypes is instead due to an individual''s personal profile of rare or private DNA variants6-8. However, characterizing how rare variation impacts complex phenotypes requires the analysis of many affected individuals at many genomic loci, and is ideally compared to a similar survey in an unaffected cohort. Despite the sequencing power offered by today''s platforms, a population-based survey of many genomic loci and the subsequent computational analysis required remains prohibitive for many investigators.To address this need, we have developed a pooled sequencing approach1,9 and a novel software package1 for highly accurate rare variant detection from the resulting data. The ability to pool genomes from entire populations of affected individuals and survey the degree of genetic variation at multiple targeted regions in a single sequencing library provides excellent cost and time savings to traditional single-sample sequencing methodology. With a mean sequencing coverage per allele of 25-fold, our custom algorithm, SPLINTER, uses an internal variant calling control strategy to call insertions, deletions and substitutions up to four base pairs in length with high sensitivity and specificity from pools of up to 1 mutant allele in 500 individuals. Here we describe the method for preparing the pooled sequencing library followed by step-by-step instructions on how to use the SPLINTER package for pooled sequencing analysis (http://www.ibridgenetwork.org/wustl/splinter). We show a comparison between pooled sequencing of 947 individuals, all of whom also underwent genome-wide array, at over 20kb of sequencing per person. Concordance between genotyping of tagged and novel variants called in the pooled sample were excellent. This method can be easily scaled up to any number of genomic loci and any number of individuals. By incorporating the internal positive and negative amplicon controls at ratios that mimic the population under study, the algorithm can be calibrated for optimal performance. This strategy can also be modified for use with hybridization capture or individual-specific barcodes and can be applied to the sequencing of naturally heterogeneous samples, such as tumor DNA.  相似文献   

4.
Carriers of germ line mutations in breast cancer susceptibility gene BRCA1 have an increased risk of developing breast and ovarian cancers; missense mutations have, however, been difficult to assess for disease association. Here we have used a biophysical approach to classify these variants. We established an assay for measuring the thermodynamic stability of the BRCA1 BRCT domains and investigated the effects of 36 missense mutations. The mutations show a range of effects. Some do not change the stability, whereas others destabilize the protein by as much as 6 kcal mol−1; one-third of the mutants could not be expressed in soluble form in Escherichia coli, and we conclude that these destabilize the protein by an even greater amount. We tested several computer algorithms for their ability to predict the mutant effects and found that by grouping them into two classes (destabilizing by less than or more than 2.2 kcal mol−1), the algorithms could predict the stability changes. Importantly, with the exception of the few mutants located in the binding site, none showed a significant reduction in affinity for phosphorylated substrate. These results indicate that despite very large losses in stability, the integrity of the structure is not compromised by the mutations. Thus, the majority of mutations cause loss of function by reducing the proportion of BRCA1 molecules that are in the folded state and increasing the proportion of molecules that are unfolded. Consequently, small molecule stabilization of the structure could be a generally applicable preventative therapeutic strategy for rescuing many BRCA1 mutations.  相似文献   

5.
Predicted species lists are generated from regional species pools, coupled to codified habitat, microhabitat and biological trait data for the species. At site level, habitat association data are used to tailor the predicted list to site conditions and comparison between predicted and observed species lists is used to explore elements of site quality and site management options involving microhabitat and trait data for the species in the process. It is pointed out that this approach makes possible the interpretation of insect species lists by non-specialist site managers. At larger geographic scales, attributes of regional lists are identified. Throughout, the approach is considered in the context of its potential to contribute to resolution of issues relating to maintenance of biodiversity in Europe and the taxonomic group employed is the Syrphidae (Diptera).  相似文献   

6.
7.
8.
组蛋白变体是重要的表观遗传调控因子,能够在染色质特定位置替换常规组蛋白,维持染色质结构进而保证转录激活或抑制的顺利进行.目前,组蛋白变体的调控功能已成为植物学研究领域的一个热点.近年来,随着植物组蛋白变体生物学功能研究的不断深入,发现组蛋白变体能够在植物生长发育和环境应答调控等多个生物学过程中发挥重要作用.该文简要介绍...  相似文献   

9.
Advances in sequencing technologies are allowing genome-wide association studies at an ever-growing scale. The interpretation of these studies requires dealing with statistical and combinatorial challenges, owing to the multi-factorial nature of human diseases and the huge space of genomic markers that are being monitored. Recently, it was proposed that using protein–protein interaction network information could help in tackling these challenges by restricting attention to markers or combinations of markers that map to close proteins in the network. In this review, we survey techniques for integrating genomic variation data with network information to improve our understanding of complex diseases and reveal meaningful associations.  相似文献   

10.
Array CGH for the detection of genomic copy number variants has replaced G-banded karyotype analysis. This paper describes the technology and its application in a clinical diagnostic service laboratory. DNA extracted from a patient’s sample (blood, saliva or other tissue types) is labeled with a fluorochrome (either cyanine 5 or cyanine 3). A reference DNA sample is labeled with the opposite fluorochrome. There follows a cleanup step to remove unincorporated nucleotides before the labeled DNAs are mixed and resuspended in a hybridization buffer and applied to an array comprising ~60,000 oligonucleotide probes from loci across the genome, with high probe density in clinically important areas. Following hybridization, the arrays are washed, then scanned and the resulting images are analyzed to measure the red and green fluorescence for each probe. Software is used to assess the quality of each probe measurement, calculate the ratio of red to green fluorescence and detect potential copy number variants.  相似文献   

11.
12.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

13.
Groupwise functional analysis of gene variants is becoming standard in next-generation sequencing studies. As the function of many genes is unknown and their classification to pathways is scant, functional associations between genes are often inferred from large-scale omics data. Such data types—including protein–protein interactions and gene co-expression networks—are used to examine the interrelations of the implicated genes. Statistical significance is assessed by comparing the interconnectedness of the mutated genes with that of random gene sets. However, interconnectedness can be affected by confounding bias, potentially resulting in false positive findings. We show that genes implicated through de novo sequence variants are biased in their coding-sequence length and longer genes tend to cluster together, which leads to exaggerated p-values in functional studies; we present here an integrative method that addresses these bias. To discern molecular pathways relevant to complex disease, we have inferred functional associations between human genes from diverse data types and assessed them with a novel phenotype-based method. Examining the functional association between de novo gene variants, we control for the heretofore unexplored confounding bias in coding-sequence length. We test different data types and networks and find that the disease-associated genes cluster more significantly in an integrated phenotypic-linkage network than in other gene networks. We present a tool of superior power to identify functional associations among genes mutated in the same disease even after accounting for significant sequencing study bias and demonstrate the suitability of this method to functionally cluster variant genes underlying polygenic disorders.  相似文献   

14.
15.
DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some “holes” in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.  相似文献   

16.
17.
Approximately 70% of sequenced bacterial genomes contain prophage-like structures, yet little effort has been made to use this information to determine the functions of these elements. The recent genomic sequencing of the marine bacterium Silicibacter sp. strain TM1040 revealed five prophage-like elements in its genome. The genomes of these prophages (named prophages 1 to 5) are approximately 74, 30, 39, 36, and 15 kb long, respectively. To understand the function of these prophages, cultures of TM1040 were treated with mitomycin C to induce the production of viral particles. A significant increase in viral counts and a decrease in bacterial counts when treated with mitomycin C suggested that prophages were induced from TM1040. Transmission electron microscopy revealed one dominant type of siphovirus, while pulsed-field gel electrophoresis demonstrated two major DNA bands, equivalent to 35 and 75 kb, in the lysate. PCR amplification with primer sets specific to each prophage detected the presence of prophages 1, 3, and 4 in the viral lysate, suggesting that these prophages are inducible, but not necessarily to the same level, while prophages 2 and 5 are likely defective or non-mitomycin C-inducible phages. The combination of traditional phage assays and modern microbial genomics provides a quick and efficient way to investigate the functions and inducibility of prophages, particularly for a host harboring multiple prophages with similar sizes and morphological features.  相似文献   

18.
Identification of important nodes in complex networks has attracted an increasing attention over the last decade. Various measures have been proposed to characterize the importance of nodes in complex networks, such as the degree, betweenness and PageRank. Different measures consider different aspects of complex networks. Although there are numerous results reported on undirected complex networks, few results have been reported on directed biological networks. Based on network motifs and principal component analysis (PCA), this paper aims at introducing a new measure to characterize node importance in directed biological networks. Investigations on five real-world biological networks indicate that the proposed method can robustly identify actually important nodes in different networks, such as finding command interneurons, global regulators and non-hub but evolutionary conserved actually important nodes in biological networks. Receiver Operating Characteristic (ROC) curves for the five networks indicate remarkable prediction accuracy of the proposed measure. The proposed index provides an alternative complex network metric. Potential implications of the related investigations include identifying network control and regulation targets, biological networks modeling and analysis, as well as networked medicine.  相似文献   

19.
Germ Cell Tumors (GCT) have a high cure rate, but we currently lack the ability to accurately identify the small subset of patients who will die from their disease. We used a combined genomic and expression profiling approach to identify genomic regions and underlying genes that are predictive of outcome in GCT patients. We performed array-based comparative genomic hybridization (CGH) on 53 non-seminomatous GCTs (NSGCTs) treated with cisplatin based chemotherapy and defined altered genomic regions using Circular Binary Segmentation. We identified 14 regions associated with two year disease-free survival (2yDFS) and 16 regions associated with five year disease-specific survival (5yDSS). From corresponding expression data, we identified 101 probe sets that showed significant changes in expression. We built several models based on these differentially expressed genes, then tested them in an independent validation set of 54 NSGCTs. These predictive models correctly classified outcome in 64–79.6% of patients in the validation set, depending on the endpoint utilized. Survival analysis demonstrated a significant separation of patients with good versus poor predicted outcome when using a combined gene set model. Multivariate analysis using clinical risk classification with the combined gene model indicated that they were independent prognostic markers. This novel set of predictive genes from altered genomic regions is almost entirely independent of our previously identified set of predictive genes for patients with NSGCTs. These genes may aid in the identification of the small subset of patients who are at high risk of poor outcome.  相似文献   

20.
In reacting to global competition and rapidly changing customer demands, industrial business organizations have developed a strong interest in flexible automation. The aim of flexible automation focuses on achieving agility in handling uncertainties from internal or external environments. Modeling complex structures, promoting reuse, and shortening the development time cycle are particularly significant aspects in the analysis and design of CIM systems, where heterogeneous elements have to be integrated in a complex control architecture. The design methodology for FMS control software involves the abstraction of an FMS and the estimation of the system performances. The aim of this activity is to suggest the optimal configuration of an FMS for given specifications, through simulation tools. In the software engineering field, object-oriented (OO) approaches have proven to be a powerful technique with respect to such aspects. The unified modeling language (UML), by using OO design methodologies, can offer reusability, extendibility, and modifiability in software design. Also, it bridges the gap that exists between the OO analysis and design area and the area of OO programming by creating an integrative metamodel of OO concepts. The specific goal of this paper is to formulate a new methodology for developing reusable, extendible, and modifiable control software for an FMS in an object-oriented environment. It is demonstrated that, with few diagrams, UML can be used to model such systems without being associated with other modeling tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号