首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use of high-throughput DNA sequencing and proteomic methods has led to an unprecedented increase in the amount of genomic and proteomic data. Application of computing technologies and development of computational tools to analyze and present these data has not kept pace with the accumulation of information. Here, we discuss the use of different database systems to store biological information and mention some of the key emerging computing technologies that are likely to have a key role in the future of bioinformatics.  相似文献   

2.
Recent years have seen an explosion in the amount of available biological data. More and more genomes are being sequenced and annotated, and protein and gene interaction data are accumulating. Biological databases have been invaluable for managing these data and for making them accessible. Depending on the data that they contain, the databases fulfil different functions. But, although they are architecturally similar, so far their integration has proved problematic.  相似文献   

3.
4.
  1. Download : Download high-res image (72KB)
  2. Download : Download full-size image
  相似文献   

5.
6.
7.
Zhu  Fangfang  Li  Jiang  Liu  Juan  Min  Wenwen 《BMC genetics》2021,22(1):1-10
Background

Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency.

Results

Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%.

Conclusions

For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

  相似文献   

8.
9.
MOTIVATION: Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. RESULTS: Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.  相似文献   

10.
11.
We are witnessing the growing menace of both increasing cases of drug-sensitive and drug-resistant Mycobacterium tuberculosis strains and the challenge to produce the first new tuberculosis (TB) drug in well over 40 years. The TB community, having invested in extensive high-throughput screening efforts, is faced with the question of how to optimally leverage these data to move from a hit to a lead to a clinical candidate and potentially, a new drug. Complementing this approach, yet conducted on a much smaller scale, cheminformatic techniques have been leveraged and are examined in this review. We suggest that these computational approaches should be optimally integrated within a workflow with experimental approaches to accelerate TB drug discovery.  相似文献   

12.
Data mining, finding and integration of information about proteins of interest, is an essential component in modern biological and biomedical research. Even when focusing on a single organism and only on a small number of proteins, there are often dozens fo data sources containing relevant information. We are developing PRIME, a protein information environment, to serve as a virtual central database which integrates distributed heterogeneous information about proteins (linked by common identifier). PRIME has powerful capabilities to visualize all kinds of protein annotation in specialized views. These views can be displayed side by side at the same time and can be synchronized in order to show simultaneously different aspects of identical proteins. These features allow a quick and comprehensive overview of properties of single proteins or protein sets.  相似文献   

13.
Mining genomic databases to identify novel hydrogen producers   总被引:7,自引:0,他引:7  
The realization that fossil fuel reserves are limited and their adverse effect on the environment has forced us to look into alternative sources of energy. Hydrogen is a strong contender as a future fuel. Biological hydrogen production ranges from 0.37 to 3.3 moles H(2) per mole of glucose and, considering the high theoretical values of production (4.0 moles H(2) per mole of glucose), it is worth exploring approaches to increase hydrogen yields. Screening the untapped microbial population is a promising possibility. Sequence analysis and pathway alignment of hydrogen metabolism in complete and incomplete genomes has led to the identification of potential hydrogen producers.  相似文献   

14.
Exon discovery by genomic sequence alignment   总被引:5,自引:0,他引:5  
MOTIVATION: During evolution, functional regions in genomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous genomic sequences, for example from human and mouse. The main advantage of these methods is that they are based on simple and generally applicable measures of (local) sequence similarity; unlike standard gene-finding approaches they do not depend on species-specific training data or on the presence of cognate genes in data bases. As all comparative sequence-analysis methods, the new comparative gene-finding approaches critically rely on the quality of the underlying sequence alignments. RESULTS: Herein, we describe a new implementation of the sequence-alignment program DIALIGN that has been developed for alignment of large genomic sequences. We compare our method to the alignment programs PipMaker, WABA and BLAST and we show that local similarities identified by these programs are highly correlated to protein-coding regions. In our test runs, PipMaker was the most sensitive method while DIALIGN was most specific. AVAILABILITY: The program is downloadable from the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/.  相似文献   

15.
Analysis of large gene databases for discovery of novel therapeutic agents   总被引:1,自引:0,他引:1  
During the 1980s and early 1990s the recombinant DNA revolution provided a vital source of therapeutic targets and agents for pharmaceutical research. However, during the early 1990s, it became apparent that the identification and cloning of novel human cDNAs was a rate limiting step in drug discovery and that new technological approaches were required to address the challenge. There was an increasing realisation that the new science of 'genomics', together with the associated large gene sequence databases, would provide a radically new means of generating targets. SmithKline Beecham has been at the forefront of this breakthrough in pharmaceutical research. The productivity of this strategy is illustrated by reference to our work on novel enzymes, chemokines and receptors and new approaches linking genes to pathological processes.  相似文献   

16.
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.  相似文献   

17.
Biochemical databases will be best served by the development of new specialized database management systems whose storage managers are based on metric-space indexing techniques and the development a database query languages that embody semantics derived from biochemical models of similarity and evolution. Important biochemical data types cannot be effectively mapped to low dimensional coordinate systems on which O(log n) indexing methods rely. It is clear from an abundance of bioinformatic discoveries that biochemical data is not random and exhibits interesting structure with respect to clustering. Metric-space indexing exploits a data set's intrinsic clustering to speed the execution of similarity queries, even when the data cannot be mapped to a coordinate system. Database management systems that seamlessly integrate semantically rich query languages with a metric-storage and retrieval mechanism will allow biologists to simply and concisely develop informatic studies that have traditionally been large and labor intensive.  相似文献   

18.
Light-weight integration of molecular biological databases   总被引:1,自引:0,他引:1  
MOTIVATION: Due to the increasing number of molecular biological databases and the exponential growth of their contents, database integration is an important topic of research in bioinformatics. Existing approaches in this area have in common that considerable efforts are needed to provide integrated access to heterogeneous data sources. RESULTS: This article describes the LIMBO architecture as a light-weight approach to molecular biological database integration. By building systems upon this architecture, the efforts needed for database integration can be significantly lowered. AVAILABILITY: As an illustration of the principle usefulness of the underlying ideas, a prototypical implementation based upon the LIMBO architecture is described. This implementation is exclusively based on freely available open source components like the PostgreSQL database management system and the BioRuby project. Additional files and modified components are available upon request from the author.  相似文献   

19.
《BIOSILICO》2003,1(4):134-142
The increasing amount of data produced by large-scale biological experiments has highlighted the inadequacies of traditional scientific data management methods such as laboratory notebooks. Databases designed to store biological information are becoming increasingly common, but there is little guidance in the literature about the best practices of biological database design. This paper suggests best practices, and provides examples for the implementation of these practices.  相似文献   

20.
MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号