首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.  相似文献   

2.
3.

Background  

A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence.  相似文献   

4.

Background  

Renowned for their fast growth, valuable wood properties and wide adaptability, Eucalyptus species are amongst the most planted hardwoods in the world, yet they are still at the early stages of domestication because conventional breeding is slow and costly. Thus, there is huge potential for marker-assisted breeding programs to improve traits such as wood properties. To this end, the sequencing, analysis and annotation of a large collection of expressed sequences tags (ESTs) from genes involved in wood formation in Eucalyptus would provide a valuable resource.  相似文献   

5.

Background  

The main two sorts of automatic gene annotation frameworks are ab initio and alignment-based, the latter splitting into two sub-groups. The first group is used for intra-species alignments, among which are successful ones with high specificity and speed. The other group contains more sensitive methods which are usually applied in aligning inter-species sequences.  相似文献   

6.

Background  

Repetitive DNA is a major fraction of eukaryotic genomes and occurs particularly often in plants. Currently, the sequencing of the sugar beet (Beta vulgaris) genome is under way and knowledge of repetitive DNA sequences is critical for the genome annotation. We generated a c 0 t-1 library, representing highly to moderately repetitive sequences, for the characterization of the major B. vulgaris repeat families. While highly abundant satellites are well-described, minisatellites are only poorly investigated in plants. Therefore, we focused on the identification and characterization of these tandemly repeated sequences.  相似文献   

7.

Background  

The Minimal Information Requested In the Annotation of biochemical Models (MIRIAM) is a set of guidelines for the annotation and curation processes of computational models, in order to facilitate their exchange and reuse. An important part of the standard consists in the controlled annotation of model components, based on Uniform Resource Identifiers. In order to enable interoperability of this annotation, the community has to agree on a set of standard URIs, corresponding to recognised data types. MIRIAM Resources are being developed to support the use of those URIs.  相似文献   

8.

Background  

Complete sequencing and annotation of the 96.2 kb Bacillus anthracis plasmid, pXO2, predicted 85 open reading frames (ORFs). Bacillus cereus and Bacillus thuringiensis isolates that ranged in genomic similarity to B. anthracis, as determined by amplified fragment length polymorphism (AFLP) analysis, were examined by PCR for the presence of sequences similar to 47 pXO2 ORFs.  相似文献   

9.
10.

Background  

To meet the needs of gene annotation for newly sequenced organisms, optimized spaced seeds can be implemented into cross-species sequence alignment programs to accurately align gene sequences to the genome of a related species. So far, seed performance has been tested for comparisons between closely related species, such as human and mouse, or on simulated data. As the number and variety of genomes increases, it becomes desirable to identify a small set of universal seeds that perform optimally or near-optimally on a large range of comparisons.  相似文献   

11.

Background  

Identifying domains in protein sequences is an important step in protein structural and functional annotation. Existing domain recognition methods typically evaluate each domain prediction independently of the rest. However, the majority of proteins are multidomain, and pairwise domain co-occurrences are highly specific and non-transitive.  相似文献   

12.

Background  

Viroids, satellite RNAs, satellites viruses and the human hepatitis delta virus form the 'brotherhood' of the smallest known infectious RNA agents, known as the subviral RNAs. For most of these species, it is generally accepted that characteristics such as cell movement, replication, host specificity and pathogenicity are encoded in their RNA sequences and their resulting RNA structures. Although many sequences are indexed in publicly available databases, these sequence annotation databases do not provide the advanced searches and data manipulation capability for identifying and characterizing subviral RNA motifs.  相似文献   

13.

Background  

Databases for either sequence, annotation, or microarray experiments data are extremely beneficial to the research community, as they centrally gather information from experiments performed by different scientists. However, data from different sources develop their full capacities only when combined. The idea of a data warehouse directly adresses this problem and solves it by integrating all required data into one single database – hence there are already many data warehouses available to genetics. For the model legume Medicago truncatula, there is currently no such single data warehouse that integrates all freely available gene sequences, the corresponding gene expression data, and annotation information. Thus, we created the data warehouse TRUNCATULIX, an integrative database of Medicago truncatula sequence and expression data.  相似文献   

14.

Background  

Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software.  相似文献   

15.

Background  

The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. GOtcha predicts GO term associations with term-specific probability (P-score) measures of confidence. Term-specific probabilities are a novel feature of GOtcha and allow the identification of conflicts or uncertainty in annotation.  相似文献   

16.

Background  

Enzymes belonging to acyl:CoA synthetase (ACS) superfamily activate wide variety of substrates and play major role in increasing the structural and functional diversity of various secondary metabolites in microbes and plants. However, due to the large sequence divergence within the superfamily, it is difficult to predict their substrate preference by annotation transfer from the closest homolog. Therefore, a large number of ACS sequences present in public databases lack any functional annotation at the level of substrate specificity. Recently, several examples have been reported where the enzymes showing high sequence similarity to luciferases or coumarate:CoA ligases have been surprisingly found to activate fatty acyl substrates in experimental studies. In this work, we have investigated the relationship between the substrate specificity of ACS and their sequence/structural features, and developed a novel computational protocol for in silico assignment of substrate preference.  相似文献   

17.

Background  

The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups.  相似文献   

18.

Background  

Ribulose-1,5-bisphosphate is the rate-limiting enzyme in photosynthesis. The catalytic large subunit of the green-algal enzyme from Chlamydomonas reinhardtii is ~90% identical to the flowering-plant sequences, although they confer diverse kinetic properties. To identify the regions that may account for species variation in kinetic properties, directed mutagenesis and chloroplast transformation were used to create four amino-acid substitutions in the carboxy terminus of the Chlamydomonas large subunit to mimic the sequence of higher-specificity plant enzymes.  相似文献   

19.

Background  

Most of the existing in silico phosphorylation site prediction systems use machine learning approach that requires preparing a good set of classification data in order to build the classification knowledge. Furthermore, phosphorylation is catalyzed by kinase enzymes and hence the kinase information of the phosphorylated sites has been used as major classification data in most of the existing systems. Since the number of kinase annotations in protein sequences is far less than that of the proteins being sequenced to date, the prediction systems that use the information found from the small clique of kinase annotated proteins can not be considered as completely perfect for predicting outside the clique. Hence the systems are certainly not generalized. In this paper, a novel generalized prediction system, PPRED (Phosphorylation PREDictor) is proposed that ignores the kinase information and only uses the evolutionary information of proteins for classifying phosphorylation sites.  相似文献   

20.

Background  

Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号