首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Expressed Sequence Tags (ESTs) are next to cDNA sequences as the most direct way to locate in silico the genes of the genome and determine their structure. Currently ESTs make up more than 60% of all the database entries. The goal of this work is the development of a new program called DNA Intelligent Analysis for ESTs (DIANA-EST) based on a combination of Artificial Neural Networks (ANN) and statistics for the characterization of the coding regions within ESTs and the reconstruction of the encoded protein. RESULTS: 89.7% of the nucleotides from an independent test set with 127 ESTs were predicted correctly as to whether they are coding or non coding. AVAILABILITY: The program is available upon request from the author. CONTACT: Present address: Department of Genetics, University of Pennsylvania, School of Medicine, 475 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104-6145, USA. artemis@pcbi.upenn.edu.  相似文献   

2.
Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has led to the identification of tags to about 8000 of the estimated 20 000 genes in Arabidopsis thaliana . This figure represents four to five times the number of complete coding sequences from this organism available in international databases. In contrast to mammals, many proteins are encoded by multigene families in A. thaliana . Using ribosomal protein gene families as an example, it is possible to construct relatively long sequences from overlapping ESTs which are of sufficiently high quality to be able to unambiguously identify tags to individual members of multigene families, even when the sequences are highly conserved. A total of 106 genes encoding 50 different cytoplasmic ribosomal protein types have been identified, most proteins being encoded by at least two and up to four genes. Coding sequences of members of individual gene families are almost always very highly conserved and derived amino acid sequences are almost, if not completely, identical in the vast majority of cases. Sequence divergence is observed in untranslated regions which allows the definition of gene-specific probes. The method can be used to construct high-quality tags to any protein.  相似文献   

3.
Translation initiation start prediction in human cDNAs with high accuracy   总被引:3,自引:0,他引:3  
MOTIVATION: Correct identification of the Translation Initiation Start (TIS) in cDNA sequences is an important issue for genome annotation. The aim of this work is to improve upon current methods and provide a performance guaranteed prediction. METHODS: This is achieved by using two modules, one sensitive to the conserved motif and the other sensitive to the coding/non-coding potential around the start codon. Both modules are based on Artificial Neural Networks (ANNs). By applying the simplified method of the ribosome scanning model, the algorithm starts a linear search at the beginning of the coding ORF and stops once the combination of the two modules predicts a positive score. RESULTS: According to the results of the test group, 94% of the TIS were correctly predicted. A confident decision is obtained through the use of the Las Vegas algorithm idea. The incorporation of this algorithm leads to a highly accurate recognition of the TIS in human cDNAs for 60% of the cases. Availability: The program is available upon request from the author.  相似文献   

4.
NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset composition; (ii) sequence annotations extracted from public databases; and (iii) protein sequence-level annotations derived from public domain programs, as well as libraries of hidden Markov models (HMMs) developed at Affymetrix. For each probeset, NetAffx lists the probe sequences, and the consensus sequence interrogated by the probes; for the larger chip sets, interactive maps display this sequence data in genomic context. Sequence annotations include Gene Ontology (GO) terms and depiction of GO graph relationships; predicted protein domains and motifs; orthologous sequences; links to relevant pathways; and links to public databases including UniGene, LocusLink, SWISS-PROT and OMIM.  相似文献   

5.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

6.
7.
Three multivariate statistical techniques (Multiway Principal Component Analysis, Multiway Partial Least Squares, and Stepwise Linear Discriminant Analysis) and one artificial intelligence method (Artificial Neural Networks) were evaluated to detect and predict early abnormal behaviors of wine fermentations. The techniques were tested with data of thirty-two variables at different stages of fermentation from industrial wine fermentations of Cabernet Sauvignon. All the techniques studied considered a pre-treatment to obtain a homogeneous space and reduce the overfitting. The results were encouraging; it was possible to classify at 72h 100% of the fermentation correctly with three variables using Multiway Partial Least Squares and Artificial Neural Networks. Additional and complementary results were obtained with Stepwise Linear Discriminant Analysis, which found that ethanol, sugars and density measurements are able to discriminate abnormal behavior.  相似文献   

8.
9.
This study used the Discriminant Analysis statistical technique and Artificial Neural Networks, multilayer perceptron, in the classification of three fish species sampled in the state of Rio de Janeiro, Brazil: Geophagus brasiliensis (acaras), Tilapia rendall (tilapias) and Mugil liza (mullets). These fish were sexed when possible, weighed, measured, and had their Gonadosomatic and Hepatosomatic Indices calculated, as well as their Condition Factor. The use of an Artificial Neural Network (ANN) presented satisfactory results, even though the groups were composed of very diverse-sized animals. Without the need for non-violation assumptions and other considerations, the Artificial Neural Network was found to be the excellent alternative to classification problems of unbalanced data, such as the one presented in this study.  相似文献   

10.
CRITICA: coding region identification tool invoking comparative analysis.   总被引:34,自引:0,他引:34  
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).  相似文献   

11.
We develop and study two neural network models of perceptual alternations. Both models have a star-like architecture of connections with a central element connected to a set of peripheral elements. A particular perception is simulated in terms of partial synchronization between the central element and some sub-group of peripheral elements. The first model is constructed from phase oscillators and the mechanism of perceptual alternations is based on chaotic intermittency under fixed parameter values. Similar to experimental evidence, the distribution of times between perceptual alternations is represented by the gamma distribution. The second model is built of spiking neurons of the Hodgkin–Huxley type. The mechanism of perceptual alternations is based on plasticity of inhibitory synapses which increases the inhibition from the central unit to the neural assembly representing the current percept. As a result another perception is formed. Simulations show that the second model is in good agreement with behavioural data on switching times between percepts of ambiguous figures and with experimental results on binocular rivalry of two and four percepts. This article is part of a special issue on Neuronal Dynamics of Sensory Coding. This special issue is in honour of Professor Pepe Segundo who is one of the pioneers in the study of neural coding. Pepe has been an active participant in many Neural Coding Workshops sharing his great knowledge and experience of research in this field. I (R. Borisyuk) was very happy to meet Pepe for the first time in Prague when attending the first Neural Coding Workshop in 1995. From that time we regularly met at Neural Coding Workshops and these meetings have always been very stimulating and fruitful for my research. Remarkably, the first paper I studied at the beginning of my scientific career was a seminal paper by Moore et al. (1970). For me, this paper provided a great opportunity to learn the basic statistical techniques for the analysis of multiple spike trains and neural coding. According to the Institute of Scientific Information, this paper has been cited 380 times! This exciting paper has inspired my research into the synaptic and functional connectivity of neural circuits derived from spike-train recordings (Borisyuk et al. 1985; Stuart et al. 2005) and guided my search for new ideas on neural coding.  相似文献   

12.
13.
14.
15.
This work examines the use of Hybrid Intelligent Systems in the pattern recognition system of an artificial nose. The connectionist approaches Multi-Layer Perceptron and Time Delay Neural Networks, and the hybrid approaches Feature-Weighted Detector and Evolving Neural Fuzzy Networks were investigated. A Wavelet Filter is evaluated as a preprocessing method for odor signals. The signals generated by an artificial nose were composed by an array of conducting polymer sensors and exposed to two different odor databases.  相似文献   

16.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.  相似文献   

17.
HUGE is a database for human large proteins newly identified by Kazusa cDNA project, which aims to predict protein primary structures from sequences of human large cDNAs (>4 kb). In particular, cDNA clones capable of coding for large proteins (>50 kDa) are current targets of the project. More than 700 sequences of human cDNAs (average size, 5.1 kb) have been determined to date and deposited in the public databases. Notable information implied from the cDNAs and the predicted protein sequences can be obtained through HUGE via the World Wide Web at URL http://www.kazusa.or.jp/huge  相似文献   

18.
A new system to recognize protein coding genes in the coronavirus genomes, specially suitable for the SARS-CoV genomes, has been proposed in this paper. Compared with some existing systems, the new program package has the merits of simplicity, high accuracy, reliability, and quickness. The system ZCURVE_CoV has been run for each of the 11 newly sequenced SARS-CoV genomes. Consequently, six genomes not annotated previously have been annotated, and some problems of previous annotations in the remaining five genomes have been pointed out and discussed. In addition to the polyprotein chain ORFs 1a and 1b and the four genes coding for the major structural proteins, spike (S), small envelop (E), membrane (M), and nuleocaspid (N), respectively, ZCURVE_CoV also predicts 5-6 putative proteins in length between 39 and 274 amino acids with unknown functions. Some single nucleotide mutations within these putative coding sequences have been detected and their biological implications are discussed. A web service is provided, by which a user can obtain the annotated result immediately by pasting the SARS-CoV genome sequences into the input window on the web site (http://tubic.tju.edu.cn/sars/). The software ZCURVE_CoV can also be downloaded freely from the web address mentioned above and run in computers under the platforms of Windows or Linux.  相似文献   

19.
20.
The EMBL Nucleotide Sequence Database   总被引:8,自引:3,他引:5       下载免费PDF全文
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号