共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Expressed Sequence Tags (ESTs) are next to cDNA sequences as the most direct way to locate in silico the genes of the genome and determine their structure. Currently ESTs make up more than 60% of all the database entries. The goal of this work is the development of a new program called DNA Intelligent Analysis for ESTs (DIANA-EST) based on a combination of Artificial Neural Networks (ANN) and statistics for the characterization of the coding regions within ESTs and the reconstruction of the encoded protein. RESULTS: 89.7% of the nucleotides from an independent test set with 127 ESTs were predicted correctly as to whether they are coding or non coding. AVAILABILITY: The program is available upon request from the author. CONTACT: Present address: Department of Genetics, University of Pennsylvania, School of Medicine, 475 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104-6145, USA. artemis@pcbi.upenn.edu. 相似文献
2.
Richard Cooke Monique Raynal Michele Laudié Michel Delseny 《The Plant journal : for cell and molecular biology》1997,11(5):1127-1140
Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has led to the identification of tags to about 8000 of the estimated 20 000 genes in Arabidopsis thaliana . This figure represents four to five times the number of complete coding sequences from this organism available in international databases. In contrast to mammals, many proteins are encoded by multigene families in A. thaliana . Using ribosomal protein gene families as an example, it is possible to construct relatively long sequences from overlapping ESTs which are of sufficiently high quality to be able to unambiguously identify tags to individual members of multigene families, even when the sequences are highly conserved. A total of 106 genes encoding 50 different cytoplasmic ribosomal protein types have been identified, most proteins being encoded by at least two and up to four genes. Coding sequences of members of individual gene families are almost always very highly conserved and derived amino acid sequences are almost, if not completely, identical in the vast majority of cases. Sequence divergence is observed in untranslated regions which allows the definition of gene-specific probes. The method can be used to construct high-quality tags to any protein. 相似文献
3.
Hatzigeorgiou AG 《Bioinformatics (Oxford, England)》2002,18(2):343-350
MOTIVATION: Correct identification of the Translation Initiation Start (TIS) in cDNA sequences is an important issue for genome annotation. The aim of this work is to improve upon current methods and provide a performance guaranteed prediction. METHODS: This is achieved by using two modules, one sensitive to the conserved motif and the other sensitive to the coding/non-coding potential around the start codon. Both modules are based on Artificial Neural Networks (ANNs). By applying the simplified method of the ribosome scanning model, the algorithm starts a linear search at the beginning of the coding ORF and stops once the combination of the two modules predicts a positive score. RESULTS: According to the results of the test group, 94% of the TIS were correctly predicted. A confident decision is obtained through the use of the Las Vegas algorithm idea. The incorporation of this algorithm leads to a highly accurate recognition of the TIS in human cDNAs for 60% of the cases. Availability: The program is available upon request from the author. 相似文献
4.
Liu G Loraine AE Shigeta R Cline M Cheng J Valmeekam V Sun S Kulp D Siani-Rose MA 《Nucleic acids research》2003,31(1):82-86
NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset composition; (ii) sequence annotations extracted from public databases; and (iii) protein sequence-level annotations derived from public domain programs, as well as libraries of hidden Markov models (HMMs) developed at Affymetrix. For each probeset, NetAffx lists the probe sequences, and the consensus sequence interrogated by the probes; for the larger chip sets, interactive maps display this sequence data in genomic context. Sequence annotations include Gene Ontology (GO) terms and depiction of GO graph relationships; predicted protein domains and motifs; orthologous sequences; links to relevant pathways; and links to public databases including UniGene, LocusLink, SWISS-PROT and OMIM. 相似文献
5.
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications. 相似文献
6.
7.
Three multivariate statistical techniques (Multiway Principal Component Analysis, Multiway Partial Least Squares, and Stepwise Linear Discriminant Analysis) and one artificial intelligence method (Artificial Neural Networks) were evaluated to detect and predict early abnormal behaviors of wine fermentations. The techniques were tested with data of thirty-two variables at different stages of fermentation from industrial wine fermentations of Cabernet Sauvignon. All the techniques studied considered a pre-treatment to obtain a homogeneous space and reduce the overfitting. The results were encouraging; it was possible to classify at 72h 100% of the fermentation correctly with three variables using Multiway Partial Least Squares and Artificial Neural Networks. Additional and complementary results were obtained with Stepwise Linear Discriminant Analysis, which found that ethanol, sugars and density measurements are able to discriminate abnormal behavior. 相似文献
8.
9.
R.A. Hauser-Davis T.F. Oliveira A.M. Silveira T.B. Silva R.L. Ziolli 《Ecological Informatics》2010,5(6):474-478
This study used the Discriminant Analysis statistical technique and Artificial Neural Networks, multilayer perceptron, in the classification of three fish species sampled in the state of Rio de Janeiro, Brazil: Geophagus brasiliensis (acaras), Tilapia rendall (tilapias) and Mugil liza (mullets). These fish were sexed when possible, weighed, measured, and had their Gonadosomatic and Hepatosomatic Indices calculated, as well as their Condition Factor. The use of an Artificial Neural Network (ANN) presented satisfactory results, even though the groups were composed of very diverse-sized animals. Without the need for non-violation assumptions and other considerations, the Artificial Neural Network was found to be the excellent alternative to classification problems of unbalanced data, such as the one presented in this study. 相似文献
10.
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu). 相似文献
11.
We develop and study two neural network models of perceptual alternations. Both models have a star-like architecture of connections
with a central element connected to a set of peripheral elements. A particular perception is simulated in terms of partial
synchronization between the central element and some sub-group of peripheral elements. The first model is constructed from
phase oscillators and the mechanism of perceptual alternations is based on chaotic intermittency under fixed parameter values.
Similar to experimental evidence, the distribution of times between perceptual alternations is represented by the gamma distribution.
The second model is built of spiking neurons of the Hodgkin–Huxley type. The mechanism of perceptual alternations is based
on plasticity of inhibitory synapses which increases the inhibition from the central unit to the neural assembly representing
the current percept. As a result another perception is formed. Simulations show that the second model is in good agreement
with behavioural data on switching times between percepts of ambiguous figures and with experimental results on binocular
rivalry of two and four percepts.
This article is part of a special issue on Neuronal Dynamics of Sensory Coding.
This special issue is in honour of Professor Pepe Segundo who is one of the pioneers in the study of neural coding. Pepe has
been an active participant in many Neural Coding Workshops sharing his great knowledge and experience of research in this
field. I (R. Borisyuk) was very happy to meet Pepe for the first time in Prague when attending the first Neural Coding Workshop
in 1995. From that time we regularly met at Neural Coding Workshops and these meetings have always been very stimulating and
fruitful for my research. Remarkably, the first paper I studied at the beginning of my scientific career was a seminal paper
by Moore et al. (1970). For me, this paper provided a great opportunity to learn the basic statistical techniques for the
analysis of multiple spike trains and neural coding. According to the Institute of Scientific Information, this paper has
been cited 380 times! This exciting paper has inspired my research into the synaptic and functional connectivity of neural
circuits derived from spike-train recordings (Borisyuk et al. 1985; Stuart et al. 2005) and guided my search for new ideas
on neural coding. 相似文献
12.
13.
14.
15.
This work examines the use of Hybrid Intelligent Systems in the pattern recognition system of an artificial nose. The connectionist approaches Multi-Layer Perceptron and Time Delay Neural Networks, and the hybrid approaches Feature-Weighted Detector and Evolving Neural Fuzzy Networks were investigated. A Wavelet Filter is evaluated as a preprocessing method for odor signals. The signals generated by an artificial nose were composed by an array of conducting polymer sensors and exposed to two different odor databases. 相似文献
16.
Stoesser G Baker W van den Broek A Garcia-Pastor M Kanz C Kulikova T Leinonen R Lin Q Lombard V Lopez R Mancuso R Nardone F Stoehr P Tuli MA Tzouvara K Vaughan R 《Nucleic acids research》2003,31(1):17-22
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk. 相似文献
17.
HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project. 总被引:3,自引:0,他引:3
下载免费PDF全文
![点击此处可从《Nucleic acids research》网站下载免费的PDF全文](/ch/ext_images/free.gif)
HUGE is a database for human large proteins newly identified by Kazusa cDNA project, which aims to predict protein primary structures from sequences of human large cDNAs (>4 kb). In particular, cDNA clones capable of coding for large proteins (>50 kDa) are current targets of the project. More than 700 sequences of human cDNAs (average size, 5.1 kb) have been determined to date and deposited in the public databases. Notable information implied from the cDNAs and the predicted protein sequences can be obtained through HUGE via the World Wide Web at URL http://www.kazusa.or.jp/huge 相似文献
18.
ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes,and its applications in analyzing SARS-CoV genomes 总被引:4,自引:0,他引:4
Chen LL Ou HY Zhang R Zhang CT 《Biochemical and biophysical research communications》2003,307(2):382-388
A new system to recognize protein coding genes in the coronavirus genomes, specially suitable for the SARS-CoV genomes, has been proposed in this paper. Compared with some existing systems, the new program package has the merits of simplicity, high accuracy, reliability, and quickness. The system ZCURVE_CoV has been run for each of the 11 newly sequenced SARS-CoV genomes. Consequently, six genomes not annotated previously have been annotated, and some problems of previous annotations in the remaining five genomes have been pointed out and discussed. In addition to the polyprotein chain ORFs 1a and 1b and the four genes coding for the major structural proteins, spike (S), small envelop (E), membrane (M), and nuleocaspid (N), respectively, ZCURVE_CoV also predicts 5-6 putative proteins in length between 39 and 274 amino acids with unknown functions. Some single nucleotide mutations within these putative coding sequences have been detected and their biological implications are discussed. A web service is provided, by which a user can obtain the annotated result immediately by pasting the SARS-CoV genome sequences into the input window on the web site (http://tubic.tju.edu.cn/sars/). The software ZCURVE_CoV can also be downloaded freely from the web address mentioned above and run in computers under the platforms of Windows or Linux. 相似文献
19.
20.
Guenter Stoesser Wendy Baker Alexandra van den Broek Evelyn Camon Maria Garcia-Pastor Carola Kanz Tamara Kulikova Rasko Leinonen Quan Lin Vincent Lombard Rodrigo Lopez Nicole Redaschi Peter Stoehr Mary Ann Tuli Katerina Tzouvara Robert Vaughan 《Nucleic acids research》2002,30(1):21-26
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk. 相似文献