首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
ESTAP--an automated system for the analysis of EST data   总被引:2,自引:0,他引:2  
The EST Analysis Pipeline (ESTAP) is a set of analytical procedures that automatically verify, cleanse, store and analyze ESTs generated on high-throughput platforms. It uses a relational database to store sequence data and analysis results, which facilitates both the search for specific information and statistical analysis. ESTAP provides for easy viewing of the original and cleansed data, as well as the analysis results via a Web browser. It also allows the data owner to submit selected sequences to dbEST in a semi-automated fashion.  相似文献   

3.
Expressed sequence tags (ESTs) are widely used in gene survey research these years. The EST Pipeline System, software developed by Hangzhou Genomics Institute (HGI), can automatically analyze different scalar EST sequences by suitable methods. All the analysis reports, including those of vector masking, sequence assembly, gene annotation, Gene Ontology classification, and some other analyses, can be browsed and searched as well as downloaded in the Excel format from the web interface, saving research efforts from routine data processing for biological rules embedded in the data.  相似文献   

4.
5.
Examination of trees for the presence of particular nodes is a fundamental aspect of systematics, and is the basis of phylogenetic sensitivity analysis, but becomes unwieldy when performed manually for complex nodes or over large numbers of trees. The program Cladescan is presented here as a stand-alone application to facilitate the detection of nodes in such situations. Cladescan includes features useful for phylogenetic sensitivity analysis, such as automatic generation of "Navajo rug" sensitivity plots. In addition, researchers may find it useful for general comparisons among large data sets.
© The Willi Hennig Society 2009.  相似文献   

6.
7.
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.  相似文献   

8.
MOTIVATION: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available. RESULTS: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.  相似文献   

9.
An automated phylogenetic key for classifying homeoboxes   总被引:3,自引:0,他引:3  
When novel gene sequences are discovered, they are usually identified, classified, and annotated based on aggregate measures of sequence similarity. This method is prone to errors, however. Phylogenetic analysis is a more accurate basis for gene classification and ortholog identification, but it is relatively labor-intensive and computationally demanding. Here we report and demonstrate a rapid new method for gene classification based on phylogenetic principles. Given the phylogeny of a minimal sample of gene family members, our method automatically identifies amino acids that are phylogenetically characteristic of each class of sequences in the family; it then classifies a novel sequence based on the presence of these characteristic attributes in its sequence. Using a subset of homeobox protein sequences as a test case, we show that our method approximates classification based on full-scale phylogenetic analysis with very high accuracy in a tiny fraction of the time.  相似文献   

10.
Y RNAs are small 'cytoplasmic' RNAs which are components of the Ro ribonucleoprotein (RNP) complex. The core of this complex, which is found in the cell nuclei of higher eukaryotes as well as the cytoplasm, is composed of a complex between the 60 kDa Ro protein and Y RNAs. Human cells contain four distinct Y RNAs (Y1, Y3, Y4 and Y5), while other eukaryotes contain a variable number of Y RNA homologues. When detected in a particular species, the Ro RNP has been present in every cell type within that particular organism. This characteristic, along with its high conservation among vertebrates, suggests an important function for Ro RNP in cellular metabolism; however, this function has not yet been definitively elucidated. In order to identify conserved features of Y RNA sequences and structures which may be directly involved in Ro RNP function, a phylogenetic comparative analysis of Y RNAs has been performed. Sequences of Y RNA homologues from five vertebrate species have been obtained and, together with previously published Y RNA sequences, used to predict Y RNA secondary structures. A novel RNA secondary structure comparison algorithm, the suboptimal RNA analysis program, has been developed and used in conjunction with available algorithms to find phylogenetically conserved secondary structure models for YI, Y3 and Y4 RNAs. Short, conserved sequences within the Y RNAs have been identified and are invariant among vertebrates, consistent with a direct role for Y RNAs in Ro function. A subset of these are located wholly or partially in looped regions in the Y3 and Y4 RNA predicted model structures, in accord with the possibility that these Y RNAs base pair with other cellular nucleic acids or are sites of interaction between the Ro RNP and other macromolecules.  相似文献   

11.
Sequence-based phylogenetic analyses typically are based on a small number of character sets and report gene trees which may not reflect the true species tree. We employed an EST mining strategy to suppress such incongruencies, and recovered the most robust phylogeny for five species of plant-parasitic nematode (Meloidogyne arenaria, M. chitwoodi, M. hapla, M. incognita, and M. javanica), three closely related tylenchid taxa (Heterodera glycines, Globodera pallida, and G. rostochiensis) and a distant taxon, Caenorhabditis elegans. Our multiple-gene approach is based on sampling more than 80,000 publicly available tylenchid EST sequences to identify phylum-wide orthologues. Bayesian inference, minimum evolution, maximum likelihood and protein distance methods were employed for phylogenetic reconstruction and hypothesis tests were constructed to elucidate differential selective pressures across the phylogeny for each gene. Our results place M. incognita and M. javanica as sister taxa, with M. arenaria as the next closely related nematode. Significant differences in selective pressure were revealed for some genes under some hypotheses, though all but one gene are exclusively under purifying selection, indicating conservation across the orthologous groups. This EST-based multi-gene analysis is a first step towards accomplishing genome-wide coverage for tylenchid evolutionary analyses.  相似文献   

12.

Background  

Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation.  相似文献   

13.
The MIKC MADS-box gene family has been shaped by extensive gene duplications giving rise to subfamilies of genes with distinct functions and expression patterns. However, within these subfamilies the functional assignment is not that clear-cut, and considerable functional redundancy exists. One way to investigate the diversity in regulation present in these subfamilies is promoter sequence analysis. With the advent of genome sequencing projects, we are now able to exert a comparative analysis of Arabidopsis and poplar promoters of MADS-box genes belonging to the same subfamily. Based on the principle of phylogenetic footprinting, sequences conserved between the promoters of homologous genes are thought to be functional. Here, we have investigated the evolution of MADS-box genes at the promoter level and show that many genes have diverged in their regulatory sequences after duplication and/or speciation. Furthermore, using phylogenetic footprinting, a distinction can be made between redundancy, neo/nonfunctionalization, and subfunctionalization.  相似文献   

14.
15.

Background  

Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds.  相似文献   

16.
17.
18.
SUMMARY: Clann has been developed in order to provide methods of investigating phylogenetic information through the application of supertrees. AVAILABILITY: Clann has been precompiled for Linux, Apple Macintosh and Windows operating systems and is available from http://bioinf.may.ie/software/clann. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Clann has been written in the C programming language. Source code is available on request.  相似文献   

19.
Phylogenetic diversity enhances ecosystem functioning but restoration ecology has not taken advantage of this knowledge. We propose plant facilitation as a mechanism to promote phylogenetic diversity in restoration practices. We planted three functionally different species (Gypsophila struthium, Sedum album, and Limonium sucronicum) in a degraded gypsum ecosystem in Spain and found that after 7 years, the species with nurse traits (G. struthium) survived longer and facilitated the establishment of new species forming phylogenetically diverse neighborhoods. These facilitation‐driven phylodiverse communities may potentially produce a cascade of benefits on ecosystem functioning.  相似文献   

20.
The amplified fragment length polymorphism (AFLP) technique is an increasingly popular component of the phylogenetic toolbox, particularly for plant species. Technological advances in capillary electrophoresis now allow very precise estimates of DNA fragment mobility and amplitude, and current AFLP software allows greater control of data scoring and the production of the binary character matrix. However, for AFLP to become a useful modern tool for large data sets, improvements to automated scoring are required. We design a procedure that can be used to optimize AFLP scoring parameters to improve phylogenetic resolution and demonstrate it for two AFLP scoring programs (GeneMapper and GeneMarker). In general, we found that there was a trade-off between getting more characters of lower quality and fewer characters of high quality. Conservative settings that gave the least error did not give the best phylogenetic resolution, as too many useful characters were discarded. For example, in GeneMapper, we found that bin width was a crucial parameter, and that although reducing bin width from 1.0 to 0.5 base pairs increased the error rate, it nevertheless improved resolution due to the increased number of informative characters. For our 30-taxon data sets, moving from default to optimized parameter settings gave between 3 and 11 extra internal edges with >50% bootstrap support, in the best case increasing the number of resolved edges from 14 to 25 out of a possible 27. Nevertheless, improvements to current AFLP software packages are needed to (1) make use of replicate profiles to calibrate the data and perform error calculations and (2) perform tests to optimize scoring parameters in a rigorous and automated way. This is true not only when AFLP data are used for phylogenetics, but also for other applications, including linkage mapping and population genetics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号