首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
分析了百合目主要类群叶绿体中编码核酮糖1,5二磷酸羧化氧化酶大亚基rbcL基因的42条序列,使用RRTree相对速率检测方法,详细研究rbcL基因在百合目7科间同义替代速率和非同义替代速率的变化.相对速率检测显示:百合目内秋水仙科(Colchicaceae)的同义替代速率和非同义替代速率均最快,金梅草科(Campynemat-aceae)同义替代速率最慢,百合科(Liliaceae)的非同义替代速率最慢,但在百合目各科间,无论同义替代速率还是非同义替代速率差异均不显著.  相似文献   

2.
MOTIVATION: Recent advances in gene sequencing have provided complete sequence information for a number of genomes and as a result the amount of data in the sequence databases is growing at an exponential rate. We introduce here a new program, DbW, to automate the update of a functional family-specific multiple alignment that tries to include relevant sequences. The program is based on the use of different sources of information: sequences and annotations in databases. RESULTS: The advantages of DbW are demonstrated using the 20 families of aminoacyl-tRNA synthetases, where DbW detects a maximum of homologous sequences in the Swiss-Prot and SPTREMBL databases. The global specificity of DbW in this test is 98.4% (1.6% of the sequences included in the alignment did not belong to the family according to their function), and the global sensitivity of DbW is estimated to be 95.2%. Thus, DbW provides a reliable basis for the many applications that rely on accurate multiple alignments, e.g. functional residue identification, 2D/3D structure prediction or homology modeling. AVAILABILITY: The DbW software is available for download at ftp://ftp-igbmc.u-strasbg.fr/pub/DbW/DbW.tar and online at http://titus.u-strasbg.fr/DbW CONTACT: prigent@igbmc.u-strasbg.fr.  相似文献   

3.
SUMMARY: Characterizing genetic diversity through genotyping short amplicons is central to evolutionary biology. Next-generation sequencing (NGS) technologies changed the scale at which these type of data are acquired. SESAME is a web application package that assists genotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. It automatically assigns reads to loci and individuals, corrects reads if standard samples are available and provides an intuitive graphical user interface (GUI) for allele validation based on the sequences and associated decision-making tools. The aim of SESAME is to help allele identification among a large number of sequences. AVAILABILITY: SESAME and its documentation are freely available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ or http://tinyurl.com/ngs-sesame.  相似文献   

4.
SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html  相似文献   

5.
A strategy for finding regions of similarity in complete genome sequences   总被引:3,自引:2,他引:1  
MOTIVATION: Complete genomic sequences will become available in the future. New methods to deal with very large sequences (sizes beyond 100 kb) efficiently are required. One of the main aims of such work is to increase our understanding of genome organization and evolution. This requires studies of the locations of regions of similarity. RESULTS: We present here a new tool, ASSIRC ('Accelerated Search for SImilarity Regions in Chromosomes'), for finding regions of similarity in genomic sequences. The method involves three steps: (i) identification of short exact chains of fixed size, called 'seeds', common to both sequences, using hashing functions; (ii) extension of these seeds into putative regions of similarity by a 'random walk' procedure; (iii) final selection of regions of similarity by assessing alignments of the putative sequences. We used simulations to estimate the proportion of regions of similarity not detected for particular region sizes, base identity proportions and seed sizes. This approach can be tailored to the user's specifications. We looked for regions of similarity between two yeast chromosomes (V and IX). The efficiency of the approach was compared to those of conventional programs BLAST and FASTA, by assessing CPU time required and the regions of similarity found for the same data set. AVAILABILITY: Source programs are freely available at the following address: ftp://ftp.biologie.ens. fr/pub/molbio/assirc.tar.gz CONTACT: vincens@biologie.ens.fr, hazout@urbb.jussieu.fr   相似文献   

6.
PredAcc: prediction of solvent accessibility   总被引:2,自引:0,他引:2  
SUMMARY: PredAcc is a tool for predicting the solvent accessibility of protein residues from the sequence at different relative accessibility levels (0-55%). The prediction rate varies between 70. 7% (for 25% relative accessibility) and 85.7% (for 0% relative accessibility). Amino acids are predicted in four categories: almost certainly hidden and almost certainly exposed with a given a posteriori prediction error, probably hidden and probably exposed otherwise. AVAILABILITY: http://condor.urbb.jussieu.fr/PredAccCfg.html CONTACT: tuffery@urbb.jussieu.fr  相似文献   

7.
In the context of the international project aiming at sequencing the whole genome of Bacillus subtilis we have developed NRSub, a non-redundant database of sequences from this organism. Starting from the B.subtilis sequences available in the repository collections we have removed all encountered duplications, then we have added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage index). We have also added cross-references with EMBL/GenBank/DDBJ, MEDLINE, SWISS-PROT and ENZYME databases. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access the database through two dedicated World Wide Web servers located in France (http://acnuc.univ-lyon1.fr/nrsub/nrsub.++ +html ) and in Japan (http://ddbjs4h.genes.nig.ac.jp/ ).  相似文献   

8.
BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans-membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg. fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2 /.  相似文献   

9.
MOTIVATION: The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of domain, a segmentation unit useful for describing the architecture of different copies of TEs. Our method extracts occurrences of a terminus-defined family of TEs, aligns the sequences, finds the domains in the alignment and searches the distribution of each domain in sequences. After a classification step relative to the presence or the absence of domains, the method results in a graphical view of sequences segmented into domains. RESULTS: Analysis of the new non-autonomous TE AtREP21 in the model plant Arabidopsis thaliana reveals copies of very different sizes and various combinations of domains which show the potential of our method. AVAILABILITY: DomainOrganizer web page is available at www.irisa.fr/symbiose/DomainOrganizer/.  相似文献   

10.
Repseek, a tool to retrieve approximate repeats from large DNA sequences   总被引:2,自引:0,他引:2  
Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels. AVAILABILITY: http://wwwabi.snv.jussieu.fr/public/RepSeek/  相似文献   

11.
RESULTS: CpGProD is an application for identifying mammalian promoter regions associated with CpG islands in large genomic sequences. Although it is strictly dedicated to this particular promoter class corresponding to approximately 50% of the genes, CpGProD exhibits a higher sensitivity and specificity than other tools used for promoter prediction. Notably, CpGProD uses different parameters according to species (human, mouse) studied. Moreover, CpGProD predicts the promoter orientation on the DNA strand. AVAILABILITY: http://pbil.univ-lyon1.fr/software/cpgprod.html SUPPLEMENTARY INFORMATION: http://pbil.univ-lyon1.fr/software/cpgprod.html  相似文献   

12.
SUMMARY: AliasServer provides services that facilitate the assembly of data or datasets that make use of different identifiers for refering to the same protein. This resource relies on a database which contains, for a given organism, a non-redundant list of protein sequences associated with a set of aliases. AVAILABILITY: AliasServer is available as an interactive Web server at http://cbi.labri.fr/outils/alias/ and as a web service using a SOAP interface. The complete tool, including sources and data, is available for local installations upon request. SUPPLEMENTARY INFORMATION: Technical documentation is available at http://cbi.labri.fr/outils/alias/asdoc.pdf  相似文献   

13.
14.
As the number of complete microbial genomes publicly available is still growing, the problem of annotation quality in these very large sequences remains unsolved. Indeed, the number of annotations associated with complete genomes is usually lower than those of the shorter entries encountered in the repository collections. Moreover, classical sequence database management systems have difficulties in handling entries of such size. In this context, the Enhanced Microbial Genomes Library (EMGLib) was developed to try to alleviate these problems. This library contains all the complete genomes from prokaryotes (bacteria and archaea) already sequenced and the yeast genome in GenBank format. The annotations are improved by the introduction of data on codon usage, gene orientation on the chromosome and gene families. It is possible to access EMGLib through two database systems set up on WWW servers: the PBIL server at http://pbil.univ-lyon1.fr/emglib.html and the MICADO server at http://locus.jouy.inra.fr/micado  相似文献   

15.
Arigon AM  Perrière G  Gouy M 《Biochimie》2008,90(4):609-614
The number of available genomic sequences is growing very fast, due to the development of massive sequencing techniques. Sequence identification is needed and contributes to the assessment of gene and species evolutionary relationships. Automated bioinformatics tools are thus necessary to carry out these identification operations in an accurate and fast way. We developed HoSeqI (Homologous Sequence Identification), a software environment allowing this kind of automated sequence identification using homologous gene family databases. HoSeqI is accessible through a Web interface (http://pbil.univ-lyon1.fr/software/HoSeqI/) allowing to identify one or several sequences and to visualize resulting alignments and phylogenetic trees. We also implemented another application, MultiHoSeqI, to quickly add a large set of sequences to a family database in order to identify them, to update the database, or to help automatic genome annotation. Lately, we developed an application, ChiSeqI (Chimeric Sequence Identification), to automate the processes of identification of bacterial 16S ribosomal RNA sequences and of detection of chimeric sequences.  相似文献   

16.
Microsatellite flanking regions are not necessarily unique sequences, but they may group into sequence families. Microsatellites occurring within such families are likely to give multiple banding patterns during polymerase chain reaction amplifications. microfamily (version 1) is a program that detects flanking‐region similarities between different microsatellite‐containing sequences, thus allowing for potentially problematic sequences to be eliminated prior to primer design. The program also accomplishes some otherwise tedious sequence editing, such as checking for nonpermitted characters, and eliminates poorly readable extremities or potential vector/adapter contamination. microfamily is written in Perl and available for Linux and Windows systems at http://www.up.univ‐mrs.fr/local/egee/dir/meglecz/microfamily.html .  相似文献   

17.
Progress in NGS technologies has opened up new opportunities for characterizing biodiversity, both for individual specimen identification and for environmental barcoding. Although the amount of data available to biologist is increasing, user‐friendly tools to facilitate data analysis have yet to be developed. Our aim, with |SE|S|AM|E| Barcode, is to provide such support through a unified platform. The sequences are analysed through a pipeline that (i) processes NGS amplicon runs, filtering markers and samples, (ii) builds reference libraries and finally (iii) identifies (barcodes) the sequences in each amplicon from the reference library. We use a simulated data set for specimen identification and a recently published data set for environmental barcoding to validate the method. The results obtained are consistent with the expected characterizations (in silico and previously published, respectively). |SE|S|AM|E| Barcode and its documentation are freely available under the Creative Commons Attribution‐NonCommercial‐ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ .  相似文献   

18.
MOTIVATION: Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. RESULTS: We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. AVAILABILITY: http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.  相似文献   

19.
SubtiList: the reference database for the Bacillus subtilis genome   总被引:6,自引:0,他引:6       下载免费PDF全文
SubtiList is the reference database dedicated to the genome of Bacillus subtilis 168, the paradigm of Gram-positive endospore-forming bacteria. Developed in the framework of the B.subtilis genome project, SubtiList provides a curated dataset of DNA and protein sequences, combined with the relevant annotations and functional assignments. Information about gene functions and products is continuously updated by linking relevant bibliographic references. Recently, sequence corrections arising from both systematic verifications and submissions by individual scientists were included in the reference genome sequence. SubtiList is based on a generic relational data schema and a World Wide Web interface developed for the handling of bacterial genomes, called GenoList. The World Wide Web interface was designed to allow users to easily browse through genome data and retrieve information according to common biological queries. SubtiList also provides more elaborate tools, such as pattern searching, which are tightly connected to the overall browsing system. SubtiList is accessible at http://genolist.pasteur.fr/SubtiList/. Similar bacterial databases are accessible at http://genolist.pasteur.fr/.  相似文献   

20.
EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号