共查询到20条相似文献,搜索用时 15 毫秒
1.
Varani AM Monteiro-Vitorello CB de Almeida LG Souza RC Cunha OL Lima WC Civerolo E Van Sluys MA Vasconcelos AT 《Genetics and molecular biology》2012,35(1):149-152
The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br. 相似文献
2.
2009年11月,美、英等国科学家宣布首次绘制出家猪的基因组草图。近两年,随着全基因组序列陆续释放,越来越多的测序片段得到正确拼接组装,从全基因组水平上对猪功能基因进行注释分析显得尤为迫切。文章以丝切蛋白1(Cofilin 1,CFL1)基因的注释过程为例,介绍了运用Sanger研究所开发的Otterlace软件对猪全基因组的免疫基因序列进行人工分析与注释。通过详细说明Zmap、Blixem和Dotter 3个注释工具的使用方法,并给出了注释过程的主要步骤,以期对Otterlace的应用起一个抛砖引玉的作用。运用Otterlace软件对243个免疫相关基因进行分析,其中180个基因得到完整或部分注释,这为后续深入开展这些基因的功能研究奠定了基础。 相似文献
3.
Elnitski L Riemer C Petrykowska H Florea L Schwartz S Miller W Hardison R 《Genomics》2002,80(6):681-690
Sequence conservation between species is useful both for locating coding regions of genes and for identifying functional noncoding segments. Hence interspecies alignment of genomic sequences is an important computational technique. However, its utility is limited without extensive annotation. We describe a suite of software tools, PipTools, and related programs that facilitate the annotation of genes and putative regulatory elements in pairwise alignments. The alignment server PipMaker uses the output of these tools to display detailed information needed to interpret alignments. These programs are provided in a portable format for use on common desktop computers and both the toolkit and the PipMaker server can be found at our Web site (http://bio.cse.psu.edu/). We illustrate the utility of the toolkit using annotation of a pairwise comparison of the mouse MHC class II and class III regions with orthologous human sequences and subsequently identify conserved, noncoding sequences that are DNase I hypersensitive sites in chromatin of mouse cells. 相似文献
4.
Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment in pairwise fashion. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially. In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The proposed method for clustering and annotating genes consists of the following steps: (1) construction of a suffix tree; (2) searching and overlapping common subsequences; (3) grouping subsequence pairs; (4) masking cross-matching pairs; (5) clustering gene sequences; (6) annotating gene clusters by the BLAST search. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria. The system generated 11 clusters and found the longest subsequences of each cluster, which are biologically significant. 相似文献
5.
Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream, exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis. 相似文献
6.
A novel algorithm, GS-Aligner, that uses bit-level operations was developed for aligning genomic sequences. GS-Aligner is efficient in terms of both time and space for aligning two very long genomic sequences and for identifying genomic rearrangements such as translocations and inversions. It is suitable for aligning fairly divergent sequences such as human and mouse genomic sequences. It consists of several efficient components: bit-level coding, search for matching segments between the two sequences as alignment anchors, longest increasing subsequence (LIS), and optimal local alignment. Efforts have been made to reduce the execution time of the program to make it truly practical for aligning very long sequences. Empirical tests suggest that for relatively divergent sequences such as sequences from different mammalian orders or from a mammal and a nonmammalian vertebrate GS-Aligner performs better than existing methods. The program and data can be downloaded from http://pondside.uchicago.edu/~lilab/ and http://webcollab.iis.sinica.edu.tw/~biocom. 相似文献
7.
The contribution of transposable elements (TEs) to genome structure and evolution as well as their impact on genome sequencing, assembly, annotation and alignment has generated increasing interest in developing new methods for their computational analysis. Here we review the diversity of innovative approaches to identify and annotate TEs in the post-genomic era, covering both the discovery of new TE families and the detection of individual TE copies in genome sequences. These approaches span a broad spectrum in computational biology including de novo, homology-based, structure-based and comparative genomic methods. We conclude that the integration and visualization of multiple approaches and the development of new conceptual representations for TE annotation will further advance the computational analysis of this dynamic component of the genome. 相似文献
8.
Venditti R De Gregorio E Silvestro G Bertocco T Salza MF Zarrilli R Di Nocera PP 《FEMS microbiology letters》2007,276(2):193-201
The structural organization of Enterococcus faecalis repeats (EFAR) is described, palindromic DNA sequences identified in the genome of the Enterococcus faecalis V583 strain by in silico analyses. EFAR are a novel type of miniature insertion sequences, which vary in size from 42 to 650 bp. Length heterogeneity results from the variable assembly of 16 different sequence types. Most elements measure 170 bp, and can fold into peculiar L-shaped structures resulting from the folding of two independent stem-loop structures (SLSs). Homologous chromosomal regions lacking or containing EFAR sequences were identified by PCR among 20 E. faecalis clinical isolates of different genotypes. Sequencing of a representative set of 'empty' sites revealed that 24-37 bp-long sequences, unrelated to each other but all able to fold into SLSs, functioned as targets for the integration of EFAR. In the process, most of the SLS had been deleted, but part of the targeted stems had been retained at EFAR termini. 相似文献
9.
Bright LA Mujahid N Nanduri B McCarthy FM Costa LR Burgess SC Swiderski CE 《Animal genetics》2011,42(4):395-405
The equine genome sequence enables the use of high-throughput genomic technologies in equine research, but accurate identification of expressed gene products and interpreting their biological relevance require additional structural and functional genome annotation. Here, we employ the equine genome sequence to identify predicted and known proteins using proteomics and model these proteins into biological pathways, identifying 582 proteins in normal cell-free equine bronchoalveolar lavage fluid (BALF). We improved structural and functional annotation by directly confirming the in vivo expression of 558 (96%) proteins, which were computationally predicted previously, and adding Gene Ontology (GO) annotations for 174 proteins, 108 of which lacked functional annotation. Bronchoalveolar lavage is commonly used to investigate equine respiratory disease, leading us to model the associated proteome and its biological functions. Modelling of protein functions using Ingenuity Pathway Analysis identified carbohydrate metabolism, cell-to-cell signalling, cellular function, inflammatory response, organ morphology, lipid metabolism and cellular movement as key biological processes in normal equine BALF. Comparative modelling of protein functions in normal cell-free bronchoalveolar lavage proteomes from horse, human, and mouse, performed by grouping GO terms sharing common ancestor terms, confirms conservation of functions across species. Ninety-one of 92 human GO categories and 105 of 109 mouse GO categories were conserved in the horse. Our approach confirms the utility of the equine genome sequence to characterize protein networks without antibodies or mRNA quantification, highlights the need for continued structural and functional annotation of the equine genome and provides a framework for equine researchers to aid in the annotation effort. 相似文献
10.
The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming. 相似文献
11.
12.
A new potential energy function representing the conformational preferences of sequentially local regions of a protein backbone is presented. This potential is derived from secondary structure probabilities such as those produced by neural network-based prediction methods. The potential is applied to the problem of remote homolog identification, in combination with a distance-dependent inter-residue potential and position-based scoring matrices. This fold recognition jury is implemented in a Java application called JThread. These methods are benchmarked on several test sets, including one released entirely after development and parameterization of JThread. In benchmark tests to identify known folds structurally similar to (but not identical with) the native structure of a sequence, JThread performs significantly better than PSI-BLAST, with 10% more structures identified correctly as the most likely structural match in a fold library, and 20% more structures correctly narrowed down to a set of five possible candidates. JThread also improves the average sequence alignment accuracy significantly, from 53% to 62% of residues aligned correctly. Reliable fold assignments and alignments are identified, making the method useful for genome annotation. JThread is applied to predicted open reading frames (ORFs) from the genomes of Mycoplasma genitalium and Drosophila melanogaster, identifying 20 new structural annotations in the former and 801 in the latter. 相似文献
13.
Gianpiero Marconi Flavia Landucci Roberto Venanzoni Emidio Albertini 《Plant biosystems》2019,153(5):660-668
Aquatic habitats are vulnerable to the invasion of alien species, so early warning protocols are necessary for eradication. The presence in Italy of two alien duckweeds in freshwaters has been documented: Lemna minuta, that showed high invasivity, and L. valdiviana, still confined to south Lazio. These two species may be mistaken for each other and for the domestic L. minor and L. gibba due to morphological variation. Here, we assess the applicability of DNA barcoding as a complement to morphological analysis for monitoring the spread of alien Lemna. We chose two chloroplast genome sequences for their ability to discriminate all Lemna species: the 5’ intron of the trnK gene and the matK gene. Among 48 samples of Lemna collected at 20 sites in Central Italy, 20 were identified as L. minor, 19 as L. minuta, five as L. trisulca and four as L. gibba. L. minuta was present at most sampling sites; in particular, at six locations of Lake Trasimeno, eight L. minuta samples were found. We demonstrate that DNA sequence analyses with cost-effective barcoding techniques can effectively support expert efforts in species determination for an early alert system of invasive Lemna species. 相似文献
14.
We introduce a novel, linguistic-like method of genome analysis. We propose a natural approach to characterizing genomic sequences based on occurrences of fixed length words from a predefined, sufficiently large set of words (strings over the alphabet {A, C, G, T} ). A measure based on this approach is called compositional spectrum and is actually a histogram of imperfect word occurrences. Our results assert that the compositional spectrum is an overall characteristic of a long sequence i.e., a complete genome or an uninterrupted part of a chromosome. This attribute is manifested in the similarity of spectra obtained on different stretches of the same genome, and simultaneously in a broad range of dissimilarities between spectral representations of different genomes. High flexibility characterizes this approach due to imperfect matching and as a result sets of relatively long words can be considered. The proposed approach may have various applications in intra- and intergenomic sequence comparisons. 相似文献
15.
In this study, an in silico approach was developed to identify homologies existing between livestock microsatellite flanking sequences and GenBank nucleotide sequences. Initially, 1955 bovine, 1570 porcine and 1121 chicken microsatellites were downloaded and the flanking sequences were compared with the nr and dbEST databases of GenBank. A total of 74 bovine, 44 porcine and 37 chicken microsatellite flanking sequences passed our criteria and had at least one significant match to human genomic sequence, genes/expressed sequence tags (ESTs) or both. GenBank annotation and BLAT searches of the UCSC human genome assembly revealed that 38 bovine, 13 porcine and 17 chicken microsatellite flanking sequences were highly similar to known human genes. Map locations were available for 67 bovine, 44 porcine and 21 chicken microsatellite flanking sequences, providing useful links in the comparative maps of humans and livestock. In support of our approach, 112 alignments with both microsatellite and match mapping information were located in the expected chromosomal regions based on previously reported syntenic relationships. The development of this in silico mapping approach has significantly increased the number of genes and EST sequences anchored to the bovine, porcine and chicken genome maps and the number of links in various human-livestock comparative maps. 相似文献
16.
An evolutionary model for maximum likelihood alignment of DNA sequences 总被引:16,自引:0,他引:16
Jeffrey L. Thorne Hirohisa Kishino Joseph Felsenstein 《Journal of molecular evolution》1991,33(2):114-124
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed. 相似文献
17.
Takuji Yamada Alison S Waller Jeroen Raes Aleksej Zelezniak Nadia Perchat Alain Perret Marcel Salanoubat Kiran R Patil Jean Weissenbach Peer Bork 《Molecular systems biology》2012,8(1)
Despite the current wealth of sequencing data, one‐third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome‐scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction. 相似文献
18.
Ursula Pieper Ranyee Chiang Jennifer J. Seffernick Shoshana D. Brown Margaret E. Glasner Libusha Kelly Narayanan Eswar J. Michael Sauder Jeffrey B. Bonanno Subramanyam Swaminathan Stephen K. Burley Xiaojing Zheng Mark R. Chance Steven C. Almo John A. Gerlt Frank M. Raushel Matthew P. Jacobson Patricia C. Babbitt Andrej Sali 《Journal of structural and functional genomics》2009,10(2):107-125
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members
of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a
collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics
(NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and
structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure
determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date,
20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the
two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of
proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily
member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function
relationships.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.
相似文献
Andrej Sali (Corresponding author)Email: URL: http://salilab.org |
19.
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences.
These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software
called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can
be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences
and assess their statistical significance. 相似文献
20.
Sansom C 《Briefings in bioinformatics》2000,1(1):22-32
This review of sequence database searching aims to set out current practice in the area, in order to give practical guidelines to the experimental biologist. It describes the basic principles behind the programs and enumerates the range of databases available in the public domain. Of these, the most important are the equivalent DNA databases European Molecular Biology Laboratory (EMBL), GenBank and DNA Databank of Japan (DDBJ), and the protein databases Swiss-Prot and TrEMBL. The commonly used BLAST and FASTA algorithms are described in detail and alternative approaches mentioned briefly. Scoring matrices used to compare amino acid types during protein database searches are compared, with an emphasis on the PAM and BLOSUM series of observed substitution matrices. 相似文献