首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 261 毫秒
1.
In search of RNase P RNA from microbial genomes   总被引:2,自引:0,他引:2       下载免费PDF全文
Li Y  Altman S 《RNA (New York, N.Y.)》2004,10(10):1533-1540
A simple procedure has been developed to quickly retrieve and validate the DNA sequence encoding the RNA subunit of ribonuclease P (RNase P RNA) from microbial genomes. RNase P RNA sequences were identified from 94% of bacterial and archaeal complete genomes where previously no RNase P RNA was annotated. A sequence was found in camelpox virus, highly conserved in all orthopoxviruses (including smallpox virus), which could fold into a putative RNase P RNA in terms of conserved primary features and secondary structure. New structure features of RNase P RNA that enable one to distinguish bacteria from archaea and eukarya were found. This RNA is yet another RNA that can be a molecular criterion to divide the living world into three domains (bacteria, archaea, and eukarya). The catalytic center of this RNA, and its detection from some environmental whole genome shotgun sequences, is also discussed.  相似文献   

2.
3.
The set of proteins which are conserved across families of microbes contain important targets of new anti-microbial agents. We have developed a simple and efficient computational tool which determines concordances of putative gene products that show sets of proteins conserved across one set of user specified genomes and not present in another set of user specified genomes. The thresholds and the homology scoring criterion are selectable to allow the user to decide the stringency of the homologies. The system uses a relational database to store protein coding regions from different genomes, and to store the results of a complete comparison of all sequences against all sequences using the FASTA program. Using Web technology, the display of all the related proteins for a given sequence and calculation of multiple sequence alignments (using CLUSTALW) can be performed with the click of a button. The current database holds 97 365 sequences from 19 complete or partial genomes and 8798905 FASTA comparison results. A example concordance is presented which demonstrates that the target of the quinolone antibiotics could have been identified using this tool.  相似文献   

4.
We developed a highly accurate method to predict polyketide (PK) and nonribosomal peptide (NRP) structures encoded in microbial genomes. PKs/NRPs are polymers of carbonyl/peptidyl chains synthesized by polyketide synthases (PKS) and nonribosomal peptide synthetases (NRPS). We analyzed domain sequences corresponding to specific substrates and physical interactions between PKSs/NRPSs in order to predict which substrates (carbonyl/peptidyl units) are selected and assembled into highly ordered chemical structures. The predicted PKs/NRPs were represented as the sequences of carbonyl/peptidyl units to extract the structural motifs efficiently. We applied our method to 4529 PKSs/NRPSs and found 619 PKs/NRPs. We also collected 1449 PKs/NRPs whose chemical structures have been determined experimentally. The structural sequences were compared using the Smith-Waterman algorithm, and clustered into 271 clusters. From the compound clusters, we extracted 33 structural motifs that are significantly related with their bioactivities. We used the structural motifs to infer functions of 13 novel PKs/NRPs clusters produced by Pseudomonas spp. and Burkholderia spp. and found a putative virulence factor. The integrative analysis of genomic and chemical information given here will provide a strategy to predict the chemical structures, the biosynthetic pathways, and the biological activities of PKs/NRPs, which is useful for the rational design of novel PKs/NRPs.  相似文献   

5.

Background and Aims

Although monocotyledonous plants comprise one of the two major groups of angiosperms and include >65 000 species, comprehensive genome analysis has been focused mainly on the Poaceae (grass) family. Due to this bias, most of the conclusions that have been drawn for monocot genome evolution are based on grasses. It is not known whether these conclusions apply to many other monocots.

Methods

To extend our understanding of genome evolution in the monocots, Asparagales genomic sequence data were acquired and the structural properties of asparagus and onion genomes were analysed. Specifically, several available onion and asparagus bacterial artificial chromosomes (BACs) with contig sizes >35 kb were annotated and analysed, with a particular focus on the characterization of long terminal repeat (LTR) retrotransposons.

Key Results

The results reveal that LTR retrotransposons are the major components of the onion and garden asparagus genomes. These elements are mostly intact (i.e. with two LTRs), have mainly inserted within the past 6 million years and are piled up into nested structures. Analysis of shotgun genomic sequence data and the observation of two copies for some transposable elements (TEs) in annotated BACs indicates that some families have become particularly abundant, as high as 4–5 % (asparagus) or 3–4 % (onion) of the genome for the most abundant families, as also seen in large grass genomes such as wheat and maize.

Conclusions

Although previous annotations of contiguous genomic sequences have suggested that LTR retrotransposons were highly fragmented in these two Asparagales genomes, the results presented here show that this was largely due to the methodology used. In contrast, this current work indicates an ensemble of genomic features similar to those observed in the Poaceae.  相似文献   

6.
The problem of rational target selection for protein structure determination in structural genomics projects on microbes is addressed. A flexible computational procedure is described that directly incorporates the whole body of annotation available in the PEDANT genome database into the sequence clustering and selection process in order to identify proteins that are likely to possess currently unknown structural domains. Filtering out gene products based on predicted structural features, such as known three-dimensional structures and transmembrane regions, allows one to reduce the complexity of neighbor relationships between sequences and all but eliminates the need for further partitioning of single-linkage clusters into disjoint protein groups corresponding to homologous families. The results of a large-scale computation experiment in which exemplary target selection for 32 prokaryotic genomes was conducted are presented.  相似文献   

7.

Background

Pseudomonas aeruginosa is an important opportunistic pathogen responsible for many infections in hospitalized and immunocompromised patients. Previous reports estimated that approximately 10% of its 6.6 Mbp genome varies from strain to strain and is therefore referred to as “accessory genome”. Elements within the accessory genome of P. aeruginosa have been associated with differences in virulence and antibiotic resistance. As whole genome sequencing of bacterial strains becomes more widespread and cost-effective, methods to quickly and reliably identify accessory genomic elements in newly sequenced P. aeruginosa genomes will be needed.

Results

We developed a bioinformatic method for identifying the accessory genome of P. aeruginosa. First, the core genome was determined based on sequence conserved among the completed genomes of twelve reference strains using Spine, a software program developed for this purpose. The core genome was 5.84 Mbp in size and contained 5,316 coding sequences. We then developed an in silico genome subtraction program named AGEnt to filter out core genomic sequences from P. aeruginosa whole genomes to identify accessory genomic sequences of these reference strains. This analysis determined that the accessory genome of P. aeruginosa ranged from 6.9-18.0% of the total genome, was enriched for genes associated with mobile elements, and was comprised of a majority of genes with unknown or unclear function. Using these genomes, we showed that AGEnt performed well compared to other publically available programs designed to detect accessory genomic elements. We then demonstrated the utility of the AGEnt program by applying it to the draft genomes of two previously unsequenced P. aeruginosa strains, PA99 and PA103.

Conclusions

The P. aeruginosa genome is rich in accessory genetic material. The AGEnt program accurately identified the accessory genomes of newly sequenced P. aeruginosa strains, even when draft genomes were used. As P. aeruginosa genomes become available at an increasingly rapid pace, this program will be useful in cataloging the expanding accessory genome of this bacterium and in discerning correlations between phenotype and accessory genome makeup. The combination of Spine and AGEnt should be useful in defining the accessory genomes of other bacterial species as well.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-737) contains supplementary material, which is available to authorized users.  相似文献   

8.
All complete or nearly complete mitochondrial genomes of Metazoa (2819) have been subject to bioinformatic analysis to investigate the distribution and features of repeated and palindromic sequences. Repeats are ubiquitous, with 29.9% of genomes containing at least one and 1.95% of total genome length being repeated. Repeat boundaries were tested for the presence of secondary structure motifs, consensus sequences or small repeats, features generally reported as associated with duplications. No significant relationship was detected, suggesting the non ubiquitousness of such features. A mechanism related to gene conversion is proposed to explain the origin of small interspersed repeats.  相似文献   

9.
FORRepeats: detects repeats on entire chromosomes and between genomes   总被引:1,自引:0,他引:1  
MOTIVATION: As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed such as MUMmer or REPuter. They also have time or space restrictions. Moreover, in terms of applications, REPuter only computes repeats and MUMmer works better with related genomes. RESULTS: We present a heuristic method, named FORRepeats, which is based on a novel data structure called factor oracle. In the first step it detects exact repeats in large sequences. Then, in the second step, it computes approximate repeats and performs pairwise comparison. We compared its computational characteristics with BLAST and REPuter. Results demonstrate that it is fast and space economical. We show FORRepeats ability to perform intra-genomic comparison and to detect repeated DNA sequences in the complete genome of the model plant Arabidopsis thaliana.  相似文献   

10.
Recent sequencing of the Brassica rapa and Brassica oleracea genomes revealed extremely contrasting genomic features such as the abundance and distribution of transposable elements between the two genomes. However, whether and how these structural differentiations may have influenced the evolutionary rates of the two genomes since their split from a common ancestor are unknown. Here, we investigated and compared the rates of nucleotide substitution between two long terminal repeats (LTRs) of individual orthologous LTR‐retrotransposons, the rates of synonymous and non‐synonymous substitution among triplicated genes retained in both genomes from a shared whole genome triplication event, and the rates of genetic recombination estimated/deduced by the comparison of physical and genetic distances along chromosomes and ratios of solo LTRs to intact elements. Overall, LTR sequences and genic sequences showed more rapid nucleotide substitution in B. rapa than in B. oleracea. Synonymous substitution of triplicated genes retained from a shared whole genome triplication was detected at higher rates in B. rapa than in B. oleracea. Interestingly, non‐synonymous substitution was observed at lower rates in the former than in the latter, indicating shifted densities of purifying selection between the two genomes. In addition to evolutionary asymmetry, orthologous genes differentially regulated and/or disrupted by transposable elements between the two genomes were also characterized. Our analyses suggest that local genomic and epigenomic features, such as recombination rates and chromatin dynamics reshaped by independent proliferation of transposable elements and elimination between the two genomes, are perhaps partially the causes and partially the outcomes of the observed inter‐specific asymmetric evolution.  相似文献   

11.
Arakawa K  Saito R  Tomita M 《FEBS letters》2007,581(2):253-258
Bacterial chromosomes are highly polarized in their nucleotide composition through mutational selection related to replication. Using compositional skews such as the GC skew, replication origin and terminus can be predicted in silico by observing the shift points. However, the genome sequence is affected by myriad functional requirements and selection on numerous subgenomic features, and elimination of this "noise" should lead to better predictions. Here, we present a noise-reduction approach that uses low-pass filtering through Fast Fourier transform coupled with cumulative skew graphs. It increases the prediction accuracy of the replication termini compared with previously documented methods based on genomic base composition.  相似文献   

12.
13.
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.  相似文献   

14.
Albrecht-Buehler G 《Gene》2012,498(1):20-27
The existence of fractal sets of DNA sequences have long been suspected on the basis of statistical analyses of genome data. In this article we identify for the first time explicitly the GA-sequences as a class of fractal genomic sequences that are easy to recognize and to extract, and are scattered densely throughout the chromosomes of a large number of genomes from different species and kingdoms including the human genome. Their existence and their fractality may have significant consequences for our understanding of the origin and evolution of genomes. Furthermore, as universal and natural markers they may be used to chart and explore the non-coding regions.  相似文献   

15.

Background

Rigorous study of mitochondrial functions and cell biology in the budding yeast, Saccharomyces cerevisiae has advanced our understanding of mitochondrial genetics. This yeast is now a powerful model for population genetics, owing to large genetic diversity and highly structured populations among wild isolates. Comparative mitochondrial genomic analyses between yeast species have revealed broad evolutionary changes in genome organization and architecture. A fine-scale view of recent evolutionary changes within S. cerevisiae has not been possible due to low numbers of complete mitochondrial sequences.

Results

To address challenges of sequencing AT-rich and repetitive mitochondrial DNAs (mtDNAs), we sequenced two divergent S. cerevisiae mtDNAs using a single-molecule sequencing platform (PacBio RS). Using de novo assemblies, we generated highly accurate complete mtDNA sequences. These mtDNA sequences were compared with 98 additional mtDNA sequences gathered from various published collections. Phylogenies based on mitochondrial coding sequences and intron profiles revealed that intraspecific diversity in mitochondrial genomes generally recapitulated the population structure of nuclear genomes. Analysis of intergenic sequence indicated a recent expansion of mobile elements in certain populations. Additionally, our analyses revealed that certain populations lacked introns previously believed conserved throughout the species, as well as the presence of introns never before reported in S. cerevisiae.

Conclusions

Our results revealed that the extensive variation in S. cerevisiae mtDNAs is often population specific, thus offering a window into the recent evolutionary processes shaping these genomes. In addition, we offer an effective strategy for sequencing these challenging AT-rich mitochondrial genomes for small scale projects.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1664-4) contains supplementary material, which is available to authorized users.  相似文献   

16.
This review presents current views on the plastid genomes of higher plants and summarizes data on the size, structural organization, gene content, and other features of plastid DNAs. Special emphasis is placed on the properties of organization of land plant plastid genomes (nucleoids) that distinguish them from bacterial genomes. The prospects of genetic engineering of chloroplast genomes are discussed.  相似文献   

17.
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.  相似文献   

18.
Toward a molecular paleontology of primate genomes   总被引:12,自引:0,他引:12  
KpnI restriction of anthropoid primate DNAs, from a New World monkey to man, releases a series of segments that are remarkable among all of the alphoid DNAs in the constancy of their relative amounts in the various primate genomes, in their long-range organization, and in their internal sequence structure. These segments are labeled the KpnI A, B, C and D segments. Cross-hybridization analysis by Southern filter-transfer hybridization indicates that the KpnI segments represent separate and distinct families of alphoid DNAs. These families are termed the KpnI A, B, C and D families of alphoid sequences, of which only the KpnI A and B families were studied in detail here. - Evidence is presented suggesting that the KpnI segments do not exist as long, tandemly repeated sequences in the primate genome: rather, they may occur interspersed among other, perhaps nonalphoid sequences. From the stained gel patterns and from Southern filter-transfer hybridization experiments, the KpnI families appear to be absent from the genomes of the two prosimians studied - the galago and the black lemur. The KpnI A and B families are found among all of the anthropoid primates, including the New World capuchin monkey. The KpnI C family was detected in the genomes of the Old World anthropoid primates whereas the KpnI D family was detected only among the great apes and man. - The results are in accord with the observation (Musich et al., 1980) that with the continued evolutionary development of the primate Order, there has been a parallel trend toward an increased number and variety of alphoid DNA sequences. The properties of the KpnI families suggest that these sequences, unique among the alphoid DNAs, have been conservatively maintained throughout primate phylogeny and that they are among the most ancient of all primate DNAs.  相似文献   

19.
Genome synthesis endows scientists the ability of de novo creating genomes absent in nature, by thorough redesigning DNA sequences and introducing numerous custom features. However, the genome synthesis is a labor‐ and time‐consuming work, and thus it is a challenge to verify and quantify the synthetic genome rapidly and precisely. Thus, specific DNA sequences different from native genomic sequences are designed into synthetic genomes during synthesis, namely genomic markers. Genomic markers can be easily detected by PCR reaction, whole‐genome sequencing (WGS) and a variety of methods to identify the synthetic genome from native one. Here, we review types and applications of genomic markers utilized in synthetic genomes, with the hope of providing a guidance for future works.  相似文献   

20.
A new system to recognize protein coding genes in the coronavirus genomes, specially suitable for the SARS-CoV genomes, has been proposed in this paper. Compared with some existing systems, the new program package has the merits of simplicity, high accuracy, reliability, and quickness. The system ZCURVE_CoV has been run for each of the 11 newly sequenced SARS-CoV genomes. Consequently, six genomes not annotated previously have been annotated, and some problems of previous annotations in the remaining five genomes have been pointed out and discussed. In addition to the polyprotein chain ORFs 1a and 1b and the four genes coding for the major structural proteins, spike (S), small envelop (E), membrane (M), and nuleocaspid (N), respectively, ZCURVE_CoV also predicts 5-6 putative proteins in length between 39 and 274 amino acids with unknown functions. Some single nucleotide mutations within these putative coding sequences have been detected and their biological implications are discussed. A web service is provided, by which a user can obtain the annotated result immediately by pasting the SARS-CoV genome sequences into the input window on the web site (http://tubic.tju.edu.cn/sars/). The software ZCURVE_CoV can also be downloaded freely from the web address mentioned above and run in computers under the platforms of Windows or Linux.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号