期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability

Vetriselvi Rangannan Manju Bansal 《Journal of biosciences》2007,32(1):851-862

相似文献

2.

Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of <Emphasis Type="Italic">Arabidopsis thaliana</Emphasis> and <Emphasis Type="Italic">Arabidopsis lyrata</Emphasis>

Leah J DeRose-Wilson Brandon S Gaut 《BMC evolutionary biology》2007,7(1):66

Background

There has been remarkably little study of nucleotide substitution rate variation among plant nuclear genes, in part because orthology is difficult to establish. Orthology is even more problematic for intergenic regions of plant nuclear genomes, because plant genomes generally harbor a wealth of repetitive DNA. In theory orthologous intergenic data is valuable for studying rate variation because nucleotide substitutions in these regions should be under little selective constraint compared to coding regions. As a result, evolutionary rates in intergenic regions may more accurately reflect genomic features, like recombination and GC content, that contribute to nucleotide substitution. 相似文献

3.

Relationship between codon usage and sequence-dependent curvature of genomes

Jáuregui R O'Reilly F Bolivar F Merino E 《Microbial & comparative genomics》1998,3(4):243-253

Static DNA curvature distributions of full-sequenced genomes and large DNA contigs from different organisms were calculated. Very distinctive differences among histogram profiles coming from archaebacteria, eubacteria, and eukaryotes were observed. Eubacterial profiles were, on average, more curved than were archaeal and eukaryotic profiles. A comparative analysis between real and randomized DNA sequences revealed that eubacterial genomes presented, overall, higher curvature values than random sequences. An opposite portrait was exhibited by archaeal and eukaryotic genomes. They displayed a lower frequency of curved regions than their corresponding randomized sequences. The contributions of coding and intergenic regions to the curvature profile were also analyzed. Intergenic regions, on average, were found to be more curved than the overall genomic sequences, especially in prokaryotic organisms. Nevertheless, because of their small size with respect to coding regions, the contribution of intergenic sequences to the overall curvature profile tended to be minor. A clear relationship between codon usage and DNA curvature was demonstrated, and a proposal of the possible coevolution of both systems is discussed. Finally, we present a procedure to quantify the deviation of a curvature profile from randomness through a formal statistical analysis. 相似文献

4.

Detecting non-coding selective pressure in coding regions

Chen H Blanchette M 《BMC evolutionary biology》2007,7(Z1):S9

相似文献

5.

Hidden Markov model variants and their application

Winters-Hilt S 《BMC bioinformatics》2006,7(Z2):S14

相似文献

6.

Long‐read sequence capture of the haemoglobin gene clusters across codfish species

Siv Nam Khang Hoff Helle T. Baalsrud Ave Tooming‐Klunderud Morten Skage Todd Richmond Gregor Obernosterer Reza Shirzadi Ole Kristian Trresen Kjetill S. Jakobsen Sissel Jentoft 《Molecular ecology resources》2019,19(1):245-259

Combining high‐throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short‐read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long‐read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom‐made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage‐specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long‐read capture as a versatile approach for comparative genomic studies by generation of a cross‐species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. 相似文献

7.

Realistic artificial DNA sequences as negative controls for computational genomics

Juan Caballero Arian F. A. Smit Leroy Hood Gustavo Glusman 《Nucleic acids research》2014,42(12):e99

A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. 相似文献

8.

Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory

Azad RK Borodovsky M 《Briefings in bioinformatics》2004,5(2):118-130

In this paper, we review developments in probabilistic methods of gene recognition in prokaryotic genomes with the emphasis on connections to the general theory of hidden Markov models (HMM). We show that the Bayesian method implemented in GeneMark, a frequently used gene-finding tool, can be augmented and reintroduced as a rigorous forward-backward (FB) algorithm for local posterior decoding described in the HMM theory. Another earlier developed method, prokaryotic GeneMark.hmm, uses a modification of the Viterbi algorithm for HMM with duration to identify the most likely global path through hidden functional states given the DNA sequence. GeneMark and GeneMark.hmm programs are worth using in concert for analysing prokaryotic DNA sequences that arguably do not follow any exact mathematical model. The new extension of GeneMark using the FB algorithm was implemented in the software program GeneMark.fba. Given the DNA sequence, this program determines an a posteriori probability for each nucleotide to belong to coding or non-coding region. Also, for any open reading frame (ORF), it assigns a score defined as a probabilistic measure of all paths through hidden states that traverse the ORF as a coding region. The prediction accuracy of GeneMark.fba determined in our tests was compared favourably to the accuracy of the initial (standard) GeneMark program. Comparison to the prokaryotic GeneMark.hmm has also demonstrated a certain, yet species-specific, degree of improvement in raw gene detection, ie detection of correct reading frame (and stop codon). The accuracy of exact gene prediction, which is concerned about precise prediction of gene start (which in a prokaryotic genome unambiguously defines the reading frame and stop codon, thus, the whole protein product), still remains more accurate in GeneMarkS, which uses more elaborate HMM to specifically address this task. 相似文献

9.

Predicting genome-wide DNA methylation using methylation marks,genomic position,and DNA regulatory elements

Weiwei Zhang Tim D Spector Panos Deloukas Jordana T Bell Barbara E Engelhardt 《Genome biology》2015,16(1)

相似文献

10.

RNA polymerase V-dependent small RNAs in Arabidopsis originate from small,intergenic loci including most SINE repeats

Tzuu-fen Lee Sai Guna Ranjan Gurazada Jixian Zhai Shengben Li Stacey A. Simon Marjori A. Matzke Xuemei Chen Blake C. Meyers 《Epigenetics》2012,7(7):781-795

In plants, heterochromatin is maintained by a small RNA-based gene silencing mechanism known as RNA-directed DNA methylation (RdDM). RdDM requires the non-redundant functions of two plant-specific DNA-dependent RNA polymerases (RNAP), RNAP IV and RNAP V. RNAP IV plays a major role in siRNA biogenesis, while RNAP V may recruit DNA methylation machinery to target endogenous loci for silencing. Although small RNA-generating regions that are dependent on both RNAP IV and RNAP V have been identified previously, the genomic loci targeted by RNAP V for siRNA accumulation and silencing have not been described extensively. To characterize the RNAP V-dependent, heterochromatic siRNA-generating regions in the Arabidopsis genome, we deeply sequenced the small RNA populations of wild-type and RNAP V null mutant (nrpe1) plants. Our results showed that RNAP V-dependent siRNA-generating loci are associated predominately with short repetitive sequences in intergenic regions. Suppression of small RNA production from short repetitive sequences was also prominent in RdDM mutants including dms4, drd1, dms3 and rdm1, reflecting the known association of these RdDM effectors with RNAP V. The genomic regions targeted by RNAP V were small, with an estimated average length of 238 bp. Our results suggest that RNAP V affects siRNA production from genomic loci with features dissimilar to known RNAP IV-dependent loci. RNAP V, along with RNAP IV and DRM1/2, may target and silence a set of small, intergenic transposable elements located in dispersed genomic regions for silencing. Silencing at these loci may be actively reinforced by RdDM. 相似文献

11.

progressiveMauve: Multiple Genome Alignment with Gene Gain,Loss and Rearrangement 总被引：1，自引：0，他引：1

Aaron E. Darling Bob Mau Nicole T. Perna 《PloS one》2010,5(6)

Background

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.

Methodology/Principal Findings

We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.

Conclusions

The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve. 相似文献

12.

Population Genomic Analysis of 962 Whole Genome Sequences of Humans Reveals Natural Selection in Non-Coding Regions

Fuli Yu Jian Lu Xiaoming Liu Elodie Gazave Diana Chang Srilakshmi Raj Haley Hunter-Zinck Ran Blekhman Leonardo Arbiza Cris Van Hout Alanna Morrison Andrew D. Johnson Joshua Bis L. Adrienne Cupples Bruce M. Psaty Donna Muzny Jin Yu Richard A. Gibbs Alon Keinan Andrew G. Clark Eric Boerwinkle 《PloS one》2015,10(3)

相似文献

13.

Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm

Supatcha Lertampaiporn Chinae Thammarongtham Chakarida Nukoolkit Boonserm Kaewkamnerdpong Marasri Ruengjitchatchawalya 《Nucleic acids research》2014,42(11):e93

To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features—structure, sequence, modularity, structural robustness and coding potential—to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm. 相似文献

14.

Transcriptional landscape and essential genes of Neisseria gonorrhoeae

Christian W. Remmele Yibo Xian Marco Albrecht Michaela Faulstich Martin Fraunholz Elisabeth Heinrichs Marcus T. Dittrich Tobias Müller Richard Reinhardt Thomas Rudel 《Nucleic acids research》2014,42(16):10579-10595

相似文献

15.

Predicting antisense RNAs in the genomes of Escherichia coli and Salmonella typhimurium using promoter-search algorithm PlatProm

Ozoline ON Deev AA 《Journal of bioinformatics and computational biology》2006,4(2):443-454

相似文献

16.

Comparison of Ultra-Conserved Elements in Drosophilids and Vertebrates

Igor V. Makunin Viktor V. Shloma Stuart J. Stephen Michael Pheasant Stepan N. Belyakin 《PloS one》2013,8(12)

相似文献

17.

Gene density and organization in a small region of the Arabidopsis thaliana genome

L. Le Guen M. Thomas M. Kreis 《Molecular genetics and genomics : MGG》1994,245(3):390-396

相似文献

18.

Detection of prokaryotic promoters from the genomic distribution of hexanucleotide pairs

Pierre-Étienne Jacques Sébastien Rodrigue Luc Gaudreau Jean Goulet Ryszard Brzezinski 《BMC bioinformatics》2006,7(1):423-14

相似文献

19.

A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion 总被引：3，自引：0，他引：3

下载免费PDF全文

Singleton TL Levin HL 《Eukaryotic cell》2002,1(1):44-55

The successful dispersal of transposons depends on the critical balance between the fitness of the host and the ability of the transposon to insert into the host genome. One method transposons may use to avoid the disruption of coding sequences is to target integration into safe havens. We explored the interaction between the long terminal repeat retrotransposon Tf1 and the genome of the yeast Schizosaccharomyces pombe. Using techniques that were specifically designed to detect integration of Tf1 throughout the genome and to avoid bias in this detection, we generated 51 insertion events. Although 60.2% of the genome of S. pombe is coding sequence, all but one of the insertions occurred in intergenic regions. We also found that Tf1 was significantly more likely to insert into intergenic regions that included polymerase II promoters than into regions between convergent gene pairs. Interestingly, 8 of the 51 insertion sites were isolated multiple times from genetically independent cultures. This result suggests that specific sites in intergenic regions are targeted by Tf1. Perhaps the most surprising observation was that per kilobase of nonrepetitive sequence, Tf1 was significantly more likely to insert into chromosome 3 than into one of the other two chromosomes. This preference was found not to be due to differences in the distribution or composition of intergenic sequences within the three chromosomes. 相似文献

20.

A novel method for accurate operon predictions in all sequenced prokaryotes 总被引：25，自引：6，他引：19

下载免费PDF全文

Price MN Huang KH Alm EJ Arkin AP 《Nucleic acids research》2005,33(3):880-892

相似文献