期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

Marta Matvienko Alexander Kozik Lutz Froenicke Dean Lavelle Belinda Martineau Bertrand Perroud Richard Michelmore 《PloS one》2013,8(2)

相似文献

2.

Primary structure of an alpha-tubulin gene of Physarum polycephalum 总被引：9，自引：0，他引：9

M J Monteiro R A Cox 《Journal of molecular biology》1987,193(3):427-438

相似文献

3.

Structure of the rat prolactin gene 总被引：17，自引：0，他引：17

E J Gubbins R A Maurer M Lagrimini C R Erwin J E Donelson 《The Journal of biological chemistry》1980,255(18):8655-8662

The organization and sequence of the rat preprolactin gene has been investigated. Analysis of two different plasmids containing pituitary cDNA inserts has provided the complete 681-nucleotide coding sequence of preprolactin as well as 17 nucleotides preceding the initiation codon and 90 nucleotides following the termination codon. Digestion of rat chromosomal DNA with the restriction endonuclease Eco RI followed by size fractionation and hybridization to a labeled prolactin cDNA probe has demonstrated that prolactin genomic sequences are located on 6.0-, 3.9-, and 2.9-kilobase fragments. The 6.0- and 3.9-kilobase fragments were isolated from a library of cloned rat DNA fragments. The sequence of more than 1800 nucleotides of the cloned DNA has been determined. The sequenced region contains coding regions of 180 and 189 nucleotides which specify the COOH-terminal 123 amino acids of the 227-amino-acid sequence of rat preprolactin. These coding regions are separated by an intervening sequence of 597 nucleotides. At least one other large intervening sequence separates this region from the region coding for the NH2-terminal portion of preprolactin. Hybridization experiments suggested that the intervening sequences of the rat prolactin gene contain DNA sequences which are repeated elsewhere in the rat genome. 相似文献

4.

NfCR1, the first non-LTR retrotransposon characterized in the Australian lungfish genome, Neoceratodus forsteri, shows similarities to CR1-like elements

Sirijovski N Woolnough C Rock J Joss JM 《Journal of experimental zoology. Part B. Molecular and developmental evolution》2005,304(1):40-49

The genomes of lungfish, together with those of some urodele amphibians, are the largest of all vertebrate genomes. It has been assumed that the bulk of the DNA making up these large genomes has been derived from repeat elements, like the noncoding DNA of those genomes that have been sequenced (e.g., human). In an attempt to characterize repeat sequences in the lungfish genome, we have isolated, by restriction enzyme digestion of genomic DNA, sequences of a repeat element in Neoceratodus forsteri, the most primitive of the living lungfishes. The fragments sequenced from the EcoRI and BglII digests were used to perform genome walking PCR in order to obtain the full sequence of the repeat element. This element shares homology with the non-LTR (LINE) element, Chicken Repeat 1 (CR1), described for several vertebrates and some invertebrates; we have called it N. forsteri CR1 (NfCR1). NfCR1 shares all the domains of other CR1 elements but it also has several unique features that suggest it may no longer be active in the lungfish genome. It occurs in both full-length and 5'-truncated versions and in its present "inactive" form represents approximately 0.05% of the lungfish genome. 相似文献

5.

Compositional properties of nuclear genes from cold-blooded vertebrates

Giacomo Bernardi Giorgio Bernardi 《Journal of molecular evolution》1991,33(1):57-67

Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code. 相似文献

6.

Exploration of multivariate analysis in microbial coding sequence modeling

Mehmood T Bohlin J Bråthen Kristoffersen A Sæbø S Warringer J Snipen L 《BMC bioinformatics》2012,13(1):97

ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies. 相似文献

7.

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists

Saeys Y Rouzé P Van de Peer Y 《Bioinformatics (Oxford, England)》2007,23(4):414-420

MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/. 相似文献

8.

Genomic exploration of the hemiascomycetous yeasts: 3. Methods and strategies used for sequence analysis and annotation

Tekaia F Blandin G Malpertuy A Llorente B Durrens P Toffano-Nioche C Ozier-Kalogeropoulos O Bon E Gaillardin C Aigle M Bolotin-Fukuhara M Casarégola S de Montigny J Lépingle A Neuvéglise C Potier S Souciet J Wésolowski-Louvel M Dujon B 《FEBS letters》2000,487(1):17-30

相似文献

9.

Identification of coding and non-coding sequences using local Holder exponent formalism 总被引：2，自引：0，他引：2

Kulkarni OC Vigneshwar R Jayaraman VK Kulkarni BD 《Bioinformatics (Oxford, England)》2005,21(20):3818-3823

MOTIVATION: Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. RESULTS: The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences. CONTACT: Available on request from bd.kulkarni@ncl.res.in. 相似文献

10.

Genome-wide comparative analysis of simple sequence coding repeats among 25 insect species

Behura SK Severson DW 《Gene》2012,504(2):226-232

We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects. 相似文献

11.

Cloning and structural analysis of genes coding for tumor necrosis factor and lymphotoxin in rabbits

A N Shakhov D V Kuprash R L Turetskaia M M Azizov A V Andreeva S A Nedospasov 《Molekuliarnaia biologiia》1989,23(6):1743-1750

相似文献

12.

Cloning and sequencing of Plasmodium falciparum DNA fragments containing repetitive regions potentially coding for histidine-rich proteins: identification of two overlapping reading frames 总被引：2，自引：0，他引：2

R Lenstra L d'Auriol B Andrieu J Le Bras F Galibert 《Biochemical and biophysical research communications》1987,146(1):368-377

DNA sequences, potentially coding for histidine-rich proteins, were isolated from a P. falciparum genomic library using an oligonucleotide probe consisting of histidine codon repeats. Sequencing revealed that the different DNA fragments contain long repetitive regions very homologous to the probe. One clone was fully sequenced and contains two open reading frames that overlap in the repetitive region but are located on opposite strands. Analysis suggests that both are coding. One frame could code for a small histidine-rich protein, the other for a protein containing many aspartic acid residues. Southern blotting revealed that these sequences are conserved in all three P. falciparum strains studied. 相似文献

13.

Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence 总被引：2，自引：0，他引：2

Yin C Yau SS 《Journal of theoretical biology》2007,247(4):687-694

With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction. 相似文献

14.

Heuristic approach to deriving models for gene finding. 总被引：21，自引：2，他引：19

下载免费PDF全文

J Besemer M Borodovsky 《Nucleic acids research》1999,27(19):3911-3920

Computer methods of accurate gene finding in DNA sequences require models of protein coding and non-coding regions derived either from experimentally validated training sets or from large amounts of anonymous DNA sequence. Here we propose a new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions. The new method needs such a small amount of DNA sequence data that the model can be built 'on the fly' by a web server for any DNA sequence >400 nt. Tests on 10 complete bacterial genomes performed with the GeneMark.hmm program demonstrated the ability of the new models to detect 93.1% of annotated genes on average, while models built by traditional training predict an average of 93.9% of genes. Models built by the heuristic approach could be used to find genes in small fragments of anonymous prokaryotic genomes and in genomes of organelles, viruses, phages and plasmids, as well as in highly inhomogeneous genomes where adjustment of models to local DNA composition is needed. The heuristic method also gives an insight into the mechanism of codon usage pattern evolution. 相似文献

15.

Differential distribution of simple sequence repeats in eukaryotic genome sequences. 总被引：34，自引：0，他引：34

M V Katti P K Ranjekar V S Gupta 《Molecular biology and evolution》2001,18(7):1161-1167

Complete chromosome/genome sequences available from humans, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae were analyzed for the occurrence of mono-, di-, tri-, and tetranucleotide repeats. In all of the genomes studied, dinucleotide repeat stretches tended to be longer than other repeats. Additionally, tetranucleotide repeats in humans and trinucleotide repeats in Drosophila also seemed to be longer. Although the trends for different repeats are similar between different chromosomes within a genome, the density of repeats may vary between different chromosomes of the same species. The abundance or rarity of various di- and trinucleotide repeats in different genomes cannot be explained by nucleotide composition of a sequence or potential of repeated motifs to form alternative DNA structures. This suggests that in addition to nucleotide composition of repeat motifs, characteristic DNA replication/repair/recombination machinery might play an important role in the genesis of repeats. Moreover, analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeats corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids. The locations and sequences of all of the repeat loci detected in genome sequences and coding DNA sequences are available at http://www.ncl-india.org/ssr and could be useful for further studies. 相似文献

16.

Comparison of synonymous codon distribution patterns of bacteriophage and host genomes. 总被引：5，自引：0，他引：5

T Kunisawa S Kanaya E Kutter 《DNA research》1998,5(6):319-326

Synonymous codon usage patterns of bacteriophage and host genomes were compared. Two indexes, G + C base composition of a gene (fgc) and fraction of translationally optimal codons of the gene (fop), were used in the comparison. Synonymous codon usage data of all the coding sequences on a genome are represented as a cloud of points in the plane of fop vs. fgc. The Escherichia coli coding sequences appear to exhibit two phases, "rising" and "flat" phases. Genes that are essential for survival and are thought to be native are located in the flat phase, while foreign-type genes from prophages and transposons are found in the rising phase with a slope of nearly unity in the fgc vs. fop plot. Synonymous codon distribution patterns of genes from temperate phages P4, P2, N15 and lambda are similar to the pattern of E. coli rising phase genes. In contrast, genes from the virulent phage T7 or T4, for which a phage-encoded DNA polymerase is identified, fall in a linear curve with a slope of nearly zero in the fop vs. fgc plane. These results may suggest that the G + C contents for T7, T4 and E. coli flat phase genes are subject to the directional mutation pressure and are determined by the DNA polymerase used in the replication. There is significant variation in the fop values of the phage genes, suggesting an adjustment to gene expression level. Similar analyses of codon distribution patterns were carried out for Haemophilus influenzae, Bacillus subtilis, Mycobacterium tuberculosis and their phages with complete genomic sequences available. 相似文献

17.

Nucleotide sequence of a macronuclear DNA molecule coding for alpha-tubulin from the ciliate Stylonychia lemnae. Special codon usage: TAA is not a translation termination codon. 总被引：37，自引：9，他引：28

下载免费PDF全文

E Helftenbein 《Nucleic acids research》1985,13(2):415-433

相似文献

18.

Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria

Lobry JR Chessel D 《Journal of applied genetics》2003,44(2):235-261

Starting from two datasets of codon usage in coding sequences from mesophilic and thermophilic bacteria, we used internal correspondence analysis to study the variability of codon usage within and between species, and within and between amino acids. The first dataset included 18,958,458 codons from 58,482 coding sequences from completely sequenced genomes of 25 species, along with 6,793,581 dinucleotides from 21,876 intergenic spaces. The second dataset, with partially sequenced genomes, included 97,095,873 codons from 293 bacterial species. Results were consistent between the two datasets. The trend for the amino-acid composition of thermophilic proteins was found to be under the control of a pressure at the nucleic acid level, not a selection at the protein level. This effect was not present in intergenic spaces, ruling out a pressure at the DNA level. The pattern at the mRNA level was more complex than a simple purine enrichment of the sense strand of coding sequences. Outliers in the partial genome dataset introduced a note of caution about the interpretation of temperature as the direct determinant of the trend observed in thermophiles. The surprising lack of selection on the amino-acid content of thermophilic proteins suggests that the amino-acid repertoire was set up in a hot environment. 相似文献

19.

Relationship between codon usage and sequence-dependent curvature of genomes

Jáuregui R O'Reilly F Bolivar F Merino E 《Microbial & comparative genomics》1998,3(4):243-253

Static DNA curvature distributions of full-sequenced genomes and large DNA contigs from different organisms were calculated. Very distinctive differences among histogram profiles coming from archaebacteria, eubacteria, and eukaryotes were observed. Eubacterial profiles were, on average, more curved than were archaeal and eukaryotic profiles. A comparative analysis between real and randomized DNA sequences revealed that eubacterial genomes presented, overall, higher curvature values than random sequences. An opposite portrait was exhibited by archaeal and eukaryotic genomes. They displayed a lower frequency of curved regions than their corresponding randomized sequences. The contributions of coding and intergenic regions to the curvature profile were also analyzed. Intergenic regions, on average, were found to be more curved than the overall genomic sequences, especially in prokaryotic organisms. Nevertheless, because of their small size with respect to coding regions, the contribution of intergenic sequences to the overall curvature profile tended to be minor. A clear relationship between codon usage and DNA curvature was demonstrated, and a proposal of the possible coevolution of both systems is discussed. Finally, we present a procedure to quantify the deviation of a curvature profile from randomness through a formal statistical analysis. 相似文献

20.

Genomic strategies to identify mammalian regulatory sequences

Pennacchio LA Rubin EM 《Nature reviews. Genetics》2001,2(2):100-109

With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes. 相似文献