共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Detecting selection in noncoding regions of nucleotide sequences 总被引:2,自引:0,他引:2
We present a maximum-likelihood method for examining the selection pressure and detecting positive selection in noncoding regions using multiple aligned DNA sequences. The rate of substitution in noncoding regions relative to the rate of synonymous substitution in coding regions is modeled by a parameter zeta. When a site in a noncoding region is evolving neutrally zeta = 1, while zeta > 1 indicates the action of positive selection, and zeta < 1 suggests negative selection. Using a combined model for the evolution of noncoding and coding regions, we develop two likelihood-ratio tests for the detection of selection in noncoding regions. Data analysis of both simulated and real viral data is presented. Using the new method we show that positive selection in viruses is acting primarily in protein-coding regions and is rare or absent in noncoding regions. 相似文献
3.
A A Sprizhitsky YuANechipurenko YuD Alexandrov M V Volkenstein 《Journal of biomolecular structure & dynamics》1988,6(2):345-358
A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates. 相似文献
4.
The majority of metazoan genomes consist of nonprotein-coding regions, although the functional significance of most noncoding DNA sequences remains unknown. Highly conserved noncoding sequences (CNSs) have proven to be reliable indicators of functionally constrained sequences such as cis-regulatory elements and noncoding RNA genes. However, CNSs may arise from nonselective evolutionary processes such as genomic regions with extremely low mutation rates known as mutation "cold spots." Here we combine comparative genomic data from recently completed insect genome projects with population genetic data in Drosophila melanogaster to test predictions of the mutational cold spot model of CNS evolution in the genus Drosophila. We find that point mutations in intronic and intergenic CNSs exhibit a significant reduction in levels of divergence relative to levels of polymorphism, as well as a significant excess of rare derived alleles, compared with either the nonconserved spacer regions between CNSs or with 4-fold silent sites in coding regions. Controlling for the effects of purifying selection, we find no evidence of positive selection acting on Drosophila CNSs, although we do find evidence for the action of recurrent positive selection in the spacer regions between CNSs. We estimate that approximately 85% of sites in Drosophila CNSs are under constraint with selection coefficients (N(e)s) on the order of 10-100, and thus, the estimated strength and number of sites under purifying selection is greater for Drosophila CNSs relative to those in the human genome. These patterns of nonneutral molecular evolution are incompatible with the mutational cold spot hypothesis to explain the existence of CNSs in Drosophila and, coupled with similar findings in mammals, argue against the general likelihood that CNSs are generated by mutational cold spots in any metazoan genome. 相似文献
5.
R H Stanley N V Dokholyan S V Buldyrev S Havlin H E Stanley 《Journal of biomolecular structure & dynamics》1999,17(1):79-87
We develop a quantitative method for analyzing repetitions of identical short oligomers in coding and noncoding DNA sequences. We analyze sequences presently available in the GenBank separately for primate, mammal, vertebrate, rodent, invertebrate and plant taxonomic partitions. We find that some oligomers "cluster" more than they would if randomly distributed, while other oligomers "repel" each other. To quantify this degree of clustering, we define clustering measures. We find that (i) clustering significantly differs in coding and noncoding DNA; (ii) in most cases, monomers, dimers and tetramers cluster in noncoding DNA but appear to repel each other in coding DNA. (iii) The degree of clustering for different sources (primates, invertebrates, and plants) is more conserved among these sources in the case of coding DNA than in the case of noncoding DNA. (iv) In contrast to other oligomers, we find that trimers always prefer to cluster. (v) Clustering of each particular oligomer is conserved within the same organism. 相似文献
6.
SUMMARY: CREDO is a user-friendly, web-based tool that integrates the analysis and results of different algorithms widely used for the computational detection of conserved sequence motifs in noncoding sequences. It enables easy comparison of the individual results. CREDO offers intuitive interfaces for easy and rapid configuration of the applied algorithms and convenient views on the results in graphical and tabular formats. AVAILABILITY: http://mips.gsf.de/proj/regulomips/credo.htm. 相似文献
7.
Relationship between pyrimidine distribution patterns and radiosensitivity (Z) of DNA molecules of different species was derived by computer analysis of recurrence frequency of pyrimidine clusters. Blocking factors (beta) and Z for coding and non-coding DNA sequences of species from different taxonomic classes have been calculated within a new model. The radiosensitivity of coding DNA sequences practically does not vary whereas Z values were increased during evolution from simplest to higher organisms. The beta and Z values calculated for several groups of individual genes were shown to vary considerably. 相似文献
8.
The entropies of protein coding genes from Escherichia coli were calculated according to Boltzmann's formula. Entropies of the coding regions were compared to the entropies of noncoding or miscoding ones. With nucleotides as code units, the entropies of the coding regions, when compared to the entropies of complete sequences (leader and coding region as well as trailer), were seen to be lower but with a marginal statistical significance. With triplets of nucleotides as code units, the entropies of correct reading frames were significantly lower than the entropies of frameshifts +1 and -1. With amino acids as code units, the results were opposite: Biologically functional proteins had significantly higher entropies than proteins translated from the frameshifted sequences. We attempt to explain this paradox with the hypothesis that the genetic code may have the ability of lowering information content (increasing entropy) of proteins while translating them from DNA. This ability might be beneficial to bacteria because it would make the functional proteins more probable (having a higher entropy) than nonfunctional proteins translated from frameshifted sequences. 相似文献
9.
10.
11.
12.
Polyomavirus tumor induction in mice: influences of viral coding and noncoding sequences on tumor profiles. 总被引:8,自引:13,他引:8 下载免费PDF全文
R Freund G Mandel G G Carmichael J P Barncastle C J Dawe T L Benjamin 《Journal of virology》1987,61(7):2232-2239
We determined the DNA sequences of the noncoding regions of two polyomavirus strains that differ profoundly in their abilities to induce tumors in mice. Differences between strains were found, both on the late side of the replication origin in the region containing known enhancer elements and on the early side of the origin, affecting the number and location of large-T-antigen-binding sites. By constructing and analyzing recombinant viruses between these high- and low-tumor strains, we attempted to localize determinants which affect the frequency and histotype of tumors. Seven recombinants were constructed and propagated in vitro, and the tumor profile of each was established by inoculation into newborn C3H mice. Recombinants containing noncoding sequences from the high-tumor strain and coding sequences from the low-tumor strain behaved like the latter, inducing tumors at a low frequency and strictly of mesenchymal origin. Reciprocal recombinants with noncoding sequences of the low-tumor strain linked to structural determinants from the high-tumor strain induced several types of epithelial tumors typical of the high-tumor strain but at reduced frequency, in addition to mesenchymal tumors. A high frequency and full diversity of epithelial tumors required, in addition to structural regions from the high-tumor strain, noncoding sequences on the early side of the origin also present in this strain. A high-tumor profile thus resulted from the combined effects of structural and regulatory determinants in the high-tumor strain, with the former affecting primarily the tissue tropism and the latter affecting the frequency of tumors. No differential effects of the enhancer regions from the late side of the origin in the two virus strains were seen in this study. 相似文献
13.
MOTIVATION: Accurate detection of positive Darwinian selection can provide important insights to researchers investigating the evolution of pathogens. However, many pathogens (particularly viruses) undergo frequent recombination and the phylogenetic methods commonly applied to detect positive selection have been shown to give misleading results when applied to recombining sequences. We propose a method that makes maximum likelihood inference of positive selection robust to the presence of recombination. This is achieved by allowing tree topologies and branch lengths to change across detected recombination breakpoints. Further improvements are obtained by allowing synonymous substitution rates to vary across sites. RESULTS: Using simulation we show that, even for extreme cases where recombination causes standard methods to reach false positive rates >90%, the proposed method decreases the false positive rate to acceptable levels while retaining high power. We applied the method to two HIV-1 datasets for which we have previously found that inference of positive selection is invalid owing to high rates of recombination. In one of these (env gene) we still detected positive selection using the proposed method, while in the other (gag gene) we found no significant evidence of positive selection. AVAILABILITY: A HyPhy batch language implementation of the proposed methods and the HIV-1 datasets analysed are available at http://www.cbio.uct.ac.za/pub_support/bioinf06. The HyPhy package is available at http://www.hyphy.org, and it is planned that the proposed methods will be included in the next distribution. RDP2 is available at http://darwin.uvigo.es/rdp/rdp.html 相似文献
14.
Kappa-chain constant-region gene sequences in genus Rattus: coding regions are diverging more rapidly than noncoding regions 总被引:2,自引:0,他引:2
We have determined the nucleotide sequence of a 1,200-base pair (bp)
genomic fragment that includes the kappa-chain constant-region gene (C
kappa) from two species of native Australian rodents, Rattus leucopus
cooktownensis and Rattus colletti. Comparison of these sequences with each
other and with other rodent C kappa genes shows three surprising features.
First, the coding regions are diverging at a rate severalfold higher than
that of the nearby noncoding regions. Second, replacement changes within
the coding region are accumulating at a rate at least as great as that of
silent changes. Third, most of the amino acid replacements are localized in
one region of the C kappa domain--namely, the carboxy-terminal "bends" in
the alpha-carbon backbone. These three features have previously been
described from comparisons of the two allelic forms of C kappa genes in R.
norvegicus. These data imply the existence of considerable evolutionary
constraints on the noncoding regions (based on as yet undetermined
functions) or powerful positive selection to diversify a portion of the
constant-region domain (whose physiological significance is not known).
These surprising features of C kappa evolution appear to be characteristic
only of closely related C kappa genes, since comparison of rodent with
human sequences shows the expected greater conservation of coding regions,
as well as a predominance of silent nucleotide substitutions within the
coding regions.
相似文献
15.
Previous studies of the small Southern Hemisphere family Atherospermataceae have drawn contradictory conclusions regarding the number of transantarctic disjunctions and role of transoceanic dispersal in its evolution. Clarification of intergeneric relationships is critical to resolving (1) whether the two Chilean species, Laurelia sempervirens and Laureliopsis philippiana, are related to different Austral-Pacific species, implying two transantarctic disjunctions as suggested by morphology; (2) where the group is likely to have originated; and (3) whether observed disjunctions reflect the breakup of Gondwana. We analyzed chloroplast DNA sequences from six regions (the rbcL gene, the rpl16 intron, and the trnL-trnF, trnT-trnL, psbA-trnH, and atpB-rbcL spacer regions; for all six regions, 4,372 bp) for all genera and most species of Atherospermataceae, using parsimony and maximum likelihood (ML). The family's sister group, the Chilean endemic Gomortega nitida (Gomortegaceae), was used to root the tree. Parsimony and ML yielded identical single best trees that contain three well-supported clades (> or = 75% bootstrap): Daphnandra and Doryphora from south-eastern Australia; Atherosperma and Nemuaron from Australia-Tasmania and New Caledonia, respectively; and Laurelia novac-zelandiac and Laureliopsis philippiana from New Zealand and Chile, respectively. The second Chilean species, Laurelia sempervirens, is sister to this last clade. Likelihood ratio testing did not reject the molecular clock assumption for the rbcL data, which can therefore be used for divergence time estimates. The atherosperm fossil record, which goes back to the Upper Cretaceous, includes pollen, wood, and leaf fossils from Europe, Africa, South America, Antarctica, New Zealand, and Tasmania. Calibration of rbcL substitution rates with the fossils suggests an initial diversification of the family at 100-140 million years ago (MYA), probably in West Gondwana, early entry into Antarctica, and long-distance dispersal to New Zealand and New Caledonia at 50-30 MYA by the ancestors of L. novae-zelandiae and Nemuaron. 相似文献
16.
Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions. 相似文献
17.
Carvalho AM Freitas AT Oliveira AL Sagot MF 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):126-140
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi. 相似文献
18.
Key for protein coding sequences identification: computer analysis of codon strategy. 总被引:3,自引:3,他引:0 下载免费PDF全文
The signal qualifying an AUG or GUG as an initiator in mRNAs processed by E. coli ribosomes is not found to be a systematic, literal homology sequence. In contrast, stability analysis reveals that initiators always occur within nucleic acid domains of low stability, for which a high A/U content is observed. Since no aminoacid selection pressure can be detected at N-termini of the proteins, the A/U enrichment results from a biased usage of the code degeneracy. A computer analysis is presented which allows easy detection of the codon strategy. N-terminal codons carry rather systematically A or U in third position, which suggests a mechanism for translation initiation and helps to detect protein coding sequences in sequenced DNA. 相似文献
19.
Compared to protein-coding sequences, the evolution of noncoding sequences and the selective constraints placed on these sequences is not well characterized. To compare the evolution of coding and noncoding sequences, we have conducted a survey for DNA polymorphism at five randomly chosen loci among a diverse collection of 81 strains of Saccharomyces cerevisiae. Average rates of both polymorphism and divergence are 40% lower at noncoding sites and 90% lower at nonsynonymous sites in comparison to synonymous sites. Although noncoding and coding sequences show substantial variability in ratios of polymorphism to divergence, two of the loci, MLS1 and PDR10, show a higher rate of polymorphism at noncoding compared to synonymous sites. The high rate of polymorphism is not accompanied by a high rate of divergence and is limited to a few small regions. These hypervariable regions include sites with three segregating bases at a single site and adjacent polymorphic sites. We show that this clustering of polymorphic sites is significantly greater than one would expect on the basis of the spacing between polymorphic fourfold degenerate sites. Although hypervariable noncoding sequences could result from selection on regulatory mutations, they could also result from transient mutational hotspots. 相似文献
20.
The nuclear internal transcribed spacers, the 5.8S subunit, ~560 bp of the small subunit, and ~320 bp of the large subunit of the nuclear ribosomal DNA repeat from 17 species of Monilinia and eight species of closely related genera were sequenced. Phylogenies were constructed using maximum parsimony. The results support the hypothesis that Monilinia is not monophyletic. A fundamental distinction was found between the section Junctoriae and the section Disjunctoriae. Four evolutionary lineages were identified within the Disjunctoriae: one species on Crataegus, one group of species on dry stone fruits of rosaceous hosts, one group of species on capsular fruits of ericaceous hosts, and one group of species on sweet berry fruits of ericaceous hosts. Comparisons between branching topologies of hosts and Monilinia species suggest that although cospeciation among hosts and parasites has been the rule, several host jumps have taken place. Sclerotinia pirolae was determined to be a true member of the Disjunctoriae. The closest taxon groups to the Junctoriae were found to be Botrytis and Sclerotinia, with Ciborinia being the closest taxon group to the Disjunctoriae. There is evidence of an increased rate of ssrRNA evolution in the lineage of species that attack ericaceous berries. 相似文献