首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
A computer-assisted analysis was made of 24 complete nucleotide sequences selected from the vertebrate retroviruses to represent the ten viral groups. The conclusions of this analysis extend and strengthen the previously made hypothesis on the Moloney murine leukemia virus: The evolution of the nucleotide sequence appears to have occurred mainly through at least three overlapping levels of duplication: (1) The distributions of overrepresented (3–6)-mers are consistent with the universal rule of a trend toward TG/CT excess and with the persistence of a certain degree of symmetry between the two strands of DNA. This suggests one or several original tandemly repeated sequences and some inverted duplications. (2) The existence of two general core consensuses at the level of these (3–6)-mers supports the hypothesis of a common evolutionary origin of vertebrate retroviruses. Consensuses more specific to certain sequences are compatible with phylogenetic trees established independently. The consensuses could correspond to intermediary evolutionary stages. (3) Most of the (3–6)-mers with a significantly higher than average frequency appear to be internally repeated (with monomeric or oligomeric internal iterations) and seem to be at least partly the cause of the bias observed by other researchers at the level of retroviral nucleotide composition. They suggest a third evolutionary stage by slippage-like stepwise local duplications. Received: 3 January 1996 / Accepted: 27 March 1996  相似文献   

2.
To study the mechanisms for local evolutionary changes in DNA sequences involving slippage-type insertions and deletions, an alignment approach is explored that can consider the posterior probabilities of alignment models. Various patterns of insertion and deletion that can link the ancestor and descendant sequences are proposed and evaluated by simulation and compared by the Markov chain Monte Carlo (MCMC) method. Analyses of pseudogenes reveal that the introduction of the parameters that control the probability of slippage-type events markedly augments the probability of the observed sequence evolution, arguing that a cryptic involvement of slippage occurrences is manifested as insertions and deletions of short nucleotide segments. Strikingly, approximately 80% of insertions in human pseudogenes and approximately 50% of insertions in murids pseudogenes are likely to be caused by the slippage-mediated process, as represented by BC in ABCD --> ABCBCD. We suggest that, in both human and murids, even very short repetitive motifs, such as CAGCAG, CACACA, and CCCC, have approximately 10- to 15-fold susceptibility to insertions and deletions, compared to nonrepetitive sequences. Our protocol, namely, indel-MCMC, thus seems to be a reasonable approach for statistical analyses of the early phase of microsatellite evolution.  相似文献   

3.
Gene duplication is an important evolutionary mechanism that can result in functional divergence in paralogs due to neo-functionalization or sub-functionalization. Consistent with functional divergence after gene duplication, recent studies have shown accelerated evolution in retained paralogs. However, little is known in general about the impact of this accelerated evolution on the molecular functions of retained paralogs. For example, do new functions typically involve changes in enzymatic activities, or changes in protein regulation? Here we study the evolution of posttranslational regulation by examining the evolution of important regulatory sequences (short linear motifs) in retained duplicates created by the whole-genome duplication in budding yeast. To do so, we identified short linear motifs whose evolutionary constraint has relaxed after gene duplication with a likelihood-ratio test that can account for heterogeneity in the evolutionary process by using a non-central chi-squared null distribution. We find that short linear motifs are more likely to show changes in evolutionary constraints in retained duplicates compared to single-copy genes. We examine changes in constraints on known regulatory sequences and show that for the Rck1/Rck2, Fkh1/Fkh2, Ace2/Swi5 paralogs, they are associated with previously characterized differences in posttranslational regulation. Finally, we experimentally confirm our prediction that for the Ace2/Swi5 paralogs, Cbk1 regulated localization was lost along the lineage leading to SWI5 after gene duplication. Our analysis suggests that changes in posttranslational regulation mediated by short regulatory motifs systematically contribute to functional divergence after gene duplication.  相似文献   

4.
Summary The present study is a detailed computer-assisted analysis of the feline leukemia virus gag gene nucleotide sequence together wit its flanking sequences (ST-FeLV GAG) that is compared with the aligned sectors of the Moloney strain of murine leukemia virus (Mo-MuLV GAG) and of three strains of feline sarcoma virus. It shows that perfectly matched repeated oligomers up to 13 nucleotides long are overrepresented and scattered throughout both ST-FeLV GAG and Mo-MuLV GAG, in noncoding and coding sectors, with no stringent correlation to codon usage in ST-FeLV gPr80gag. Many repeated oligomers share a core consensus that is intriguingly part of the inverted repeat at the termini of the long terminal repeat. Local scrambled repetitions of nucleotide subsequences have been found; they suggest a model of molecular evolution byslippage-like mechanisms. Thus, viral genomes could be subject to the same evolutionary mechanisms that are now known to be operating extensively in eukaryotic genomes. The data are discussed in light of putative patterns of molecular evolution.  相似文献   

5.
Due to the low complexity associated with their sequences, uncovering the evolutionary and functional relationships in highly repetitive proteins such as elastin, spider silks, resilin and abductin represents a significant challenge. Using the polymeric extracellular protein elastin as a model system, we present a novel computational approach to the study of sequence, function and evolutionary relationships in repetitive proteins. To address the absence of accurate sequence annotation for repetitive proteins such as elastin, we have constructed a new database repository, ElastoDB (http://theileria.ccb.sickkids.ca/elastin), dedicated to the storage and retrieval of elastin sequence- and meta-data. To analyse their sequence relationships we have devised an innovative new method, based on the identification of overrepresented 'fuzzy' motifs. Applying this method to elastin sequences derived from mammals, chicken, Xenopus and zebrafish resulted in the identification of both highly conserved, and taxon and species specific motifs that likely represent important functional and/or structural elements. The relative spacing and organization of these elements suggest that exon duplication events have played an important role in the evolution of elastin. Clustering of similarity profiles generated for sets of exons and introns, revealed a pattern of putative duplication events involving exons 15-30 in mammalian and chicken elastins, exons 20-31 in both zebrafish elastins, exons 15-20 in fugu elastin and exons 35-50 in Xenopus elastin 1. The success of this approach for elastin offers a promising route to the elucidation of sequence, structure, function and evolutionary relationships for many other proteins with sequences of low complexity.  相似文献   

6.
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a 'retroviral' score of 890-2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1-2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the evolutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.  相似文献   

7.
Following the original idea of Maynard Smith on evolution of the protein sequence space, a novel tool is developed that allows the "space walk", from one sequence to its likely evolutionary relative and further on. At a given threshold of identity between consecutive steps, the walks of many steps are possible. The sequences at the ends of the walks may substantially differ from one another. In a sequence space of randomized (shuffled) sequences the walks are very short. The approach opens new perspectives for protein evolutionary studies and sequence annotation.  相似文献   

8.
9.
In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions.  相似文献   

10.
MOTIVATION: Comparative sequence analysis is the essence of many approaches to genome annotation. Heuristic alignment algorithms utilize similar seed pairs to anchor an alignment. Some applications of local alignment algorithms (e.g. phylogenetic footprinting) would benefit from including prior knowledge (e.g. binding site motifs) in the alignment building process. RESULTS: We introduce predefined sequence patterns as anchor points into a heuristic local alignment strategy. We extended the BLASTZ program for this purpose. A set of seed patterns is either given as consensus sequences in IUPAC code or position-weight-matrices. Phylogenetic footprinting of promoter regions is one of many potential applications for the SITEBLAST software. AVAILABILITY: The source code is freely available to the academic community from http://corg.molgen.mpg.de/software  相似文献   

11.
12.
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

13.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

14.
A locus harboring a human endogenous retroviral LTR (long terminal repeat) was mapped on the short arm of human chromosome 7 (7p22), and its evolutionary history was investigated. Sequences of two human genome fragments that were homologous to the LTR-flanking sequences were found in human genome databases: (1) an LTR-containing DNA fragment from region 3p13 of the human genome, which includes clusters of olfactory receptor genes and pseudogenes; and (2) a fragment of region 21q22.1 lacking LTR sequences. PCR analysis demonstrated that LTRs with highly homologous flanking sequences could be found in the genomes of human, chimp, gorilla, and orangutan, but were absent from the genomes of gibbon and New World monkeys. A PCR assay with a primer set corresponding to the sequence from human Chr 3 allowed us to detect LTR-containing paralogous sequences on human chromosomes 3, 4, 7, and 11. The divergence times for the LTR-flanking sequences on chromosomes 3 and 7, and the paralogous sequence on chromosome 21, were evaluated and used to reconstruct the order of duplication events and retroviral insertions. (1) An initial duplication event that occurred 14-17 Mya and before LTR insertion - produced two loci, one corresponding to that located on Chr 21, while the second was the ancestor of the loci on chromosomes 3 and 7. (2) Insertion of the LTR (most probably as a provirus) into this ancestral locus took place 13 Mya. (3) Duplication of the LTR-containing ancestral locus occurred 11 Mya, forming the paralogous modern loci on Chr 3 and 7.  相似文献   

15.
Simple sequences     
Simple sequences (or microsatellites) are stretches of monotonous repetitions of short (1–5 bp) nucleotide motifs that are distributed across the whole genome in eukaryotes. They are probably generated by slippage during replication and their primary mutation rate seems to be controlled predominantly by the efficiency of the mismatch repair system. Although most mutations in simple sequence loci appear to be neutral, some mutations in particular stretches have been implicated as having a role in human genetic diseases.  相似文献   

16.
Rapoport AE  Trifonov EN 《Gene》2011,488(1-2):41-45
Linguistic (word count) analysis of prokaryotic genome sequences, by Shannon N-gram extension, reveals that the dominant hidden motifs in A+T rich genomes are T(A)(T)A and G(A)(T)C with uncertain number of repeating A and T. Since prokaryotic sequences are largely protein-coding, the motifs would correspond to amphipathic alpha-helices with alternating lysine and phenylalanine as preferential polar and non-polar residues. The motifs are also known in eukaryotes, as nucleosome positioning patterns. Their existence in prokaryotes as well may serve for binding of histone-like proteins to DNA. In this case the above patterns in prokaryotes may be considered as "anticipated" nucleosome positioning patterns which, quite likely, existed in prokaryotic genomes before the evolutionary separation between eukaryotes and prokaryotes.  相似文献   

17.
In the class of repeated sequences that occur in DNA, minisatellites have been found polymorphic and became useful tools in genetic mapping and forensic studies. They consist of a heterogeneous tandem array of a short repeat unit. The slightly different units along the array are called variants. Minisatellites evolve mainly through tandem duplications and tandem deletions of variants. Jeffreys et al. (1997) devised a method to obtain the sequence of variants along the array in a digital code and called such sequences maps. Minisatellite maps give access to the detail of mutation processes at work on such loci. In this paper, we design an algorithm to compare two maps under an evolutionary model that includes deletion, insertion, mutation, tandem duplication, and tandem deletion of a variant. Our method computes an optimal alignment in reasonable time; and the alignment score, i.e., the weighted sum of its elementary operations, is a distance metric between maps. The main difficulty is that the optimal sequence of operations depends on the order in which they are applied to the map. Taking the maps of the minisatellite MSY1 of 609 men, we computed all pairwise distances and reconstructed an evolutionary tree of these individuals. MSY1 (DYF155S1) is a hypervariable locus on the Y chromosome. In our tree, the populations of some haplogroups are monophyletic, showing that one can decipher a microevolutionary signal using minisatellite maps comparison.  相似文献   

18.
19.
Comparative studies of vertebrate gene promoter regions seldom detect gross rearrangements ('promoter shuffling') since such analyses usually employ relatively similar DNA sequences. Conversely, attempts to compare evolutionarily more divergent promoter sequences have been largely unsuccessful owing to the inability of conventional alignment procedures to deal with gross rearrangements. These limitations have been circumvented in the present study by using the novel technique of complexity analysis to identify modular components ('blocks') in the growth hormone (GH) gene promoter sequences of some 22 vertebrate species, from salmon to human. Significant rearrangement of blocks was found to have occurred, indicating that they have evolved as independent units. Some blocks appear to be ubiquitous, whereas others are restricted to a specific taxon. Considerable variation between orthologous GH gene promoters was apparent in terms of block length, copy number and relative location. It may be inferred that a wide variety of different mutational mechanisms have operated upon the GH gene promoter over evolutionary time. These include gross changes such as deletion, duplication, amplification, elongation, contraction, transposition, inversion and fusion, as well as the slow, steady accumulation of single base-pair substitutions. Thus the patchwork structure of the modular GH promoter region, and those of its paralogous GH2 and prolactin (PRL) counterparts, have continually been shuffled into new combinations through the rearrangement of pre-existing blocks. Although some of these changes may have had no influence on promoter function, others could have served to alter either the level of gene expression or the responsiveness of the promoter to external stimuli.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号