首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Human subtelomeric segmental duplications ('subtelomeric repeats') comprise about 25% of the most distal 500 kb and 80% of the most distal 100 kb in human DNA. A systematic analysis of the duplication substructure of human subtelomeric regions was done in order to develop a detailed understanding of subtelomeric sequence organization and a nucleotide sequence-level characterization of subtelomeric duplicon families.  相似文献   

2.
An estimated 5% of the human genome consists of interspersed duplications that have arisen over the past 35 million years of evolution. Two categories of such recently duplicated segments can be distinguished: segmental duplications between nonhomologous chromosomes (transchromosomal duplications) and duplications mainly restricted to a particular chromosome (chromosome-specific duplications). Many of these duplications exhibit an extraordinarily high degree of sequence identity at the nucleotide level (>95%) and span large genomic distances (1-100 kb). Preliminary analyses indicate that these same regions are targets for rapid evolutionary turnover among the genomes of closely related primates. The dynamic nature of these regions because of recurrent chromosomal rearrangement, and their ability to create fusion genes from juxtaposed cassettes suggest that duplicative transposition was an important force in the evolution of our genome.  相似文献   

3.
An unexpected finding of the human genome was the large fraction of the genome organized as blocks of interspersed duplicated sequence. We provide a comparative and phylogenetic analysis of a highly duplicated region of 16p12.2, which is composed of at least four different segmental duplications spanning in excess of 160 kb. We contrast the dispersal of two different segmental duplications (LCR16a and LCR16u). LCR16a, a 20 kb low-copy repeat sequence A from chromosome 16, was shown previously to contain a rapidly evolving novel hominoid gene family (morpheus) that had expanded within the last 10 million years of great ape/human evolution. We compare the dispersal of this genomic segment with a second adjacent duplication called LCR16u. The duplication contains a second putative gene family (KIAA0220/SMG1) that is represented approximately eight times within the human genome. A high degree of sequence identity (approximately 98%) was observed among the various copies of LCR16u. Comparative analyses with Old World monkey species show that LCR16a and LCR16u originated from two distinct ancestral loci. Within the human genome, at least 70% of the LCR16u copies were duplicated in concert with the LCR16a duplication. In contrast, only 30% of the chimpanzee loci show an association between LCR16a and LCR16u duplications. The data suggest that the two copies of genomic sequence were brought together during the chimpanzee/human divergence and were subsequently duplicated as a larger cassette specifically within the human lineage. The evolutionary history of these two chromosome-specific duplications supports a model of rapid expansion and evolutionary turnover among the genomes of man and the great apes.  相似文献   

4.
Standard methods of DNA sequence analysis assume that sequences evolve independently, yet this assumption may not be appropriate for segmental duplications that exchange variants via interlocus gene conversion (IGC). Here, we use high quality multiple sequence alignments from well-annotated segmental duplications to systematically identify IGC signals in the human reference genome. Our analysis combines two complementary methods: (i) a paralog quartet method that uses DNA sequence simulations to identify a statistical excess of sites consistent with inter-paralog exchange, and (ii) the alignment-based method implemented in the GENECONV program. One-quarter (25.4%) of the paralog families in our analysis harbor clear IGC signals by the quartet approach. Using GENECONV, we identify 1477 gene conversion tracks that cumulatively span 1.54 Mb of the genome. Our analyses confirm the previously reported high rates of IGC in subtelomeric regions and Y-chromosome palindromes, and identify multiple novel IGC hotspots, including the pregnancy specific glycoproteins and the neuroblastoma breakpoint gene families. Although the duplication history of a paralog family is described by a single tree, we show that IGC has introduced incredible site-to-site variation in the evolutionary relationships among paralogs in the human genome. Our findings indicate that IGC has left significant footprints in patterns of sequence diversity across segmental duplications in the human genome, out-pacing the contributions of single base mutation by orders of magnitude. Collectively, the IGC signals we report comprise a catalog that will provide a critical reference for interpreting observed patterns of DNA sequence variation across duplicated genomic regions, including targets of recent adaptive evolution in humans.  相似文献   

5.
6.
Nuclear integrations of mitochondrial DNA (numts) are widespread among eukaryotes, although their prevalence differs greatly among taxa. Most knowledge of numt evolution comes from analyses of whole-genome sequences of single species or, more recently, from genomic comparisons across vast phylogenetic distances. Here we employ a comparative approach using human and chimpanzee genome sequence data to infer differences in the patterns and processes underlying numt integrations. We identified 66 numts that have integrated into the chimpanzee nuclear genome since the human–chimp divergence, which is significantly greater than the 37 numts observed in humans. By comparing these closely related species, we accurately reconstructed the preintegration target site sequence and deduced nucleotide changes associated with numt integration. From >100 species-specific numts, we quantified the frequency of small insertions, deletions, duplications, and instances of microhomology. Most human and chimpanzee numt integrations were accompanied by microhomology and short indels of the kind typically observed in the nonhomologous end-joining pathway of DNA double-strand break repair. Human-specific numts have integrated into regions with a significant deficit of transposable elements; however, the same was not seen in chimpanzees. From a separate data set, we also found evidence for an apparent increase in the rate of numt insertions in the last common ancestor of humans and the great apes using a polymerase chain reaction–based screen. Last, phylogenetic analyses indicate that mitochondrial-numt alignments must be at least 500 bp, and preferably >1 kb in length, to accurately reconstruct hominoid phylogeny and recover the correct point of numt insertion. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

7.
Vertebrates' plasmatic apolipoproteins and a few number of lipases in their metabolism present sequence homologies. They are grouped in genes families. The four exons apolipoproteins gene family includes nine human genes: the divergence rate of their sequences allows to place the first ancestral gene very high in the phylogenetic tree of the evolution. However, a more recent duplication of apolipoprotein C-I gene dating from 40 millions years, may be a phylogenetic marker for the radiation of Monkeys. Pancreatic lipase and isoforms, lipoprotein-lipase and hepatic triacylglycerol-lipase form by their homologies a "superfamily" of genes, which also includes yolk proteins of Dipterians eggs. Sequence homologies of PL, LPL and HL are analysed and compared with multiple alignments of amino-acids and nucleotides on spreadsheets. From these comparisons we may characterize four classes of phylogenetic markers: 1) repetitive DNA sequence (Alu, B1, PRE-1) appeared during Mammals evolution, 2) short insertions or deletions (within N-terminal domain) and a gene conversion in guinea-pig lineage, 3) a progressive reduction of intron number during the lipases evolution, 4) several duplications of genes which have produced the five genes of this superfamily currently known in the human genome.  相似文献   

8.
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.  相似文献   

9.
Neurotrophins are structurally related proteins regulating brain development and function. Molecular evolution studies of neurotrophins and their receptors are essential for understanding the mechanisms underlying the coevolution processes of these gene families and how they correlate with the increased complexity of the vertebrate nervous system. In order to improve our current knowledge of the molecular evolution of neurotrophins and receptors, we have collected all information available in the literature and analyzed the genome database for each of them. Statistical analysis of aminoacid and nucleotide sequences of the neurotrophin and Trk family genes was applied to both complete genes and mature sequences, and different phylogenetic methods were used to compare aminoacid and nucleotide sequences variability among the different species. All collected data favor a model in which several rounds of genome duplications might have facilitated the generation of the many different neurotrophins and the acquisition of specific different functions correlated with the increased complexity of the vertebrate nervous system during evolution. We report findings that refine the structure of the evolutionary trees for neurotrophins and Trk receptors families, indicate different rates of evolution for each member of the two families, and newly demonstrate that the NGF-like genes found in Fowlpox and Canarypox viruses are closely related to reptile NGF.  相似文献   

10.
Yu Z  Wright SI  Bureau TE 《Genetics》2000,156(4):2019-2031
While genome-wide surveys of abundance and diversity of mobile elements have been conducted for some class I transposable element families, little is known about the nature of class II transposable elements on this scale. In this report, we present the results from analysis of the sequence and structural diversity of Mutator-like elements (MULEs) in the genome of Arabidopsis thaliana (Columbia). Sequence similarity searches and subsequent characterization suggest that MULEs exhibit extreme structure, sequence, and size heterogeneity. Multiple alignments at the nucleotide and amino acid levels reveal conserved, potentially transposition-related sequence motifs. While many MULEs share common structural features to Mu elements in maize, some groups lack characteristic long terminal inverted repeats. High sequence similarity and phylogenetic analyses based on nucleotide sequence alignments indicate that many of these elements with diverse structural features may remain transpositionally competent and that multiple MULE lineages may have been evolving independently over long time scales. Finally, there is evidence that MULEs are capable of the acquisition of host DNA segments, which may have implications for adaptive evolution, both at the element and host levels.  相似文献   

11.
12.
Recent segmental and gene duplications in the mouse genome   总被引:2,自引:0,他引:2       下载免费PDF全文

Background

The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies.

Results

We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice.

Conclusion

Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.
  相似文献   

13.
Relative to genomes of other sequenced organisms, the human genome appears particularly enriched for large, highly homologous segmental duplications (> or =90% sequence identity and > or =10 kbp in length). The molecular basis for this enrichment is unknown. We sought to gain insight into the mechanism of origin, by systematically examining sequence features at the junctions of duplications. We analyzed 9,464 junctions within regions of high-quality finished sequence from a genomewide set of 2,366 duplication alignments. We observed a highly significant (P<.0001) enrichment of Alu short interspersed element (SINE) sequences near or within the junction. Twenty-seven percent of all segmental duplications terminated within an Alu repeat. The Alu junction enrichment was most pronounced for interspersed segmental duplications separated by > or =1 Mb of intervening sequence. Alu elements at the junctions showed higher levels of divergence, consistent with Alu-Alu-mediated recombination events. When we classified Alu elements into major subfamilies, younger elements (AluY and AluS) accounted for the enrichment, whereas the oldest primate family (AluJ) showed no enrichment. We propose that the primate-specific burst of Alu retroposition activity (which occurred 35-40 million years ago) sensitized the ancestral human genome for Alu-Alu-mediated recombination events, which, in turn, initiated the expansion of gene-rich segmental duplications and their subsequent role in nonallelic homologous recombination.  相似文献   

14.
15.
Reticulate, or non-bifurcating, evolution is now recognized as an important phenomenon shaping the histories of many organisms. It appears to be particularly common in plants, especially in ferns, which have relatively few barriers to intra- and interspecific hybridization. Reticulate evolutionary patterns have been recognized in many fern groups, though very few have been studied rigorously using modern molecular phylogenetic techniques in order to determine the causes of the reticulate patterns. In the current study, we examine patterns of branching and reticulate evolution in the genus Dryopteris, the woodferns. The North American members of this group have long been recognized as a classic example of reticulate evolution in plants, and we extend analysis of the genus to all 30 species in the New World, as well as numerous taxa from other regions. We employ sequence data from the plastid and nuclear genomes and use maximum parsimony (MP), maximum likelihood (ML), Bayesian inference (BI), and divergence time analyses to explore the relationships of New World Dryopteris to other regions and to reconstruct the timing and events which may have led to taxa displaying reticulate rather than strictly branching histories. We find evidence for reticulation among both the North and Central/South American groups of species, and our data support a classic hypothesis for reticulate evolution via allopolyploid speciation in the North America taxa, including an extinct diploid progenitor in this group. In the Central and South American species, we find evidence of extensive reticulation involving unknown ancestors from Asia, and we reject deep coalescent processes such as incomplete lineage sorting in favor of more recent intercontinental hybridization and chloroplast capture as an explanation for the origin of the Latin American reticulate taxa.  相似文献   

16.
Tillier ER  Biro L  Li G  Tillo D 《Proteins》2006,63(4):822-831
Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.  相似文献   

17.
18.
In phylogenetic inference, an evolutionary model describes the substitution processes along each edge of a phylogenetic tree. Misspecification of the model has important implications for the analysis of phylogenetic data. Conventionally, however, the selection of a suitable evolutionary model is based on heuristics or relies on the choice of an approximate input tree. We introduce a method for model Selection in Phylogenetics based on linear INvariants (SPIn), which uses recent insights on linear invariants to characterize a model of nucleotide evolution for phylogenetic mixtures on any number of components. Linear invariants are constraints among the joint probabilities of the bases in the operational taxonomic units that hold irrespective of the tree topologies appearing in the mixtures. SPIn therefore requires no input tree and is designed to deal with nonhomogeneous phylogenetic data consisting of multiple sequence alignments showing different patterns of evolution, for example, concatenated genes, exons, and/or introns. Here, we report on the results of the proposed method evaluated on multiple sequence alignments simulated under a variety of single-tree and mixture settings for both continuous- and discrete-time models. In the simulations, SPIn successfully recovers the underlying evolutionary model and is shown to perform better than existing approaches.  相似文献   

19.

Background  

Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates.  相似文献   

20.
Extensive intraspecific variation in the chloroplast trnL(UAA)-trnF(GAA) spacer of model plant Arabidopsis lyrata is caused by multiple copies of a tandemly repeated trnF pseudogene undergoing parallel independent changes in copy number. Linkage disequilibrium and secondary structure analyses indicate that the diversification of pseudogene copies is driven by complex processes of structurally mediated illegitimate recombination. Disperse repeats sharing similar secondary structures interact, facilitating reciprocal exchange of structural motifs between copies via intramolecular and intermolecular recombinations, forming chimeric sequences and iterative expansion and contraction in pseudogene copy numbers. Widely held assumptions that chloroplast sequence evolution is simple and structural changes are informative are violated. Our findings have important implications for the use of this highly variable region in Brassicaceae studies. The reticulate evolution and nonindependent nucleotide substitution render the pseudogene inappropriate for standard phylogenetic reconstruction, but over short evolutionary timescales they may be useful for assessing gene flow, hybridization and introgression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号