首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
Garcia SP  Pinho AJ 《PloS one》2011,6(12):e29344
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species.  相似文献   

2.
It is generally assumed that mitochondrial genomes are uniparentally transmitted, homoplasmic and nonrecombining. However, these assumptions draw largely from early studies on animal mitochondrial DNA (mtDNA). In this review, we show that plants, animals and fungi are all characterized by episodes of biparental inheritance, recombination among genetically distinct partners, and selfish elements within the mitochondrial genome, but that the extent of these phenomena may vary substantially across taxa. We argue that occasional biparental mitochondrial transmission may allow organisms to achieve the best of both worlds by facilitating mutational clearance but continuing to restrict the spread of selfish genetic elements. We also show that methodological biases and disproportionately allocated study effort are likely to have influenced current estimates of the extent of biparental inheritance, heteroplasmy and recombination in mitochondrial genomes from different taxa. Despite these complications, there do seem to be discernible similarities and differences in transmission dynamics and likelihood of recombination of mtDNA in plant, animal and fungal taxa that should provide an excellent opportunity for comparative investigation of the evolution of mitochondrial genome dynamics.  相似文献   

3.
MOTIVATION: Analysis of statistical properties of DNA sequences is important for evolutional biology as well as for DNA probe and PCR technologies. These technologies, in turn, can be used for organism identification, which implies applications in the diagnosis of infectious diseases, environmental studies, etc. RESULTS: We present results of the correlation analysis of distributions of the presence/absence of short nucleotide subsequences of different length ('n-mers', n = 5-20) in more than 1500 microbial and virus genomes, together with five genomes of multicellular organisms (including human). We calculate whether a given n-mer is present or absent (frequency of presence) in a given genome, which is not the usually calculated number of appearances of n-mers in one or more genomes (frequency of appearance). For organisms that are not close relatives of each other, the presence/absence of different 7-20mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers in this range appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes leads to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms and possibly individual genomes of the same species including human with a low probability of error.  相似文献   

4.
Naya H  Romero H  Carels N  Zavala A  Musto H 《FEBS letters》2001,501(2-3):127-130
In unicellular species codon usage is determined by mutational biases and natural selection. Among prokaryotes, the influence of these factors is different if the genome is skewed towards AT or GC, since in AT-rich organisms translational selection is absent. On the other hand, in AT-rich unicellular eukaryotes the two factors are present. In order to understand if GC-rich genomes display a similar behavior, the case of Chlamydomonas reinhardtii was studied. Since we found that translational selection strongly influences codon usage in this species, we conclude that there is not a common pattern among unicellular organisms.  相似文献   

5.
Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.  相似文献   

6.
Meiosis in triploids faces the seemingly insuperable difficulty of dividing an odd number of chromosome sets by two. Triploid vertebrates usually circumvent this problem through either asexuality or some forms of hybridogenesis, including meiotic hybridogenesis that involve a reproductive community of different ploidy levels and genome composition. Batura toads (Bufo baturae; 3n = 33 chromosomes), however, present an all-triploid sexual reproduction. This hybrid species has two genome copies carrying a nucleolus-organizing region (NOR+) on chromosome 6, and a third copy without it (NOR-). Males only produce haploid NOR+ sperm, while ova are diploid, containing one NOR+ and one NOR- set. Here, we conduct sibship analyses with co-dominant microsatellite markers so as (i) to confirm the purely clonal and maternal transmission of the NOR- set, and (ii) to demonstrate Mendelian segregation and recombination of the NOR+ sets in both sexes. This new reproductive mode in vertebrates ('pre-equalizing hybrid meiosis') offers an ideal opportunity to study the evolution of non-recombining genomes. Elucidating the mechanisms that allow simultaneous transmission of two genomes, one of Mendelian, the other of clonal inheritance, might shed light on the general processes that regulate meiosis in vertebrates.  相似文献   

7.
Warm-blooded vertebrates show large-scale variation in G + C content along their chromosomes, a pattern which appears to be largely absent from cold-blooded vertebrates. However, compositional variation in poikilotherms has generally been studied by ultracentrifugation rather than sequence analysis. In this paper, we investigate the compositional properties of coding sequences from a broad range of vertebrate poikilotherms using DNA sequence analysis. We find that on average poikilotherms have lower third-codon position GC contents (GC3) than homeotherms but that some poikilotherms have higher mean GC3 values. We find that most poikilotherms have lower variation in GC3 than homeotherms but that there is a correlation between GC12 and GC3 for some species, indicating that there is systematic variation in base composition across their genomes. We also demonstrate that the GC3 of genes in the zebrafish, Danio rerio, is correlated with that in humans, suggesting that vertebrates share a basic isochore structure. However, we find no correlation between either the mean GC3 or the standard deviation in GC3 and body temperature.  相似文献   

8.
E R Waters  B A Schaal 《Génome》1996,39(1):150-154
Hybridization is a common phenomenon that results in complex genomes. How ancestral genomes interact in hybrids has long been of great interest. Recombination among ancestral genomes may increase or decrease genetic variation. This study examines rDNA from members of the Brassica triangle for evidence of gene conversion across ancestral genomes. Gene conversion is a powerful force in the evolution of multigene families. It has previously been shown that biased gene conversion can act to homogenize rDNA repeats within hybrid genomes. Here, we find no evidence for biased gene conversion or unequal crossing over across ancestral genomes in allotetraploid Brassica species. We suggest that, while basic genomic processes are shared by all organisms, the relative frequency of these processes and their evolutionary importance may differ among lineages. Key words : Brassica, rDNA, gene conversion, allotetraploids.  相似文献   

9.
The rates and patterns of molecular evolution in many eukaryotic organisms have been shown to be influenced by the compartmentalization of their genomes into fractions of distinct base composition and mutational properties. We have examined the Drosophila genome to explore relationships between the nucleotide content of large chromosomal segments and the base composition and rate of evolution of genes within those segments. Direct determination of the G + C contents of yeast artificial chromosome clones containing inserts of Drosophila melanogaster DNA ranging from 140-340 kb revealed significant heterogeneity in base composition. The G + C content of the large segments studied ranged from 36.9% G + C for a clone containing the hunchback locus in polytene region 85, to 50.9% G + C for a clone that includes the rosy region in polytene region 87. Unlike other organisms, however, there was no significant correlation between the base composition of large chromosomal regions and the base composition at fourfold degenerate nucleotide sites of genes encompassed within those regions. Despite the situation seen in mammals, there was also no significant association between base composition and rate of nucleotide substitution. These results suggest that nucleotide sequence evolution in Drosophila differs from that of many vertebrates and does not reflect distinct mutational biases, as a function of base composition, in different genomic regions. Significant negative correlations between codon-usage bias and rates of synonymous site divergence, however, provide strong support for an argument that selection among alternative codons may be a major contributor to variability in evolutionary rates within Drosophila genomes.  相似文献   

10.
Phylogenetic analyses based on mitochondrial DNA have yielded widely differing relationships among members of the arthropod lineage Arachnida, depending on the nucleotide coding schemes and models of evolution used. We enhanced taxonomic coverage within the Arachnida greatly by sequencing seven new arachnid mitochondrial genomes from five orders. We then used all 13 mitochondrial protein-coding genes from these genomes to evaluate patterns of nucleotide and amino acid biases. Our data show that two of the six orders of arachnids (spiders and scorpions) have experienced shifts in both nucleotide and amino acid usage in all their protein-coding genes, and that these biases mislead phylogeny reconstruction. These biases are most striking for the hydrophobic amino acids isoleucine and valine, which appear to have evolved asymmetrical exchanges in response to shifts in nucleotide composition. To improve phylogenetic accuracy based on amino acid differences, we tested two recoding methods: (1) removing all isoleucine and valine sites and (2) recoding amino acids based on their physiochemical properties. We find that these methods yield phylogenetic trees that are consistent in their support of ancient intraordinal divergences within the major arachnid lineages. Further refinement of amino acid recoding methods may help us better delineate interordinal relationships among these diverse organisms.  相似文献   

11.
Early biochemical experiments measuring nearest neighbor frequencies established that the set of dinucleotide relative abundance values (dinucleotide biases) is a remarkably stable property of the DNA of an organism. Analyses of currently available genomic sequence data have extended these earlier results, showing that the dinucleotide biases evaluated for successive 50 kb segments of a genome are significantly more similar to each other than to those of sequences from more distant organisms. From this perspective, the set of dinucleotide biases constitutes a 'genomic signature' that can discriminate sequences from different organisms. The dinucleotide biases appear to reflect species-specific properties of DNA stacking energies, modification, replication, and repair mechanisms. The genomic signature is useful for detecting pathogenicity islands in bacterial genomes.  相似文献   

12.
Trapping is a common sampling technique used to estimate fundamental population metrics of animal species such as abundance, survival and distribution. However, capture success for any trapping method can be heavily influenced by individuals’ behavioural plasticity, which in turn affects the accuracy of any population estimates derived from the data. Funnel trapping is one of the most common methods for sampling aquatic vertebrates, although, apart from fish studies, almost nothing is known about the effects of behavioural plasticity on trapping success. We used a full factorial experiment to investigate the effects that two common environmental parameters (predator presence and vegetation density) have on the trapping success of tadpoles. We estimated that the odds of tadpoles being captured in traps was 4.3 times higher when predators were absent compared to present and 2.1 times higher when vegetation density was high compared to low, using odds ratios based on fitted model means. The odds of tadpoles being detected in traps were also 2.9 times higher in predator-free environments. These results indicate that common environmental factors can trigger behavioural plasticity in tadpoles that biases trapping success. We issue a warning to researchers and surveyors that trapping biases may be commonplace when conducting surveys such as these, and urge caution in interpreting data without consideration of important environmental factors present in the study system. Left unconsidered, trapping biases in capture success have the potential to lead to incorrect interpretations of data sets, and misdirection of limited resources for managing species.  相似文献   

13.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

14.
Since 2006, numerous cases of bacterial symbionts with extraordinarily small genomes have been reported. These organisms represent independent lineages from diverse bacterial groups. They have diminutive gene sets that rival some mitochondria and chloroplasts in terms of gene numbers and lack genes that are considered to be essential in other bacteria. These symbionts have numerous features in common, such as extraordinarily fast protein evolution and a high abundance of chaperones. Together, these features point to highly degenerate genomes that retain only the most essential functions, often including a considerable fraction of genes that serve the hosts. These discoveries have implications for the concept of minimal genomes, the origins of cellular organelles, and studies of symbiosis and host-associated microbiota.  相似文献   

15.
Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods—as well as the evaluation and proper implementation of existing methods—relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, “core” genes—those displaying patterns of mutational biases shared among large numbers of genes—are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple “core” gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes—representing those having experienced lateral gene transfer—were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying “atypical” genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently—i.e., they had different sets of strengths and weaknesses—when identifying atypical genes within chimeric artificial genomes.  相似文献   

16.
Jianping Xu 《Génome》2005,48(6):951-958
Unlike nuclear genes and genomes, the inheritance of organelle genes and genomes does not follow Mendel's laws. In this mini-review, I summarize recent research progress on the patterns and mechanisms of the inheritance of organelle genes and genomes. While most sexual eukaryotes show uniparental inheritance of organelle genes and genomes in some progeny at least part of the time, increasing evidence indicates that strictly uniparental inheritance is rare and that organelle inheritance patterns are very diverse and complex. In contrast with the predominance of uniparental inheritance in multicellular organisms, organelle genes in eukaryotic microorganisms, such as protists, algae, and fungi, typically show a greater diversity of inheritance patterns, with sex-determining loci playing significant roles. The diverse patterns of inheritance are matched by the rich variety of potential mechanisms. Indeed, many factors, both deterministic and stochastic, can influence observed patterns of organelle inheritance. Interestingly, in multicellular organisms, progeny from interspecific crosses seem to exhibit more frequent paternal leakage and biparental organelle genome inheritance than those from intraspecific crosses. The recent observation of a sex-determining gene in the basidiomycete yeast Cryptococcus neoformans, which controls mitochondrial DNA inheritance, has opened up potentially exciting research opportunities for identifying specific molecular genetic pathways that control organelle inheritance, as well as for testing evolutionary hypotheses regarding the prevalence of uniparental inheritance of organelle genes and genomes.  相似文献   

17.
Along the gene, nucleotides in various codon positions tend to exert a slight but observable influence on the nucleotide choice at neighboring positions. Such context biases are different in different organisms and can be used as genomic signatures. In this paper, we will focus specifically on the dinucleotide composed of a third codon position nucleotide and its succeeding first position nucleotide. Using the 16 possible dinucleotide combinations, we calculate how well individual genes conform to the observed mean dinucleotide frequencies of an entire genome, forming a distance measure for each gene. It is found that genes from different genomes can be separated with a high degree of accuracy, according to these distance values. In particular, we address the problem of recent horizontal gene transfer, and how imported genes may be evaluated by their poor assimilation to the host's context biases. By concentrating on the third- and succeeding first position nucleotides, we eliminate most spurious contributions from codon usage and amino-acid requirements, focusing mainly on mutational effects. Since imported genes are expected to converge only gradually to genomic signatures, it is possible to question whether a gene present in only one of two closely related organisms has been imported into one organism or deleted in the other. Striking correlations between the proposed distance measure and poor homology are observed when Escherichia coli genes are compared to Salmonella typhi, indicating that sets of outlier genes in E. coli may contain a high number of genes that have been imported into E. coli, and not deleted in S. typhi. Received: 16 January 2001 / Accepted: 30 August 2001  相似文献   

18.
Hoolahan AH  Blok VC  Gibson T  Dowton M 《Genetica》2012,140(1-3):19-29
Recombination is typically assumed to be absent in animal mitochondrial genomes (mtDNA). However, the maternal mode of inheritance means that recombinant products are indistinguishable from their progenitor molecules. The majority of studies of mtDNA recombination assess past recombination events, where patterns of recombination are inferred by comparing the mtDNA of different individuals. Few studies assess contemporary mtDNA recombination, where recombinant molecules are observed as direct mosaics of known progenitor molecules. Here we use the potato cyst nematode, Globodera pallida, to investigate past and contemporary recombination. Past recombination was assessed within and between populations of G. pallida, and contemporary recombination was assessed in the progeny of experimental crosses of these populations. Breeding of genetically divergent organisms may cause paternal mtDNA leakage, resulting in heteroplasmy and facilitating the detection of recombination. To assess contemporary recombination we looked for evidence of recombination between the mtDNA of the parental populations within the mtDNA of progeny. Past recombination was detected between a South American population and several UK populations of G. pallida, as well as between two South American populations. This suggests that these populations may have interbred, paternal mtDNA leakage occurred, and the mtDNA of these populations subsequently recombined. This evidence challenges two dogmas of animal mtDNA evolution; no recombination and maternal inheritance. No contemporary recombination between the parental populations was detected in the progeny of the experimental crosses. This supports current arguments that mtDNA recombination events are rare. More sensitive detection methods may be required to adequately assess contemporary mtDNA recombination in animals.  相似文献   

19.
Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.  相似文献   

20.
Efficient enumeration of phylogenetically informative substrings.   总被引:1,自引:0,他引:1  
We study the problem of enumerating substrings that are common amongst genomes that share evolutionary descent. For example, one might want to enumerate all identical (therefore conserved) substrings that are shared between all mammals and not found in non-mammals. Such collection of substrings may be used to identify conserved subsequences or to construct sets of identifying substrings for branches of a phylogenetic tree. For two disjoint sets of genomes on a phylogenetic tree, a substring is called a tag if it is found in all of the genomes of one set and none of the genomes of the other set. We present a near-linear time algorithm that finds all tags in a given phylogeny; and a sublinear space algorithm (at the expense of running time) that is more suited for very large data sets. Under a stochastic model of evolution, we show that a simple process of tag-generation essentially captures all possible ways of generating tags. We use this insight to develop a faster tag discovery algorithm with a small chance of error. However, since tags are not guaranteed to exist in a given data set, we generalize the notion of a tag from a single substring to a set of substrings. We present a linear programming-based approach for finding approximate generalized tag sets. Finally, we use our tag enumeration algorithm to analyze a phylogeny containing 57 whole microbial genomes. We find tags for all nodes in the phylogeny except the root for which we find generalized tag sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号