共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Horizontal gene transfer among microbial genomes: new insights from complete genome analysis 总被引:30,自引:0,他引:30
Eisen JA 《Current opinion in genetics & development》2000,10(6):606-611
The determination and analysis of complete genome sequences has led to the suggestion that horizontal gene transfer may be much more extensive than previously appreciated. Many of these studies, however, rely on evidence that could be generated by forces other than gene transfer including selection, variable evolutionary rates, and biased sampling. 相似文献
3.
【背景】LM1212菌株是昆虫病原菌苏云金芽胞杆菌(Bacillus thuringiensis,Bt)中的一员,其芽胞和晶体分别产生于芽胞形成细胞和晶体产生细胞中,具有独特的细胞分化表型。与野生株LM1212相比,突变株LM1212-DB芽胞细胞比例明显降低并产生更高比例的晶体产生细胞,这使得LM1212-DB菌株成为研究晶体产生细胞形成机制和提高菌株杀虫活性的绝佳实验材料。【目的】比较LM1212菌株和LM1212-DB菌株的基因组差异,以便于揭示导致这两个菌株表型差异的原因。【方法】利用单分子测序技术(single molecular real-time,SMRT)和Pacbio RS II测序平台对两个菌株进行全基因组测序,对染色体和质粒、双组分信号系统和插入序列等进行差异分析,并构建表型特性相关基因的系统发育树。【结果】基因组分析发现,LM1212和LM1212-DB菌株均含有丰富的插入序列和双组分信号系统,暗示两个菌株极易发生基因重排且具有较强的环境适应性。与LM1212菌株相比,突变株LM1212-DB中发生了染色体和质粒片段缺失、质粒重排、质粒拷贝数变异。进一步分析缺失基因的功能发现,一些环境胁迫响应基因(如sigB)和芽胞形成相关基因(如abrB)等缺失;通过分析质粒拷贝数变异发现,具有增加晶体细胞比例功能的转录因子CpcR所在质粒的拷贝数增加1个,同时对CpcR的进化分析发现,与其亲缘关系最近的基因的从属菌株也产生与LM1212菌株相似的细胞分化表型。这些重要功能基因的缺失和拷贝数变异可能是导致两个菌株表型差异的原因。此外,突变株LM1212-DB缺失I型限制-修饰系统,这使得突变株LM1212-DB与野生菌株LM1212相比具有更好的外源DNA兼容性。【结论】突变株LM1212-DB染色体和质粒的结构变异可能是导致与野生株LM1212表型差异的潜在原因,这将为研究LM1212菌株的晶体细胞分化机制提供指导方向。 相似文献
4.
E. N. Trifonov 《Journal of molecular evolution》1995,40(3):337-342
A theory of an early stage of genome evolution by combinatorial fusion of circular DNA units is suggested, based on protein sequence fossil evidence. The evidence includes preference of protein sequence lengths for certain sizes—multiples of 123 as for eukaryotes and multiples of 152 as for prokaryotes. At the DNA level these sizes correspond to 350–450 base pairs—the known optimal range for DNA ring closure. The methionine residues repeatedly appear along the sequences with the same period of about 120 as (in eukaryotes), presumably marking the sites of insertion of the early genes—rings of protein-coding DNA. No torsional constraint in this DNA results in very sharp estimate of the helical periodicity of the early DNA, indistinguishable from the experimental mean value for extant DNA. According to the combinatorial fusion theory, based on the above evidence, in the pregenomic, prerecombinational stage the genes and the noncoding sequences existed in form of autonomously replicating DNA rings of close to standard size, randomly segregating between dividing cells, like modern plasmids do. In the recombinational early genomic stage the rings started to fuse, forming larger DNA molecules consisting of several unit genes connected in various combinations and forming long protein-coding sequences (combinatorial fusion). This process, which involved, perhaps, noncoding sequences as well, eventually resulted in the formation of large genomes. The dispersed circular DNA—or, rather, evolutionarily advanced derivatives thereof—may still exist in the form of various mobile DNA elements. 相似文献
5.
Plants contain large mitochondrial genomes, which are several times as complex as those in animals, fungi or algae. However, genome size is not correlated with information content. The mitochondrial genome (mtDNA) of Arabidopsis specifies only 58 genes in 367 kb, whereas the 184 kb mtDNA in the liverwort Marchantia polymorpha codes for 66 genes, and the 58 kb genome in the green alga Prototheca wickerhamii encodes 63 genes. In Arabidopsis’ mtDNA, genes for subunits of complex II, for several ribosomal proteins and for 16 tRNAs are missing, some of which have been transferred recently to the nuclear genome. Numerous integrated fragments originate from alien genomes, including 16 sequence stretches of plastid origin, 41 fragments of nuclear (retro)transposons and two fragments of fungal viruses. These immigrant sequences suggest that the large size of plant mitochondrial genomes is caused by secondary expansion as a result of integration and propagation, and is thus a derived trait established during the evolution of land plants. 相似文献
6.
基因组序列k-mer的非随机使用规律及包含的生物学意义一直是人们关注的问题,目前还没有根本性进展。本文以七个物种的全部基因序列为样本,得到各物种基因组序列的8-mer频谱分布。发现狗和牛的频谱有三个峰,而斑马鱼、青鳉鱼、秀丽线虫和酿酒酵母的频谱只有一个峰,鸡的频谱分布形状介于两者之间。将8-mer集合按照XY二核苷含量分类,结果显示只有CG二核苷分类下0CG、1CG和2CG三类子集的频谱形成各自独立的单峰分布。对照随机序列,发现0CG模体是随机进化的,1CG和2CG模体是定向进化的,它们的使用频次远小于随机频次,且这种独立进化分离规律具有物种普适性。三个CG子集频谱之间的距离是产生单峰或多峰现象的根本原因。将七个物种基因组序列标准化到109bp,比较发现1CG和2CG子集频谱与物种进化显著相关,0CG子集频谱与物种进化无显著关系。可以认为三种CG模体各自执行着不同的生物学功能。基因组序列8-mer的独立分离规律为揭示基因组结构、基因组进化以及模体的生物功能提供了一种新的思维方式。 相似文献
7.
Dominique Mouchiroud Gwennaele Fichant Giorgio Bernardi 《Journal of molecular evolution》1987,26(3):198-204
Summary The compositional distribution of coding sequences from five vertebrates (Xenopus, chicken, mouse, rat, and human) is shifted toward higher GC values compared to that of the DNA molecules (in the 35–85-kb size range) isolated from the corresponding genomes. This shift is due to the lower GC levels of intergenic sequences compared to coding sequences. In the cold-blooded vertebrate, the two distributions are similar in that GC-poor genes and GC-poor DNA molecules are largely predominant. In contrast, in the warm-blooded vertebrates, GC-rich genes are largely predominant over GC-poor genes, whereas GC-poor DNA molecules are largely predominant over GC-rich DNA molecules. As a consequence, the genomes of warm-blooded vertebrates show a compositional gradient of gene concentration. The compositional distributions of coding sequences (as well as of DNA molecules) showed remarkable differences between chicken and mammals, and between mouse (or rat) and human. Differences were also detected in the compositional distribution of housekeeping and tissue-specific genes, the former being more abundant among GC-rich genes. 相似文献
8.
An improved quantitative model describing a protective function of eukaryotic genomic noncoding sequences was developed. In this new model, two factors affecting gene protection from chemical mutagensare considered: (1) the ratio of the total lengths of coding and noncoding genomic sequences and (2) the volume of the cell nucleus. An increase in the noncoding DNA in the genome reduces the number of mutagen-damaged nucleotides in the coding region, whereas an increase in the volume of the nucleus decreases the flow of mutagens per unit of nuclear volume that attacks its surface. 相似文献
9.
The complete sequence of the carp mitochondrial genome of 16,575 base pairs has been determined. The carp mitochondrial genome encodes the same set of genes (13 proteins, 2 rRNAs, and 22 tRNAs) as do other vertebrate mitochondrial DNAs. Comparison of this teleostean mitochondrial genome with those of other vertebrates reveals a similar gene order and compact genomic organization. The codon usage of proteins of carp mitochondrial genome is similar to that of other vertebrates. The phylogenetic relationship for mitochondrial protein genes is more apparent than that for the mitochondrial tRNA and rRNA genes.Correspondence to: F. Huang 相似文献
10.
11.
Ashby MK 《FEMS microbiology letters》2004,233(2):277-281
The numbers of potential response regulator genes were determined from the complete and annotated genome sequences of Archaea and Bacteria. The numbers of each class of response regulators are shown for each organism, determined principally from BLASTP searches, but with reference to the gene category lists where available. The survey shows that for Bacteria there is a link between the total number of potential response regulator genes and both the genome complexity (number of potential protein-coding genes) and the organism's lifestyle/habitat. Increasingly complex lifestyles and genome complexities are matched by an increase in the average number of potential response regulator genes per genome, indicating that a higher degree of complexity requires a higher level of control of gene expression and cellular activity. Detailed results of this study are available online at and. 相似文献
12.
Minoru Kanehisa 《Quantitative Biology.》2013,1(3):192
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes. 相似文献
13.
Sabot F Guyot R Wicker T Chantret N Laubin B Chalhoub B Leroy P Sourdille P Bernard M 《Molecular genetics and genomics : MGG》2005,274(2):119-130
Triticeae species (including wheat, barley and rye) have huge and complex genomes due to polyploidization and a high content
of transposable elements (TEs). TEs are known to play a major role in the structure and evolutionary dynamics of Triticeae
genomes. During the last 5 years, substantial stretches of contiguous genomic sequence from various species of Triticeae have
been generated, making it necessary to update and standardize TE annotations and nomenclature. In this study we propose standard
procedures for these tasks, based on structure, nucleic acid and protein sequence homologies. We report statistical analyses
of TE composition and distribution in large blocks of genomic sequences from wheat and barley. Altogether, 3.8 Mb of wheat
sequence available in the databases was analyzed or re-analyzed, and compared with 1.3 Mb of re-annotated genomic sequences
from barley. The wheat sequences were relatively gene-rich (one gene per 23.9 kb), although wheat gene-derived sequences represented
only 7.8% (159 elements) of the total, while the remainder mainly comprised coding sequences found in TEs (54.7%, 751 elements).
Class I elements [mainly long terminal repeat (LTR) retrotransposons] accounted for the major proportion of TEs, in terms
of sequence length as well as element number (83.6% and 498, respectively). In addition, we show that the gene-rich sequences
of wheat genome A seem to have a higher TE content than those of genomes B and D, or of barley gene-rich sequences. Moreover,
among the various TE groups, MITEs were most often associated with genes: 43.1% of MITEs fell into this category. Finally, the TRIM and copia elements were shown to be the most active TEs in the wheat genome. The implications of these results for the evolution of
diploid and polyploid wheat species are discussed.
Electronic Supplementary Material Supplementary material is available for this article at 相似文献
14.
We used complete sequence data from 30 complete Herpesviridae genomes to investigate phylogenetic relationships and patterns of genome evolution. The approach was to identify orthologous gene clusters among taxa and to generate a genomic matrix of gene content. We identified 17 genes with homologs in all 30 taxa and concatenated a subset of 10 of these genes for phylogenetic inference. We also constructed phylogenetic trees on the basis of gene content data. The amino acid and gene content phylogenies were largely concordant, but the amino acid data had much higher internal support. We mapped gene gain events onto the phylogenetic tree by assuming that genes were gained only once during the evolution of herpesviruses. Thirty genes were inferred to be present in the ancestor of all herpesvirus, a number smaller than previously hypothesized. Few genes of recent origin within herpesviruses could be identified as originating from transfer between virus and vertebrate hosts. Inferred rates of gene gain were heterogeneous, with both taxonomic and temporal biases. Nonetheless, the average rate of gene gain was approximately 3.5 x 10(-7) genes gained per year, which is an order of magnitude higher than the nucleotide mutation rate for these large DNA viruses. 相似文献
15.
Summary We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide
sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with
little sequence identity using the run test statistic (r
o) of Mood (1940,Ann. Math. Stat.
11, 367–392). The probability density ofr
o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency
of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run
test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and
all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity
(4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen
randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the
random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However,
we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two
important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas
those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred
by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling
the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window,
autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural
feature of random sequences. 相似文献
16.
Accurate estimation of any phylogeny is important as a framework for evolutionary analysis of form and function at all levels of organization from sequence to whole organism. Using alignments of nonrepetitive components of opossum, human, mouse, rat, and dog genomes we evaluated two alternative tree topologies for eutherian evolution. We show with very high confidence that there is a basal split between rodents (as represented by the mouse and rat) and a branch joining primates (as represented by humans) and carnivores (as represented by dogs), consistent with some but not the most widely accepted mammalian phylogenies. The result was robust to substitution model choice with equivalent inference returned from a spectrum of models ranging from a general time reversible model, a model that treated nucleotides as either purines and pyrimidines, and variants of these that incorporated rate heterogeneity among sites. By determining this particular branching order we are able to show that the rate of molecular evolution is almost identical in rodent and carnivore lineages and that sequences evolve approximately 11%-14% faster in these lineages than in the primate lineage. In addition by applying the chicken as outgroup the analyses suggested that the rate of evolution in all eutherian lineages is approximately 30% slower than in the opossum lineage. This pattern of relative rates is inconsistent with the hypothesis that generation time is an important determinant of substitution rates and, by implication, mutation rates. Possible factors causing rate differences between the lineages include differences in DNA repair and replication enzymology, and shifts in nucleotide pools. Our analysis demonstrates the importance of using multiple sequences from across the genome to estimate phylogeny and relative evolutionary rate in order to reduce the influence of distorting local effects evident even in relatively long sequences. 相似文献
17.
Chromosomal distribution of interstitial telomeric sequences in nine neotropical primates (Platyrrhini): possible implications in evolution and phylogeny 下载免费PDF全文
Francesca Dumas Helenia Cuttaia Luca Sineo 《Journal of Zoological Systematics and Evolutionary Research》2016,54(3):226-236
To localize interstitial telomeric sequences (ITSs) and to test whether their pattern of distribution could be linked to chromosomal evolution, we hybridized telomeric sequence probes (peptide nucleic acid, PNA) on metaphases of New World monkeys: Callithrix argentata, Callithrix jacchus, Cebuella pygmaea, Saguinus oedipus, Saimiri sciureus, Aotus lemurinus griseimembra, Aotus nancymaae (Cebidae), Lagothrix lagotricha (Atelidae) and Callicebus moloch (Pithecidae), characterized by a rapid radiation and a high rate of chromosomal rearrangements. Our analysis of the probe signal localization allowed us to show in all the species analysed, as normally, the telomeric location at the terminal ends of chromosomes and unexpected signal distributions in some species. Indeed, in three species among the nine studied, Aotus lemurinus griseimembra, Aotus nancymaae (Cebidae) and Lagothrix lagotricha (Atelidae), we showed a high variability in terms of localization and degree of amplification of interstitial telomeric sequences, especially for the ones found at centromeric or pericentromeric positions (het‐ITS). A comparative analysis, between species, of homologous chromosomes to human syntenies, on which we have found positive interspersed PNA signals, allowed us to explain the observed pattern of ITS distribution as results of chromosomal rearrangements in the neotropical primates analysed. This evidence permitted us to discuss the possible implication of ITSs as phylogenetic markers for closely related species. Moreover, reviewing previous literature data of ITSs distribution in Primates and in the light of our results, we suggest an underestimation of ITSs and highlight the importance of the molecular cytogenetics approach in characterizing ITSs, which role is still not clarified. 相似文献
18.
Ogawa S Yoshino R Angata K Iwamoto M Pi M Kuroe K Matsuo K Morio T Urushihara H Yanagisawa K Tanaka Y 《Molecular & general genetics : MGG》2000,263(3):514-519
We present an overview of the gene content and organization of the mitochondrial genome of Dictyostelium discoideum. The mitochondria genome consists of 55,564 bp with an A + T content of 72.6%. The identified genes include those for two
ribosomal RNAs (rnl and rns), 18 tRNAs, ten subunits of the NADH dehydrogenase complex (nad1, 2, 3, 4, 4L, 5, 6, 7, 9 and 11), apocytochrome b (cytb), three subunits of the cytochrome oxidase (cox1/2 and 3), four subunits of the ATP synthase complex (atp1, 6, 8 and 9), 15 ribosomal proteins, and five other ORFs, excluding intronic ORFs. Notable features of D. discoideum mtDNA include the following. (1) All genes are encoded on the same strand of the DNA and a universal genetic code is used.
(2) The cox1 gene has no termination codon and is fused to the downstream cox2 gene. The 13 genes for ribosomal proteins and four ORF genes form a cluster 15.4 kb long with several gene overlaps. (3)
The number of tRNAs encoded in the genome is not sufficient to support the synthesis of mitochondrial protein. (4) In total,
five group I introns reside in rnl and cox1/2, and three of those in cox1/2 contain four free-standing ORFs. We compare the genome to other sequenced mitochondrial genomes, particularly that of Acanthamoeba castellanii.
Received: 5 July 1999 / Accepted: 17 January 2000 相似文献
19.
菜粉蝶线粒体基因组的全序列测定和分析 总被引:2,自引:0,他引:2
目前关于蝶类线粒体基因组全序列及其分子进化的研究还不多见。本研究通过长PCR和引物步移法对菜粉蝶Pieris rapae Linnaeus线粒体基因组全序列进行了测定和初步分析。结果表明:菜粉蝶线粒体基因组全长15 157 bp, 包含13个蛋白编码基因、22个tRNA和2个rRNA基因以及1个非编码的控制区域, 它们的长度分别是11 196 bp, 1 474 bp, 2 093 bp和393 bp。37个基因的位置与已报道的其他蝶类基本一致, 共有10对基因间存在总共59 bp的重叠, 重叠碱基数在1~35 bp之间; 基因间隔序列共计13处120 bp, 间隔长度1~46 bp不等, 最大的基因间隔46 bp, 位于tRNAIle和tRNAGln基因之间。另外, 基于13个蛋白质编码基因的氨基酸序列, 重建了基于蛋白质编码基因序列数据的11种代表性蝶类的NJ和MP系统树。结果表明:凤蝶类(包括凤蝶和绢蝶)为一大支系, 粉蝶类、 灰蝶类与蛱蝶类(包括蛱蝶、 珍蝶)构成另一大支系。结果不支持粉蝶科与凤蝶科(包括凤蝶类和绢蝶类)构成单系群, 却显示粉蝶科、 灰蝶科和蛱蝶科的组合为单系群。 相似文献
20.
鹅圆环病毒浙江永康株全基因组的克隆及序列分析 总被引:9,自引:0,他引:9
为研究水禽流感大规模爆发的机理,进行了水禽流感病例中并发病原,特别是免疫抑制性病原的检测研究。根据已发表的鹅圆环病毒(Goosecircovirus,GoCV)序列,设计了一对检测引物,对浙江永康禽流感病死鹅样品进行PCR扩增,获得与预期552bp大小相符的DNA片段,经测序确认为GoCV特异序列,推测样品中存在GoCV。根据测定的序列进一步设计反向扩增引物,经扩增、测序、拼接后获得GoCV全长基因组序列。基因组序列分析表明,浙江永康株GoCV_yk01全长1821bp,具有圆环病毒共同的与病毒复制相关的茎环结构和Rep蛋白保守基序等特征,它与德国、中国台湾发表的序列在全基因组水平有91%~93%的同源性,在Rep和外壳蛋白的氨基酸水平有94%~97%的同源性。应用ClustalW方法作进化树分析显示,GoCV_yk01序列与德国株及中国台湾株均不在同一分支。圆环病毒可以感染淋巴细胞等增殖快的细胞,引起免疫抑制,从而造成其他病原的并发和继发感染,怀疑GoCV可能在2004年初永康爆发的鹅流感中起到了一定的协同作用。该GoCV_yk01是中国内地首次检测确认并测定全基因组序列的鹅圆环病毒。 相似文献