首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
De novo origin of coding sequence remains an obscure issue in molecular evolution. One of the possible paths for addition (subtraction) of DNA segments to (from) a gene is stop codon shift. Single nucleotide substitutions can destroy the existing stop codon, leading to uninterrupted translation up to the next stop codon in the gene’s reading frame, or create a premature stop codon via a nonsense mutation. Furthermore, short indels-caused frameshifts near gene’s end may lead to premature stop codons or to translation past the existing stop codon. Here, we describe the evolution of the length of coding sequence of prokaryotic genes by change of positions of stop codons. We observed cases of addition of regions of 3′UTR to genes due to mutations at the existing stop codon, and cases of subtraction of C-terminal coding segments due to nonsense mutations upstream of the stop codon. Many of the observed stop codon shifts cannot be attributed to sequencing errors or rare deleterious variants segregating within bacterial populations. The additions of regions of 3′UTR tend to occur in those genes in which they are facilitated by nearby downstream in-frame triplets which may serve as new stop codons. Conversely, subtractions of coding sequence often give rise to in-frame stop codons located nearby. The amino acid composition of the added region is significantly biased, compared to the overall amino acid composition of the genes. Our results show that in prokaryotes, shift of stop codon is an underappreciated contributor to functional evolution of gene length.  相似文献   

2.
3.
In bacteriophage T4, there is a strong tendency for genes that encode interacting proteins to be clustered on the chromosome. There is 1.6 kb of DNA between the DNA helicase (gene 41) and the DNA primase (gene 61) genes of this virus. The DNA sequence of this region suggests that it contains five genes, designated as open reading frames (ORFs) 61.1 to 61.5, predicted to encode proteins ranging in size from 5.94 to 22.88 kDa. Are these ORFs actually genes? As one test, we compared the DNA sequence of this region in bacteriophages T2, T4, and T6 and found that ORFs 61.1, 61.3, 61.4, and 61.5 are highly conserved among the three closely related viruses. In contrast, ORF 61.2 is conserved between phages T4 and T6 yet is absent from phage T2, where it is replaced by another ORF, T2 ORF 61.2, which is not found in the T4 and T6 genomes. As a second, independent test for coding sequences, we calculated the codon base position preferences for all ORFs in this region that could encode proteins that contain at least 30 amino acids. Both the T4/T6 and T2 versions of ORF 61.2, as well as the other ORFs, have codon base position preferences that are indistinguishable from those of known T4 genes (coefficients of 0.81 to 0.94); the six other possible ORFs of at least 90 bp in this region are ruled out as genes by this test (coefficients less than zero). Thus, both evolutionary conservation and codon usage patterns lead us to conclude that ORFs 61.1 to 61.5 represent important protein-coding sequences for this family of bacteriophages. Because they are located between the genes that encode the two interacting proteins of the T4 primosome (DNA helicase plus DNA primase), one or more may function in DNA replication by modulating primosome function.  相似文献   

4.
The complete nucleotide sequence of the Clostridium thermocellum celE gene, coding for an endo-beta-1,4-glucanase (endoglucanase E; EGE) with xylan-hydrolysing activity has been determined. The structural gene consists of an open reading frame (ORF) of 2442 bp commencing with a GTG start codon and followed by a TAA stop codon. The nucleotide sequence obtained has been confirmed by comparing the predicted amino acid sequence with that derived by N-terminal amino acid sequencing of the purified protein. The EGE sequence contains a region homologous to the reiterated domain found at the C terminus of other endoglucanases from the same organism. BAL 31 deletions of the structural gene have revealed the extent to which this conserved sequence is necessary for endoglucanase and xylanase activity. A region of DNA, upstream from the structural gene has also been sequenced and a ribosome-binding site and putative promoter sequences have been identified. A second ORF which ends 349 bp 5' to the GTG start codon of the celE gene has also been identified. The encoded product contains a C terminus homologous to other C. thermocellum endoglucanases.  相似文献   

5.
6.
The nucleotide (nt) sequence of a DNA segment containing the majority of a gene cloned from Bacillus thuringiensis DSIR517 encoding a 130 kDa insecticidal crystal protein has been determined. Sequence analysis reveals an open reading frame (ORF) of 3453 nt. The ATG initiation codon, which is preceded by a potential ribosome-binding site sequence, was confirmed by N-terminal amino acid sequencing. The ORF extends beyond the 3' terminus of the cloned fragment; however, the high degree of homology between the deduced amino acid sequence of this ORF and other Cry proteins suggests the clone lacks only five C-terminal amino acids. Making this assumption, the ORF of 3468 nt encodes a protein of 1156 amino acids with an estimated molecular mass of 129700 Da. Analysis of the deduced amino acid sequence reveals a number of features characteristic of Cry proteins. Alignment of the Cry 517 protein sequence with other Cry proteins suggests it is most closely related to the cryIA-E genes but sufficiently different to form a new cryI gene subclass.  相似文献   

7.
The nucleotide sequence of cDNA clones encoding the three major BIIIB high-sulfur wool keratin proteins (BIIIB2, 3, and 4) and the structure of a BIIIB4 gene and a BIIIB3 pseudogene are reported. Although Southern blot analysis indicates that the BIIIB genes comprise a multigene family in the sheep genome, they are poorly represented in genomic DNA libraries. The family sequence homology of the coding region extends into the 5' and 3' untranslated regions and the near 5' flanking region of the BIIIB3 and 4 genes. These homologies suggest that the BIIIB3 and 4 genes represent the latest gene duplication event in the evolution of the BIIIB multigene family. Like the genes coding for other wool keratin matrix protein components, the BIIIB genes have the conserved 18-bp sequence immediately 5' to the initiation codon and also appear to lack introns.  相似文献   

8.
9.
The nucleotide sequences of the entire gene family, comprising six genes, that encodes the Rubisco small subunit (rbcS) multigene family in Mesembryanthemum crystallinum (common ice plant), were determined. Five of the genes are arranged in a tandem array spanning 20 kb, while the sixth gene is not closely linked to this array. The mature small subunit coding regions are highly conserved and encode four distinct polypeptides of equal lengths with up to five amino acid differences distinguishing individual genes. The transit peptide coding regions are more divergent in both amino acid sequence and length, encoding five distinct peptide sequences that range from 55 to 61 amino acids in length. Each of the genes has two introns located at conserved sites within the mature peptide-coding regions. The first introns are diverse in sequence and length ranging from 122 by to 1092 bp. Five of the six second introns are highly conserved in sequence and length. Two genes, rbcS-4 and rbcS-5, are identical at the nucleotide level starting from 121 by upstream of the ATG initiation codon to 9 by downstream of the stop codon including the sequences of both introns, indicating recent gene duplication and/or gene conversion. Functionally important regulatory elements identified in rbcS promoters of other species are absent from the upstream regions of all but one of the ice plant rbcS genes. Relative expression levels were determined for the rbcS genes and indicate that they are differentially expressed in leaves.  相似文献   

10.
A 6.3 kb fragment of E.coli RFL57 DNA coding for the type IV restriction-modification system Eco57I was cloned and expressed in E.coli RR1. A 5775 bp region of the cloned fragment was sequenced which contains three open reading frames (ORF). The methylase gene is 1623 bp long, corresponding to a protein of 543 amino acids (62 kDa); the endonuclease gene is 2991 bp in length (997 amino acids, 117 kDa). The two genes are transcribed convergently from different strands with their 3'-ends separated by 69 bp. The third short open reading frame (186 bp, 62 amino acids) has been identified, that precedes and overlaps by 7 nucleotides the ORF encoding the methylase. Comparison of the deduced Eco57I endonuclease and methylase amino acid sequences revealed three regions of significant similarity. Two of them resemble the conserved sequence motifs characteristic of the DNA[adenine-N6] methylases. The third one shares similarity with corresponding regions of the PaeR7I, TaqI, CviBIII, PstI, BamHI and HincII methylases. Homologs of this sequence are also found within the sequences of the PaeR7I, PstI and BamHI restriction endonucleases. This is the first example of a family of cognate restriction endonucleases and methylases sharing homologous regions. Analysis of the structural relationship suggests that the type IV enzymes represent an intermediate in the evolutionary pathway between the type III and type II enzymes.  相似文献   

11.
12.
The nucleotide sequences of the entire gene family, comprising six genes, that encodes the Rubisco small subunit (rbcS) multigene family in Mesembryanthemum crystallinum (common ice plant), were determined. Five of the genes are arranged in a tandem array spanning 20 kb, while the sixth gene is not closely linked to this array. The mature small subunit coding regions are highly conserved and encode four distinct polypeptides of equal lengths with up to five amino acid differences distinguishing individual genes. The transit peptide coding regions are more divergent in both amino acid sequence and length, encoding five distinct peptide sequences that range from 55 to 61 amino acids in length. Each of the genes has two introns located at conserved sites within the mature peptide-coding regions. The first introns are diverse in sequence and length ranging from 122 by to 1092 bp. Five of the six second introns are highly conserved in sequence and length. Two genes, rbcS-4 and rbcS-5, are identical at the nucleotide level starting from 121 by upstream of the ATG initiation codon to 9 by downstream of the stop codon including the sequences of both introns, indicating recent gene duplication and/or gene conversion. Functionally important regulatory elements identified in rbcS promoters of other species are absent from the upstream regions of all but one of the ice plant rbcS genes. Relative expression levels were determined for the rbcS genes and indicate that they are differentially expressed in leaves.  相似文献   

13.
甘蔗乙烯合成酶基因家族三个成员的克隆与序列分析   总被引:4,自引:0,他引:4  
ACC(1-aminocyclopropane-1-carboxylic acid)合成酶是高等植物乙烯生物合成途径中的限速酶.根据已克隆的植物ACS(1-aminocyclopropane-1-carboxylic acid synthase)基因同源序列,设计简并引物,以甘蔗叶片总DNA为模板,通过PCR扩增,得到3条特异性强的扩增片段:Sc-ACS1为1 041 bp、Sc-ACS2为1 345 bp和Sc-ACS3为1 707 bp.将序列在GenBank核酸数据库进行同源性搜索,结果表明,3个片段均为ACS基因,推导编码的蛋白质序列分别包含326、242和310个氨基酸.其中,Sc-A CS1和Sc-ACS3同源性最高,核苷酸序列和蛋白质氨基酸序列分别有98%和96%同源,与禾本科植物玉米Zm ACS6、水稻OS-ACS2、毛竹等ACS基因家族也有很高的同源性,核苷酸序列同源性为88%-98%,蛋白质氨基酸序列同源性为73%-81%.甘蔗Sc-ACS2与水稻OS-ACS5在核苷酸和氨基酸序列上分别有91%和79%同源性,但与甘蔗Sc-ACS1和Sc-ACS3基因成员之间,氨基酸同源性分别只有45%和49%.系统进化分析表明,Sc-ACS1和Sc-ACS3基因与玉米Zm ACS6基因亲缘关系最近,而Sc-ACS2基因与水稻OS-ACS5基因亲缘关系最近.Southern杂交表明三基因在基因组中确实存在而且是多拷贝基因.三个片段已在GenBank数据库中注册,注册号分别为AY620985、AY620986和AY788919.  相似文献   

14.
15.
16.
17.
The coding region of the alpha-amylase inhibitor (HaimII) gene from the producing strain Streptomyces griseosporeus YM-25 was localized on an 800-base-pair DNA segment. The nucleotide sequence of a 1,191-base-pair region including the HaimII gene was determined by the dideoxy-chain termination method. The nucleotide sequence data predicted an open reading frame of 363 base pairs starting with an ATG initiation codon and ending with a TGA translational stop codon. The amino acid sequence deduced from the nucleotide sequence indicated that the presumptive pre-HaimII protein extends 37 amino acids to the amino terminus and 6 amino acids to the carboxyl terminus of the mature HaimII protein. The pre-HaimII protein is believed to be processed both during and after secretion. Two forms of the inhibitor, which have a higher molecular weight than that of the HaimII protein isolated from S. griseosporeus, were partially purified from the culture filtrate of Streptomyces lividans containing the cloned HaimII gene.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号