首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
We have determined the nucleotide sequence of the uvrA gene of Escherichia coli. The coding region of the gene is 2820 base pairs which specifies a protein of 940 amino acids and Mr = 103,874. The polypeptide sequence predicted from the DNA sequence was confirmed by analyzing the UvrA protein: the sequence of the first 7 NH2-terminal amino acids as well as the amino acid composition of the pure protein agreed with those predicted from the nucleotide sequence. By comparing the sequence of UvrA protein to the amino acid sequences of other ATPases, we found that two regions in the UvrA protein, separated from one another by about 600 amino acids, have the highly conserved G-X4-GKT(S)-X6-I(V) sequence found at the active sites of many, but not all, ATPases. Our findings suggest that UvrA protein may have two ATP binding sites.  相似文献   

3.
4.
5.
Bacteriophage T4 gene 44 protein is a DNA polymerase accessory protein which is required for T4 DNA replication. We have isolated the gene for 44 protein from a previously constructed lambda-T4 hybrid phage (Wilson, G. G., Tanyashin, V. I., and Murray, N. E. (1977) Mol. Gen. Genet. 156, 203-214). We report here the nucleotide sequence of gene 44 and about 60 nucleotides 5' upstream from its coding region, which is immediately adjacent to gene 45. We have also purified 44 protein from T4-infected cells and submitted it to extensive protein chemistry characterization. Thus, considerable portions of the protein sequence predicted from the DNA sequence were confirmed by direct protein sequencing of peptides or by matching amino acid compositions of purified peptides. A total of 84% of the predicted amino acids was confirmed by the protein data. These studies indicate that gene 44 codes for a polypeptide containing 319 amino acids, with a calculated Mr = 35,371. The coding region of gene 44 is preceded by a potential regulatory region containing sequences homologous to the Escherichia coli (-10) RNA polymerase binding region and to a conserved sequence at -25 to -30 found in other T4 middle genes. In addition, there are sequence similarities in the translation initiation regions of genes 44, 45, and rIIB, all of which are subject to regulation by regA protein.  相似文献   

6.
Genomic DNA containing the protein coding region for Drosophila cAMP-dependent protein kinase catalytic subunit has been cloned and sequenced. The probe used to detect and isolate the gene fragment was constructed from two partially complementary synthetic oligonucleotides and contains 60 base pairs that encode (using Drosophila codon preferences) amino acids 195-214 of the beef heart catalytic subunit. In reduced stringency hybridization conditions, the probe recognizes two target sites in fly genomic DNA with 85% homology. One of these sites is in the cAMP-dependent protein kinase catalytic subunit gene, which was isolated as a 3959-base pair HindIII fragment. This fragment contains all of the protein coding portion, 900 base pairs upstream of the initiator ATG, and 2000 base pairs downstream of the termination codon (TAG). The coding portion of the gene contains no introns and yields a protein of 352 amino acids. There is a 2-amino acid insertion near the N terminus of the fly protein relative to the beef and mouse enzymes. Of the remaining 350 amino acids, 273 are invariant in the three species. A probe derived from the coding sequence of the HindIII clone hybridizes strongly to a 5100-base poly(A)+ RNA and weakly to 4100- and 3400-base poly(A)+ RNAs expressed in adult flies. A 2100-base pair EcoRI genomic fragment containing the second site recognized by the 60-base pair probe has also been cloned. DNA sequence analysis demonstrates that this fragment is part of the cGMP-dependent protein kinase gene or a close homolog. The catalytic subunit gene and the cGMP-dependent protein kinase gene have been located in regions 30C and 21D, respectively, of chromosome 2.  相似文献   

7.
8.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

9.
We have analyzed micrococcal nuclease (MNase) DNA cleavage patterns at the sequence level by examining 2.3 X 10(3) base-pairs of data derived from the Drosophila melanogaster 44D larval cuticle locus. Within this region, MNase preferentially cleaved 140 sites. Clusters of these sites appear to generate the preferential MNase eukaryotic DNA cleavage sites seen on agarose gels at roughly 100 to 300 base-pair intervals. These clusters of preferential cleavage sites rarely occur within gene coding regions. The analysis revealed that duplex DNA sequences preferentially cleaved by MNase are generally determined by a single strand sequence: d(A-T)n, where n greater than or equal to 1, flanked by a 5' dC or dG. Cleavage of the other strand is generally staggered 5' by several nucleotides and occurs even if such sequences are absent on that strand. An empirical predictive DNA cleavage model derived from a statistical analysis of the sequence level data was applied to seven eukaryotic gene loci of known sequence. The predicted patterns were in good general agreement with the previously observed eukaryotic gene/spacer cleavage pattern. Statistical analysis also revealed that sites of predicted preferential DNA cleavage occur less frequently in protein coding regions than for randomized sequences of the same length and nucleotide content. Comparison of the MNase cleavage patterns to the sequence-dependent pattern of binding energies between duplex DNA strands indicates that MNase preferentially cleaves sequences with low helix stability.  相似文献   

10.
The gene that codes for the surface antigen of Plasmodium knowlesi sporozoites (CS protein) is unsplit and present in the genome in only one copy. The CS protein, as deduced from DNA sequence analysis of the structural gene, has an unusual structure with the central 40% of the polypeptide chain present as 12 tandemly repeated amino acid peptide units flanked by regions of highly charged amino acids. The protein has an amino-terminal hydrophobic amino acid signal sequence and a hydrophobic carboxy-terminal anchor sequence. The coding sequence of the gene has an AT content of 53%, compared with 70% AT in the 5′ and 3′ flanking sequences, and is contained entirely within an 11 kb Eco RI genomic DNA fragment. This genomic fragment expresses the CS protein in E. coli, indicating that the parasite promoter and ribosome binding site signals can be recognized in E. coli.  相似文献   

11.

Background  

Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences.  相似文献   

12.
The nucleotide sequence of the P gene of human parainfluenza virus type 1 (PIV1) was determined from cloned cDNA copies of the mRNA. By analogy with the gene organization of Sendai virus, two open reading frames in the mRNA sense of the gene were identified as coding sequences for the P protein (568 amino acids with an estimated molecular weight of 64,655) and the C protein (204 amino acids with an estimated molecular weight of 24,108). Comparison of the deduced amino acid sequences of the P and C proteins of PIV1 with those of Sendai virus showed a high degree of homology. However, a sequence for the cysteine-rich V protein, which was considered a common feature of other paramyxoviruses, was interrupted by the presence of multiple stop codons. The sequence analysis of three P-gene-specific cDNA clones generated from genomic RNA by polymerase chain reaction and one additional clone generated from mRNA confirmed that the coding sequence for the cysteine-rich region is silent in the PIV1 gene and thus is not translated into protein. Two potential editing sites with the consensus sequence 3'UUYUCCC were found in the PIV1 P gene at positions 564 to 570 and 1430 to 1436. However, examination of the PIV1 mRNA population by a primer extension method indicated that neither of these sites is utilized. These results indicate that the PIV1 P gene has a coding strategy different from those of other paramyxovirus P genes.  相似文献   

13.
家兔BMP7基因的克隆及其生物信息学分析   总被引:1,自引:0,他引:1  
李明  赵巧辉  陈其新  刘孟洲  石晓卫 《遗传》2008,30(7):885-892
在对已知部分编码序列(CDS)进行分析的基础上, 采用RT-PCR分步扩增以及RACE方法, 对家兔BMP7基因3′和5′末端未知序列进行了克隆与生物信息学分析。测序结果综合分析表明, 所获序列共计1 654 bp, 包括家兔BMP7近全长前肽、全长成熟肽CDS及3′非翻译序列(3′UTR), 将已有的序列向5′和3′端分别延伸了395 bp和628 bp。序列对比表明, 克隆的家兔BMP7 CDS部分与人、小鼠的对应序列的同源性分别为91.89%和89.32%, 预测的氨基酸序列同源性分别为96.51%和96.01%。家兔BMP7 3′UTR长446 bp, 与人、小鼠对应序列同源性分别为57.38%和45.57%; 具有2个转录终止信号位点。推测家兔BMP7成熟蛋白有BMPs特有的7个位置固定的半胱氨酸残基和TGF-β家族指纹。家兔BMP7 3′UTR区转录终止信号的可选择性可能与基因转录后调控有关。  相似文献   

14.
Genomic DNA sequence for human C-reactive protein   总被引:12,自引:0,他引:12  
The gene for the prototype acute phase reactant, C-reactive protein, has been isolated from two lambda phage libraries containing inserted human DNA fragments using synthetic oligonucleotide probes. Nucleotide sequence analysis indicates that after coding for a signal peptide of 18 amino acids and the first two amino acids of the mature protein, there is an intron of 278 base pairs followed by the nucleotide sequence for the remaining 204 amino acids. The intron is unusual in that it contains on the positive strand a poly(A) stretch 16 nucleotides long and a poly(GT) region 30 nucleotides long which could adopt the Z-form of DNA. The nucleotide sequence reported here confirms the amino acid sequence of mature C-reactive protein as originally reported except that it codes for an additional 19 amino acids beginning at position 62. Thus DNA sequence analysis predicts that the mature protein consists of 206 amino acids rather than 187 as originally reported. The mRNA cap site is located 104 nucleotides from the start of the signal peptide and there is a 3' noncoding region 1.2 kilobase pairs in length. The gene has a typical promoter containing the sequences TATAAAT and CAAT 29 and 81 base pairs upstream, respectively, of the cap site.  相似文献   

15.
We study to what degree patterns of amino acid substitution vary between genes using two models of protein-coding gene evolution. The first divides the amino acids into groups, with one substitution rate for pairs of residues in the same group and a second for those in differing groups. Unlike previous applications of this model, the groups themselves are estimated from data by simulated annealing. The second model makes substitution rates a function of the physical and chemical similarity between two residues. Because we model the evolution of coding DNA sequences as opposed to protein sequences, artifacts arising from the differing numbers of nucleotide substitutions required to bring about various amino acid substitutions are avoided. Using 10 alignments of related sequences (five of orthologous genes and five gene families), we do find differences in substitution patterns. We also find that, although patterns of amino acid substitution vary temporally within the history of a gene, variation is not greater in paralogous than in orthologous genes. Improved understanding of such gene-specific variation in substitution patterns may have implications for applications such as sequence alignment and phylogenetic inference.  相似文献   

16.
Biological functions of proteins and their active 3D structures are determined by the linear sequences of amino acids. The resonant recognition model (RRM) is a physico-mathematical model developed for structure/function analysis of protein and DNA sequences. Here, we are comparing results of the RRM analysis [1,2] of protease proteins using the electron-ion interaction potential (EIIP) and ionisation constant (IC) of amino acids. The results obtained reveal that the IC parameter can be successfully used to determine the characteristic patterns of different functional protease subgroups.  相似文献   

17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号