首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Structure and evolution of the bovine prothrombin gene   总被引:6,自引:0,他引:6  
The cloned bovine prothrombin gene has been characterized by partial DNA sequence analysis, including the 5' and 3' flanking sequences and all the intron-exon junctions. The gene is approximately 15.4 x 10(3) base-pairs in length and comprises 14 exons interrupted by 13 introns. The exons coding for the prepro-leader peptide and the gamma-carboxyglutamic acid-containing region are similar in organization to the corresponding exons in the factor IX and protein C genes. This region has probably evolved as a result of recent gene duplication and exon shuffling events. The exons coding for the kringles and the serine protease region of the prothrombin gene are different in organization from the homologous regions in other genes, suggesting that introns have been inserted into these regions after the initial gene duplication events.  相似文献   

2.
The complete human dihydrofolate reductase (DHFR) gene has been cloned from four recombinant lambda libraries constructed with the DNA from a methotrexate-resistant human cell line with amplified DHFR genes. The detailed organization of the gene has been determined by restriction mapping of the cloned fragments and DNA sequencing of all the protein coding regions and adjacent intron segments, and shown to correspond to that of the native human DHFR gene. The gene spans a length of approximately 29 X 10(3) bases from the ATG initiator codon to the end of the 3' untranslated region, and contains five introns that interrupt the protein coding sequence. The number and positions of introns are identical to those found in the mouse gene. By contrast, the size of the homologous introns (with the exception of the first one) varies greatly, up to several fold, in the genes from man, mouse and Chinese hamster; the intron sequences also exhibit a great divergence, except in the junction regions. A striking sequence homology, extending over several hundred nucleotides, exists between the human and mouse gene 5' non-coding regions. These regions are characterized by an unusually high G + C content, 72% and 66% in the human and mouse genes, respectively, which is maintained in the first coding segment and first intron, and is in sharp contrast to the relatively low G + C content (approximately 40%) of the remainder of the gene.  相似文献   

3.
S S Sommer 《FASEB journal》1992,6(10):2767-2774
Germline mutations cause or predispose to most disease. Hemophilia B is a useful model for studying the underlying pattern of recent germline mutations in humans because the observed pattern of mutation in factor IX more closely reflects the underlying pattern of mutation than the observed pattern for many other genes. In addition, it is possible to identify and correct for biases inherent in ascertaining only those mutations that cause hemophilia. Aspects of the pattern of germline mutation in the factor IX gene are becoming clear: 1) in the United States, two-thirds of mutations causing mild disease arose from three founders whereas almost all the mutations resulting in either moderate or severe disease arose independently, generally within the past 150 years; 2) direct estimates of the rates of mutation in humans indicate that transitions are more frequent than transversions, which in turn are more frequent than deletions and insertions; 3) transitions at CpG are elevated approximately 24-fold relative to transitions at non-CpG dinucleotides; 4) transversions at CpG are elevated approximately eightfold relative to transversions at non-CpG dinucleotides; 5) the sum total of the dinucleotide mutation rates produces a bias against G and C bases that would be sufficient to maintain the G+C content of the factor IX gene at its evolutionarily conserved level of 40%; and 6) the pattern of mutation is similar for Caucasians residing in the United States and for Asians residing in Asia. Two ideas emerge from this and from an analysis of the pattern of recent deleterious mutations compared with ancient neutral mutations that have been fixed during evolution into the factor IX gene. First, the bulk of germline mutations are likely to arise from endogenous processes rather than environmental mutagens. Second, the factor IX protein is composed mostly of two classes of amino acids: critical residues in which all single-base missense changes will disrupt protein function, and "spacer" residues in which the precise nature of the residue is unimportant but the peptide bond is necessary to keep the critical residues in register. More work is necessary to assess the veracity and generality of these ideas.  相似文献   

4.
5.
Why does the human factor IX gene have a G + C content of 40%?   总被引:20,自引:2,他引:18       下载免费PDF全文
The factor IX gene has a G + C content of approximately 40% in all mammalian species examined. In human factor IX, C----T and G----A transitions at the dinucleotide CpG are elevated at least 24-fold relative to other transitions. Can the G + C content be explained solely by this hot spot of mutation? Using our mathematical model, we show that the elevation of mutation at CpG cannot alone lower the G + C content below 45%. To search for other hot spots of mutation that might contribute to the reduction of G + C content, we assessed the relative rates of base substitution in our sample of 160 families with hemophilia B. Seventeen independent single-base substitutions are reported herein for a total of 96 independent point mutations in our sample. The following conclusions emerge from the analysis of our data and, where appropriate, the data of others: (1) Transversions at CpG are elevated an estimated 7.7-fold relative to other transversions. (2) The mutation rates at non-CpG dinucleotides are remarkably uniform; none of the observed rates are either more than twofold above the median for transitions or more than threefold above the median for transversions. (3) The pattern of recent mutation is compatible with the pattern during mammalian evolution that has maintained the G + C content of the factor IX gene at approximately 40%.  相似文献   

6.
7.
Secretin is a 27-amino acid gastrointestinal hormone that stimulates the secretion of bicarbonate-rich pancreatic fluid. We isolated and analyzed the coding region of the gene for the rat secretin precursor. The entire coding region spans 692 base pairs and is divided into four regions corresponding to the signal peptide and NH2-terminal peptide, the secretin peptide and processing signal sequences, a part of the COOH-terminal peptide, and the remainder of the COOH-terminal peptide, which are interrupted by three short introns (81, 105, and 104 base pairs). The organization is similar to those of the genes for other members of the secretin family, glucagon and VIP/PHI-27 precursors, supporting the assumption that the genes for the secretin family peptide precursors originated from a common ancestral gene. We also demonstrated that the secretin precursor gene is widely expressed in the brain and in the hypophysis. The regional expression pattern of the secretin precursor gene in the brain is quite different from those of the glucagon and VIP/PHI-27 precursor genes. The secretin precursor gene is highly expressed in the medulla oblongata and pons of the brain and the hypophysis, the expression levels of which are comparable to those in the duodenum. The secretin precursor mRNA in the brain and the hypophysis has the same coding sequence as that in the duodenum, indicating that secretin in the brain and the hypophysis is produced from the same secretin precursor protein as that in the duodenum. This is the first evidence to be reported that the secretin precursor gene is definitely expressed in the brain.  相似文献   

8.
We have isolated recombinant DNA clones which include cDNA and chromosomal DNA sequences of the major heat shock-inducible gene of Drosophila. With the cDNA fragments used as specific hybridization probes, DNA:DNA reassociation and in situ hybridization analysis demonstrated that the DNA sequences are repeated approximately 7 times in the haploid Drosophila genome, and that gene sequences are present at both the 87A and 87C loci on the cytological map. The cloned cDNA and homologous cloned chromosomal DNA hybridized to mRNA which translated in vitro into the major 70K heat shock-specific protein. Here we summarize a study of the organization of genes coding for the 70K heat shock-specific protein contained in the two recombinant chromosomal DNA plasmids pG3 and pG5. On the basis of R loop hybridization experiments and restriction enzyme analysis, we conclude that a 14 kb fragment, G3, contains three copies of the gene coding for the 70K protein. A second 9.2 kb fragment, G5, contains one copy of the gene coding for the 70K protein. Hybridization of labeled poly(A)-containing RNA to restriction endonuclease-cleaved DNA indicates that the mRNA coding regions in G3 and G5 are each approximately 2100 bp long. The three tandemly repeated genes of G3 are separated by approximately 1400 bp of spacer DNA. The two internal spacer regions in G3 appear to be identical, whereas differences in restriction enzyme sites indicate that the sequences adjacent to the cluster differ from the internal spacer and from each other.  相似文献   

9.
10.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

11.
Y Tsujimoto  Y Suzuki 《Cell》1979,18(2):591-600
Sequence analysis of the cloned genomic fibroin gene and cDNA containing the sequence complementary to fibroin mRNA has been carried out for the regions covering the 5′ flanking, mRNA coding, entire intervening sequence and its borders, and fibroin coding sequences. The sequences determined on the gene extend from nucleotide ?552 to +1497 (assigning +1 to the cap locus); sequence analysis of the cDNA has confirmed our previous mapping of the cap locus (Tsujimoto and Suzuki, 1979). Comparison of the nucleotide sequence of the genomic gene with that of cDNA has confirmed the existence of an intervening sequence 970 bases long. The sequence comparison also pinpointed the 5′ coding-intervening junction at +64?66 and the 3′ intervening-coding junction at +1034?1036. Both the 5′ and 3′ junctions of the fibroin gene (insect) share homologous segments of about 10 nucleotides each with the published sequences of β-globin (mammal), immunoglobulin (mammal) and ovalbumin (avian) genes. A long inverted repeat sequence (17 of 23 base match) has been found next to the junctions within the intervening sequence of the fibroin gene. The repetitious sequence that codes for the Gly-Ala peptide characteristic of fibroin protein begins at position +1448. The characteristics of the N terminal portion of fibroin protein (or its precursor) are discussed, as are the features of the 5′ flanking sequence of the gene and the mRNA sequence (with special attention to the putative promoter sequence for the gene), the possible secondary structure and a sequence complementary to the 3′ end of 18S ribosomal RNA at the 5′ proximal region of fibroin mRNA.  相似文献   

12.
13.
Human liver cDNA coding for protein C has been synthesized, cloned and sequenced. The abundance of protein C message is approximately 0.02% of total mRNA. Three overlapping clones contain 1,798 nucleotides of contiguous sequence, which approximates the size of the protein's mRNA, based upon Northern hybridization. The cDNA sequence consists of 73 5'-noncoding bases, coding sequence for a 461 amino acid nascent polypeptide precursor, a TAA termination codon, 296 3'-noncoding bases, and a 38 base polyadenylation segment. The nascent protein consists of a 33 amino acid "signal", a 9 amino acid propeptide, a 155 amino acid "light" chain, a Lys-Arg connecting dipeptide, and a 262 amino acid "heavy" chain. Human protein C and Factor IX and X precursors possess about one third identical amino acids (59% in the gamma-carboxyglutamate domain), including two forty-six amino acid segments homologous to epidermal growth factor. Human protein C also has similar homology with prothrombin in the "leader", gamma-carboxyglutamate and serine protease domains, but lacks the two "kringle" domains found in prothrombin.  相似文献   

14.
The complete sequence of Musa acuminata bacterial artificial chromosome (BAC) clones is presented and, consequently, the first analysis of the banana genome organization. One clone (MuH9) is 82,723 bp long with an overall G+C content of 38.2%. Twelve putative protein-coding sequences were identified, representing a gene density of one per 6.9 kb, which is slightly less than that previously reported for Arabidopsis but similar to rice. One coding sequence was identified as a partial M. acuminata malate synthase, while the remaining sequences showed a similarity to predicted or hypothetical proteins identified in genome sequence data. A second BAC clone (MuG9) is 73,268 bp long with an overall G+C content of 38.5%. Only seven putative coding regions were discovered, representing a gene density of only one gene per 10.5 kb, which is strikingly lower than that of the first BAC. One coding sequence showed significant homology to the soybean ribonucleotide reductase (large subunit). A transition point between coding regions and repeated sequences was found at approximately 45 kb, separating the coding upstream BAC end from its downstream end that mainly contained transposon-like sequences and regions similar to known repetitive sequences of M. acuminata. This gene organization resembles Gramineae genome sequences, where genes are clustered in gene-rich regions separated by gene-poor DNA containing abundant transposons.Communicated by J.S. Heslop-Harrison  相似文献   

15.
The complete nucleotide sequence of the Pseudomonas chromosomal gene coding for the enzyme carboxypeptidase G2 (CPG2) has been determined. The nucleotide sequence obtained has been confirmed by comparing the predicted amino acid sequence with that of randomly derived peptide fragments and by N-terminal sequencing of the purified protein. The gene has been shown to code for a 22 amino acid signal peptide at its N-terminus which closely resembles the signal peptides of other secreted proteins. An alternative 36 amino acid signal peptide which may function in Pseudomonas has also been identified. The codon utilisation of the gene is influenced by the high G + C (67.2%) content of the DNA and exhibits a 92.8% preference for codons ending in G or C. This unusual codon preference may contribute to the generally observed weak expression of Pseudomonas genes in Escherichia coli. A region of DNA upstream of the structural gene has also been sequenced and a ribosome binding site and two putative promoter sequences identified.  相似文献   

16.
Homologous "propeptide" regions are present in a family of vitamin K-dependent mammalian proteins, including clotting factors II, VII, IX, X, protein C, protein S and bone "gla" proteins. To test the hypothesis that the propeptide is a signal for the correct gamma-carboxylation of the adjacent gamma-carboxy region, we have mutated amino acid -4 of human factor IX from an arginine to a glutamine residue, by M13-directed site-specific mutagenesis of a cDNA clone. After expression of mutant factor IX in dog kidney cells, we find that it is secreted into the medium in a precursor form containing the propeptide, and is inefficiently gamma-carboxylated compared to the control, wild-type, recombinant factor IX. This result supports the hypothesis that the propeptide region is required for efficient gamma-carboxylation of factor IX in dog kidney cells. Furthermore, it confirms previous results that arginine at amino acid -4 is required for correct propeptide processing.  相似文献   

17.
18.
The gene for the extracellular alpha antigen of Mycobacterium bovis BCG was cloned by using a single probe restricted to G or C in the third position. This technique should have great potential for the isolation of mycobacterial antigen genes. The gene analysis revealed that the alpha antigen gene encoded 323 amino acid residues, including 40 amino acids for signal peptide followed by 283 amino acids for mature protein. This is the first report on the structure of the mycobacterial signal peptide. The promoter-like sequence and ribosome-binding site were observed upstream of the open reading frame. In the coding region, the third position of the codon showed high G + C content (86%). The gene was expressed as an unfused protein in Escherichia coli by using an E. coli expression vector. This protein, which reacted with polyclonal antibody raised against alpha antigen from Mycobacterium tuberculosis, would be applicable to the immunodiagnosis of tuberculosis.  相似文献   

19.
The nucleotide sequence of the Pseudomonas saccharophila gene encoding maltotetraohydrolase (G4-forming amylase) has been determined. The coding region for the G4-forming amylase precursor contained 1653 nucleotides. The deduced precursor protein included an N-terminal 21-residue putative signal peptide; the deduced mature form of G4-forming amylase contains 530 amino acid residues with a calculated molecular mass of 57 740 Da. Sequence similarities between the G4-forming amylase and other amylolytic enzymes of species ranging from prokaryotes to eukaryotes are quite limited. However, three regions, which are involved in both the catalytic and substrate-binding sites of various amylolytic enzymes, are highly conserved in the G4-forming amylase of P. saccharophila.  相似文献   

20.
MicroRNAs and other tiny endogenous RNAs in C. elegans   总被引:8,自引:0,他引:8  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号