首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
The whole genomic analysis in silico of 64 free-living prokaryotic species has been performed to determine the number, length, distribution, and localization of directed and inverted intragenomic repeated sequences (LRS). Three main types of lengthy (≥500 bp) repeated sequences were revealed: (a) associated with ribosomal RNA genes; (b) with copies of protein coding genes; (c) with IS-elements and genes encoding the hypothetical transposases. Lengthy repeated sequences related to transposases comprise 50 to 95% of the total number of LRS depending on the species. Intragenomic LRS associated with transposases and IS-elements can reflect the recombination potential of different prokaryotic species determining the capability for adaptive gene rearrangements as well as the cell capacity for integration of foreign genes acquired through horizontal transfer paths.  相似文献   

5.
Using the transcriptome to annotate the genome   总被引:35,自引:0,他引:35  
A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified approximately 15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another approximately 10,000-20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed. As the in silico approaches identified a smaller number of genes than anticipated, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method--called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach--that can be used to rapidly identify novel genes and exons.  相似文献   

6.
We report here on the complete structure of the human COL3A1 and COL5A2 genes. Collagens III and V, together with collagens I, II and XI make up the group of fibrillar collagens, all of which share a similar structure and function; however, despite the similar size of the major triple-helical domain, the number of exons coding for the domain differs between the genes for the major fibrillar collagens characterized so far (I, II, and III) and the minor ones (V and XI). The main triple-helical domain being encoded by 49-50 exons, including the junction exons, in the COL5A1, COL11A1 and COL11A2 genes, but by 43-44 exons in the genes for the major fibrillar collagens. Characterization of the genomic structure of the COL3A1 gene confirmed its association with the major fibrillar collagen genes, but surprisingly, the genomic organization of the COL5A2 gene was found to be similar to that of the COL3A1 gene. We also confirmed that the two genes are located in tail-to-tail orientation with an intergenic distance of approximately 22 kb. Phylogenetic analysis suggested that they have evolved from a common ancestor gene. Analysis of the genomic sequences identified a novel single nucleotide polymorphism and a novel dinucleotide repeat. These polymorphisms should be useful for linkage analysis of the Ehlers-Danlos syndrome and related disorders.  相似文献   

7.
Long M  Wang W  Zhang J 《Gene》1999,238(1):135-141
This paper deals with a general question posed by the origin of new processed chimerical genes: when a new retrosequence inserts into a new genome position, how does it become activated and acquire novel protein function by recruiting new functional domains and regulatory elements? Jingwei (jgw), a newly evolved functional gene with a chimerical structure in Drosophila, provides an opportunity to examine such questions. The source of its exon encoding C-terminal peptide has been identified as an Adh retrosequence, which extends the concept of exon shuffling from recombination to retroposition as a general molecular mechanism for the origin of a new gene. However, the origin of 5' exons remains unclear. We examined two hypotheses concerning the origin of these non-Adh-derived jgw exons: (i) these exons might originate from a unique genomic sequence that fortuitously evolved a standard intron-exon structure and regulatory sequence for jgw; (ii) these exons might be a duplicate of an unrelated previously existing gene. Genomic Southern analysis, in conjunction with construction and screening of a genomic bookshelf (sub-library), was conducted in a group of Drosophila species. The results demonstrated that there are duplicate genes containing the same structure as the recruited portion of jgw. We name this duplicate gene in Drosophila teissieri and Drosophila yakuba and its orthologous gene in Drosophila melanogaster as yellow-emperor (ymp). Thus, the 5' exons/introns originated from a previously existing gene that provided new modules with specific sub-function to create jgw.  相似文献   

8.
Glucuronidation is a major pathway of androgen metabolism and is catalyzed by UDP-glucuronosyltransferase (UGT) enzymes. UGT2B15 and UGT2B17 are 95% identical in primary structure, and are expressed in steroid target tissues where they conjugate C19 steroids. Despite the similarities, their regulation of expression are different; however, the promoter region and genomic structure of only the UGT2B17 gene have been characterizedX to date. To isolate the UGT2B15 gene and other novel steroid-conjugating UGT2B genes, eight P-1-derived artificial chromosomes (PAC) clones varying in length from 30 kb to 165 kb were isolated. The entire UGT2B15 gene was isolated and characterized from the PAC clone 21598 of 165 kb. The UGT2B15 and UGT2B17 genes are highly conserved, are both composed of six exons spanning approximately 25 kb, have identical exon sizes and have identical exon-intron boundaries. The homology between the two genes extend into the 5'-flanking region, and contain several conserved putative cis-acting elements including Pbx-1, C/EBP, AP-1, Oct-1 and NF/kappaB. However, transfection studies revealed differences in basal promoter activity between the two genes, which correspond to regions containing non-conserved potential elements. The high degree of homology in the 5'-flanking region between the two genes is lost upstream of -1662 in UGT2B15, and suggests a site of genetic recombination involved in duplication of UGT2B genes. Fluorescence in situ hybridization mapped the UGT2B15 gene to chromosome 4q13.3-21.1. The other PAC clones isolated contain exons from the UGT2B4, UGT2B11 and UGT2B17 genes. Five novel exons, which are highly homologous to the exon 1 of known UGT2B genes, were also identified; however, these exons contain premature stop codons and represent the first recognized pseudogenes of the UGT2B family. The localization of highly homologous UGT2B genes and pseudogenes as a cluster on chromosome 4q13 reveals the complex nature of this gene locus, and other novel homologous UGT2B genes encoding steroid conjugating enzymes are likely to be found in this region of the genome.  相似文献   

9.
We describe a genomic DNA-based signal sequence trap method, signal-exon trap (SET), for the identification of genes encoding secreted and membrane-bound proteins. SET is based on the coupling of an exon trap to the translation of captured exons, which allows screening of the exon-encoded polypeptides for signal peptide function. Since most signal sequences are expected to be located in the 5′-terminal exons of genes, we first demonstrate that trapping of these exons is feasible. To test the applicability of SET for the screening of complex genomic DNA, we evaluated two critical features of the method. Specificity was assessed by the analysis of random genomic DNA and efficiency was demonstrated by screening a 425 kb YAC known to contain the genes of four secretory or membrane-bound proteins. All trapped clones contained a translation initiation signal followed by a hydrophobic stretch of amino acids representing either a known signal peptide, transmembrane domain or novel sequence. Our results suggest that SET is a potentially useful method for the isolation of signal sequence-containing genes and may find application in the discovery of novel members of known secretory gene clusters, as well as in other positional cloning approaches.  相似文献   

10.
11.
Efficiency and specificity of gene isolation by exon amplification   总被引:2,自引:0,他引:2  
Exon amplification is an increasingly popular approach to the identification of transcribed sequences and will complement other strategies to isolate genes. We have used this system to amplify candidate exons from 32 cosmids, including 8 cosmids which span a well characterized 185-kb region of the human major histocompatibility class II region on Chromosome (Chr) 6. We have examined the efficiency, specificity, and reproducibility of the system in isolating exons from genes known to be present on particular cosmids and have determined the nature and frequency of artefact amplifications in routine cosmid screening. We were able to clone at least one exon from 88% (7/8) of all known genes tested (including exons which are differentially spliced) and obtained artefacts from 19% (6/32) of the cosmids tested. Such artefacts generally arise from the amplification of noncoding sequences flanked by regions with high homology to acceptor and donor splice junctions. We show that the exon amplification procedure can be used successfully with a wide variety of cosmids which have different numbers of genes and gene structures and describe several approaches to the characterization of novel exons cloned in this study.  相似文献   

12.
通过利用肝癌病人体内血清中所含的对肿瘤抗原产生的特异性抗体筛选肝癌组织cDNA表达文库的方法 (SEREX) ,筛选得到了可以诱导肝癌病人抗体免疫应答的两个新抗原HCA5 19基因(GenBankAF14 6 731)及其变异体HCA90基因 (GenBankAF 2 872 6 5 ) .它们定位于染色体 2 0q11 2 ,HCA5 19含 18个外显子 ,HCA90含 19个外显子 .其中HCA90所特有的外显子序列长 10 8bp ,属Alu重复序列片段 ,插入于HCA5 19外显子 10和 11之间 ,原为HCA5 19内含子序列 .该插入片段位于HCA5 19开放阅读框架之内 ,不改变HCA5 19的读码框 ,使HCA5 19编码的 74 7个氨基酸增长至HCA90的 783个氨基酸 .通过Northern杂交和RT PCR分析发现 ,HCA5 19和HCA90基因分别在 9 9例和 6 9例肝癌组织中高表达 ,RT PCR显示 ,它们在正常肝组织和其它正常组织中有极低水平转录本表达 .而这种低表达转录本在Northern杂交中不能被检测到 .HCA5 19蛋白被首次发现在肿瘤病人中能够诱导机体的抗体免疫应答 ,为一个新的肿瘤相关性抗原分子 .其变异体HCA90抗原基因为首次发现的新基因 .其功能可能与细胞的恶性增殖相关 ,并可进一步研究其作为临床肿瘤治疗和诊断的靶分子的可行性  相似文献   

13.
A bovine genomic clone that hybridized to HLA-DQ beta cDNA was isolated and fragments containing the beta 1, beta 2 and transmembrane (TM) exons subcloned. The nucleotide sequences of the exons and flanking intron regions were determined. Comparisons of these exon nucleotide sequences and derived amino acid sequences to human class II beta-chain sequences showed that this gene is only 77% identical to HLA-DQ beta and about 75% identical to bovine DQ beta-like genes. The exon sequences were more divergent from other class II beta-chain genes. However, structural features such as conserved cysteines and regions of amino acids strongly suggest this to be a class II beta-chain gene. When exon-containing fragments were used as hybridization probes on Southern blots of bovine genomic DNA digested with Eco RI or Pvu II, each exon hybridized to a single band. Based on these results we have referred to this gene as a novel bovine class II beta-chain gene, BoLA-DIB.  相似文献   

14.
15.
Genomic duplication, followed by divergence, contributes to organismal evolution. Several mechanisms, such as exon shuffling and alternative splicing, are responsible for novel gene functions, but they generate homologous domains and do not usually lead to drastic innovation. Major novelties can potentially be introduced by frameshift mutations and this idea can explain the creation of novel proteins. Here, we employ a strategy using simulated protein sequences and identify 470 human and 108 mouse frameshift events that originate new gene segments. No obvious interspecies overlap was observed, suggesting high rates of acquisition of evolutionary events. This inference is supported by a deficiency of TpA dinucleotides in the protein-coding sequences, which decreases the occurrence of translational termination, even on the complementary strand. Increased usage of the TGA codon as the termination signal in newer genes also supports our inference. This suggests that tolerated frameshift changes are a prevalent mechanism for the rapid emergence of new genes and that protein-coding sequences can be derived from existing or ancestral exons rather than from events that result in noncoding sequences becoming exons.  相似文献   

16.
17.
18.
19.
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.  相似文献   

20.
Grapevine is an important perennial fruit to the wine industry, and has implications for the health industry with some causative agents proven to reduce heart disease. Since the sequencing and assembly of grapevine cultivar Pinot Noir, several studies have contributed to its genome annotation. This new study further contributes toward genome annotation efforts by conducting a proteogenomics analysis using the latest genome annotation from CRIBI, legacy proteomics dataset from cultivar Cabernet Sauvignon and a large RNA‐seq dataset. A total of 341 novel annotation events are identified consisting of five frame‐shifts, 37 translated UTRs, 15 exon boundaries, one novel splice, nine novel exons, 159 gene boundaries, 112 reverse strands, and one novel gene event in 213 genes and 323 proteins. From this proteogenomics evidence, the Augustus gene prediction tool predicted 52 novel and revised genes (54 protein isoforms), 11 genes of which are associated with key traits such as stress tolerance and floral and fruity wine characteristics. This study also highlights a likely over‐assembly with the genome, particularly on chromosome 7.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号