首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Transforming acidic coiled-coil proteins (TACC1, 2, and 3) are essential proteins associated with the assembly of spindle microtubules and maintenance of bipolarity. Dysregulation of TACCs is associated with tumorigenesis, but studies of microsatellite instability in TACC genes have not been extensive. Microsatellite or simple sequence repeat instability is known to cause many types of cancer. The present in silico analysis of SSRs in human TACC gene sequences shows the presence of mono- to hexa-nucleotide repeats, with the highest densities found for mono- and di-nucleotide repeats. Density of repeats is higher in introns than in exons. Some of the repeats are present in regulatory regions and retained introns. Human TACC genes show conservation of many repeat classes. Microsatellites in TACC genes could be valuable markers for monitoring numerical chromosomal aberrations and or cancer.  相似文献   

3.
Ankyrin repeats are present in a great variety of proteins of eukaryotes, prokaryotes and some viruses and they function as protein-protein interaction domains. We have search for all the ankyrin repeats present in Arabidopsis proteins and determined their consensus sequence. We identified a total of 509 ankyrin repeats present in 105 proteins. Ankyrin repeat containing proteins can be classified in 16 groups of structurally similar proteins. The most abundant group contains proteins with ankyrin repeats and transmembrane domains (AtANKTM). Sequence similarity analysis indicates that these proteins are divided in six families. Some of the AtAnkTm genes are organized in tandem arrays and others are present in duplicated parts of the Arabidopsis genome. The expression of several AtAnkTm genes was analyzed resulting in a wide variety of expression patterns even within the same family. The likely functions of these proteins are discussed in comparison with the known functions of proteins with similar organization in other species.  相似文献   

4.
The evolutionary expansion of CAG repeats in human triplet expansion disease genes is intriguing because of their deleterious phenotype. In the past, this expansion has been suggested to reflect a broad genomewide expansion of repeats, which would imply that mutational and evolutionary processes acting on repeats differ between species. Here, we tested this hypothesis by analyzing repeat- and flanking-sequence evolution in 28 repeat-containing genes that had been sequenced in humans and mice and by considering overall lengths and distributions of CAG repeats in the two species. We found no evidence that these repeats were longer in humans than in mice. We also found no evidence for preferential accumulation of CAG repeats in the human genome relative to mice from an analysis of the lengths of repeats identified in sequence databases. We then investigated whether sequence properties, such as base and amino acid composition and base substitution rates, showed any relationship to repeat evolution. We found that repeat-containing genes were enriched in certain amino acids, presumably as the result of selection, but that this did not reflect underlying biases in base composition. We also found that regions near repeats showed higher nonsynonymous substitution rates than the remainder of the gene and lower nonsynonymous rates in genes that contained a repeat in both the human and the mouse. Higher rates of nonsynonymous mutation in the neighborhood of repeats presumably reflect weaker purifying selection acting in these regions of the proteins, while the very low rate of nonsynonymous mutation in proteins containing a CAG repeat in both species presumably reflects a high level of purifying selection. Based on these observations, we propose that the mutational processes giving rise to polyglutamine repeats in human and murine proteins do not differ. Instead, we propose that the evolution of polyglutamine repeats in proteins results from an interplay between mutational processes and selection.  相似文献   

5.
Molecular markers derived from the complete chloroplast genome can provide effective tools for species identification and phylogenetic resolution. Complete chloroplast (cp) genome sequences of Capsicum species have been reported. We herein report the complete chloroplast genome sequence of Capsicum baccatum var. baccatum, a wild Capsicum species. The total length of the chloroplast genome is 157,145 bp with 37.7 % overall GC content. One pair of inverted repeats, 25,910 bp in length, was separated by a small single-copy region (17,974 bp) and large single-copy region (87,351 bp). This region contains 86 protein-coding genes, 30 tRNA genes, 4 rRNA genes, and 11 genes contain one or two introns. Pair-wise alignments of chloroplast genome were performed for genome-wide comparison. Analysis revealed a total of 134 simple sequence repeat (SSR) motifs and 282 insertions or deletions variants in the C. baccatum var. baccatum cp genome. The types and abundances of repeat units in Capsicum species were relatively conserved, and these loci could be used in future studies to investigate and conserve the genetic diversity of the Capsicum species.  相似文献   

6.
Complete sequence determination of the brachiopod Lingula anatina mtDNA (28,818 bp) revealed an organization that is remarkably atypical for an animal mt-genome. In addition to the usual set of 37 animal mitochondrial genes, which make up only 57% (16,555 bp) of the entire sequence, the genome contains lengthy unassigned sequences. All the genes are encoded in the same DNA strand, generally in a compact way, whereas the overall gene order is highly divergent in comparison with known animal mtDNA. Individual genes are generally longer and deviate considerably in sequence from their homologues in other animals. The genome contains two major repeat regions, in which 11 units of unassigned sequences and six genes (atp8, trnM, trnQ, trnV, and part of cox2 and nad2) are found in repetition, in the form of nested direct repeats of unparalleled complexity. One of the repeat regions contains unassigned repeat units dispersed among several unique sequences, novel repetitive structure for animal mtDNAs. Each of those unique sequences contains an open reading frame for a polypeptide between 80 and 357 amino acids long, potentially encoding a functional molecule, but none of them has been identified with known proteins. In both repeat regions, tRNA genes or tRNA gene-like sequences flank major repeated units, supporting the view that those structures play a role in the mitochondrial gene rearrangements. Although the intricate repeated organization of this genome can be explained by recurrent tandem duplications and subsequent deletions mediated by replication errors, other mechanisms, such as nonhomologous recombinations, appear to explain certain structures more easily.  相似文献   

7.
8.
9.
The Pacific oyster (Crassostrea gigas) is globally distributed and is one of the most commercially and ecologically important marine organisms. However, little is known about the genome of this species. In this study, a C. gigas fosmid library was constructed that contains 459,936 clones with an average insert size of approximately 40 kb, representing 22.34-fold haploid genome equivalents. End sequencing generated 90,240 fosmid end sequences (FESs) with an average length of 384.27 base pairs (bp), covering approximately 2.58% of the Pacific oyster genome. The FESs were subsequently assembled and annotated, resulting in 6332 sequences with predicted open reading frames≥300 and 1,189,100 bp repeats. Furthermore, a total of 3200 microsatellite repeats were identified, and dinucleotide repeats were found to occur most abundantly, with AG and AAT being the most abundant repeat class of dinucleotides and trinucleotides. We also found that the repeat number was generally negatively proportional to the repeat element length. Microsatellites composition between the transcribed sequences and genomic sequences was shown to be different. Point mutations of microsatellite were non-random and underwent strong selection stress. Overall, a comprehensive sequence resource for the Pacific oyster was created, including annotated transposable elements, tandem repeats, protein coding sequences and microsatellites. These initial findings will serve as resources for further in-depth studies of physical mapping, gene discovery, microsatellite marker developing and evolution studies.  相似文献   

10.
11.
Genes containing multiple coding mini- and microsatellite repeats are highly dynamic components of genomes. Frequent recombination events within these tandem repeats lead to changes in repeat numbers, which in turn alters the amino acid sequence of the corresponding protein. In bacteria and yeasts, the expansion of such coding repeats in cell wall proteins is associated with alterations in immunogenicity, adhesion, and pathogenesis. We hypothesized that identification of repeat-containing putative cell wall proteins in the human pathogen Aspergillus fumigatus may reveal novel pathogenesis-related elements. Here, we report that the genome of A. fumigatus contains as many as 292 genes with internal repeats. Fourteen of 30 selected genes showed size variation of their repeat-containing regions among 11 clinical A. fumigatus isolates. Four of these genes, Afu3g08990, Afu2g05150 (MP-2), Afu4g09600, and Afu6g14090, encode putative cell wall proteins containing a leader sequence and a glycosylphosphatidylinositol anchor motif. All four genes are expressed and produce variable-size mRNA encoding a discrete number of repeat amino acid units. Their expression was altered during development and in response to cell wall-disrupting agents. Deletion of one of these genes, Afu3g08990, resulted in a phenotype characterized by rapid conidial germination and reduced adherence to extracellular matrix suggestive of an alteration in cell wall characteristics. The Afu3g08990 protein was localized to the cell walls of dormant and germinating conidia. Our findings suggest that a subset of the A. fumigatus cell surface proteins may be hypervariable due to recombination events in their internal tandem repeats. This variation may provide the functional diversity in cell surface antigens which allows rapid adaptation to the environment and/or elusion of the host immune system.  相似文献   

12.
Summary We describe the structure of a gene expressed in the salivary gland cells of the dipteranChironomus tentans and show that it encodes 1 of the approximately 15 secretory proteins exported by the gland cells. This sp115,140 gene consists of approximately 65 copies of a 42-bp sequence in a central uninterrupted core block, surrounded by short nonrepetitive regions. The repeats within the gene are highly similar to each other, but divergent repeats are present in a pattern which suggests that the repeat structure has been remodeled during evolution. The 42-bp repeat in the gene is a simple variant of the more complex repeat unit present in the Balbiani ring genes, encoding four of the other secretory proteins. The structure of the sp115,140 gene suggests that related repeat structures have evolved from a common origin and resulted in the set of genes whose secretory proteins interact in the assembly of the secreted protein fibers.  相似文献   

13.
Trinucleotide repeat (TNR) expansions in the genome cause a number of degenerative diseases. A prominent TNR expansion involves the triplet CAG in the huntingtin (HTT) gene responsible for Huntington's disease (HD). Pathology is caused by protein and RNA generated from the TNR regions including small siRNA‐sized repeat fragments. An inverse correlation between the length of the repeats in HTT and cancer incidence has been reported for HD patients. We now show that siRNAs based on the CAG TNR are toxic to cancer cells by targeting genes that contain long reverse complementary TNRs in their open reading frames. Of the 60 siRNAs based on the different TNRs, the six members in the CAG/CUG family of related TNRs are the most toxic to both human and mouse cancer cells. siCAG/CUG TNR‐based siRNAs induce cell death in vitro in all tested cancer cell lines and slow down tumor growth in a preclinical mouse model of ovarian cancer with no signs of toxicity to the mice. We propose to explore TNR‐based siRNAs as a novel form of anticancer reagents.  相似文献   

14.
Chung HJ  Jung JD  Park HW  Kim JH  Cha HW  Min SR  Jeong WJ  Liu JR 《Plant cell reports》2006,25(12):1369-1379
The complete nucleotide sequence of the chloroplast genome of potato Solanum tuberosum L. cv. Desiree was determined. The circular double-stranded DNA, which consists of 155,312 bp, contains a pair of inverted repeat regions (IRa, IRb) of 25,595 bp each. The inverted repeat regions are separated by small and large single copy regions of 18,373 and 85,749 bp, respectively. The genome contains 79 proteins, 30 tRNAs, 4 rRNAs, and unidentified genes. A comparison of chloroplast genomes of seven Solanaceae species revealed that the gene content and their relative positions of S. tuberosum are similar to the other six Solanaceae species. However, undefined open reading frames (ORFs) in LSC region were highly diverged in Solanaceae species except N. sylvestris. Detailed comparison was identified by numerous indels in the intergenic regions that were mostly located in the LSC region. Among them, a single large 241-bp deletion, was not associated with direct repeats and found in only S. tuberosum, clearly discriminates a cultivated potato from wild potato species Solanum bulbocastanum. The extent of sequence divergence may provide the basis for evaluating genetic diversity within the Solanaceae species, and will be useful to examine the evolutionary processes in potato landraces.  相似文献   

15.
The Arabidopsis genome was searched to identify predicted proteins containing armadillo (ARM) repeats, a motif known to mediate protein-protein interactions in a number of different animal proteins. Using domain database predictions and models generated in this study, 108 Arabidopsis proteins were identified that contained a minimum of two ARM repeats with the majority of proteins containing four to eight ARM repeats. Clustering analysis showed that the 108 predicted Arabidopsis ARM repeat proteins could be divided into multiple groups with wide differences in their domain compositions and organizations. Interestingly, 41 of the 108 Arabidopsis ARM repeat proteins contained a U-box, a motif present in a family of E3 ligases, and these proteins represented the largest class of Arabidopsis ARM repeat proteins. In 14 of these U-box/ARM repeat proteins, there was also a novel conserved domain identified in the N-terminal region. Based on the phylogenetic tree, representative U-box/ARM repeat proteins were selected for further study. RNA-blot analyses revealed that these U-box/ARM proteins are expressed in a variety of tissues in Arabidopsis. In addition, the selected U-box/ARM proteins were found to be functional E3 ubiquitin ligases. Thus, these U-box/ARM proteins represent a new family of E3 ligases in Arabidopsis.  相似文献   

16.
Streptococcus pyogenes expresses a fibronectin-binding surface protein (Sfb protein) which mediates adherence to human epithelial cells. The nucleotide sequence of the sfb gene was determined and the primary sequence of the Sfb protein was analysed. The protein consists of 638 amino acids and comprises five structurally distinct domains. The protein starts with an N-terminal signal peptide followed by an aromatic domain. The central part of the protein is formed by four proline-rich repeats which are flanked by non-repetitive spacer sequences. A second repeat region, consisting of four repeats that are distinct from the proline repeats and have been shown to form the fibronectin-binding domain, is located in the Cterminal part of the protein. The protein ends with a typical cell wall and membrane anchor region. Comparative sequence analysis of the N-terminal aromatic domain revealed similarities with carbohydrate-binding sites of other proteins. The proline repeat region of the Sfb protein shares characteristic features with proline-rich repeats of functionally distinct surface proteins from pathogenic Gram-positive cocci. Immunoelectron microscopy revealed an even distribution of the fibronectin-binding domain of Sfb protein on the surface of streptococcal cells. Analyses of 38 sfb genes originating from different S. pyogenes isolates revealed primary sequence variability in regions coding for the N-termini of mature Sfb proteins, whereas sequences coding for the central and C-terminal repeats were highly conserved. The repeat sequences are postulated to act as target sites for intragenic recombination events that result in variable numbers of repeats within the different sfb genes. A model of the Sfb protein is presented.  相似文献   

17.
The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.  相似文献   

18.
19.
Amino acid sequence analysis corresponding to the PPE proteins in H37Rv and CDC 1551 strains of theMycobacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid-residue common region in 22 proteins. The pairwise sequence identities were as low as 18%. Conservation of amino acid residues was observed at fifteen positions that were distributed over the whole length of the region. The secondary structure corresponding to this region is predicted to be a mixture of a-helices and β-strands. Although the function is not known, proteins with this region specific to mycobacterial species may be associated with a common function. We further observed another group of 20 PPE proteins corresponding to the conserved C-terminal region comprising 44 amino acid residues with GFxGT and PxxPxxW sequence motifs. This region is preceded by a hydrophobic region, comprising 40–100 amino acid residues, that is flanked by charged amino acid residues. Identification of conserved regions described above may be useful to detect related proteins from other genomes and assist the design of suitable experiments to test their corresponding functions. Amino acid sequence analysis corresponding to the PE proteins resulted in the identification of tandem repeats comprising 41-43 amino acid residues in the C-terminal variable regions in two PE proteins (Rv0978 and Rv0980). These correspond to the AB repeats that were first identified in some proteins of theMethanosarcina mazei genome, and were demonstrated as surface antigens. We observed the AB repeats also in several other proteins of hitherto uncharacterized function inArchaea andBacteria genomes. Some of these proteins are also associated with another repeat called the C-repeat or the PKD-domain comprising 85 amino acid residues. The secondary structure corresponding to the AB repeat is predicted mainly as 4 β-strands. We suggest that proteins with AB repeats inMycobacterium tuberculosis and other genomes may be associated as surface antigens. TheM. leprae genome, however, does not contain either the AB or C-repeats and different proteins may therefore be recruited as surface antigens in theM. leprae genome compared to theM. tuberculosis genome.  相似文献   

20.
Rickettsia are best known as strictly intracellular vector‐borne bacteria that cause mild to severe diseases in humans and other animals. Recent advances in molecular tools and biological experiments have unveiled a wide diversity of Rickettsia spp. that include species with a broad host range and some species that act as endosymbiotic associates. Molecular phylogenies of Rickettsia spp. contain some ambiguities, such as the position of R. canadensis and relationships within the spotted fever group. In the modern era of genomics, with an ever‐increasing number of sequenced genomes, there is enhanced interest in the use of whole‐genome sequences to understand pathogenesis and assess evolutionary relationships among rickettsial species. Rickettsia have small genomes (1.1–1.5 Mb) as a result of reductive evolution. These genomes contain split genes, gene remnants and pseudogenes that, owing to the colinearity of some rickettsial genomes, may represent different steps of the genome degradation process. Genomics reveal extreme genome reduction and massive gene loss in highly vertebrate‐pathogenic Rickettsia compared to less virulent or endosymbiotic species. Information gleaned from rickettsial genomics challenges traditional concepts of pathogenesis that focused primarily on the acquisition of virulence factors. Another intriguing phenomenon about the reduced rickettsial genomes concerns the large fraction of non‐coding DNA and possible functionality of these “non‐coding” sequences, because of the high conservation of these regions. Despite genome streamlining, Rickettsia spp. contain gene families, selfish DNA, repeat palindromic elements and genes encoding eukaryotic‐like motifs. These features participate in sequence and functional diversity and may play a crucial role in adaptation to the host cell and pathogenesis. Genome analyses have identified a large fraction of mobile genetic elements, including plasmids, suggesting the possibility of lateral gene transfer in these intracellular bacteria. Phylogenetic analyses have identified several candidates for horizontal gene acquisition among Rickettsia spp. including tra, pat2, and genes encoding for the type IV secretion system and ATP/ADP translocase that may have been acquired from bacteria living in amoebae. Gene loss, gene duplication, DNA repeats and lateral gene transfer all have shaped rickettsial genome evolution. A comprehensive analysis of the entire genome, including genes and non‐coding DNA, will help to unlock the mysteries of rickettsial evolution and pathogenesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号