首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The complete sequences of four TBE1 transposons from Oxytricha fallax and O. trifallax are presented and analyzed. Although two TBE1s are 98% identical to each other at the nucleotide level, the remaining two TBE1s are only 90% identical both to each other and to the other two. This large evolutionary divergence allows us to identify conserved TBE1 features. TBE1 transposons are 4.1 kbp long and are flanked by 3 bp target-site repeats. The elements consist of 78 bp inverted terminal repeats, of which the 17 terminal base pairs are Oxytricha telomere repeats; a central conserved section of 550 bp that includes a set of nested direct and inverted sequence repeats; and 3 open reading frames conserved for encoded amino acid sequence. The three open reading frames encode a 22 kDa basic protein of unknown function, a 42 kDa ‘D,D35E’ transposase, and a 57 kDa chimeric C 2 H 2 zinc finger/protein kinase. The protein kinase domain of the 57 kDa protein is unusual, lacking a conserved ATP-binding motif. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
Telomeres of most insects are composed of simple (TTAGG) n repeats that are synthesized by telomerase. However, in some dipteran insects such as Drosophila melanogaster, (TTAGG) n repeats or telomerase activity has not been detected. Although telomere structure is well documented in Diptera and Lepidoptera, very limited information is available on lower insect groups. To understand general aspects of telomere function and evolution in insects, we endeavored to characterize structures of the telomeric and subtelomeric regions in a lower insect, the Taiwan cricket, Teleogryllus taiwanemma. FISH analysis of this insect's chromosomes demonstrated (TTAGG) n repeat elements in all distal ends. Just proximal to the telomeric repeats, the highly conserved 9-kb long terminal unit (LTU) sequences are tandemly repeated. These were observed in four of six chromosomes, three autosomal ends, and one X-chromosomal end. LTU sequences represent about 0.2% of the T. taiwanemma genome. Each LTU contains a core (TTAGG)8-like sequence (TRLS) and five types of conserved sequences—ST (short telomere associated), J (joint), X, SR (satellite sequence rich), and Y—which vary in length from about 150 bp to 2.7 kb. The LTU sequence is defined as ST–J–TRLS–SR–X–Y–X–Y–X. Most LTU regions may be derived from the ancestral common sequence, which is observed in ST regions six times and at many other LTU sites. We could not find the LTU-like sequence in three other crickets including the closest species, T. emma, suggesting that the LTU in T. taiwanemma has been rapidly amplified in subtelomeric regions through recent evolutional events. It is also suggested that the highly conserved structure of the LTU is maintained by recombination and may contribute to telomere elongation, as seen in dipteran insects. Received: 6 August 2001/Accepted: 10 October 2001  相似文献   

3.
The rapid divergence of repetitive sequences makes them desirable markers for phylogenetic studies of closely related groups, provided that a high level of sequence homogeneity has been maintained within species. Intraspecific polymorphisms are found in an increasing number of studies now, and this highlights the need to determine why these occur. In this study we examined intraindividual variation present in the first ribosomal internal transcribed spacer (ITS1) from a group of cryptic mosquito species. Individuals of the Anopheles punctulatus group contained multiple ITS1 length variants that ranged from 1.2 to 8.0 kb. Nucleotide and copy number variation for several homologous internal repeats is common, yet the intraspecific sequence divergence of cloned PCR isolates is comparable to that of other mosquito species (~0.2–1.5%). Most of the length variation is comprised of a 5′-ITS1 repeat that was identified as a duplication of a conserved ITS2 region. Secondary structure conservation for this repeat is pronounced and several repeat types that are highly homogenized have formed. Significant interspecific divergence indicates a high rate of evolutionary change for this spacer. A maximum likelihood tree constructed here was congruent with previous phylogenetic hypotheses and suggests that concerted evolution is also accompanied by interpopulation divergence. The lack of interindividual differences and the presence of homogenized internal repeats suggest that a high rate of turnover has reduced the overall level of variation. However, the intraindividual variation also appears to be maintained by the absence of a single turnover rate and the complex dynamics of ongoing recombination within the spacer. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

4.
A total of 48 full-length protein sequences of pectin lyases from different source organisms available in NCBI were subjected to multiple sequence alignment, domain analysis, and phylogenetic tree construction. A phylogenetic tree constructed on the basis of the protein sequences revealed two distinct clusters representing pectin lyases from bacterial and fungal sources. Similarly, the multiple accessions of different source organisms representing bacterial and fungal pectin lyases also formed distinct clusters, showing sequence level homology. The sequence level similarities among different groups of pectinase enzymes, viz. pectin lyase, pectate lyase, polygalacturonase, and pectin esterase, were also analyzed by subjecting a single protein sequence from each group with common source organism to tree construction. Four distinct clusters representing different groups of pectinases with common source organisms were observed, indicating the existing sequence level similarity among them. Multiple sequence alignment of pectin lyase protein sequence of different source organisms along with pectinases with common source organisms revealed a conserved region, indicating homology at sequence level. A conserved domain Pec_Lyase_C was frequently observed in the protein sequences of pectin lyases and pectate lyases, while Glyco_hydro_28 domains and Pectate lyase-like β-helix clan domain are frequently observed in polygalacturonases and pectin esterases, respectively. The signature amino acid sequence of 41 amino acids, i.e. TYDNAGVLPITVN-SNKSLIGEGSKGVIKGKGLRIVSGAKNI, related with the Pec_Lyase_C is frequently observed in pectin lyase protein sequences and might be related with the structure and enzymatic function.  相似文献   

5.
Evolution of N-terminal sequences of the vertebrate HOXA13 protein   总被引:8,自引:0,他引:8  
While the the role of the homeodomain in HOX function has been evaluated extensively, little attention has been given to the non-homeodomain portions of the HOX proteins. To investigate the evolution of the HOXA13 protein and to identify conserved residues in the N-terminal region of the protein with potential functional significance, N-terminal Hoxa13 coding sequences were PCR-amplified from fish, amphibian, reptile, chicken, and marsupial and eutherian mammal genomic DNA. Compared with fish HOXA13, the mammalian protein has increased in size by 35% primarily owing to the accumulation of alanine repeats and flanking segments rich in proline, glycine, or serine within the first 215 amino acids. Certain residues and amino acid motifs were strongly conserved, and several HOXA13 N-terminal domains were also shared in the paralogous HOXB13 and HOXD13 genes; however, other conserved regions appear to be unique to HOXA13. Two domains highly conserved in HOXA13 orthologs are shared with Drosophila AbdB and other vertebrate AbdB-like proteins. Marsupial and eutherian mammalian HOXA13 proteins have three large homopolymeric alanine repeats of 14, 12, and 17–18 residues that are absent in reptiles, birds, and fish. Thus, the repeats arose after the divergence of reptiles from the lineage that would give rise to the mammals. In contrast, other short homopolymeric alanine repeats in mammalian HOXA13 have remained virtually the same length, suggesting that forces driving or limiting repeat expansion are context dependent. Consecutive stretches of identical third-base usage in alanine codons within the large repeats were found, supporting replication slippage as a mechanism for their generation. However, numerous species-specific base substitutions affecting third-base alanine repeat codon positions were observed, particularly in the largest repeat. Therefore, if the large alanine repeats were present prior to eutherian mammal development as is suggested by the opossum data, then a dynamic process of recurring replication slippage and point mutation within alanine repeat codons must be considered to reconcile these observations. This model might also explain why the alanine repeats are flanked by proline, serine, and glycine-rich sequences, and it reveals a biological mechanism that promotes increases in protein size and, potentially, acquisition of new functions. Received: 8 June 1999 / Accepted: 23 September 1999  相似文献   

6.
All striated muscles respond to stretch by a delayed increase in tension. This physiological response, known as stretch activation, is, however, predominantly found in vertebrate cardiac muscle and insect asynchronous flight muscles. Stretch activation relies on an elastic third filament system composed of giant proteins known as titin in vertebrates or kettin and projectin in insects. The projectin insect protein functions jointly as a “scaffold and ruler” system during myofibril assembly and as an elastic protein during stretch activation. An evolutionary analysis of the projectin molecule could potentially provide insight into how distinct protein regions may have evolved in response to different evolutionary constraints. We mined candidate genes in representative insect species from Hemiptera to Diptera, from published and novel genome sequence data, and carried out a detailed molecular and phylogenetic analysis. The general domain organization of projectin is highly conserved, as are the protein sequences of its two repeated regions—the immunoglobulin type C and fibronectin type III domains. The conservation in structure and sequence is consistent with the proposed function of projectin as a scaffold and ruler. In contrast, the amino acid sequences of the elastic PEVK domains are noticeably divergent, although their length and overall unusual amino acid makeup are conserved. These patterns suggest that the PEVK region working as an unstructured domain can still maintain its dynamic, and even its three-dimensional, properties, without the need for strict amino acid conservation. Phylogenetic analysis of the projectin proteins also supports a reclassification of the Hymenoptera in relation to Diptera and Coleoptera. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

7.
We have designed hidden Markov models (HMMs) of structurally conserved repeats that, based on pairwise comparisons, are unconserved at the sequence level. To model secondary structure features these HMMs assign higher probabilities of transition to insert or delete states within sequence regions predicted to form loops. HMMs were optimized using a sampling procedure based on the degree of statistical uncertainty associated with parameter estimates. A PSI-BLAST search initialized using a checkpoint-recovered profile derived from simulated sequences emitted by such a HMM can reveal distant structural relationships with, in certain instances, substantially greater sensitivity than a normal PSI-BLAST search. This is illustrated using two examples involving DNA- and RNA-associated proteins with structurally conserved repeats. In the first example a putative sliding DNA clamp protein was detected in the thermophilic bacterium Thermotoga maritima. This protein appears to have arisen by way of a duplicated β-clamp gene that then acquired features of a PCNA-like clamp, perhaps to perform a PCNA-related function in association with one or more of the many archaeal-like proteins present in this organism. In the second example, β-propeller domains were predicted in the large subunit of UV-damaged DNA-binding protein and in related proteins, including the large subunit of cleavage-polyadenylation specificity factor, the yeast Rse1p and human SAP130 pre-mRNA splicing factors and the fission yeast Rik1p gene silencing protein.  相似文献   

8.
The Δ12 desaturase represents a diverse gene family in plants and is responsible for conversion of oleic acid (18:1) to linoleic acid (18:2). Several members of this family are known from plants like Arabidopsis and Soybean. Using primers from conserved C- and N-terminal regions, we have cloned a novel Δ12 desaturase gene amplified from flax genomic DNA, denoted as LuFAD2-2. This intron-less gene is 1,149-base pair long encoding 382 amino acids—putative membrane-bound Δ12 desaturase protein. Sequence comparisons show that the novel sequence has 85% similarity with previously reported flax Δ12 desaturase at amino acid level and shows typical features of membrane-bound desaturase such as three conserved histidine boxes along with four membrane-spanning regions that are universally present among plant desaturases. The signature amino acid sequence ‘YNNKL’ was also found to be present at the N terminus of the protein, which is necessary and sufficient for ER localization of enzyme. Neighbor-Joining tree generated from the sequence alignment grouped LuFAD2-2 among the other FAD2 sequences from Ricinus, Hevea, Jatropha, and Vernicia. When LuFAD2-2 and LuFAD2 were expressed in Saccharomyces cerevisiae, they could convert the oleic acid to linoleic acid, with an average conversion rate of 5.25 and 8.85%, respectively. However, exogenously supplied linoleic acid was feebly converted to linolenic acid suggesting that LuFAD2-2 encodes a functional FAD2 enzyme and has substrate specificity similar to LuFAD2.  相似文献   

9.
Wada S  Watanabe T 《Genetica》2007,131(3):307-314
Mitogen-activated protein (MAP) kinases, a closely related family of protein kinases, are involved in cell cycle regulation and differentiation in yeast and human cells. They have not been documented in ciliates. We used PCR to amplify DNA sequences of a ciliated protozoan—Paramecium caudatum—using primers corresponding to amino acid sequences that are common to MAP kinases. We isolated and sequenced one putative MAP kinase-like serine/threonine kinase cDNA from P. caudatum. This cDNA, called pcstk1 (Paramecium caudatum Serine/Threonine Kinase 1) shared approximately 35% amino acid identity with MAP kinases from yeast. MAP kinases are activated by phosphorylation of specific threonine and tyrosine residues. These two amino acid residues are conserved in the PCSTK1 sequence at positions Thr 159 and Tyr 161. The PSTAIRE motif, which is characteristic of the CDK2 gene family, cannot be found in ORF of PCSTK1. The highest homology score was to human STK9, which contains MAP type kinase domains. Comparisons of expression level have shown that pcstk1 is expressed equally in cells at different stages (sexual and asexual). We discussed the possibility, as in other organisms, that a family of MAP kinase genes exists in P. caudatum.  相似文献   

10.
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5′TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5′ UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5′TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.  相似文献   

11.
The mate recognition protein (MRP) gene is a member of a family of extracellular matrix protein genes, called MRP Motif Repeat (MMR) genes, with no known homologs. Two sets of MMR genes, designated MMR-A and MMR-B, were found in Brachionus manjavacas. MMR-B has previously been shown to encode the MRP in the Brachionus plicatilis species complex. MMR family genes share the same basic structure: a signal peptide sequence, followed by nearly identical 276 bp (MMR-A) or 261 bp (MMR-B) repeats, with a truncated final repeat. Each repeat of the predicted MMR-A and -B proteins is expected to have a secondary structure of 5 α-helices, ranging in length from 11 to 20 amino acids, separated by coils of 1–3 amino acids. Hydrophobic and hydrophilic amino acids are predicted to be partitioned to opposite sides of each α-helix, suggesting that MMR proteins are globular with a hydrophobic core. MMR-A and MMR-B proteins vary in their post-translational modifications, resulting in differences in size and charge, and likely causing differences in the physical properties of the proteins on the surface of the female, and their ability to be recognized by a receptor on a male rotifer. The identity of MMR gene repeats is theorized to be maintained by concerted evolution, through a process of unequal crossing over and/or gene conversion, with new mutations likely to be lost. Rarely, however, the same process of concerted evolution can rapidly spread a mutation across all of the repeats. When a mutation results in conformational changes in the protein detectable by males, it could lead to reproductive isolation and thereby to speciation. Thus, changes in MRP could be a driving force in the high degree of species diversity seen within the B. plicatilis cryptic species complex.  相似文献   

12.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

13.
Expressed sequence tags (ESTs) from Coffea canephora leaves and fruits were used to search for types and frequencies of simple sequence repeats (EST–SSRs) with a motif length of 1–6 bp. From a non-redundant (NR) EST set of 5,534 potential unigenes, 6.8% SSR-containing sequences were identified, with an average density of one SSR every 7.73 kb of EST sequences. Trinucleotide repeats were found to be the most abundant (34.34%), followed by di- (25.75%) and hexa-nucleotide (22.04%) motifs. The development of unique genic SSR markers was optimized by a computational approach which allowed us to eliminate redundancy in the original EST set and also to test the specificity of each pair of designed primers. Twenty-five EST–SSRs were developed and used to evaluate cross-species transferability in the Coffea genus. The orthology was supported by the amplicon sequence similarity and the amplification patterns. The >94% identity of flanking sequences revealed high sequence conservation across the Coffea genus. A high level of polymorphic loci was obtained regardless of the species considered (from 75% for C. liberica to 86% for C. canephora). Moreover, the polymorphism revealed by EST–SSR was similar to that exposed by genomic SSR. It is concluded that Coffea ESTs are a valuable resource for microsatellite mining. EST-SSR markers developed from C. canephora sequences can be easily transferred to other Coffea species for which very little molecular information is available. They constitute a set of conserved orthologous markers, which would be ideal for assessing genetic diversity in coffee trees as well as for cross-referencing transcribed sequences in comparative genomics studies.  相似文献   

14.
15.
Universal scale of the sequence conservation has been recently introduced based on omnipresence of the protein sequence motifs across species. A large spectrum of short sequences, up to eight residues has been found to reside in all or almost all prokaryotic organisms. By this discovery a principally novel quantitative approach is introduced to the problem of reconstruction of the last universal common ancestor (LUCA). The most conserved elements (protein modules) with defined structures and sequences harboring the omnipresent motifs are outlined in this work, by combining the sequence and protein crystal structure data. The structurally conserved modules involve 25–30 amino acid residues and have appearance of closed loops, loop-n-lock structures. This confirms earlier conclusions on the loop-fold structure of globular proteins. Many of the topmost conserved modules represent the primary closed loop prototypes, that have been derived by whole genome sequence searches. The data presented, thus, make a basis for further developments toward the earliest stages of protein evolution. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

16.
This paper presents the first report on the structure of a 14-kb centromere sequence in a cereal genome that includes 1.9-kb direct repeats. The cereal centromeric sequence (CCS1) conserved in some Gramineae species contains a 17-bp motif similar to the CENP-B box, which serves as the binding site for the centromere-specific protein CENP-B in human. To isolate centromeric units from rice (Oryza sativa L.), we performed PCR using the CENP-B box-like sequences (CBLS) as primers. A 264-bp clone was amplified by this method, and called RCS1516. It appeared to be a novel member of the CCS1 family, sharing about 60% identity with the CCS1 sequences of other cereals. Then, a 14-kb genomic clone, λRCB11, carrying the RCS1516 sequence was isolated and sequenced. It was found to contain three copies of a 1.9-kb direct repeat, RCE1, separated by 5.1- and 1.7-kb. A 300-bp sequence at the 3′ end of RCE1 is highly conserved in all three copies (>90%) and is almost identical to the RCS1516 sequence including the CBLS motif. The copy number of RCE1 was estimated to range from 102 to 103 in the haploid genome of rice. Cloned RCE1 units were used for fluorescent in situ hybridization (FISH) analysis, and signals were observed on almost every primary constriction of rice chromosomes. Thus it was concluded that RCE1 is a significant component of the rice centromere. The λRCB11 clone contained at least four A/T-rich regions, which are candidate for matrix attachment regions (MARs), in the sequences between the RCE1 repeats. Other elements that are homologous to the short centromeric repetitive sequences pSau3A9 and pRG5, detected in both sorghum and rice, were also found in the clone. Received: 9 June 1998 / Accepted: 16 September 1998  相似文献   

17.
Intrinsically unstructured proteins and their functions   总被引:3,自引:0,他引:3  
Many gene sequences in eukaryotic genomes encode entire proteins or large segments of proteins that lack a well-structured three-dimensional fold. Disordered regions can be highly conserved between species in both composition and sequence and, contrary to the traditional view that protein function equates with a stable three-dimensional structure, disordered regions are often functional, in ways that we are only beginning to discover. Many disordered segments fold on binding to their biological targets (coupled folding and binding), whereas others constitute flexible linkers that have a role in the assembly of macromolecular arrays.  相似文献   

18.
Subtilases are members of the family of subtilisin-like serine proteases. Presently, greater than 50 subtilases are known, greater than 40 of which with their complete amino acid sequences. We have compared these sequences and the available three-dimensional structures (subtilisin BPN', subtilisin Carlsberg, thermitase and proteinase K). The mature enzymes contain up to 1775 residues, with N-terminal catalytic domains ranging from 268 to 511 residues, and signal and/or activation-peptides ranging from 27 to 280 residues. Several members contain C-terminal extensions, relative to the subtilisins, which display additional properties such as sequence repeats, processing sites and membrane anchor segments. Multiple sequence alignment of the N-terminal catalytic domains allows the definition of two main classes of subtilases. A structurally conserved framework of 191 core residues has been defined from a comparison of the four known three-dimensional structures. Eighteen of these core residues are highly conserved, nine of which are glycines. While the alpha-helix and beta-sheet secondary structure elements show considerable sequence homology, this is less so for peptide loops that connect the core secondary structure elements. These loops can vary in length by greater than 150 residues. While the core three-dimensional structure is conserved, insertions and deletions are preferentially confined to surface loops. From the known three-dimensional structures various predictions are made for the other subtilases concerning essential conserved residues, allowable amino acid substitutions, disulphide bonds, Ca(2+)-binding sites, substrate-binding site residues, ionic and aromatic interactions, proteolytically susceptible surface loops, etc. These predictions form a basis for protein engineering of members of the subtilase family, for which no three-dimensional structure is known.  相似文献   

19.
Comparison of ARM and HEAT protein repeats   总被引:18,自引:0,他引:18  
ARM and HEAT motifs are tandemly repeated sequences of approximately 50 amino acid residues that occur in a wide variety of eukaryotic proteins. An exhaustive search of sequence databases detected new family members and revealed that at least 1 in 500 eukaryotic protein sequences contain such repeats. It also rendered the similarity between ARM and HEAT repeats, believed to be evolutionarily related, readily apparent. All the proteins identified in the database searches could be clustered by sequence similarity into four groups: canonical ARM-repeat proteins and three groups of the more divergent HEAT-repeat proteins. This allowed us to build improved sequence profiles for the automatic detection of repeat motifs. Inspection of these profiles indicated that the individual repeat motifs of all four classes share a common set of seven highly conserved hydrophobic residues, which in proteins of known three-dimensional structure are buried within or between repeats. However, the motifs differ at several specific residue positions, suggesting important structural or functional differences among the classes. Our results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly. We discuss evolutionary scenarios that could account for the great diversity of repeats observed.  相似文献   

20.
Protein sequences are normally the most conserved elements of genomes owing to purifying selection to maintain their functions. We document an extraordinary amount of within-species protein sequence variation in the model eukaryote Dictyostelium discoideum stemming from triplet DNA repeats coding for long strings of single amino acids. D. discoideum has a very large number of such strings, many of which are polyglutamine repeats, the same sequence that causes various human neurological disorders in humans, like Huntington’s disease. We show here that D. discoideum coding repeat loci are highly variable among individuals, making D. discoideum a candidate for the most variable proteome. The coding repeat loci are not significantly less variable than similar non-coding triplet repeats. This pattern is consistent with these amino-acid repeats being largely non-functional sequences evolving primarily by mutation and drift.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号