首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Members of three repetitive sequence families were isolated from recombinant λ-genome libraries, and were used to investigate sequence relationships within these families. Studies presented elsewhere show that members of all three repeat sequence families are transcribed tissue-specifically. The thermal stability of intrafamilial heteroduplexes was measured, and the extent of colinearity between related sequences was determined by restriction mapping, heteroduplex visualization, gel blot hybridization, and direct sequencing. One large and very divergent family, named 2108, was shown to consist of an assemblage of many small repeat sequence subfamilies. Each subfamily includes <40 members which are not contiguous in the genome but are very closely related colinear sequence elements several thousand nucleotides in length. The different 2108 subfamilies share only small sequence subelements, which in each subfamily occur in a different linear order and are surrounded by different sequences. A second divergent family consisting of short repetitive sequences, the 2109 family, includes many small internally homologous subfamilies as well. A third family, 2034, displays little internal sequence divergence and no apparent subfamily structure. The repeat sequence subfamilies may be biologically significant units of repetition. Thus specific 2108 subfamilies were shown to be evolutionary conserved to a remarkable degree. Highly homologous 2108 sequences were found shared among sea urchin species which diverged almost 200 million years ago, although only about 10% of the single copy DNA sequences of these species are now homologous enough to crossreact.  相似文献   

2.
Intra- and intergeneric distances derived from maximum-likelihood phylogenetic trees inferred from 254 nuclear ITS rDNA sequences were examined for seven families of euascomycetes, representing five classes. The intra- and intergeneric distances were well separated in most cases, but the distances varied between families. The analysis of the distance distributions provides a powerful tool for identifying certain taxa with highly deviating distances and thus cases of excessive lumping or splitting. Some cases of lumping and splitting found in different families are briefly discussed. The results of the analysis show that the generic concepts differ between the families. The consequences for nomenclature are discussed and a method abandoning binomial nomenclature while keeping the style of species names is recommended to ensure nomenclatural stability.  相似文献   

3.
Moderately repetitive DNA sequences in Lilium (cv Enchantment) which undergo a meiotic-specific repair synthesis during pachytene (P-DNA) were previously shown to exist as families of very low internal sequence divergence. The present study concerns P-DNA sequence preservation among higher plants. The relative abundance of these sequences in a variety of plant species and their divergence relative to Enchantment P-DNA was determined through C0t analysis and thermal denaturation of hybrid duplexes. Nearly all of the P-DNA sequence families of Enchantment were found to be present in the genomes of a number of monocot species and the dicot Vicia faba. Sequence content is highly conserved, with less than 6% divergence between Lilium and distantly related species such as Zea mays and Secale cereale. However, the number of repeats per P-DNA family varies considerably in different species, being particularly low among the Poales. P-DNA differs from most high thermal stability (HTS) sequence families of Enchantment which, although exhibiting a high degree of internal homology, are not present as repetitive DNA in the genomes of the other species examined. For most HTS families, the lack of internal divergence probably reflects their fairly recent introduction into the moderately repetitive DNA class, while P-DNA sequences represent evolutionarily ancient families which are the products of strong selective pressure for an indispensable meiotic function.  相似文献   

4.
Yves Quentin 《Genetica》1994,93(1-3):203-215
The past few years have brought new insight into the evolution of families of retroposons. These are composed of a very small number of master sequences able to duplicate, and a large majority of copies that are inactive for retroposition. During the course of time, successive replacements of master sequences have produced waves of amplification that are recognizable as subfamilies. In the Alu and the B1 families, one can distinguish two evolutionary periods. The first involves only monomeric elements that are now extinguished (fossil elements) and is characterized by deep remodeling of the sequences. This period ends, in primates, with the fusion of a free left and a free right Alu monomer, producing the first modern Alu dimeric element; in rodents it ends with a tandem duplication of 29 bp to create the first modern B1 element. The second period is characterized by a great stability of the master sequences. The observed turn-over of master sequences is still an enigma. However, analysis of the contemporary master sequences and of the oldest master sequences provide some clues. Here, we review the very first stages of the appearance of the Alu and the B1 families in mammalian genomes.  相似文献   

5.
The stability of elements of three different dispersed repeated gene families in the genome of Drosophila tissue culture cells has been examined. Different amounts of sequences homologous to elements of 412, copia and 297 dispersed repeated gene families are found in the genomes of D. melanogaster embryonic and tissue culture cells. In general the amount of these sequences is increased in the cell lines. The additional sequences homologous to 412, copia and 297 occur as intact elements and are dispersed to new sites in the cell culture genome. It appears that these elements can insert at many alternative sites. We also describe a DNA sequence arrangement found in the D. melanogaster embryo genome which appears to result from a transposition of an element of the copia dispersed repeated gene family into a new chromosomal site. The mechanism of insertion of this copia element is precise to within 90 bp and may involve a region of weak sequence homology between the site of insertion and the direct terminal repeats of the copia element.  相似文献   

6.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

7.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

8.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

9.
Repeated sequences cloned from the DNA of the sea urchin S. purpuratus were used as probes to measure the lengths of individual families of repeats. Some probes reassociated much more rapidly with preparations of long repeats than with short repeats while others reassociated more rapidly with short repeats than with long repeats. In this way two of five cloned repeats were shown to represent families with a great majority of sequences in the long class. One represented a family with similar numbers of long and short class members. Two were members of predominantly short class families. — The cloned repeats representing long class families, formed more precise duplexes than those representing short class families. Thermal stability measurements using S. purpuratus or S. franciscanus driver DNA showed that precise repetitive sequences have as great an interspecies sequence difference as the less precise repeats. Thus the precision of many families may result from recent multiplication rather than from selective pressure on the DNA sequences. Measurements of evolutionary frequency change show a clear correlation between the frequency change and the size of families of repeats in S. purpuratus. Comparison with S. franciscanus indicates that many of the large size families in S. purpuratus are those that have grown in size since these two species diverged.  相似文献   

10.
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.  相似文献   

11.
Cloned repetitive sequences from the S. purpuratus genome a few hundred to approximately 1000 nucleotides long were used to investigate the characteristics of individual repetitive sequence families. They were terminally labeled by the kinase procedure and reacted with sheared S. purpuratus DNA. Repetition frequencies were measured for 26 individual families and were found to vary from a few to several thousand copies per genome. Estimates of sequence divergence were made for 18 cloned repeat families by measuring thermal stability of the heteroduplexes formed between the genomic DNA and the cloned fragments, compared with that of the renatured cloned fragments. The difference was <4°C for three of the 18 families, and <10°C for 13 of the 18 families. These 13 repetitive sequence families lack any detectable highly divergent sequence relatives, and the results reported are shown not to change when the renaturation criterion is lowered below 55°C in 0.18 M Na+. Five of the 18 cloned families displayed greater sequence divergence. The average sequence divergence of the total short repetitive sequence fraction of S. purpuratus DNA was found to match closely the average of the divergences of the cloned repeat sequences.  相似文献   

12.
Many bacteria are naturally competent, able to actively transport environmental DNA fragments across their cell envelope and into their cytoplasm. Because incoming DNA fragments can recombine with and replace homologous segments of the chromosome, competence provides cells with a potent mechanism of horizontal gene transfer as well as access to the nutrients in extracellular DNA. This review starts with an introductory overview of competence and continues with a detailed consideration of the DNA uptake specificity of competent proteobacteria in the Pasteurellaceae and Neisseriaceae. Species in these distantly related families exhibit strong preferences for genomic DNA from close relatives, a self-specificity arising from the combined effects of biases in the uptake machinery and genomic overrepresentation of the sequences this machinery prefers. Other competent species tested lack obvious uptake bias or uptake sequences, suggesting that strong convergent evolutionary forces have acted on these two families. Recent results show that uptake sequences have multiple “dialects,” with clades within each family preferring distinct sequence variants and having corresponding variants enriched in their genomes. Although the genomic consensus uptake sequences are 12 and 29 to 34 bp, uptake assays have found that only central cores of 3 to 4 bp, conserved across dialects, are crucial for uptake. The other bases, which differ between dialects, make weaker individual contributions but have important cooperative interactions. Together, these results make predictions about the mechanism of DNA uptake across the outer membrane, supporting a model for the evolutionary accumulation and stability of uptake sequences and suggesting that uptake biases may be more widespread than currently thought.  相似文献   

13.
14.
During macronuclear development in the ciliated protozoan Tetrahymena thermophila, sequence reorganization including sequence loss occurs. Addressing questions about the organization and nucleotide sequence of micronucleus limited regions can lead to insights about mechanisms of DNA rearrangements during macronuclear development as well as mechanisms for the maintenance of the stability of micronucleus-limited sequence families. We have previously identified a moderately repetitive micronucleus-limited sequence family called X-H (family members hybridize to an approximately 450 bp Xbal-HindIII restriction fragment), completely absent from macronuclear DNA. The first member of this family which we isolated is associated with terminal sequences characteristic of a Tel-1 element, a putative micronuclear transposable element. Two additional family members have been isolated which are not closely associated with Tel-1 terminal sequences. We have nucleotide sequence data for three cloned members of the X-H family. This analysis has demonstrated that the longest cloned members of the X-H family share a region of homology of approximately 2,400 bp and are highly conserved, differing only by small insertions or deletions of 100 bp or less. The sequences from one of the sequenced family members flanking the region of homology are themselves mostly micronucleus-limited.  相似文献   

15.
16.
Sequence conservation in Alu evolution   总被引:25,自引:8,他引:17       下载免费PDF全文
A statistical analysis of a set of genomic human Alu elements is based on a published alignment and a recent classification of these sequences. After separation of the Alu sequences into families, the consensus sequences of these families are determined, using the correct weighting of the unidirectional decay of CG-dinucleotides. For, the tenfold greater mutation rate at CG's requires separate consideration of an independent clock at every stage of analysis. The distributions of the substitutions with respect to the new consensus sequences, taking the CG and the non-CG-nucleotide positions separately, lie far closer to the expected distributions than the total diversity. Computer analysis of the folding of RNAs derived from these sequences indicates that RNA secondary structure is conserved among Alu families, suggesting its importance for Alu proliferation and/or function. The folding pattern, further substantiated by a number of compensatory mutations, includes secondary structure domains which are homologous to those observed in 7SL RNA and a defined region of interaction between the two Alu subunits. These results are consistent with a model in which a small number of conserved Alu master genes give rise via retroposition to the numerous copies of Alu pseudogenes, that then diversify by random substitution. The master genes appeared at different periods during evolution giving rise to different families of Alu sequences.  相似文献   

17.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.  相似文献   

18.
Plasmid metagenome nucleotide sequence data were recently obtained from wastewater treatment plant (WWTP) bacteria with reduced susceptibility to selected antimicrobial drugs by applying the ultrafast 454-sequencing technology. The sequence dataset comprising 36,071,493 bases (346,427 reads with an average read length of 104 bases) was analysed for genetic diversity and composition by using a newly developed bioinformatic pipeline based on assignment of environmental gene tags (EGTs) to protein families stored in the Pfam database. Short amino acid sequences deduced from the plasmid metagenome sequence reads were compared to profile hidden Markov models underlying Pfam. Obtained matches evidenced that many reads represent genes having predicted functions in plasmid replication, stability and plasmid mobility which indicates that WWTP bacteria harbour genetically stabilised and mobile plasmids. Moreover, the data confirm a high diversity of plasmids residing in WWTP bacteria. The mobile organic peroxide resistance plasmid pMAC from Acinetobacter baumannii was identified as reference plasmid for the most abundant replication module type in the sequenced sample. Accessory plasmid modules encode different transposons, insertion sequences, integrons, resistance and virulence determinants. Most of the matches to Transposase protein families were identified for transposases similar to the one of the chromate resistance transposon Tn5719. Noticeable are hits to beta-lactamase protein families which suggests that plasmids from WWTP bacteria encode different enzymes possessing beta-lactam-hydrolysing activity. Some of the sequence reads correspond to antibiotic resistance genes that were only recently identified in clinical isolates of human pathogens. EGT analysis thus proofed to be a very valuable method to explore genetic diversity and composition of the present plasmid metagenome dataset.  相似文献   

19.
We propose a model that explains the hierarchical organization of proteins in fold families. The model, which is based on the evolutionary selection of proteins by their native state stability, reproduces patterns of amino acids conserved across protein families. Due to its dynamic nature, the model sheds light on the evolutionary time-scales. By studying the relaxation of the correlation function between consecutive mutations at a given position in proteins, we observe separation of the evolutionary time-scales: at short time intervals families of proteins with similar sequences and structures are formed, while at long time intervals the families of structurally similar proteins that have low sequence similarity are formed. We discuss the evolutionary implications of our model. We provide a "profile" solution to our model and find agreement between predicted patterns of conserved amino acids and those actually observed in nature.  相似文献   

20.
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/ VIDA.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号