首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus ``matches' 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins. Received: 29 January 1998 / Accepted: 14 May 1998  相似文献   

2.
A global alignment of EF-G(2) sequences was corrected by reference to protein structure. The selection of characters eligible for construction of phylogenetic trees was optimized by searching for regions arising from the artifactual matching of sequence segments unique to different phylogenetic domains. The spurious matchings were identified by comparing all sections of the global alignment with a comprehensive inventory of significant binary alignments obtained by BLAST probing of the DNA and protein databases with representative EF-G(2) sequences. In three discrete alignment blocks (one in domain II and two in domain IV), the alignment of the bacterial sequences with those of Archaea–Eucarya was not retrieved by database probing with EF-G(2) sequences, and no EF-G homologue of the EF-2 sequence segments was detected by using partial EF-G(2) sequences as probes in BLAST/FASTA searches. The two domain IV regions (one of which comprises the ADP-ribosylatable site of EF-2) are almost certainly due to the artifactual alignment of insertion segments that are unique to Bacteria and to Archaea–Eucarya. Phylogenetic trees have been constructed from the global alignment after deselecting positions encompassing the unretrieved, spuriously aligned regions, as well as positions arising from misalignment of the G′ and G″ subdomain insertion segments flanking the ``fifth' consensus motif of the G domain (?varsson, 1995). The results show inconsistencies between trees inferred by alternative methods and alternative (DNA and protein) data sets with regard to Archaea being a monophyletic or paraphyletic grouping. Both maximum-likelihood and maximum-parsimony methods do not allow discrimination (by log-likelihood difference and difference in number of inferred substitutions) between the conflicting (monophyletic vs. paraphyletic Archaea) topologies. No specific EF-2 insertions (or terminal accretions) supporting a crenarchaeal–eucaryal clade are detectable in the new EF-G(2) sequence alignment.  相似文献   

3.
Carrying out simultaneous tree-building and alignment of sequence data is a difficult computational task, and the methods currently available are either limited to a few sequences or restricted to highly simplified models of alignment and phylogeny. A method is given here for overcoming these limitations by Bayesian sampling of trees and alignments simultaneously. The method uses a standard substitution matrix model for residues together with a hidden Markov model structure that allows affine gap penalties. It escapes the heavy computational burdens of other models by using an approximation called the ``*' rule, which replaces missing data by a sum over all possible values of variables. The behavior of the model is demonstrated on test sets of globins. Received: 25 May 1998 / Accepted: 8 December 1998  相似文献   

4.
5.
Protein sequences with similarities to Escherichia coli RecA were compared across the major kingdoms of eubacteria, archaebacteria, and eukaryotes. The archaeal sequences branch monophyletically and are most closely related to the eukaryotic paralogous Rad51 and Dmc1 groups. A multiple alignment of the sequences suggests a modular structure of RecA-like proteins consisting of distinct segments, some of which are conserved only within subgroups of sequences. The eukaryotic and archaeal sequences share an N-terminal domain which may play a role in interactions with other factors and nucleic acids. Several positions in the alignment blocks are highly conserved within the eubacteria as one group and within the eukaryotes and archaebacteria as a second group, but compared between the groups these positions display nonconservative amino acid substitutions. Conservation within the RecA-like core domain identifies possible key residues involved in ATP-induced conformational changes. We propose that RecA-like proteins derive evolutionarily from an assortment of independent domains and that the functional homologs of RecA in noneubacteria comprise an array of RecA-like proteins acting in series or cooperatively. Received: 25 October 1996 / Accepted: 31 December 1996  相似文献   

6.
The members of the PKA regulatory subunit family (PKA-R family) were analyzed by multiple sequence alignment and clustering based on phylogenetic tree construction. According to the phylogenetic trees generated from multiple sequence alignment of the complete sequences, the PKA-R family was divided into four subfamilies (types I to IV). Members of each subfamily were exclusively from animals (types I and II), fungi (type III), and alveolates (type IV). Application of the same methodology to the cAMP-binding domains, and subsequently to the region delimited by β-strands 6 and 7 of the crystal structures of bovine RIα and rat RIIβ (the phosphate-binding cassette; PBC), proved that this highly conserved region was enough to classify unequivocally the members of the PKA-R family. A single signature sequence, F–G–E–[LIV]–A–L–[LIMV]–x(3)–[PV]–R–[ANQV]–A, corresponding to the PBC was identified which is characteristic of the PKA-R family and is sufficient to distinguish it from other members of the cyclic nucleotide-binding protein superfamily. Specific determinants for the A and B domains of each R-subunit type were also identified. Conserved residues defining the signature motif are important for interaction with cAMP or for positioning the residues that directly interact with cAMP. Conversely, residues that define subfamilies or domain types are not conserved and are mostly located on the loop that connects α-helix B′ and β strand 7. Received: 2 November 2000/Accepted: 14 June 2001  相似文献   

7.
RNA secondary-structure folding algorithms predict the existence of connected networks of RNA sequences with identical secondary structures. Fitness landscapes that are based on the mapping between RNA sequence and RNA secondary structure hence have many neutral paths. A neutral walk on these fitness landscapes gives access to a virtually unlimited number of secondary structures that are a single point mutation from the neutral path. This shows that neutral evolution explores phenotype space and can play a role in adaptation. Received: 23 December 1995 / Accepted: 17 March 1996  相似文献   

8.
Mitochondrial DNA (mtDNA) sequences are widely used for inferring the phylogenetic relationships among species. Clearly, the assumed model of nucleotide or amino acid substitution used should be as realistic as possible. Dependence among neighboring nucleotides in a codon complicates modeling of nucleotide substitutions in protein-encoding genes. It seems preferable to model amino acid substitution rather than nucleotide substitution. Therefore, we present a transition probability matrix of the general reversible Markov model of amino acid substitution for mtDNA-encoded proteins. The matrix is estimated by the maximum likelihood (ML) method from the complete sequence data of mtDNA from 20 vertebrate species. This matrix represents the substitution pattern of the mtDNA-encoded proteins and shows some differences from the matrix estimated from the nuclear-encoded proteins. The use of this matrix would be recommended in inferring trees from mtDNA-encoded protein sequences by the ML method. Received: 3 May 1995 / Accepted: 31 October 1995  相似文献   

9.
There is an apparent paradox in our understanding of molecular evolution. Current biochemically based models predict that evolutionary trees should not be recoverable for divergences beyond a few hundred million years. In practice, however, trees often appear to be recovered from much older times. Mathematical models, such as those assuming that sites evolve at different rates [including a Γ distribution of rates across sites (RAS)] may in theory allow the recovery of some ancient divergences. However, such models require that each site maintain its characteristic rate over the whole evolutionary period. This assumption, however, contradicts the knowledge that tertiary structures diverge with time, invalidating the rate-constancy assumption of purely mathematical models. We report here that a hidden Markov version of the covarion model can meet both biochemical and statistical requirements for the analysis of sequence data. The model was proposed on biochemical grounds and can be implemented with only two additional parameters. The two hidden parts of this model are the proportion of sites free to vary (covarions) and the rate of interchange between fixed sites and these variable sites. Simulation results are consistent with this approach, providing a better framework for understanding anciently diverged sequences than the standard RAS models. However, a Γ distribution of rates may approximate a covarion model and may possibly be justified on these grounds. The accurate reconstruction of older divergences from sequence data is still a major problem, and molecular evolution still requires mathematical models that also have a sound biochemical basis. Received: 13 February 2001 / Accepted: 22 May 2001  相似文献   

10.
The molecular diversity of inhibitor-resistant TEM (IRT) enzymes was explored using a strategy which involved DNA amplification by polymerase chain reaction (PCR), analysis of restriction fragment length polymorphism (RFLP), and direct nucleotide sequencing. The study of plasmid-borne genes from 27 strains, resistant to amoxicillin and β-lactamase-inhibitor combinations, identified mutations resulting in amino acid change at positions 69, 244, 275, and 276 known to be associated with the IRT phenotype and a mutation at nucleotide position 162 in the promoter region. These mutations were found to lie on two different gene sequences, described here as ``TEM-1B like' and ``TEM-2 like' restriction linkage groups. Further analysis, of nucleotide sequences of promoter and coding regions of the β-lactamases, confirmed that a given mutation causing IRT phenotype could be associated with two different gene sequence frameworks and two different causal mutations could lie on identical gene sequence framework. These data argue in favor of convergent phenotypic evolution of IRT enzymes under the selective pressure imposed by the intensive clinical use of β-lactam–β-lactamase inhibitor combinations. Received: 18 March 1996 / Accepted: 15 July 1996  相似文献   

11.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

12.
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001  相似文献   

13.
An AluI satellite DNA family has been isolated in the genome of the root-knot nematode Meloidogyne chitwoodi. This repeated sequence was shown to be present at approximately 11,400 copies per haploid genome, and represents about 3.5% of the total genomic DNA. Nineteen monomers were cloned and sequenced. Their length ranged from 142 to 180 bp, and their A + T content was high (from 65.7 to 79.1%), with frequent runs of As and Ts. An unexpected heterogeneity in primary structure was observed between monomers, and multiple alignment analysis showed that the 19 repeats could be unambiguously clustered in six subfamilies. A consensus sequence has been deduced for each subfamily, within which the number of positions conserved is very high, ranging from 86.7% to 98.6%. Even though blocks of conserved regions could be observed, multiple alignment of the six consensus sequences did not enable the establishment of a general unambiguous consensus sequence. Screening of the six consensus sequences for evidence of internal repeated subunits revealed a 6-bp motif (AAATTT), present in both direct and inverted orientation. This motif was found up to nine times in the consensus sequences, also with the occurrence of degenerated subrepeats. Along with the meiotic parthenogenetic mode of reproduction of this nematode, such structural features may argue for the evolution of this satellite DNA family either (1) from a common ancestral sequence by amplification followed by mechanisms of sequence divergence, or (2) through independent mutations of the ancestral sequence in isolated amphimictic nematode populations and subsequent hybridization events. Overall, our results suggest the ancient origin of this satellite DNA family, and may reflect for M. chitwoodi a phylogenetic position close to the ancestral amphimictic forms of root-knot nematodes. Received: 23 April 1997 / Accepted: 9 July 1997  相似文献   

14.
The photolyase–blue-light photoreceptor family is composed of cyclobutane pyrimidine dimer (CPD) photolyases, (6-4) photolyases, and blue-light photoreceptors. CPD photolyase and (6-4) photolyase are involved in photoreactivation for CPD and (6-4) photoproducts, respectively. CPD photolyase is classified into two subclasses, class I and II, based on amino acid sequence similarity. Blue-light photoreceptors are essential light detectors for the early development of plants. The amino acid sequence of the receptor is similar to those of the photolyases, although the receptor does not show the activity of photoreactivation. To investigate the functional divergence of the family, the amino acid sequences of the proteins were aligned. The alignment suggested that the recognition mechanisms of the cofactors and the substrate of class I CPD photolyases (class I photolyases) are different from those of class II CPD photolyases (class II photolyases). We reconstructed the phylogenetic trees based on the alignment by the NJ method and the ML method. The phylogenetic analysis suggested that the ancestral gene of the family had encoded CPD photolyase and that the gene duplication of the ancestral proteins had occurred at least eight times before the divergence between eubacteria and eukaryotes. Received: 23 October 1996 / Accepted: 1 April 1997  相似文献   

15.
We have reconstructed the evolution of the anciently derived kinesin superfamily using various alignment and tree-building methods. In addition to classifying previously described kinesins from protists, fungi, and animals, we analyzed a variety of kinesin sequences from the plant kingdom including 12 from Zea mays and 29 from Arabidopsis thaliana. Also included in our data set were four sequences from the anciently diverged amitochondriate protist Giardia lamblia. The overall topology of the best tree we found is more likely than previously reported topologies and allows us to make the following new observations: (1) kinesins involved in chromosome movement including MCAK, chromokinesin, and CENP-E may be descended from a single ancestor; (2) kinesins that form complex oligomers are limited to a monophyletic group of families; (3) kinesins that crosslink antiparallel microtubules at the spindle midzone including BIMC, MKLP, and CENP-E are closely related; (4) Drosophila NOD and human KID group with other characterized chromokinesins; and (5) Saccharomyces SMY1 groups with kinesin-I sequences, forming a family of kinesins capable of class V myosin interactions. In addition, we found that one monophyletic clade composed exclusively of sequences with a C-terminal motor domain contains all known minus end-directed kinesins. Received: 20 February 2001 / Accepted: 5 June 2001  相似文献   

16.
We have previously shown that several multicopy gene families within the major histocompatibility complex (MHC) arose from a process of segmental duplication. It has also been observed that retroelements play a role in generating diversity within these duplicated segments. The objective of this study was to compare the genomic organization of a gene duplication within another multicopy gene family outside the MHC. Using new continuous genomic sequence encompassing the APOE-CII gene cluster, we show that APOCI and its pseudogene, APOCI′, are contained within large duplicated segments which include sequences from the hepatic control region (HCR). Flanking Alu sequences are observed at both ends of the duplicated unit, suggesting a possible role in the integration of these segments. As observed previously within the MHC, the major differences between the segments are the insertion of sequences (approximately 200–1000 bp in length), consisting predominantly of Alu sequences. Ancestral retroelements also contribute to the generation of sequence diversity between the segments, especially within the 3′ poly(A) tract of Alu sequences. The exonic and regulatory sequences of the APOCI and HCR loci show limited sequence diversity, with exon 3 being an exception. Finally, the typing of pre- and postduplication Alus from both segments indicates an estimated time of duplication of approximately 37 million years ago (mya), some time prior to the separation of Old and New World monkeys. Received: 17 July 1999 / Accepted: 6 November 1999  相似文献   

17.
Complete sequences of seven protein coding genes from Penaeus notialis mitochondrial DNA were compared in base composition and codon usage with homologous genes from Artemia franciscana and four insects. The crustacean genes are significantly less A + T-rich than their counterpart in insects and the pattern of codon usage (ratio of G + C-rich versus A + T-rich codon) is less biased. A phylogenetic analysis using amino acid sequences of the seven corresponding polypeptides supports a sister-taxon status for mollusks–annelid and arthropods. Furthermore, a distance matrix-based tree and two most-parsimonious trees both suggest that crustaceans are paraphyletic with respect to insects. This is also supported by the inclusion of Panulirus argus COII (complete) and COI and COIII (partial) sequence data. From analysis of single and combined genes to infer phylogenies, it is observed that obtained from single genes are not well supported in most topologies cases and notably differ from that of the tree based on all seven genes. Received: 25 August 1998 / Accepted: 8 March 1999  相似文献   

18.
Sequence analysis of a 237 kb genomic fragment from the central region of the MHC has revealed that the HLA-B and HLA-C genes are contained within duplicated segments peri-B (53 kb) and peri-C (48 kb), respectively, and separated by an intervening sequence (IF) of 30 kb. The peri-B and peri-C segments share at least 90% sequence homology except when interrupted by insertions/deletions including Alu, L1, an endogenous retrovirus, and pseudogenes. The sequences of peri-B, IF, and peri-C were searched for the presence of Alu elements to use as markers of evolution, chromosomal rearrangements, and polymorphism. Of 29 Alu elements, 14 were identified in peri-B, 11 in peri-C, and 4 in IF. The Alu elements in peri-B and peri-C clustered phylogenetically into two clades which were classified as ``preduplication' and ``postduplication' clades. Four Alu J elements that are shared by peri-B and peri-C and are flanked by homologous sequences in their paralogous locations, respectively, clustered into a ``preduplication' clade. By contrast, the majority of Alu elements, which are unique to either peri-B or peri-C, clustered into a postduplication clade together with the Alu consensus subfamily members ranging from platyrrhine-specific (Spqxcg) to catarrhine-specific Alu sequences (Y). The insertion of platyrrhine-specific Alu elements in postduplication locations of peri-B and peri-C implies that these two segments are the products of a duplication which occurred in primates prior to the divergence of the New World primate from the human lineage (35–44 mya). Examination of the paralogous Alu integration sites revealed that 9 of 14 postduplication Alu sequences have produced microsatellites of different length and sequence within the Alu 3′-poly A tail. The present analysis supports the hypothesis that HLA-B and HLA-C genes are products of an extended segmental duplication between 44 and 81 million years ago (mya), and that subsequent diversification of both genomic segments occurred because of the mobility and mutation of retroelements such as Alu repeats. Received: 21 May 1997 / Accepted: 9 July 1997  相似文献   

19.
Recent analyses of genes encoding proteins typical for multicellularity, especially adhesion molecules and receptors, favor the conclusion that all metazoan phyla, including the phylum Porifera (sponges), are of monophyletic origin. However, none of these data includes cDNA encoding a protein from the sponge class Hexactinellida. We have now isolated and characterized the cDNA encoding a protein kinase C, belonging to the C subfamily (cPKC), from the hexactinellid sponge Rhabdocalyptus dawsoni. The two conserved regions, the regulatory part with the pseudosubstrate site, the two zinc fingers, and the C2 domain, as well as the catalytic domain were used for phylogenetic analyses. Sequence alignment and construction of a phylogenetic tree from the catalytic domains revealed that the yeast Saccharomyces cerevisiae and the protozoan Trypanosoma brucei are at the base of the tree, while the hexactinellid R. dawsoni branches off first among the metazoan sequences; the other two classes of the Porifera, the Calcarea (the sequence from Sycon raphanus was used) and the Demospongiae (sequences from Geodia cydonium and Suberites domuncula were used), branch off later. The statistically robust tree also shows that the two cPKC sequences from the higher invertebrates Drosophila melanogaster and Lytechinus pictus are most closely related to the calcareous sponge. This finding was also confirmed by comparing the regulatory part of the kinase gene. We suggest, that (i) within the phylum Porifera, the class Hexactinellida diverged first from a common ancestor to the Calcarea and the Demospongiae, which both appeared later, and (ii) the higher invertebrates are more closely related to the calcareous sponges. Received: 6 August 1997 / Accepted: 24 October 1997  相似文献   

20.
Intraindividual and Interspecies Variation in the 5S rDNA of Coregonid Fish   总被引:5,自引:0,他引:5  
This study was designed to characterize further the nontranscribed intergenic spacers (NTSs) of the 5S rRNA genes of fish and evaluate this marker as a tool for comparative studies. Two members of the closely related North American Great Lakes cisco species complex (Coregonus artedi and C. zenithicus) were chosen for comparison. Fluorescence in situ hybridization found the ciscoes to have a single multicopy 5S locus located in a C band-positive region of the largest submetacentric chromosome. The entire NTS was amplified from the two species by polymerase chain reaction with oligonucleotide primers anchored in the conserved 5S coding region. Complete sequences were determined for 25 clones from four individuals representing two discrete NTS length variants. Sequence analysis found the length variants to result from presence of a 130-bp direct repeat. No two sequences from a single fish were identical. Examination of sequence from the coding region revealed two types of 5S genes in addition to pseudogenes. This suggests the presence of both somatic and germline (oocyte) forms of the 5S gene in the genome of Coregonus. The amount of variation present among NTS sequences indicates that accumulation of variation (mutation) is greater in this multicopy gene than is gene conversion (homogenization). The high level of sequence variation makes the 5S NTS an inappropriate DNA sequence for comparisons of closely related taxa. Received: 22 August 1997 / Accepted: 31 October 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号