首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.  相似文献   

2.
Complementary strands of adeno-associated virus DNA labeled with 32P at the 5' ends were separated and then self-annealed to form single-stranded circles stabilized by hydrogen bonds between the complementary sequences in the inverted terminal repetitions. We have previously shown that there are two distinct sequences in the terminal repetition which represent an inversion of the first 125 nucleotides (E. Lusby et al., J. Virol. 34:402-409, 1980; I. S. Spear et al., Virology 24:627-634, 1977). Base pairing between terminal sequences of the same orientation leads to a normal double helical structure. If sequences of the opposite orientation pair, an aberrant secondary structure is formed. HpaII digestion of the self-annealed, single-stranded circles led to labeled terminal fragments that corresponded both to those generated from termini of a normal double helical structure and those generated from an aberrant terminal secondary structure. Thus, the orientation of the terminal repetition at one end of the genome is not influenced by the orientation at the other end.  相似文献   

3.
4.
5.
The structure of the endogenous murine leukemia virus (MuLV) sequences of NIH/Swiss mice was analyzed by restriction endonuclease digestion, gel electrophoresis, and hybridization to an MuLV nucleic acid probe. Digestion of mouse DNA with certain restriction endonucleases revealed two classes of fragments. A large number of fragments (about 30) were present at a relatively low concentration, indicating that each derived from a sequence present once in the mouse genome. A smaller number of fragments (one to five) were present at a much higher concentration and must have resulted from sequences present multiple times in the mouse genome. These results indicated that the endogenous MuLV sequences represent a family of dispersed repetitive sequences. Hybridization of these same digested mouse DNAs to nucleic acid probes representing different portions of the MuLV genome allowed construction of a map of the sites where restriction endonucleases cleave the endogenous MuLV sequences. Several independent recombinant DNA clones of endogenous MuLV sequences have been isolated from C3H mice (Roblin et al., J. Virol. 43:113-126, 1982). Analysis of these sequences shows that they have the structure of MuLV proviruses. The sites at which restriction endonucleases cleave within these proviruses appeared to be similar or identical to the sites at which these nucleases cleaved within the MuLV sequences of NIH/Swiss mice. This identity was confirmed by parallel electrophoresis. We conclude that the apparently complex pattern of endogenous MuLV sequences of NIH/Swiss mice consists largely of only two kinds of provirus, each repeated multiple times at dispersed sites in the mouse genome.  相似文献   

6.
A secondary structure has been predicted for the C termini of the fibrinogen β and γ chains from an aligned set of homologous protein sequences using a transparent method that extracts conformational information from patters of variation and conservation, parsing strings, and patterns of amphiphilicity. The structure is modeled to form two domains, the first having a core parallel sheet flanked on one side by at least two helices and on the other by an antiparallel amphiphilic sheet, with an additional helix connecting the two sheets. The second domain is built entirely from β strands. © 1997 Wiley-Liss, Inc.  相似文献   

7.
8.
9.
10.
The complete nucleotide sequences of the luxA to luxE genes, as well as the flanking regions, were determined for the lux operons of two Xenorhabdus luminescens strains isolated from insects and humans. The nucleotide sequences of the corresponding lux genes (luxCDABE) were 85 to 90% identical but completely diverged 350 bp upstream of the first lux gene (luxC) and immediately downstream of the last lux gene (luxE). These results show that the luxG gene found immediately downstream of luxE in luminescent marine bacteria is missing at this location in terrestrial bacteria and raise the possibility that the lux operons are at different positions in the genomes of the X. luminescens strains. Four enteric repetitive intergenic consensus (ERIC) or intergenic repetitive unit (IRU) sequences of 126 bp were identified in the 7.7-kbp DNA fragment from the X.luminescens strain isolated from humans, providing the first example of multiple ERIC structures in the same operon including two ERIC structures at the same site. Only a single ERIC structure between luxB and luxE is present in the 7-kbp lux DNA from insects. Analysis of the genomic DNAs from five X. luminescens strains or isolates by polymerase chain reaction has demonstrated that an ERIC structure is between luxB and luxE in all of the strains, whereas only the strains isolated from humans had an ERIC structure between luxD and luxA. The results indicate that there has been insertion and/or deletion of multiple 126-bp repetitive elements in the lux operons of X.luminescens during evolution.  相似文献   

11.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

12.
13.
Acid-soluble collagens were prepared from connective tissues in the abalone Haliotis discus foot and adductor muscles with limited proteolysis using pepsin. Collagen preparation solubilized with 1% pepsin contained two types of alpha-chains which were different in their N-terminal amino acid sequences. Accordingly, two types of full-length cDNAs coding for collagen proalpha-chains were isolated from the foot muscle of the same animal and these proteins were named Hdcols (Haliotis discus collagens) 1alpha and 2alpha. The two N-terminal amino acid sequences of the abalone pepsin-solubilized collagen preparation corresponded to either of the two sequences deduced from the cDNA clones. In addition, several tryptic peptides prepared from the pepsin-solubilized collagen and fractionated by HPLC showed N-terminal amino acid sequences identical to those deduced from the two cDNA clones. Hdcols 1alpha and 2alpha consisted of 1378 and 1439 amino acids, respectively, showing the primary structure typical to those of fibril-forming collagens. The N-terminal propeptides of the two collagen proalpha-chains contained cysteine-rich globular domains. It is of note that Hdcol 1alpha completely lacked a short Gly-X-Y triplet repeat sequence in its propeptide. An unusual structure such as this has never before been reported for any fibril-forming collagen. The main triple-helical domains for both chains consisted of 1014 amino acids, where a supposed glycine residue in the triplet at the 598th position from the N-terminus was replaced by alanine in Hdcol 1alpha and by serine in Hdcol 2alpha. Both proalpha-chains of abalone collagens contained six cysteine residues in the carboxyl-terminal propeptide, lacking two cysteine residues usually found in vertebrate collagens. Northern blot analysis demonstrated that the mRNA levels of Hdcols 1alpha and 2alpha in various tissues including muscles were similar to each other.  相似文献   

14.
Covalent ligation studies on the human telomere quadruplex   总被引:5,自引:4,他引:1  
Qi J  Shafer RH 《Nucleic acids research》2005,33(10):3185-3192
Recent X-ray crystallographic studies on the human telomere sequence d[AGGG(TTAGGG)3] revealed a unimolecular, parallel quadruplex structure in the presence of potassium ions, while earlier NMR results in the presence of sodium ions indicated a unimolecular, antiparallel quadruplex. In an effort to identify and isolate the parallel form in solution, we have successfully ligated into circular products the single-stranded human telomere and several modified human telomere sequences in potassium-containing solutions. Using these sequences with one or two terminal phosphates, we have made chemically ligated products via creation of an additional loop. Circular products have been identified by polyacrylamide gel electrophoresis, enzymatic digestion with exonuclease VII and electrospray mass spectrometry in negative ion mode. Optimum pH for the ligation reaction of the human telomere sequence ranges from 4.5 to 6.0. Several buffers were also examined, with MES yielding the greatest ligation efficiency. Human telomere sequences with two phosphate groups, one each at the 3′ and 5′ ends, were more efficient at ligation, via pyrophosphate bond formation, than the corresponding sequences with only one phosphate group, at the 5′ end. Circular dichroism spectra showed that the ligation product was derived from an antiparallel, single-stranded guanine quadruplex rather than a parallel single-stranded guanine quadruplex structure.  相似文献   

15.
The amino-terminal amino acid sequences of the pili proteins from four antigenically dissimilar strains of Neisseria gonorrhoeae, from Neisseria meningiditis, and from Escherichia coli were determined. Although antibodies raised to the pili protein from a given strain of gonococcus cross-reacted poorly or not at all with each of the other strains tested, the amino-terminal sequences were all identical. The meningococcal protein sequence was also identical with the gonococcal sequence through 29 residues, and this sequence was highly homologous to the sequence of the pili protein of Moraxella nonliquifaciens determined by other workers. However, the sequence of the pili protein from E. coli showed no similarity to the other sequences. The gonococcal and meningococcal proteins have an unusual amino acid at the amino termini, N-methylphenylalanine. In addition, the first 24 residues of these proteins have only two hydrophilic residues (at positions 2 and 5) with the rest being predominantly aliphatic hydrophobic amino acids. The preservation of this highly unusual sequence among five antigenically dissimilar Neisseria pili proteins implies a role for the amino-terminal structure in pilus function. The amino terminus may be directly or indirectly (through preservation of tertiary structure) important for the pilus function of facilitating attachment of bacteria to human cells.  相似文献   

16.
We have asked whether coding segments of nucleic acids generate amino acid sequences which have an antisense relationship to other amino acid sequences in the same chain (i.e. ''Internal Antisense''), and if so, could the internal antisense content be related to the structure of the encoded protein? Computer searches were conducted with the coding sequences for 132 proteins. The result for each search of a specific sequence was compared to the mean result obtained from 1000 randomly assembled nucleic acid chains whose length and base composition were identical to that of the native sequences. The study was conducted in all three reading frames. The normal reading frame (frame one) was found to be contain lower amounts of internal antisense than the randomly assembled chains, whereas the frame two results were much higher. The internal antisense content in frame three was not significantly different from that in the random chains. The amount of internal antisense in frames two and three was correlated with the GC content at the center position of the codons in that frame, but this correlation was absent in frame one. No correlation with chain length was found. Qualitatively similar results were obtained when the random model was limited to retain the same purine/pyrimidine ratio as the native chains at each position in the codons, but in this case the internal antisense in frame three was also significantly greater than the computer-generated sequences. The results suggest that the internal antisense content in the correct reading frame has a qualitatively different origin from that in the other two frames. The high amount in frames two and three is apparently an artifact resulting from the asymmetric distribution of G and C in the codons, while the low amount in frame one may suggest evolutionary selection against internal antisense. Thus, the results do not support a relationship between internal antisense and protein structure.  相似文献   

17.
Cellulomonas fimi produces an endoglucanase and an exoglucanase which bind strongly to cellulose. Each enzyme contains three distinct regions: a short sequence of about 20 amino acids containing only proline and threonine (the Pro-Thr box); an irregular region, rich in hydroxyamino acids, of low charge density, and which is predicted to have little secondary structure; and an ordered region of higher charge density which contains a potential active site, and which is predicted to have secondary structure. The Pro-Thr box is conserved almost perfectly in the two enzymes. The irregular regions are 50% conserved, and the conserved sequences include four Asn-Xaa-Ser/Thr sites. The ordered regions appear not to be conserved, but the potential active sites both have the sequence Glu-Xaa7-Asn-Xaa6-Thr; they occur at widely separated sites in the two regions. The order of the regions is reversed in the two enzymes: irregular-Pro-Thr box-ordered in the endoglucanase; ordered-Pro-Thr box-irregular in the exoglucanase. The genes for the two enzymes appear to have arisen by shuffling of two conserved sequences and either one or two other sequences.  相似文献   

18.
We describe a novel chromosome structure in which telomeric sequences are present interstitially, at the apparent breakpoint junctions of structurally abnormal chromosomes. In the linear chromosomes with interstitial telomeric sequences, there were three sites of hybridization of the telomere consensus sequence within each derived chromosome: one at each terminus and one at the breakpoint junction. Telomeric sequences also were observed within a ring chromosome. The rearrangements examined were constitutional chromosome abnormalities with a breakpoint assigned to a terminal band. In each case (with the exception of the ring chromosome), an acentric segment of one chromosome was joined to the terminus of an apparently intact recipient chromosome. One case exhibited apparent instability of the chromosome rearrangement, resulting in somatic mosaicism. The rearrangements described here differ from the telomeric associations observed in certain tumors, which appear to represent end-to-end fusion of two or more intact chromosomes. The observed interstitial telomeric sequences appear to represent nonfunctional chromosomal elements, analogous to the inactivated centromeres observed in dicentric chromosomes.  相似文献   

19.
RNase MRP is a ribonucleoprotein endoribonuclease involved in eukaryotic pre-rRNA processing. The enzyme possesses an RNA subunit, structurally related to that of RNase P RNA, that is thought to be catalytic. RNase MRP RNA sequences from Saccharomycetaceae species are structurally well defined through detailed phylogenetic and structural analysis. In contrast, higher eukaryote MRP RNA structure models are based on comparative sequence analysis of only five sequences and limited probing data. Detailed structural analysis of the Homo sapiens MRP RNA, entailing enzymatic and chemical probing, is reported. The data are consistent with the phylogenetic secondary structure model and demonstrate unequivocally that higher eukaryote MRP RNA structure differs significantly from that reported for Saccharomycetaceae species. Neither model can account for all of the known MRP RNAs and we thus propose the evolution of at least two subsets of RNase MRP secondary structure, differing predominantly in the predicted specificity domain.  相似文献   

20.
In sheep's fescue, Festuca ovina, genes coding for the cytosolic enzyme phosphoglucose isomerase, PGIC, are not only found at the standard locus, PgiC1, but also at a segregating second locus, PgiC2. We have used PCR-based sequencing to characterize the molecular structure and evolution of five PgiC1 and three PgiC2 alleles in F. ovina. The three PgiC2 alleles were complex in that they carried two gene copies: either two active genes or one active and one pseudogene. All the PgiC2 sequences were very similar to each other but highly diverged from the five PgiC1 sequences. We also sequenced PgiC genes from several other grass species. Phylogenetic analysis of these sequences indicates that PgiC2 has introgressed into F. ovina from the distant genus Poa. Such an introgression may, for example, follow from a non-standard fertilization with more than one pollen grain, or a direct horizontal gene transfer mediated by a plant virus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号