首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
D J Hayzer  E Brinson  M S Runge 《Gene》1992,117(2):277-278
Polymerase chain reaction amplification of a cDNA derived from rat aortic smooth muscle cells, using sequences from conserved regions of the intramembrane domains of adrenergic receptors as primers, yielded the clone, rat8. This clone possesses a high degree of sequence similarity to a series of human interferon (IFN)-inducible genes. The rat8 sequence is 70% similar to that derived from the human alpha-IFN-induced gene, 9-27; there is 66% similarity between the deduced amino acid sequences encoded by the rat and the human genes. The rat homologue hybridizes with many bands in Southern analysis of rat DNA, suggesting that it is a member of a large multigene family.  相似文献   

2.
The suggestion that the ethanol regulatory protein from Aspergillus has its evolutionary origin in a gene fusion between aldehyde and alcohol dehydrogenase genes (Hawkins AR, Lamb HK, Radford A, Moore JD, 1994, Gene 146:145-158) has been tested by profile analysis with aldehyde and alcohol dehydrogenase family profiles. We show that the degree and kind of similarity observed between these profiles and the ethanol regulatory protein sequence is that expected from random sequences of the same composition. This level of similarity fails to support the suggested gene fusion.  相似文献   

3.
Multifunctional yeast high-copy-number shuttle vectors.   总被引:187,自引:0,他引:187  
  相似文献   

4.
MOTIVATION: The global alignment of protein sequence pairs is often used in the classification and analysis of full-length sequences. The calculation of a Z-score for the comparison gives a length and composition corrected measure of the similarity between the sequences. However, the Z-score alone, does not indicate the likely biological significance of the similarity. In this paper, all pairs of domains from 250 sequences belonging to different SCOP folds were aligned and Z-scores calculated. The distribution of Z-scores was fitted with a peak distribution from which the probability of obtaining a given Z-score from the global alignment of two protein sequences of unrelated fold was calculated. A similar analysis was applied to subsequence pairs found by the Smith-Waterman algorithm. These analyses allow the probability that two protein sequences share the same fold to be estimated by global sequence alignment. RESULTS: The relationship between Z-score and probability varied little over the matrix/gap penalty combinations examined. However, an average shift of +4.7 was observed for Z-scores derived from global alignment of locally-aligned subsequences compared to global alignment of the full-length sequences. This shift was shown to be the result of pre-selection by local alignment, rather than any structural similarity in the subsequences. The search ability of both methods was benchmarked against the SCOP superfamily classification and showed that global alignment Z-scores generated from the entire sequence are as effective as SSEARCH at low error rates and more effective at higher error rates. However, global alignment Z-scores generated from the best locally-aligned subsequence were significantly less effective than SSEARCH. The method of estimating statistical significance described here was shown to give similar values to SSEARCH and BLAST, providing confidence in the significance estimation. AVAILABILITY: Software to apply the statistics to global alignments is available from http://barton.ebi.ac.uk. CONTACT: geoff@ebi.ac.uk  相似文献   

5.
E M Amrhein 《Cryobiology》1975,12(4):340-352
This paper gives a short introduction to the modes of crystallization of polymers and shows, by a series of micrographs, that the resulting morphology is very similar to that of ice crystals growing from aqueous solutions. The similarity is explained by similar conditions of nucleation and growth, leading in both cases to a hindered and partial crystallization. It is shown that the resulting crystal patterns can be qualitatively explained by estimating the relative growth rates in different directions on a developing crystal face as a function of supercooling and concentration.  相似文献   

6.
Hou Y  Hsu W  Lee ML  Bystroff C 《Proteins》2004,57(3):518-530
Remote homology detection refers to the detection of structural homology in proteins when there is little or no sequence similarity. In this article, we present a remote homolog detection method called SVM-HMMSTR that overcomes the reliance on detectable sequence similarity by transforming the sequences into strings of hidden Markov states that represent local folding motif patterns. These state strings are transformed into fixed-dimension feature vectors for input to a support vector machine. Two sets of features are defined: an order-independent feature set that captures the amino acid and local structure composition; and an order-dependent feature set that captures the sequential ordering of the local structures. Tests using the Structural Classification of Proteins (SCOP) 1.53 data set show that the SVM-HMMSTR gives a significant improvement over several current methods.  相似文献   

7.
8.
We consider to construct 4L-components vectors for a DNA primary sequence based on the L-tuple. For two DNA sequences, using the corresponding vectors, we construct a set of L × L matrices called related matrix. The mathematical characterization from the constructed matrices have been selected to characterize the degree of similarity between the two DNA sequences. The search for similar sequences of a query sequence from a database of 39 library sequences and the construction of phylogenetic tree of H5N1 avian influenza virus illustrate the utility of the matrices for DNA sequences.  相似文献   

9.
Scaling, as defined here, refers to the precise identification of those structural and functional aspects of selected systems that are size-independent, over some specified size-range. Small and large instances of such systems are said to be similar in respect to those aspects. Physicists and engineers have developed an elaborate methodology for identifying quantitative similarity criteria applicable to physical systems.These criteria are usually derived by dimensional analysis of physical laws pertinent to a given system. Knowledge of similarity criteria allows one to predict quantitatively the behavior of a large-scale prototype from measurements made on a small-scale model (e.g., in a wind tunnel). Numerous workers have sought to apply this elegant methodology to scale-up in biology. After briefly reviewing dimensional analysis, scaling, and modeling, as deployed in physics and engineering, this article discusses several well-known examples of their application to bioscaling problems (chiefly in mammals) and gives reasons for doubting their validity. It concludes that this methodology is unlikely to provide explanations applicable to scale-up in diverse species.  相似文献   

10.
Koike R  Kinoshita K  Kidera A 《Proteins》2007,66(3):655-663
Dynamic programming (DP) and its heuristic algorithms are the most fundamental methods for similarity searches of amino acid sequences. Their detection power has been improved by including supplemental information, such as homologous sequences in the profile method. Here, we describe a method, probabilistic alignment (PA), that gives improved detection power, but similarly to the original DP, uses only a pair of amino acid sequences. Receiver operating characteristic (ROC) analysis demonstrated that the PA method is far superior to BLAST, and that its sensitivity and selectivity approach to those of PSI-BLAST. Particularly for orphan proteins having few homologues in the database, PA exhibits much better performance than PSI-BLAST. On the basis of this observation, we applied the PA method to a homology search of two orphan proteins, Latexin and Resuscitation-promoting factor domain. Their molecular functions have been described based on structural similarities, but sequence homologues have not been identified by PSI-BLAST. PA successfully detected sequence homologues for the two proteins and confirmed that the observed structural similarities are the result of an evolutional relationship.  相似文献   

11.
Aita T  Husimi Y  Nishigaki K 《Bio Systems》2011,106(2-3):67-75
To measure the similarity or dissimilarity between two given biological sequences, several papers proposed metrics based on the "word-composition vector". The essence of these metrics is as follows. First, we count the appearance frequencies of all the K-tuple words throughout each of two given sequences. Then, the two given sequences are transformed into their respective word-composition vectors. Next, the distance metrics, for example the angle between the two vectors, are calculated. A significant issue is to determine the optimal word size K. With a mathematical model of mutational events (including substitutions, insertions, deletions and duplications) that occur in sequences, we analyzed how the angle between the composition vectors depends on the mutational events. We also considered the optimal word size (=resolution) from our original approach. Our results were verified by computational experiments using artificially generated sequences, amino acid sequences of hemoglobin and nucleotide sequences of 16S ribosomal RNA.  相似文献   

12.
Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST  相似文献   

13.
The human alpha-fetoprotein (AFP) gene was isolated into three overlapping clones in bacteriophage lambda vectors and its sequence organization analyzed by restriction endonuclease mapping and nucleotide sequencing. The human AFP gene is about 20 kilobase pairs long and contains 15 exons and 14 introns. The overall organization of the human AFP gene is similar to that of the mouse AFP gene, with all but two exons showing identical sizes. Nucleotide sequences at all exon/intron junctions display similarity to the consensus boundary sequence (Breathnach, R., and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383), with the GT-AG rule applied to the splicing point. The cap site maps 44 nucleotides upstream from the translation initiation site. The "TATA box" is located 27 nucleotides upstream from the putative cap site and is flanked by sequences with dyad symmetry. The TATA box can thus be placed in the loop portion of a possible stem-loop structure formed by intrastrand base-pairing. Other characteristic nucleotide sequences in the 5' flanking region include a CCAAC pentamer, a 14-base pair (bp) enhancer-like sequence, and a 9-bp sequence homologous to the glucocorticoid responsive element. A long (90 bp) direct repeat and several alternating purine/pyrimidine sequences are also present in the 5' flanking region. A 736-bp sequence of the 5' flanking region adjacent to the cap site of the human AFP gene shows a 61% similarity with the corresponding region of the mouse AFP gene. There are two Alu family sequences and two poly(dT-dG) repeats in the human AFP gene that show different distribution patterns from those in the mouse AFP gene.  相似文献   

14.
Cyanobacterial tRNA(Leu) (UAA) intron sequences from natural populations of Nostoc and other cyanobacteria were compared. Variation between the different introns was not randomly distributed but strongly restricted by the secondary and tertiary structure of the intron. Although all Nostoc sequences examined shared high similarity, differences were observed in one stem-loop. This stem-loop could be divided into two classes, both built up from two base pairing heptanucleotide repeats. Size variation was primarily caused by different numbers of repeats, but some strains also contained additional sequences in this stem-loop not following the heptanucleotide repeat motif. Several sequences showing similarity with these additional sequences were identified in the Nostoc punctiforme genome. Furthermore, the regions flanking these sequences contained the same, or similar, heptanucleotide repeats as those flanking the corresponding sequences in the intron. It is proposed that both slipped strand mispairing during replication and homologous recombination among different loci in the genome are important processes causing variation between introns.  相似文献   

15.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

16.
We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution.  相似文献   

17.
Deep-level diagnostic value of the rDNA-ITS region   总被引:14,自引:0,他引:14  
The similarity of certain reported angiosperm rDNA internal transcribed spacer (ITS) region sequences to those of green algae prompted our analysis of the deep-level phylogenetic signal in the highly conserved but short 5.8S and hypervariable ITS2 sequences. We found that 5.8S sequences yield phylogenetic trees similar to but less well supported than those generated by a ca. 10-fold longer alignment from rDNA-18S sequences, as well as independent evidence. We attribute this result to our finding that, compared to 18S, the 5.8S has a higher proportion of sites subject to vary and greater among-site substitution rate homogeneity. We also determined that our phylogenetic results are not likely affected by intramolecular compensatory mutation to maintain RNA secondary structure nor by evident systematic biases in base composition. Despite historical homology, there appears to be no ITS2 primary sequence similarity shared sufficient similarity to cluster correctly on the basis of alignability. Our results indicate that groups, however, share sufficient similarity to cluster correctly on the basis of alignability. Our results indicate that ITS region sequences can diagnose organismal origins and phylogenetic relationships at many phylogenetic levels and provide a useful paradigm for molecular evolutionary study.   相似文献   

18.
Yoo SY  Bomblies K  Yoo SK  Yang JW  Choi MS  Lee JS  Weigel D  Ahn JH 《Planta》2005,221(4):523-530
Positive selection of transgenic plants is essential during plant transformation. Thus, strong promoters are often used in selectable marker genes to ensure successful selection. Many plant transformation vectors, including pPZP family vectors, use the 35S promoter as a regulatory sequence for their selectable marker genes. We found that the 35S promoter used in a selectable marker gene affected the expression pattern of a transgene, possibly leading to a misinterpretation of the result obtained from transgenic plants. It is likely that the 35S enhancer sequence in the 35S promoter is responsible for the interference, as in the activation tagging screen. This affected expression mostly disappeared in transgenic plants generated using vectors without the 35S sequences within their T-DNA region. Therefore, we suggest that caution should be used in selecting a plant transformation vector and in the interpretation of the results obtained from transgenic approaches using vectors carrying the 35S promoter sequences within their T-DNA regions.  相似文献   

19.
Although DNA-recognition sequences are among the most important characteristics of restriction enzymes and their corresponding methylases, determination of the recognition sequence of a Type-I restriction enzyme is a complicated procedure. To facilitate this process we have previously developed plasmid R-M tests and the computer program RM search. To specifically identify Type-I isoschizomers, we engineered a pUC19 derivative plasmid, pTypeI, which contains all of the 27 Type-I recognition sequences in a 248-bp DNA fragment. Furthermore, a series of 27 plasmids (designated 'reference plasmids'), each containing a unique Type-I recognition sequence, were also constructed using pMECA, a derivative of pUC vectors. In this study, we tried those vectors on 108 clinical E. coli strains and found that 48 strains produced isoschizomers of Type I enzymes. A detailed study of 26 strains using these 'reference plasmids' revealed that they produce seven different isoschizomers of the prototypes: EcoAI, EcoBI, EcoKI, Eco377I, Eco646I, Eco777I and Eco826I. One strain EC1344 produces two Type I enzymes (EcoKI and Eco377I).  相似文献   

20.
The entire genome of single component geminiviruses such as maize streak virus (MSV) consists of a single-stranded circular DNA of ~2.7 kb. Although this size is sufficient to encode only three average sized proteins, the virus is capable of causing severe disease of many monocots with symptoms of chlorosis and stunting. We have identified viral gene functions essential for systemic spread and symptom development during MSV infection. Deletions and gene replacement mutants were created by site-directed mutagenesis and insertion between flanking MSV or reporter gene sequences contained in Agrobacterium T-DNA derived vectors. Following Agrobacterium-mediated inoculation of maize seedlings, the mutated MSV DNAs were excised from these binary vectors by homologous recombination within the flanking sequences. Our analyses show that the capsid gene of MSV, while not required for replication, is essential for systemic spread and subsequent disease development. The `+' strand open reading frame (ORF) located immediately upstream from the capsid ORF and predicted to encode a 10.9 kd protein was also found to be dispensable for replication but essential for systemic spread. By this analysis, MSV sequences that support autonomous replication were localized to a 1.7 kb segment containing the two viral intergenic regions and two overlapping complementary `-' strand ORFs. Despite the inability of the gene replacement mutants to spread systemically, both inoculated and newly developed leaves displayed chlorotic patterns similar to the phenotype observed in certain developmental mutants of maize. The similarity of the MSV mutant phenotype to these developmental mutants is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号