首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
We consider the basic function which locates a specific stringof symbols within a longer sequence. When one is expecting todo many substring searches it is worthwhile to build an auxiliaryindex to the sequence to aid in the search. We propose a methodto generate a compact index that can be viewed as a small (partial)deterministic finite automaton recognizing the subword structureof a sequence. We present an algorithm for its constructionon-line in linear time. Such a data structure permits the efficientlocalization of subwords in a sequence and can be used in thedevelopment of interactive sequence analysis software.  相似文献   

2.
传统的DNA序列可视化模型局限于短DNA序列的可视化,并且缺乏对可视化图形的通用分析方法。因此,文章提出了一种基于图像的DNA序列可视化模型,这种模型通过将一维的DNA序列转换为二维的256色的灰度图像,可以实现长DNA序列的可视化,具有很高的空间紧密性。借助成熟的图像处理方法来分析DNA可视化图像,可以获取原始DNA序列的规模、4种不同碱基的分布、无序程度等重要信息。通过比较不同DNA序列的可视化图像,可以获取这些序列的相似性信息。  相似文献   

3.
We present a correction of the previously reported nucleotide sequence of the Citrobacter freundii trp operon regulatory region. The original sequence analyses were performed with a plasmid designated pCF2. We repeated the cloning of the trp regulatory region of C. freundii and concluded from the determined sequence that a DNA rearrangement had occurred within the leader region of the cloned trp DNA of pCF2. The correct sequence is homologous to the Escherichia coli sequence.  相似文献   

4.
Sano K  Maeda K  Oki M  Maéda Y 《FEBS letters》2002,532(1-2):143-146
We describe a cis element that dramatically increases the expression levels of exogenous genes in baculovirus-infected insect cells. This 21 bp sequence element is derived from a 5' untranslated leader sequence of a lobster tropomyosin cDNA (L21). By using a transfer vector carrying L21, the expression levels of tropomyosin and luciferase were 20- and seven-fold higher with L21 than without L21, respectively. L21 has both the Kozak sequence and the A-rich sequence found in the polyhedrin leader sequence. We assume that both sequence elements are essential for the enhancement of protein expression in the baculovirus-based expression system.  相似文献   

5.
6.
7.
RNA sequence analysis using covariance models.   总被引:43,自引:8,他引:35       下载免费PDF全文
We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.  相似文献   

8.
MOTIVATION: We describe APDB, a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. This evaluation does not require a reference alignment or a structure superposition. APDB is designed to efficiently and objectively benchmark multiple sequence alignment methods. RESULTS: Using existing collections of reference multiple sequence alignments and existing alignment methods, we show that APDB gives results that are consistent with those obtained using conventional evaluations. We also show that APDB is suitable for evaluating sequence alignments that are structurally equivalent. We conclude that APDB provides an alternative to more conventional methods used for benchmarking sequence alignment packages.  相似文献   

9.
Nicholas HB  Ropelewski AJ  Deerfield DW 《BioTechniques》2002,32(3):572-4, 576, 578 passim
We present an overview of multiple sequence alignments to outline the practical consequences for the choices among different techniques and parameters. We begin with a discussion of the scoring methods for quantifying the quality of a multiple sequence alignment, followed by a discussion of the algorithms implemented within a variety of multiple sequence alignment programs. We also discuss additional alignment details such as gap penalty and distance metrics. The paper concludes with a discussion on how to improve alignment quality and the limitations of the techniques described in this paper  相似文献   

10.
Compared to their eukaryotic counterparts, bacterial genomes are small and contain extremely tightly packed genes. Repetitive sequences are rare but not completely absent. One of the most common repeat families is REPINs. REPINs can replicate in the host genome and form populations that persist for millions of years. Here, we model the interactions of these intragenomic sequence populations with the bacterial host. We first confirm well-established results, in the presence and absence of horizontal gene transfer (hgt) sequence populations either expand until they drive the host to extinction or the sequence population gets purged from the genome. We then show that a sequence population can be stably maintained, when each individual sequence provides a benefit that decreases with increasing sequence population size. Maintaining a sequence population of stable size also requires the replication of the sequence population to be costly to the host, otherwise the sequence population size will increase indefinitely. Surprisingly, in regimes with high hgt rates, the benefit conferred by the sequence population does not have to exceed the damage it causes to its host. Our analyses provide a plausible scenario for the persistence of sequence populations in bacterial genomes. We also hypothesize a limited biologically relevant parameter range for the provided benefit, which can be tested in future experiments.  相似文献   

11.
A unique repetitive DNA sequence in the Myxococcus xanthus genome.   总被引:7,自引:2,他引:5       下载免费PDF全文
We found a novel type of repetitive DNA sequence in the Myxococcus xanthus genome. The first repetitive sequence is located in the spacer region between the ops and tps genes. We cloned five other repetitive sequences using the first repetitive sequence as a probe and determined their nucleotide sequences. Comparison of these sequences revealed that the repetitive sequences consist of a 87-bp core sequence and that some clones share additional homology on their flanking regions.  相似文献   

12.
J R Smiley  C Lavery    M Howes 《Journal of virology》1992,66(12):7505-7510
We inserted the terminal repeat (a sequence) of herpes simplex virus type 1 (HSV-1) strain KOS into the tk gene of HSV-2 strain HG52 in order to assess the ability of the HSV-1 a sequence to provoke genome isomerization events in an HSV-2 background. We found that the HSV-1 a sequence was cleaved by the HSV-2 cleavage/packaging machinery to give rise to novel genomic termini. However, the HSV-1 a sequence did not detectably recombine with the HSV-2 a sequence. These results demonstrate that the viral DNA cleavage/packaging system contributes to a subset of genome isomerization events and indicate that the additional recombinational inversion events that occur during infection require sequence homology between the recombination partners.  相似文献   

13.
BC Faircloth  TC Glenn 《PloS one》2012,7(8):e42543
Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (maxcount = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.  相似文献   

14.
参照GenBank中长角血蜱致病性Okayama株卵泡抑素基因的核苷酸序列(GenBank Accession No.DQ248886)设计合成一对引物,从本实验室保藏的单克隆洁净长角血蜱饥饿成蜱中快速提取总RNA,通过RT-PCR扩增出814bp的卵泡抑素基因,序列比对结果显示:与长角血蜱致病性Okayama株的核苷酸序列及氨基酸序列一致性分别为97.8%和99%,将其亚克隆到表达载体pGEX-4T-1中进行表达,GST融合重组蛋白预期分子量为57kD。表达重组蛋白经MagneGSTTM蛋白纯化系统纯化后作为抗原分别与抗不同发育阶段长角血蜱(卵、幼蜱、若蜱、成蜱)多克隆抗体作为一抗进行免疫印迹,结果表明:与长角血蜱卵制备的多克隆抗体有很强的免疫反应,而与其他发育阶段(幼蜱、若蜱、成蜱)饥饿长角血蜱制备的多克隆抗体反应性很弱。以上结果表明:长角血蜱卵泡抑素蛋白在长角血蜱产卵及卵成熟发育时期的表达水平较其他发育阶段(幼蜱、若蜱、成蜱)的蛋白表达水平高。  相似文献   

15.
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.  相似文献   

16.
We propose a feature vector approach to characterize the variation in large data sets of biological sequences. Each candidate sequence produces a single feature vector constructed with the number and location of amino acids or nucleic acids in the sequence. The feature vector characterizes the distance between the actual sequence and a model of a theoretical sequence based on the binomial and uniform distributions. This method is distinctive in that it does not rely on sequence alignment for determining protein relatedness, allowing the user to visualize the relationships within a set of proteins without making a priori assumptions about those proteins. We apply our method to two large families of proteins: protein kinase C, and globins, including hemoglobins and myoglobins. We interpret the high-dimensional feature vectors using principal components analysis and agglomerative hierarchical clustering. We find that the feature vector retains much of the information about the original sequence. By using principal component analysis to extract information from collections of feature vectors, we are able to quickly identify the nature of variation in a collection of proteins. Where collections are phylogenetically or functionally related, this is easily detected. Hierarchical agglomerative clustering provides a means of constructing cladograms from the feature vector output.  相似文献   

17.
Biotinylation of fusion proteins in E. coli was studied using a sequence of Propionibacterium freudenreichii transcarboxylase 1.3S biotin subunit. As the biotinylation sequence, we examined two sequences: one was of amino acid residues [84-123] of 1.3S, a partial sequence containing a region from a conserved tetrapeptide (Ala-Met-Bct-Met) around the biotinyl lysine (Bct) to the carboxyl terminal; the other was of an almost entire sequence [18-123]. We constructed recombinant plasmids for fusion proteins of beta-galactosidase, of chloramphenicol acetyltransferase, and of alkaline phosphatase. We found the biotinylation in the [18-123] sequence fused to alkaline phosphatase.  相似文献   

18.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

19.
We have cloned a 12 kb DNA segment containing human mu gene and its flanking sequence from human fetal liver DNA library using mouse mu gene as a probe. Partial nucleotide sequence determination shows that the cloned DNA contains the sequence encoding human mu chain. This is the first constant region gene of the human heavy chain that is cloned. We have compared human and mouse mu genes by heteroduplex analysis and Southern blot hybridization. The results clearly show that not only the sequence encoding the CH4 domain but also the 5'-flanking (S mu) sequence is conserved between human and mouse mu genes, suggesting that the nucleotide sequence in the S mu region has an important biological function, presumably a recognition signal for the class switch recombinant as proposed previously.  相似文献   

20.
Comparative embryology of closely related species can shed light on the evolution of developmental processes. An important mechanism in the evolution of developmental processes, which can lead to significant changes in larval or adult form, is variation in the sequence and timing of developmental events. We compared the development of 12 species of anurans, including a wide taxonomic range as well as a number of congeneric species. The comparison consisted of monitoring a series of external morphological markers and histological markers. For each species we noted the timing of each of the markers, using a uniform parameter of normalized time. We compared the normalized time of each of these events among the species, as well as the sequence of the events. Our analysis revealed many differences in sequence and in timing of developmental events. We mapped these differences on a cladogram of the studied species, using sequence units as discrete characters. The differences do not seem to be connected to the phylogenetic relations between the species or to any obvious ecological factors. We suggest a hypothetical ancestral sequence of developmental events, and discuss the possible factors that could have caused the observed variations from the ancestral sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号