首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Li L  Ly M  Linhardt RJ 《Molecular bioSystems》2012,8(6):1613-1625
Proteoglycans (PGs) are among the most structurally complex biomacromolecules in nature. They are present in all animal cells and frequently exert their critical biological functions through interactions with protein ligands and receptors. PGs are comprised of a core protein to which one or multiple, heterogeneous, and polydisperse glycosaminoglycan (GAG) chains are attached. Proteins, including the protein core of PGs, are now routinely sequenced either directly using proteomics or indirectly using molecular biology through their encoding DNA. The sequencing of the GAG component of PGs poses a considerably more difficult challenge because of the relatively underdeveloped state of glycomics and because the control of their biosynthesis in the endoplasmic reticulum and the Golgi is poorly understood and not believed to be template driven. Recently, the GAG chain of the simplest PG has been suggested to have a defined sequence based on its top-down Fourier transform mass spectral sequencing. This review examines the advances made over the past decade in the sequencing of GAG chains and the challenges the field face in sequencing complex PGs having critical biological functions in developmental biology and pathogenesis.  相似文献   

2.
3.
Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong  相似文献   

4.
We investigated protein sequence/structure correlation by constructing a space of protein sequences, based on methods developed previously for constructing a space of protein structures. The space is constructed by using a representation of the amino acids as vectors of 10 property factors that encode almost all of their physical properties. Each sequence is represented by a distribution of overlapping sequence fragments. A distance between any two sequences can be calculated. By attaching a weight to each factor, intersequence distances can be varied. We optimize the correlation between corresponding distances in the sequence and structure spaces. The optimal correlation between the sequence and structure spaces is significantly better than that which results from correlating randomly generated sequences, having the overall composition of the data base, with the structure space. However, sets of randomly generated sequences, each of which approximates the composition of the real sequence it replaces, produce correlations with the structure space that are as good as that observed for the actual protein sequences. A connection is proposed with previous studies of the protein folding code. It is shown that the most important property factors for the correlation of the sequence and structure spaces are related to helix/bend preference, side chain bulk, and beta-structure preference.  相似文献   

5.
Publically available cDNA sequence data of Citrullus lanatus were searched for simple sequence repeats (SSRs). Nineteen microsatellites were identified and primer pairs were designed to amplify those loci. Primers were evaluated for their ability to detect polymorphisms within a set of several watermelon varieties and local landraces, C. colocynthis, and interspecific hybrids. Eighteen polymorphic SSR loci were identified. These polymorphic loci can be used for varietal identification and other uses.  相似文献   

6.
Contact-based sequence alignment   总被引:2,自引:1,他引:1  
This paper introduces the novel method of contact-based protein sequence alignment, where structural information in the form of contact mutation probabilities is incorporated into an alignment routine using contact-mutation matrices (CAO: Contact Accepted mutatiOn). The contact-based alignment routine optimizes the score of matched contacts, which involves four (two per contact) instead of two residues per match in pairwise alignments. The first contact refers to a real side-chain contact in a template sequence with known structure, and the second contact is the equivalent putative contact of a homologous query sequence with unknown structure. An algorithm has been devised to perform a pairwise sequence alignment based on contact information. The contact scores were combined with PAM-type (Point Accepted Mutation) substitution scores after parameterization of gap penalties and score weights by means of a genetic algorithm. We show that owing to the structural information contained in the CAO matrices, significantly improved alignments of distantly related sequences can be obtained. This has allowed us to annotate eight putative Drosophila IGF sequences. Contact-based sequence alignment should therefore prove useful in comparative modelling and fold recognition.  相似文献   

7.
Multiple sequence alignments are an essential tool for protein structure and function prediction, phylogeny inference and other common tasks in sequence analysis. Recently developed systems have advanced the state of the art with respect to accuracy, ability to scale to thousands of proteins and flexibility in comparing proteins that do not share the same domain architecture. New multiple alignment benchmark databases include PREFAB, SABMARK, OXBENCH and IRMBASE. Although CLUSTALW is still the most popular alignment tool to date, recent methods offer significantly better alignment quality and, in some cases, reduced computational cost.  相似文献   

8.
Coding sequence evolution   总被引:8,自引:0,他引:8  
Dramatic progress has been made in the past ten years in the development of statistical and experimental techniques for investigating features of molecular evolution. Applied to coding regions, these techniques have produced remarkable advances in our understanding of selection for codon usage but, ironically, have had little impact on our understanding of protein evolution. That may be about to change.  相似文献   

9.
A new approach to sequence comparison: normalized sequence alignment   总被引:3,自引:0,他引:3  
The Smith-Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the existing notion of local similarity has a serious flaw: it does not discard poorly conserved intermediate segments. The Smith-Waterman algorithm finds the local alignment with maximal score but it is unable to find local alignment with maximum degree of similarity (e.g. maximal percent of matches). Moreover, there is still no efficient algorithm that answers the following natural question: do two sequences share a (sufficiently long) fragment with more than 70% of similarity? As a result, the local alignment sometimes produces a mosaic of well-conserved fragments artificially connected by poorly-conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction as recently pointed out by Zhang et al. (Bioinformatics, 15, 1012-1019, 1999). In this paper we propose a new sequence comparison algorithm (normalized local alignment ) that reports the regions with maximum degree of similarity. The algorithm is based on fractional programming and its running time is O(n2log n). In practice, normalized local alignment is only 3-5 times slower than the standard Smith-Waterman algorithm.  相似文献   

10.
Multiple sequence alignments are very widely used in all areas of DNA and protein sequence analysis. The main methods that are still in use are based on 'progressive alignment' and date from the mid to late 1980s. Recently, some dramatic improvements have been made to the methodology with respect either to speed and capacity to deal with large numbers of sequences or to accuracy. There have also been some practical advances concerning how to combine three-dimensional structural information with primary sequences to give more accurate alignments, when structures are available.  相似文献   

11.
Homology-extended sequence alignment   总被引:5,自引:1,他引:4       下载免费PDF全文
We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.  相似文献   

12.
Plant Molecular Biology -  相似文献   

13.
Protein sequence databases   总被引:2,自引:0,他引:2  
A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium.  相似文献   

14.
This paper presents a dynamic programming algorithm for aligning two sequeces when the alignment is constrained to lie between two arbitrary boundary lines in the dynamic programming matrix. For affine gap penalties, the algorithm requires onlyO(F) computation time andO(M+N) space, whereF is the area of the feasible region andM andN are the sequence lengths. The result extends to concave gap penalties, with somewhat increased time and space bounds. K.-M. C. and W. M. were supported in part by grant R01 LM05110 from the National Library of Medicine. R. C. H. was supported by PHS grant R01 DK27635.  相似文献   

15.
Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25-30 amino acid residues.  相似文献   

16.
In this paper, we used correlation analysis to quantify the correlations between the hydrophobicity sequence and accessibility sequence of 26 alpha-helix bundle membrane proteins and 119 transmembrane helices. Statistical significances of these correlations were also assessed. A slightly positive correlation was found in the alpha-helix bundle membrane proteins due to the contribution of extra-membranous domains. No correlation was found in the transmembrane domains.  相似文献   

17.
Consensus sequences are widely used in molecular biology but they have many flaws. As a result, binding sites of proteins and other molecules are missed during studies of genetic sequences and important biological effects cannot be seen. Information theory provides a mathematically robust way to avoid consensus sequences. Instead of using consensus sequences, sequence conservation can be quantitatively presented in bits of information by using sequence logo graphics to represent the average of a set of sites, and sequence walker graphics to represent individual sites.  相似文献   

18.
Speed sequence     
  相似文献   

19.
SUMMARY: The Kinase Sequence Database (KSD) located at http://kinase.ucsf.edu/ksd contains information on 290 protein kinase families derived by profile-based clustering of the non-redundant list of sequences obtained from a GenBank-wide search. Included in the database are a total of 5,041 protein kinases from over 100 organisms. Clustering into families is based on the extent of homology within the kinase catalytic domain (250-300 residues in length). Alignments of the families are viewed by interactive Excel-based sequence spreadsheets. In addition, KSD features evolutionary trees derived for each family and detailed information on each sequence as well as links to the corresponding GenBank entries. Sequence manipulation tools, such as evolutionary tree generation, novel sequence assignment, and statistical analysis, are also provided. AVAILABILITY: The kinase sequence database is a web-based service accessible at http://kinase.ucsf.edu/ksd CONTACT: buzko@cmp.ucsf.edu; shokat@cmp.ucsf.edu/ksd  相似文献   

20.
Multiple sequence alignment   总被引:13,自引:0,他引:13  
A method has been developed for aligning segments of several sequences at once. The number of search steps depends only polynomially on the number of sequences, instead of exponentially, because most alignments are rejected without being evaluated explicitly. A data structure herein called the "heap" facilitates this process. For a set of n sequence segments, the overall similarity is taken to be the sum of all the constituent segment pair similarities, which are in turn sums of corresponding residue similarity scores from a Table. The statistical models that test alignments for significance make it possible to group sequences objectively, even when most or all of the interrelationships are weak. These tests are very sensitive, while remaining quite conservative, and discourage the addition of "misfit" sequences to an existing set. The new techniques are applied to a set of five DNA-binding proteins, to a group of three enzymes that employ the coenzyme FAD, and to a control set. The alignment previously proposed for the DNA-binding proteins on the basis of structural comparisons and inspection of sequences is supported quite dramatically, and a highly significant alignment is found for the FAD-binding proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号