首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Bio3D is a family of R packages for the analysis of biomolecular sequence, structure, and dynamics. Major functionality includes biomolecular database searching and retrieval, sequence and structure conservation analysis, ensemble normal mode analysis, protein structure and correlation network analysis, principal component, and related multivariate analysis methods. Here, we review recent package developments, including a new underlying segregation into separate packages for distinct analysis, and introduce a new method for structure analysis named ensemble difference distance matrix analysis (eDDM). The eDDM approach calculates and compares atomic distance matrices across large sets of homologous atomic structures to help identify the residue wise determinants underlying specific functional processes. An eDDM workflow is detailed along with an example application to a large protein family. As a new member of the Bio3D family, the Bio3D‐eddm package supports both experimental and theoretical simulation‐generated structures, is integrated with other methods for dissecting sequence‐structure–function relationships, and can be used in a highly automated and reproducible manner. Bio3D is distributed as an integrated set of platform independent open source R packages available from: http://thegrantlab.org/bio3d/ .  相似文献   

2.
Sequence divergence among orthologous proteins was characterized with 34 amino acid replacement matrices, sequence context analysis, and a phylogenetic tree. The model was trained on very large datasets of aligned protein sequences drawn from 15 organisms including protists, plants, Dictyostelium, fungi, and animals. Comparative tests with models currently used in phylogeny, i.e., with JTT+Γ±F and WAG+Γ±F, made on a test dataset of 380 multiple alignments containing protein sequences from all five of the major taxonomic groups mentioned, indicate that our model should be preferred over the JTT+Γ±F and WAG+Γ±F models on datasets similar to the test dataset. The strong performance of our model of orthologous protein sequence divergence can be attributed to its ability to better approximate amino acid equilibrium frequencies to compositions found in alignment columns. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor : Dr. Martin Kreitman]  相似文献   

3.
This paper presents the first report on the structure of a 14-kb centromere sequence in a cereal genome that includes 1.9-kb direct repeats. The cereal centromeric sequence (CCS1) conserved in some Gramineae species contains a 17-bp motif similar to the CENP-B box, which serves as the binding site for the centromere-specific protein CENP-B in human. To isolate centromeric units from rice (Oryza sativa L.), we performed PCR using the CENP-B box-like sequences (CBLS) as primers. A 264-bp clone was amplified by this method, and called RCS1516. It appeared to be a novel member of the CCS1 family, sharing about 60% identity with the CCS1 sequences of other cereals. Then, a 14-kb genomic clone, λRCB11, carrying the RCS1516 sequence was isolated and sequenced. It was found to contain three copies of a 1.9-kb direct repeat, RCE1, separated by 5.1- and 1.7-kb. A 300-bp sequence at the 3′ end of RCE1 is highly conserved in all three copies (>90%) and is almost identical to the RCS1516 sequence including the CBLS motif. The copy number of RCE1 was estimated to range from 102 to 103 in the haploid genome of rice. Cloned RCE1 units were used for fluorescent in situ hybridization (FISH) analysis, and signals were observed on almost every primary constriction of rice chromosomes. Thus it was concluded that RCE1 is a significant component of the rice centromere. The λRCB11 clone contained at least four A/T-rich regions, which are candidate for matrix attachment regions (MARs), in the sequences between the RCE1 repeats. Other elements that are homologous to the short centromeric repetitive sequences pSau3A9 and pRG5, detected in both sorghum and rice, were also found in the clone. Received: 9 June 1998 / Accepted: 16 September 1998  相似文献   

4.
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.  相似文献   

5.
DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next‐generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor‐made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools . A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org .  相似文献   

6.
Light-harvesting proteins harness light energy for photosynthesis. Sequences of the Photosystem II (PS II) light harvesting proteins, Lhcb1–6, have been deduced from many plants. However, limited information is available for spinach Lhcb sequences, although a spinach PS II preparation (BBY) is commonly used as a model for plant photosynthetic oxygen evolution [DA Berthold, GT Babcock and CF Yocum (1981) FEBS Lett 134: 231–234]. In this work, we describe the use of tryptic digestion, liquid chromatography, tandem mass spectrometry, and database searching to identify light-harvesting proteins in the spinach BBY preparation. Using this approach, partial amino acid sequences were assigned to the PS II-associated light-harvesting proteins, Lhcb1–6. The identified stretches of sequence are predicted to contain intra-membranous chlorophyll ligands, extra-membranous loop regions, and lutein-binding sites. In addition, we find that at least two distinct Lhcb4 (CP29) polypeptides and two distinct Lhcb1 polypeptides are present in the BBY preparation. One of these Lhcb4 polypeptides has a subsequence that has not been reported for Lhcb4 in any other organism. This work demonstrates the utility of tandem mass spectrometry in the characterization of photosynthetic membrane proteins. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

7.
A novel interactive method for generating multiple protein sequencealignments is described. The program has no internal limit tothe number or length of sequences it can handle and is designedfor use with DEC VAX processors running the VMS operating system.The approach used is essentially one of manual sequence manipulation,aided by built-in symbolic displays of identities and similarities,and strict and ‘fuzzy’ (ambiguous) pattern-matchingfacilities. Additional flexibility is provided by means of aninterface to a publicly available automatic alignment systemand to a comprehensive sequence analysis package. Received on August 28, 1990; accepted on November 20, 1990  相似文献   

8.
In the program, PCAP, we provide a methodology for choosingsynthetic oligonucleotide probes to be used in contig mappingexperiments. The package serves the purpose of presenting aseries of short oligonucleotides (8–12mers) that are chosenbased on constraints with respect to frequency of occurrencewithin a particular genome and the G+C content of the oligonucleotides.The four programs contained within the package: (i) convertGenBank files to a format useable by the package; (ii) calculatetrinucleotide and tetranucleotide frequencies in available sequencedata on a particular species; (iv) present the user with upperand lower bounds on the frequencies of hybridization sites foroligonucleotide probes of length 8–12, (iv) allow theuser to place constraints on site frequency and G+C contentand provides a list of short probe sequences that fit thesecriteria. These sequences can then be synthetically producedand used in hybridization experiments to carry out contig mapping.  相似文献   

9.
Promoter trapping involved screening uncharacterized fragments of C. elegans genomic DNA for C. elegans promoter activity. By sequencing the ends of these DNA fragments and locating their genomic origin using the available genome sequence data, promoter trapping has now been shown to identify real promoters of real genes, exactly as anticipated. Developmental expression patterns have thereby been linked to gene sequence, allowing further inferences on gene function to be drawn. Some expression patterns generated by promoter trapping include subcellular details. Localization to the surface of particular cells or even particular aspects of the cell surface was found to be consistent with the genes, now associated with these patterns, encoding membrane-spanning proteins. Data on gene expression patterns are easier to generate and characterize than mutant phenotypes and may provide the best means of interpreting the large quantity of sequence data currently being generated in genome projects. Received: 12 June 1998 / Accepted: 21 August 1998  相似文献   

10.
The protein information resource (PIR)   总被引:13,自引:0,他引:13       下载免费PDF全文
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号