共查询到20条相似文献,搜索用时 0 毫秒
1.
Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley-Liss, Inc. 相似文献
2.
3.
Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 527 manually verified families which are available for browsing and on-line searching via the World Wide Web in the UK at http://www.sanger.ac.uk/Pfam/ and in the US at http://genome.wustl. edu/Pfam/ Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome. 相似文献
4.
A comprehensive comparison of multiple sequence alignment programs. 总被引:31,自引:4,他引:31
In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms. 相似文献
5.
Background
Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. 相似文献6.
A. L. Welden 《Brittonia》1967,19(4):328-332
Two species ofStereum are discussed. One,S. macrocystidiatum from Java, is described as new; the other,S. illudens Berk., is redescribed from Mexico-Guatemala collections. Study of this material leads to the conclusion thatXylobolus Karst. emend. Boidin cannot be maintained as distinct fromStereum Hill ex S. F. Gray. Subgeneric distinctions between these two groups of species are also rejected. 相似文献
7.
Ribeiro Ede O Zerlotini GG Lopes IR Ribeiro VB Melo AC Walter ME Costa MM;BIOFOCO Network 《Genetics and molecular research : GMR》2005,4(3):590-598
Interpro is a widely used tool for protein annotation in genome sequencing projects, demanding a large amount of computation and representing a huge time-consuming step. We present a strategy to execute programs using databases Pfam, PROSITE and ProDom of Interpro in a distributed environment using a Java-based messaging system. We developed a two-layer scheduling architecture of the distributed infrastructure. Then, we made experiments and analyzed the results. Our distributed system gave much better results than Interpro Pfam, PROSITE and ProDom running in a centralized platform. This approach seems to be appropriate and promising for highly demanding computational tools used for biological applications. 相似文献
8.
Enormous amounts of data result from genome sequencing projects and new experimental methods. Within this tremendous amount of genomic data 30-40 per cent of the genes being identified in an organism remain unknown in terms of their biological function. As a consequence of this lack of information the overall schema of all the biological functions occurring in a specific organism cannot be properly represented. To understand the functional properties of the genomic data more experimental data must be collected. A pathway database is an effort to handle the current knowledge of biochemical pathways and in addition can be used for interpretation of sequence data. Some of the existing pathway databases can be interpreted as detailed functional annotations of genomes because they are tightly integrated with genomic information. However, experimental data are often lacking in these databases. This paper summarises a list of pathway databases and some of their corresponding biological databases, and also focuses on information about the content and the structure of these databases, the organisation of the data and the reliability of stored information from a biological point of view. Moreover, information about the representation of the pathway data and tools to work with the data are given. Advantages and disadvantages of the analysed databases are pointed out, and an overview to biological scientists on how to use these pathway databases is given. 相似文献
9.
10.
《Biochimica et Biophysica Acta (BBA)/General Subjects》2016,1860(10):2249-2254
BackgroundTrehalose is a non-reducing disaccharide highly conserved throughout evolution. In yeasts, trehalose hydrolysis is confined to the enzyme trehalase, an α-glucosidase specific for trehalose as sole substrate. Two kinds of trehalase activity exist in yeasts: neutral and acid enzymes.Scope of the reviewThis review makes a comparative survey of the main biochemical and genetic parameters, regulatory systems, tridimensional structure and catalytic mechanism of the two yeast trehalases.Major conclusionsThe yeast neutral and acid trehalases display sharp differences in biochemical features (optimum pH, Mr or amino acid sequence) physiological roles, subcellular location (cytosol vs vacuoles or cell wall) and regulatory control (phosphorylation vs catabolite repression). However, their identical specificity for trehalose is based on the presence of an (α/α)6 toroid folding structure in the active centre and a catalytic mechanism of anomeric inversion.General significanceThis review expands our knowledge of the homology, functional features and catalytic mechanisms of α-glucosidases in yeasts. It provides a further analysis of the correlation between structures and predicted biological roles of macromolecules. 相似文献
11.
Experiments with mice show that the pre-carcinogen vinyl chloride is metabolically converted to a short-lived alkylating intermediate which introduces the 2-oxoethyl group onto nucleophilic sites in DNA and proteins. The absolute and relative amounts of alkylated products support the hypothesis that the main reactive metabolite is chloroethylene oxide. 相似文献
12.
We introduce a novel, linguistic-like method of genome analysis. We propose a natural approach to characterizing genomic sequences based on occurrences of fixed length words from a predefined, sufficiently large set of words (strings over the alphabet {A, C, G, T} ). A measure based on this approach is called compositional spectrum and is actually a histogram of imperfect word occurrences. Our results assert that the compositional spectrum is an overall characteristic of a long sequence i.e., a complete genome or an uninterrupted part of a chromosome. This attribute is manifested in the similarity of spectra obtained on different stretches of the same genome, and simultaneously in a broad range of dissimilarities between spectral representations of different genomes. High flexibility characterizes this approach due to imperfect matching and as a result sets of relatively long words can be considered. The proposed approach may have various applications in intra- and intergenomic sequence comparisons. 相似文献
13.
We compared total charges for obstetric care at a major teaching hospital and faculty group practice with those at 3 nonteaching centers in western Washington. The patients were all enrollees of an employee-based health maintenance organization. Charges were used as a proxy for costs and included all outpatient, inpatient, and physician charges. In the teaching system, patients were cared for by faculty and house staff; in the nonteaching settings, they received care from private physicians. No significant differences in total charges were found between the teaching and the nonteaching settings for all deliveries ($4,652 [N = 90] versus $4,530 [N = 335], P greater than .5). In the teaching setting, vaginal deliveries were slightly more expensive ($4,178 [n = 75] versus $3,768 [n = 250], P = .15), as were cesarean deliveries ($7,024 [n = 15] versus $6,771 [n = 85], P greater than .5). The rate of cesarean deliveries was lower in the teaching setting (17% versus 25%, P = .10), partially accounting for the similarity in total charges. The length of stay was similar in the teaching hospital (3.29 versus 3.14 days, P greater than .5). We conclude that the academic medical center as a total system of care can provide obstetric care as cost-effectively as nonteaching systems under the constraints of prepaid care. 相似文献
14.
L. J. Donaldson 《BMJ (Clinical research ed.)》1992,305(6864):1280-1284
The advent of the Tomlinson inquiry draws attention to the need to strike a balance between market led and planned approaches to health care delivery. This is important not just for hospital rationalisation but also for the preservation and development of services which are provided in a smaller number of hospitals. Specialised services are often in the forefront of raising standards of care and introducing new developments and innovations. They are the only option for a small number of patients with serious illnesses. In the internal market for health care provision created by the 1990 NHS reforms more sophisticated and flexible mechanisms must be found to provide stability for specialised services while at the same time enabling the benefits of purchaser choice and provider competition to be realised. 相似文献
15.
16.
17.
LISTA, LISTA-HOP and LISTA-HON: a comprehensive compilation of protein encoding sequences and its associated homology databases from the yeast Saccharomyces.
下载免费PDF全文

We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). 相似文献
18.
LISTA, LISTA-HOP and LISTA-HON: a comprehensive compilation of protein encoding sequences and its associated homology databases from the yeast Saccharomyces.
下载免费PDF全文

We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. As in previous editions the genetic names are consistently associated to each sequence with a known and confirmed ORF. If necessary, synonyms are given in the case of allelic duplicated sequences. Although the first publication of a sequence gives-according to our rules-the genetic name of a gene, in some instances more commonly used names are given to avoid nomenclature problems and the use of ancient designations which are no longer used. In these cases the old designation is given as synonym. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, SWISSPROT and EMBL accession numbers. New entries will also contain the name from the systematic sequencing efforts. Since the release of LISTA4.1 we update the database continuously. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. This release includes reports from full Smith and Watermann peptide-level searches against a non-redundant protein sequence database. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). The database is available by FTP and on World Wide Web. 相似文献
19.
20.
Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. 总被引:34,自引:2,他引:34
下载免费PDF全文

A Bateman E Birney R Durbin S R Eddy R D Finn E L Sonnhammer 《Nucleic acids research》1999,27(1):260-262
Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are available on the World Wide Web in Europe at http://www.sanger.ac.uk/Software/Pfam/ and http://www.cgr.ki.se/Pfam/, and in the US at http://pfam.wustl.edu/. Over 54% of proteins in SWISS-PROT-35 and SP-TrEMBL-5 match a Pfam family. The primary changes of Pfam since release 2.1 are that we now use the more advanced version 2 of the HMMER software, which is more sensitive and provides expectation values for matches, and that it now includes proteins from both SP-TrEMBL and SWISS-PROT. 相似文献