首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
Protein sequence databases   总被引:2,自引:0,他引:2  
A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium.  相似文献   

5.
The role of pattern databases in sequence analysis   总被引:2,自引:0,他引:2  
In the wake of the numerous now-fruitful genome projects, we are entering an era rich in biological data. The field of bioinformatics is poised to exploit this information in increasingly powerful ways, but the abundance and growing complexity both of the data and of the tools and resources required to analyse them are threatening to overwhelm us. Databases and their search tools are now an essential part of the research environment. However, the rate of sequence generation and the haphazard proliferation of databases have made it difficult to keep pace with developments. In an age of information overload, researchers want rapid, easy-to-use, reliable tools for functional characterisation of newly determined sequences. But what are those tools? How do we access them? Which should we use? This review focuses on a particular type of database that is increasingly used in the task of routine sequence analysis--the so-called pattern database. The paper aims to provide an overview of the current status of pattern databases in common use, outlining the methods behind them and giving pointers on their diagnostic strengths and weaknesses.  相似文献   

6.
7.
Nicholas HB  Deerfield DW  Ropelewski AJ 《BioTechniques》2000,28(6):1174-8, 1180, 1182 passim
We provide a detailed overview of the choices inherent in performing a sequence database search, including the choice of algorithm, substitution matrix and gap model. Each of these choices has implications that can be described as restrictions on the underlying model of sequence evolution, the expected degree of divergence between the query sequence and the database sequences (if one uses an evolutionary based matrix), as well as the sensitivity and selectivity of the search. We conclude with a series of recommendations for researchers performing these searches based on our experience and literature studies.  相似文献   

8.
MOTIVATION: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. SUMMARY: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer.  相似文献   

9.
Publically available cDNA sequence data of Citrullus lanatus were searched for simple sequence repeats (SSRs). Nineteen microsatellites were identified and primer pairs were designed to amplify those loci. Primers were evaluated for their ability to detect polymorphisms within a set of several watermelon varieties and local landraces, C. colocynthis, and interspecific hybrids. Eighteen polymorphic SSR loci were identified. These polymorphic loci can be used for varietal identification and other uses.  相似文献   

10.
11.
12.
Nucleic acid and protein sequences contain a wealth of informationof interest to molecular biologists. The advent of molecularsequence databases provides a unique opportunity for the computeranalysis of all available sequences. Sequence databases servetwo main functions: (i) to facilitate comparisons with newlydetermined sequences, and (ii) to act as a source of data forthe generation and testing of hypotheses concerning molecularsequence organisation and evolution. The large amounts of sequencedata now becoming available require that algorithms for databasesearching be fast and efficient and considerable progress isbeing made in this area.  相似文献   

13.
This paper aims to give an overview of current resources onhuman sequence variations and give an idea about the directionin which these services are moving.   相似文献   

14.

Background  

Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil.  相似文献   

15.
16.
Databases for biologists are becoming increasingly important. Some of these can be regarded as ‘core’ resources, such as the bibliographic databases, whereas others are of greater interest to specialists. As comparative genomics develops, however, even databases limited in their scope (e.g. to a single organism) are of great interest to a wider community.  相似文献   

17.
Storing biological sequence databases in relational form   总被引:2,自引:0,他引:2  
SUMMARY: We have created a set of applications using Perl and Java in combination with XML technology to install biological sequence databases into an Oracle RDBMS. An easy-to-use interface using Java has been created for database query and other tools developed to integrate with our in-house bioinformatics applications. AVAILIBILITY: The database schema, DTD file, and source codes are available from the authors via email. CONTACT: guochun_ xie@merck. com  相似文献   

18.
Human immunodeficiency virus type 1 (HIV-1) sequences are accumulating in the literature at a rapid pace. For this ever-expanding resource to be maximally useful, it is critical that researchers strive to maintain a high level of quality assurance, both in experimental design and conduct and in analyses. Here we present detailed analyses of problematic sets of HIV-1 sequences in the database that include sequence anomalies suggestive of mislabeling or sample contamination problems. These data are examined in the context of currently available HIV-1 sequence information to provide an example of how to identify potentially flawed data. Indicators of potential problems with sequences are (i) sequences that are nearly identical that are supposed to be derived from unlinked individuals and that are markedly distinct from other sequences from the putative source or (ii) sequences that are nearly identical to those of laboratory strains. We provide an outline of methods that researchers can use to perform preliminary laboratory and computational analyses that could help identify problematic data and thus help ensure the integrity of sequence databases.  相似文献   

19.
The GenBank genetic sequence databank.   总被引:36,自引:6,他引:30       下载免费PDF全文
The GenBank Genetic Sequence Data Bank contains over 5700 entries for DNA and RNA sequences that have been reported since 1967. This paper briefly describes the contents of the database, the forms in which the database is distributed, and the services we offer to scientists who use the GenBank database.  相似文献   

20.
Nucleic acid-based biochemical assays are crucial to modern biology. Key applications, such as detection of bacterial, viral and fungal pathogens, require detailed knowledge of assay sensitivity and specificity to obtain reliable results. Improved methods to predict assay performance are needed for exploiting the exponentially growing amount of DNA sequence data and for reducing the experimental effort required to develop robust detection assays. Toward this goal, we present an algorithm for the calculation of sequence similarity based on DNA thermodynamics. In our approach, search queries consist of one to three oligonucleotide sequences representing either a hybridization probe, a pair of Padlock probes or a pair of PCR primers with an optional TaqMantrade mark probe (i.e. in silico or 'virtual' PCR). Matches are reported if the query and target satisfy both the thermodynamics of the assay (binding at a specified hybridization temperature and/or change in free energy) and the relevant biological constraints (assay sequences binding to the correct target duplex strands in the required orientations). The sensitivity and specificity of our method is evaluated by comparing predicted to known sequence tagged sites in the human genome. Free energy is shown to be a more sensitive and specific match criterion than hybridization temperature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号