首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Elaboration of ALFRED (http://alfred.med.yale.edu) is being continued in two directions. One of which is developing tools for efficiently annotating the entries and checking the integrity of the data already in the database while the other is to increase the quantity and accessibility of data. Information contained in ALFRED such as, polymorphic sites, number of populations and frequency tables (one sample typed for one site) has significantly increased.  相似文献   

2.
The deluge of data from the human genome project (HGP) presents new opportunities for molecular anthropologists to study human variation through the promise of vast numbers of new polymorphisms (e.g., single nucleotide polymorphisms or SNPs). Collecting the resulting data into a single, easily accessible resource will be important to facilitate this research. We created a prototype Web-accessible database named ALFRED (ALelle FREquency Database, http://alfred.med.yale.edu/alfred/) to store and make publicly available allele frequency data on diverse polymorphic sites for many populations. In constructing this database, we considered many different concerns relating to the types of information needed for anthropology, population genetics, molecular genetics, and statistics, as well as issues of data integrity and ease of access to data. We also developed links to other Web-based databases as well as procedures for others to make links to the data in ALFRED. Here we present an overview of the issues considered and provisional solutions, as well as an example of data already available. It is our hope that this database will be useful for research and teaching in a wide range of fields, and that colleagues from various fields will contribute to making ALFRED an important resource for many studies as yet unforeseen.  相似文献   

3.
We have developed a publicly accessible database (ALFRED, the ALlele FREquency Database) that catalogues allele frequency data for a wide range of population samples and DNA polymorphisms. This database is web-accessible through our laboratory (Kidd Lab) Web site: http://info.med.yale.edu/genetics/kkidd. ALFRED currently contains data on 60 populations and 156 genetic systems including single nucleotide polymorphisms (SNPs), short tandem repeat polymorphisms (STRPs), variable number of tandem repeats (VNTRs) and insertion-deletion polymorphisms. While data are not available for all population-DNA polymorphism combinations, over 2000 allele frequency tables have been entered. Our database is designed (i) to address our specific research requirements as well as broader scientific objectives; (ii) to allow researchers and interested educators to easily navigate and retrieve data of interest to them; and (iii) to integrate links to other related public databases such as dbSNP, GenBank and PubMed.  相似文献   

4.
SUMMARY: Understanding of human variation relevant to association studies can benefit from population comparison, especially comparing populations in the same geographical region. Variations in linkage disequilibrium patterns, in tagSNP sets, and in SNP heterozygosities among populations can be used to infer the evolutionary pattern. We present here a win32 system based Perl/Tk application for visual comparisons of these variations in different populations. AVAILABILITY: The application package is available at http://info.med.yale.edu/genetics/kkidd/programs.html CONTACT: sheng.gu@yale.edu.  相似文献   

5.
MOTIVATION: Genomic DNA copy number alterations are characteristic of many human diseases including cancer. Various techniques and platforms have been proposed to allow researchers to partition the whole genome into segments where copy numbers change between contiguous segments, and subsequently to quantify DNA copy number alterations. In this paper, we incorporate the spatial dependence of DNA copy number data into a regression model and formalize the detection of DNA copy number alterations as a penalized least squares regression problem. In addition, we use a stationary bootstrap approach to estimate the statistical significance and false discovery rate. RESULTS: The proposed method is studied by simulations and illustrated by an application to an extensively analyzed dataset in the literature. The results show that the proposed method can correctly detect the numbers and locations of the true breakpoints while appropriately controlling the false positives. AVAILABILITY: http://bioinformatics.med.yale.edu/DNACopyNumber CONTACT: hongyu.zhao@yale.edu SUPPLEMENTARY INFORMATION: http://bioinformatics.med.yale.edu/DNACopyNumber.  相似文献   

6.
MOTIVATION: Packages that support the creation of pathway diagrams are limited by their inability to be readily extended to new classes of pathway-related data. RESULTS: VitaPad is a cross-platform application that enables users to create and modify biological pathway diagrams and incorporate microarray data with them. It improves on existing software in the following areas: (i) It can create diagrams dynamically through graph layout algorithms. (ii) It is open-source and uses an open XML format to store data, allowing for easy extension or integration with other tools. (iii) It features a cutting-edge user interface with intuitive controls, high-resolution graphics and fully customizable appearance. AVAILABILITY: http://bioinformatics.med.yale.edu CONTACTS: matthew.holford@yale.edu; hongyu.zhao@yale.edu.  相似文献   

7.
8.
9.
This database consists of over 24 000 mutations in 18 viral, bacterial, yeast or mammalian genes. The data are grouped as sets of DNA base sequence changes or spectra caused by a particular mutagen under defined conditions. The spectra are available on the World Wide Web at http://info.med.yale.edu/mutbase/ in two formats; in text format that can be browsed on-line or downloaded for use with a text editor and in dBASEIII format for use, after downloading, by relational database programs or by spreadsheets. Researchers are encouraged to submit DNA sequence changes to a suitable mutation database such as ours. A data entry program, MUTSIN, can be retrieved from this site. MUTSIN diagrams each mutation on the computer screen and alerts the user to any discrepancies.  相似文献   

10.
11.
MOTIVATION: Identifying protein-protein interactions is critical for understanding cellular processes. Because protein domains represent binding modules and are responsible for the interactions between proteins, computational approaches have been proposed to predict protein interactions at the domain level. The fact that protein domains are likely evolutionarily conserved allows us to pool information from data across multiple organisms for the inference of domain-domain and protein-protein interaction probabilities. RESULTS: We use a likelihood approach to estimating domain-domain interaction probabilities by integrating large-scale protein interaction data from three organisms, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. The estimated domain-domain interaction probabilities are then used to predict protein-protein interactions in S.cerevisiae. Based on a thorough comparison of sensitivity and specificity, Gene Ontology term enrichment and gene expression profiles, we have demonstrated that it may be far more informative to predict protein-protein interactions from diverse organisms than from a single organism. AVAILABILITY: The program for computing the protein-protein interaction probabilities and supplementary material are available at http://bioinformatics.med.yale.edu/interaction.  相似文献   

12.
A mutation spectra database for bacterial and mammalian genes.   总被引:1,自引:0,他引:1       下载免费PDF全文
Each mutation spectrum in this database is a dataset of changes in DNA base sequence in mutations induced in a gene by a particular mutagen (including spontaneous processes) under defined conditions. There are 240 datasets with 24 500 mutants in nine bacterial genes, two phage genes, five mammalian genes and one yeast gene. The database is available on the Web at http://info.med.yale.edu/mutbase/ . The data tables can be viewed on the Web and downloaded in text form for local use. The data are also available in dBASE III, a format which can be utilized by essentially any desktop computer database program or spreadsheet, and makes feasible analyses of a large number of mutants. Researchers are invited to submit additional data. A data entry program, MUTSIN, diagrams each mutation on the computer screen as the data are entered and alerts the user to any discrepancies between the entry and the gene sequence.  相似文献   

13.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

14.
Randomized libraries are increasingly popular in protein engineering and other biomedical research fields. Statistics of the libraries are useful to guide and evaluate randomized library construction. Previous works only give the mean of the number of unique sequences in the library, and they can only handle equal molar ratio of the four nucleotides at a small number of mutation sites. We derive formulas to calculate the mean and variance of the number of unique sequences in libraries generated by cassette mutagenesis with mixtures of arbitrary nucleotide ratios. Computer program was developed which utilizes arbitrary numerical precision software package to calculate the statistics of large libraries. The statistics of library with mutations in more than 20 amino acids can be calculated easily. Results show that the nucleotide ratios have significant effects on these statistics. The more skewed the ratio, the larger the library size is needed to obtain the same expected number of unique sequences. The program is freely available at http://graphics.med.yale.edu/cgi-bin/lib_comp.pl.  相似文献   

15.
16.
The Olfactory Receptor Database (ORDB) is a WWW-accessible database that stores data on Olfactory Receptor-like molecules (ORs) and has been open to the public since June 1996. It contains a public and a private area. The public area includes published DNA and protein sequence data for ORs, links to OR models and data on their expression, chromosomal localization and source organism, as well as (i) links to bibliography through PubMed and (ii) interactive WWW-based tools, such as BLAST homology searching. The private area functions as a service to laboratories that are actively cloning receptors. Source laboratories enter the sequences of the receptor clones they have characterized to the private database and can search for identical or near identical OR sequences in both public and private databases. If another laboratory has cloned and deposited an identical or closely matching sequence there are means for communication between the laboratories to help avoid duplication of work. ORDB is available via the WWW at http://crepe.med.yale.edu/ORDB/HTML  相似文献   

17.
The Synergizer is a database and web service that provides translations of biological database identifiers. It is accessible both programmatically and interactively. AVAILABILITY: The Synergizer is freely available to all users inter-actively via a web application (http://llama.med.harvard.edu/synergizer/translate) and programmatically via a web service. Clients implementing the Synergizer application programming interface (API) are also freely available. Please visit http://llama.med.harvard.edu/synergizer/doc for details.  相似文献   

18.
Kong Y 《Genomics》2011,98(2):152-153
Btrim is a fast and lightweight software to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines. It also can reliably identify barcodes and assign the reads to the original samples. Based on a modified Myers's bit-vector dynamic programming algorithm, Btrim can handle indels in adapters and barcodes. It removes low quality regions and trims off adapters at both or either end of the reads. A typical trimming of 30 M reads with two sets of adapter pairs can be done in about a minute with a small memory footprint. Btrim is a versatile stand-alone tool that can be used as the first step in virtually all next-generation sequence analysis pipelines. The program is available at http://graphics.med.yale.edu/trim/.  相似文献   

19.
NMPP: a user-customized NimbleGen microarray data processing pipeline   总被引:1,自引:0,他引:1  
NMPP package is a bundle of user-customized tools based on established algorithms and methods to process self-designed NimbleGen microarray data. It features a command-line-based integrative processing procedure that comprises five major functional components, namely the raw microarray data parsing and integrating module, the array spatial effect smoothing and visualization module, the probe-level multi-array normalization module, the gene expression intensity summarization module and the gene expression status inference module. AVAILABILITY: http://plantgenomics.biology.yale.edu/nmpp  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号