A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.   

Prediction of protein function is one of the most challenging problems in the post-genomic era. In this paper, we propose a novel algorithm Improved ProteinRank (IPR) for protein function prediction, which is based on the search engine technology and the preferential attachment criteria. In addition, an improved algorithm IPRW is developed from IPR to be used in the weighted protein?protein interaction (PPI) network. The proposed algorithms IPR and IPRW are applied to the PPI network of S.cerevisiae. The experimental results show that both IPR and IPRW outweigh the previous methods for the prediction of protein functions.   

The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. AVAILABILITY: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website.   



Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information.   

MOTIVATION: Genome-wide high density SNP association studies are expected to identify various SNP alleles associated with different complex disorders. Understanding the biological significance of these SNP alleles in the context of existing literature is a major challenge since existing search engines are not designed to search literature for SNPs or other genetic markers. The literature mining of gene and protein functions has received significant attention and effort while similar work on genetic markers and their related diseases is still in its infancy. Our goal is to develop a web-based tool that facilitates the mining of Medline literature related to genetic studies and gene/protein function studies. Our solution consists of four main function modules for (1) identification of different types of genetic markers or genetic variations in Medline records (2) distinguishing positive versus negative linkage or association between genetic markers and diseases (3) integrating marker genomic location data from different databases to enable the retrieval of Medline records related to markers in the same linkage disequilibrium region (4) and a web interface called MarkerInfoFinder to search, display, sort and download Medline citation results. Tests using published data suggest MarkerInfoFinder can significantly increase the efficiency of finding genetic disorders and their underlying molecular mechanisms. The functions we developed will also be used to build a knowledge base for genetic markers and diseases. AVAILABILITY: The MarkerInfoFinder is publicly available at: http://brainarray.mbni.med.umich.edu/brainarray/datamining/MarkerInfoFinder.   



Evolutionary biologists want to explain the origin of novel features and functions. Two recent but separate lines of research address this question. The first describes one possible outcome of hybridization, called transgressive segregation, where hybrid offspring exhibit trait distributions outside of the parental range. The second considers the explicit mapping of form to function and illustrates manifold paths to similar function (called many to one mapping, MTOM) when the relationship between the two is complex. Under this scenario, functional novelty may be a product of the number of ways to elicit a functional outcome (i.e., the degree of MTOM). We fuse these research themes by considering the influence of MTOM on the production of transgressive jaw biomechanics in simulated hybrids between Lake Malawi cichlid species.


We characterized the component links and functional output (kinematic transmission, KT) of the 4-bar mechanism in the oral jaws of Lake Malawi cichlids. We demonstrated that the input and output links, the length of the lower jaw and the length of the maxilla respectively, have consistent but opposing relationships with KT. Based on these data, we predicted scenarios in which species with different morphologies but similar KT (MTOM species) would produce transgressive function in hybrids. We used a simple but realistic genetic model to show that transgressive function is a likely outcome of hybridization among Malawi species exhibiting MTOM. Notably, F2 hybrids are transgressive for function (KT), but not the component links that contribute to function. In our model, transgression is a consequence of recombination and assortment among alleles specifying the lengths of the lower jaw and maxilla.


We have described a general and likely pervasive mechanism that generates functional novelty. Simulations of hybrid offspring among Lake Malawi cichlids exhibiting MTOM produce transgressive function in the majority of cases, and at appreciable frequency. Functional transgression (i) is a product of recombination and assortment between alleles controlling the lengths of the lower jaw and the maxilla, (ii) occurs in the absence of transgressive morphology, and (iii) can be predicted from the morphology of parents. Our genetic model can be tested by breeding Malawi cichlid hybrids in the laboratory and examining the resulting range of forms and functions.   

A wealth of bioinformatics tools and databases has been created over the last decade and most are freely available to the general public. However, these valuable resources live a shadow existence compared to experimental results and methods that are widely published in journals and relatively easily found through publication databases such as PubMed. For the general scientist as well as bioinformaticists, these tools can deliver great value to the design and analysis of biological and medical experiments, but there is no inventory presenting an up-to-date and easily searchable index of all these resources. To remedy this, the BioWareDB search engine has been created. BioWareDB is an extensive and current catalog of software and databases of relevance to researchers in the fields of biology and medicine, and presently consists of 2800 validated entries. AVAILABILITY: BioWareDB is freely available over the Internet at http://www.biowaredb.org/   

SUMMARY: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, 'Sciencenet', which facilitates rapid searching over this large data space. By 'bringing the search engine to the data', we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. AVAILABILITY AND IMPLEMENTATION: The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the 'AskMe' experiment publisher is written in Python 2.7, and the backend 'YaCy' search engine is based on Java 1.6.   

FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank. AVAILABILITY: The system is available at http://www.nactem.ac.uk/software/facta/   

MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Source code for the scoring functions is available from http://proteomics.fhcrc.org   

Peptide mass fingerprinting (PMF) is a valuable method for rapid and high-throughput protein identification using the proteomics approach. Automated search engines, such as Ms-Fit, Mascot, ProFound, and Peptldent, have facilitated protein identification through PMF. The potential to obtain a true MS protein identification result depends on the choice of algorithm as well as experimental factors that influence the information content in MS data. When mass spectral data are incomplete and/or have low mass accuracy, the "number of matches" approach may be inadequate for a useful identification. Several studies have evaluated factors influencing the quality of mass spectrometry (MS) experiments. Missed cleavages, posttranslational modifications of peptides and contaminants (e.g., keratin) are important factors that can affect the results of MS analyses by influencing the identification process as well as the quality of the MS spectra. We compared search engines frequently used to identify proteins fromHomo sapiens andHalobacterium salinarum by evaluating factors, including data-based and mass tolerance to develop an improved search engine for PMF. This study may provide information to help develop a more effective algorithm for protein identification in each species through PMF.   

From both within and without bioethics, growing criticism of the predominant methods and practices of the field can be heard. These critiques tend to lament an emphasis on logically derived rules and philosophical theories that inadequately capture how and why people have the moral attitudes they do, and they urge the use of more empirically grounded social sciences--history, sociology, and anthropology--to draw attention to the complex factors behind such attitudes. However, these critiques do not go far enough, as they do not question why debate over ethical categories should have such a central role in voicing concerns about medicine. The importance of using other forms of inquiry, especially that of history, to examine aspects of medical practice and the emergence of bioethics itself is not simply to refine bioethical moral analysis. Instead, history can be employed to counter the preoccupation with translating concerns about medicine into moral terms and to move towards what is more sorely needed: a true medical humanism.   

Andromeda: a peptide search engine integrated into the MaxQuant environment   总被引:3,自引:0,他引:3  
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.   

Large proteomic data sets identifying hundreds or thousands of modified peptides are becoming increasingly common in the literature. Several methods for assessing the reliability of peptide identifications both at the individual peptide or data set level have become established. However, tools for measuring the confidence of modification site assignments are sparse and are not often employed. A few tools for estimating phosphorylation site assignment reliabilities have been developed, but these are not integral to a search engine, so require a particular search engine output for a second step of processing. They may also require use of a particular fragmentation method and are mostly only applicable for phosphorylation analysis, rather than post-translational modifications analysis in general. In this study, we present the performance of site assignment scoring that is directly integrated into the search engine Protein Prospector, which allows site assignment reliability to be automatically reported for all modifications present in an identified peptide. It clearly indicates when a site assignment is ambiguous (and if so, between which residues), and reports an assignment score that can be translated into a reliability measure for individual site assignments.   

Gynecomastia is a benign, abnormal, growth of the male breast gland which can occur unilaterally or bilaterally, resulting from a proliferation of glandular, fibrous and adipose tissue. Gynecomastia is characterised by the presence of soft, 2-4 cm in diameter, usually discusshaped enlargement of tissues under the nipple. It is estimated that this pathology occurs in 32-65% of men over the age of 17. Gynecomastia is a psychosocial problem and may lead to a perceived lowering of quality of life. The main cause of gynecomastia is a loss of equilibrium between oestrogens and androgens. Increased sensitivity for oestrogens of the breast gland, or local factors (e.g. an excessive synthesis of oestrogens in breast tissues or changes in oestrogen and androgen receptors) may cause gynecomastia. Also, prolactin, thyroxine, cortisol, human chorionic gonadotropin, leptin and receptors for human chorionic gonadotropin, prolactin and luteinizing hormone localised in tissues of the male breast   

