首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary: Traditional two-dimensional (2D) software programsfor drawing pedigrees are limited when dealing with extendedpedigrees. In successive generations, the number of individualsgrows exponentially, leading to an unworkable amount of spacerequired in the horizontal direction for 2D displays. In addition,it is not always possible to place closely related individualsnear each other due to the lack of space in 2Ds. To addressthese issues we have developed three-dimensional (3D) pedigreedrawing techniques to enable clearer visualization of extendedpedigrees. Currently no other methods are available for displayingextended pedigrees in 3Ds. We have made freely available a softwaretool—‘Celestial3D’—that implements thesenovel techniques. Availability: Freely available to non-commercial users Contact: celestial3d{at}genepi.org.au Supplementary information: www.genepi.org.au/celestial3d Associate Editor: Martin Bishop 1A more extensive list of software tools appears in the SupplementaryMaterial.  相似文献   

2.
Segmentation of cDNA microarray spots using markov random field modeling   总被引:3,自引:3,他引:0  
Motivation: Spot segmentation is a critical step in microarraygene expression data analysis. Therefore, the performance ofsegmentation may substantially affect the results of subsequentstages of the analysis, such as the detection of differentiallyexpressed genes. Several methods have been developed to segmentmicroarray spots from the surrounding background. In this study,we have proposed a new approach based on Markov random field(MRF) modeling and tested its performance on simulated and realmicroarray images against a widely used segmentation methodbased on Mann–Whitney test adopted by QuantArray software(Boston, MA). Spot addressing was performed using QuantArray.We have also devised a simulation method to generate microarrayimages with realistic features. Such images can be used as goldstandards for the purposes of testing and comparing differentsegmentation methods, and optimizing segmentation parameters. Results: Experiments on simulated and 14 actual microarray imagesets show that the proposed MRF-based segmentation method candetect spot areas and estimate spot intensities with higheraccuracy. Availability: The algorithms were implemented in MatlabTM (TheMathworks, Inc., Natick, MA) environment. The codes for MRF-basedsegmentation and image simulation methods are available uponrequest. Contact: demirkaya{at}ieee.org  相似文献   

3.
Summary: Using literature databases one can find not only knownand true relations between processes but also less studied,non-obvious associations. The main problem with discoveringsuch type of relevant biological information is ‘selection’.The ability to distinguish between a true correlation (e.g.between different types of biological processes) and randomchance that this correlation is statistically significant iscrucial for any bio-medical research, literature mining beingno exception. This problem is especially visible when searchingfor information which has not been studied and described inmany publications. Therefore, a novel bio-linguistic statisticalmethod is required, capable of ‘selecting’ truecorrelations, even when they are low-frequency associations.In this article, we present such statistical approach basedon Z-score and implemented in a web-based application ‘e-LiSe’. Availability: The software is available at http://miron.ibb.waw.pl/elise/ Contact: piotr{at}ibb.waw.pl Supplementary information: Supplementary materials are availableat http://miron.ibb.waw.pl/elise/supplementary/ Associate Editor: Alfonso Valencia  相似文献   

4.
Motivation: In recent years, several methods have been proposedfor determining metabolic pathways in an automated way basedon network topology. The aim of this work is to analyse thesemethods by tackling a concrete example relevant in biochemistry.It concerns the question whether even-chain fatty acids, beingthe most important constituents of lipids, can be convertedinto sugars at steady state. It was proved five decades agothat this conversion using the Krebs cycle is impossible unlessthe enzymes of the glyoxylate shunt (or alternative bypasses)are present in the system. Using this example, we can comparethe various methods in pathway analysis. Results: Elementary modes analysis (EMA) of a set of enzymescorresponding to the Krebs cycle, glycolysis and gluconeogenesissupports the scientific evidence showing that there is no pathwaycapable of converting acetyl-CoA to glucose at steady state.This conversion is possible after the addition of isocitratelyase and malate synthase (forming the glyoxylate shunt) tothe system. Dealing with the same example, we compare EMA withtwo tools based on graph theory available online, PathFindingand Pathway Hunter Tool. These automated network generatingtools do not succeed in predicting the conversions known fromexperiment. They sometimes generate unbalanced paths and revealproblems identifying side metabolites that are not responsiblefor the carbon net flux. This shows that, for metabolic pathwayanalysis, it is important to consider the topology (includingbimolecular reactions) and stoichiometry of metabolic systems,as is done in EMA. Contact: ldpf{at}minet.uni-jena.de; schuster{at}minet.uni-jena.de Supplementary information: Supplementary data are availableat Bioinformatics online. FOOTNOTES Associate Editor: Alfonso Valencia Received on July 24, 2008; revised on September 18, 2008; accepted on September 18, 2008  相似文献   

5.
Practical FDR-based sample size calculations in microarray experiments   总被引:5,自引:2,他引:3  
Motivation: Owing to the experimental cost and difficulty inobtaining biological materials, it is essential to considerappropriate sample sizes in microarray studies. With the growinguse of the False Discovery Rate (FDR) in microarray analysis,an FDR-based sample size calculation is essential. Method: We describe an approach to explicitly connect the samplesize to the FDR and the number of differentially expressed genesto be detected. The method fits parametric models for degreeof differential expression using the Expectation–Maximizationalgorithm. Results: The applicability of the method is illustrated withsimulations and studies of a lung microarray dataset. We proposeto use a small training set or published data from relevantbiological settings to calculate the sample size of an experiment. Availability: Code to implement the method in the statisticalpackage R is available from the authors. Contact: jhu{at}mdanderson.org  相似文献   

6.
Motivation: The quest for high-throughput proteomics has revealeda number of challenges in recent years. Whilst substantial improvementsin automated protein separation with liquid chromatography andmass spectrometry (LC/MS), aka ‘shotgun’ proteomics,have been achieved, large-scale open initiatives such as theHuman Proteome Organization (HUPO) Brain Proteome Project haveshown that maximal proteome coverage is only possible when LC/MSis complemented by 2D gel electrophoresis (2-DE) studies. Moreover,both separation methods require automated alignment and differentialanalysis to relieve the bioinformatics bottleneck and so makehigh-throughput protein biomarker discovery a reality. The purposeof this article is to describe a fully automatic image alignmentframework for the integration of 2-DE into a high-throughputdifferential expression proteomics pipeline. Results: The proposed method is based on robust automated imagenormalization (RAIN) to circumvent the drawbacks of traditionalapproaches. These use symbolic representation at the very earlystages of the analysis, which introduces persistent errors dueto inaccuracies in modelling and alignment. In RAIN, a third-ordervolume-invariant B-spline model is incorporated into a multi-resolutionschema to correct for geometric and expression inhomogeneityat multiple scales. The normalized images can then be compareddirectly in the image domain for quantitative differential analysis.Through evaluation against an existing state-of-the-art methodon real and synthetically warped 2D gels, the proposed analysisframework demonstrates substantial improvements in matchingaccuracy and differential sensitivity. High-throughput analysisis established through an accelerated GPGPU (general purposecomputation on graphics cards) implementation. Availability: Supplementary material, software and images usedin the validation are available at http://www.proteomegrid.org/rain/ Contact: g.z.yang{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: David Rocke  相似文献   

7.
Motivation: A large number of new DNA sequences with virtuallyunknown functions are generated as the Human Genome Projectprogresses. Therefore, it is essential to develop computer algorithmsthat can predict the functionality of DNA segments accordingto their primary sequences, including algorithms that can predictpromoters. Although several promoter-predicting algorithms areavailable, they have high false-positive detections and therate of promoter detection needs to be improved further. Results: In this research, PromFD, a computer program to recognizevertebrate RNA polymerase II promoters, has been developed.Both vertebrate promoters and non-promoter sequences are usedin the analysis. The promoters are obtained from the EukaryoticPromoter Database. Promoters are divided into a training setand a test set. Non-promoter sequences are obtained from theGenBank sequence databank, and are also divided into a trainingset and a test set. The first step is to search out, among allpossible permutations, patterns of strings 5–10 bp long,that are significantly over-represented in the promoter set.The program also searches IMD (Information Matrix Database)matrices that have a significantly higher presence in the promoterset. The results of the searches are stored in the PromFD database,and the program PromFD scores input DNA sequences accordingto their content of the database entries. PromFD predicts promoters—theirlocations and the location of potential TATA boxes, if found.The program can detect 71% of promoters in the training setwith a false-positive rate of under 1 in every 13 000 bp, and47% of promoters in the test set with a false-positive rateof under 1 in every 9800 bp. PromFD uses a new approach andits false-positive identification rate is better compared withother available promoter recognition algorithms. The sourcecode for PromFD is in the ‘c++’ language. Availability: PromFD is available for Unix platforms by anonymousftp to: beagle. colorado. edu, cd pub, get promFD.tar. A Javaversion of the program is also available for netscape 2.0, byhttp: // beagle.colorado.edu/chenq. Contact: E-mail: chenq{at}beagle.colorado.edu  相似文献   

8.
Motivation: Mass spectrometry (MS), such as the surface-enhancedlaser desorption and ionization time-of-flight (SELDI-TOF) MS,provides a potentially promising proteomic technology for biomarkerdiscovery. An important matter for such a technology to be usedroutinely is its reproducibility. It is of significant interestto develop quantitative measures to evaluate the quality andreliability of different experimental methods. Results: We compare the quality of SELDI-TOF MS data using unfractionated,fractionated plasma samples and abundant protein depletion methodsin terms of the numbers of detected peaks and reliability. Severalstatistical quality-control and quality-assessment techniquesare proposed, including the Graeco–Latin square designfor the sample allocation on a Protein chip, the use of thepairwise Pearson correlation coefficient as the similarity measurebetween the spectra in conjunction with multi-dimensional scaling(MDS) for graphically evaluating similarity of replicates andassessing outlier samples; and the use of the reliability ratiofor evaluating reproducibility. Our results show that the numberof peaks detected is similar among the three sample preparationtechnologies, and the use of the Sigma multi-removal kit doesnot improve peak detection. Fractionation of plasma samplesintroduces more experimental variability. The peaks detectedusing the unfractionated plasma samples have the highest reproducibilityas determined by the reliability ratio. Availability: Our algorithm for assessment of SELDI-TOF experimentquality is available at http://www.biostat.harvard.edu/~xlin Contact: harezlak{at}post.harvard.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Thomas Lengauer  相似文献   

9.
Summary: Cross-mapping of gene and protein identifiers betweendifferent databases is a tedious and time-consuming task. Toovercome this, we developed CRONOS, a cross-reference serverthat contains entries from five mammalian organisms presentedby major gene and protein information resources. Sequence similarityanalysis of the mapped entries shows that the cross-referencesare highly accurate. In total, up to 18 different identifiertypes can be used for identification of cross-references. Thequality of the mapping could be improved substantially by exclusionof ambiguous gene and protein names which were manually validated.Organism-specific lists of ambiguous terms, which are valuablefor a variety of bioinformatics applications like text miningare available for download. Availability: CRONOS is freely available to non-commercial usersat http://mips.gsf.de/genre/proj/cronos/index.html, web servicesare available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl. Contact: brigitte.waegele{at}helmholtz-muenchen.de Supplementary information: Supplementary data are availableat Bioinformatics online. The online Supplementary Materialcontains all figures and tables referenced by this article. Associate Editor: Martin Bishop  相似文献   

10.
Motivation: Understanding the complexity in gene–phenotyperelationship is vital for revealing the genetic basis of commondiseases. Recent studies on the basis of human interactome andphenome not only uncovers prevalent phenotypic overlap and geneticoverlap between diseases, but also reveals a modular organizationof the genetic landscape of human diseases, providing new opportunitiesto reduce the complexity in dissecting the gene–phenotypeassociation. Results: We provide systematic and quantitative evidence thatphenotypic overlap implies genetic overlap. With these results,we perform the first heterogeneous alignment of human interactomeand phenome via a network alignment technique and identify 39disease families with corresponding causative gene networks.Finally, we propose AlignPI, an alignment-based framework topredict disease genes, and identify plausible candidates for70 diseases. Our method scales well to the whole genome, asdemonstrated by prioritizing 6154 genes across 37 chromosomeregions for Crohn's disease (CD). Results are consistent witha recent meta-analysis of genome-wide association studies forCD. Availability: Bi-modules and disease gene predictions are freelyavailable at the URL http://bioinfo.au.tsinghua.edu.cn/alignpi/ Contact: ruijiang{at}tsinghua.edu.cn Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Trey Ideker  相似文献   

11.
Motivation: After 10-year investigations, the folding mechanismsof β-hairpins are still under debate. Experiments stronglysupport zip-out pathway, while most simulations prefer the hydrophobiccollapse model (including middle-out and zip-in pathways). Inthis article, we show that all pathways can occur during thefolding of β-hairpins but with different probabilities.The zip-out pathway is the most probable one. This is in agreementwith the experimental results. We came to our conclusions by38 100-ns room-temperature all-atom molecular dynamics simulationsof the β-hairpin trpzip2. Our results may help to clarifythe inconsistencies in the current pictures of β-hairpinfolding mechanisms. Contact: yxiao{at}mail.hust.edu.cn Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Anna Tramontano  相似文献   

12.
13.
14.
A multivariate test of association   总被引:1,自引:0,他引:1  
Summary: Although genetic association studies often test multiple,related phenotypes, few formal multivariate tests of associationare available. We describe a test of association that can beefficiently applied to large population-based designs. Availability: A C++ implementation can be obtained from theauthors. Contact: manuel.ferreira{at}qimr.edu.au Supplementary information: Supplementary figures are availableat Bioinformatics online. Associate Editor: Alex Bateman  相似文献   

15.
Motivation: Pair-wise residue-residue contacts in proteins canbe predicted from both threading templates and sequence-basedmachine learning. However, most structure modeling approachesonly use the template-based contact predictions in guiding thesimulations; this is partly because the sequence-based contactpredictions are usually considered to be less accurate thanthat by threading. With the rapid progress in sequence databasesand machine-learning techniques, it is necessary to have a detailedand comprehensive assessment of the contact-prediction methodsin different template conditions. Results: We develop two methods for protein-contact predictions:SVM-SEQ is a sequence-based machine learning approach whichtrains a variety of sequence-derived features on contact maps;SVM-LOMETS collects consensus contact predictions from multiplethreading templates. We test both methods on the same set of554 proteins which are categorized into ‘Easy’,‘Medium’, ‘Hard’ and ‘Very Hard’targets based on the evolutionary and structural distance betweentemplates and targets. For the Easy and Medium targets, SVM-LOMETSobviously outperforms SVM-SEQ; but for the Hard and Very Hardtargets, the accuracy of the SVM-SEQ predictions is higher thanthat of SVM-LOMETS by 12–25%. If we combine the SVM-SEQand SVM-LOMETS predictions together, the total number of correctlypredicted contacts in the Hard proteins will increase by morethan 60% (or 70% for the long-range contact with a sequenceseparation 24), compared with SVM-LOMETS alone. The advantageof SVM-SEQ is also shown in the CASP7 free modeling targetswhere the SVM-SEQ is around four times more accurate than SVM-LOMETSin the long-range contact prediction. These data demonstratethat the state-of-the-art sequence-based contact predictionhas reached a level which may be helpful in assisting tertiarystructure modeling for the targets which do not have close structuretemplates. The maximum yield should be obtained by the combinationof both sequence- and template-based predictions. Contact: yzhang{at}ku.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Anna Tramontano  相似文献   

16.
A fuzzy guided genetic algorithm for operon prediction   总被引:4,自引:0,他引:4  
Motivation: The operon structure of the prokaryotic genome isa critical input for the reconstruction of regulatory networksat the whole genome level. As experimental methods for the detectionof operons are difficult and time-consuming, efforts are beingput into developing computational methods that can use availablebiological information to predict operons. Method: A genetic algorithm is developed to evolve a startingpopulation of putative operon maps of the genome into progressivelybetter predictions. Fuzzy scoring functions based on multiplecriteria are used for assessing the ‘fitness’ ofthe newly evolved operon maps and guiding their evolution. Results: The algorithm organizes the whole genome into operons.The fuzzy guided genetic algorithm-based approach makes it possibleto use diverse biological information like genome sequence data,functional annotations and conservation across multiple genomes,to guide the organization process. This approach does not requireany prior training with experimental operons. The predictionsfrom this algorithm for Escherchia coli K12 and Bacillus subtilisare evaluated against experimentally discovered operons forthese organisms. The accuracy of the method is evaluated usingan ROC (receiver operating characteristic) analysis. The areaunder the ROC curve is around 0.9, which indicates excellentaccuracy. Contact: roschen_csir{at}rediffmail.com  相似文献   

17.
Summary: Taverna is an application that eases the integrationof tools and databases for life science research by the constructionof workflows. The Taverna Interaction Service extends the functionalityof Taverna by defining human interaction within a workflow andacting as a mediation layer between the automated workflow engineand one or more users. Availability: Taverna, the Interaction Service plug-in and webapplication are available as open source and can be downloadedfrom http://taverna.sourceforge.net/ Contact: taverna-users{at}lists.sourceforge.net Associate Editor: John Quackenbush  相似文献   

18.
Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong  相似文献   

19.
20.
An olfactory receptor database (ORDB) is being developed tofacilitate analysis of this large gene family. ORDB currentlycontains over 400 olfactory receptor sequences and related information,and is available via the World Wide Web. We plan to incorporatefunctional data, structural models, spatial localization andother categories of information, toward an integrated modelof olfactory receptor function. Chem. Senses 22: 321–326,1997.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号