首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary: PCCA (phylogenetic canonical correlation analysis)is a new program for canonical correlation analysis of multivariate,continuously valued data from biological species. Canonicalcorrelation analysis is a technique in which derived variablesare obtained from two sets of original variables whereby thecorrelations between corresponding derived variables are maximized.It is a very useful multivariate statistical method for thecalculation and analysis of correlations between character sets.The program controls for species non-independence due to phylogenetichistory and computes canonical coefficients, correlations andscores; and conducts hypothesis tests on the canonical correlations.It can also compute a multivariate version of Pagel's , whichcan then be used in the phylogenetic transformation. Availability: PCCA is distributed as DOS/Windows, Mac OS X andLinux/Unix executables with a detailed program manual and isfreely available on the World Wide Web at: http://anolis.oeb.harvard.edu/~liam/programs/. Contact: lrevell{at}fas.harvard.edu Associate Editor: Keith Crandall  相似文献   

2.
Summary: Accurate estimation of DNA copy numbers from arraycomparative genomic hybridization (CGH) data is important forcharacterizing the cancer genome. An important part of thisprocess is the segmentation of the log-ratios between the sampleand control DNA along the chromosome into regions of differentcopy numbers. However, multiple algorithms are available inthe literature for this procedure and the results can vary substantiallyamong these. Thus, a visualization tool that can display thesegmented profiles from a number of methods can be helpful tothe biologist or the clinician to ascertain that a feature ofinterest did not arise as an artifact of the algorithm. Sucha tool also allows the methodologist to easily contrast hismethod against others. We developed a web-based tool that applies a number of popularalgorithms to a single array CGH profile entered by the user.It generates a heatmap panel of the segmented profiles for eachmethod as well as a consensus profile. The clickable heatmapcan be moved along the chromosome and zoomed in or out. It alsodisplays the time that each algorithm took and provides numericalvalues of the segmented profiles for download. The web interfacecalls algorithms written in the statistical language R. We encouragedevelopers of new algorithms to submit their routines to beincorporated into the website. Availability: http://compbio.med.harvard.edu/CGHweb Contact: peter_park{at}harvard.edu Associate Editor: Keith Crandall  相似文献   

3.
Motivation: To enable a new way of submitting sequence informationto the EMBL nucleotide database through the WWW. This processof data submission is long and complex, and calls for efficientand user-friendly mechanisms for collection and validation ofinformation. Results: Described here is a generic, object-oriented data-submissionsystem that is being used for the EMBL database, but can easilybe tailored to serve several data-submission schemes with arelatively short development and implementation time. The programprovides the user with a friendly interface that breaks thecomplex task into smaller, more manageable tasks and, on theother hand, acts as a pre-filter, scanning errors online. Availability: The program is accessible through the EMBL-EBlWWW server at the URL: http: //www.ebi.ac.uk/subs/ emblsubs.html Contact: E-mail: bshomer{at}ebi.ac.uk  相似文献   

4.
Motivation: Mass spectrometry (MS), such as the surface-enhancedlaser desorption and ionization time-of-flight (SELDI-TOF) MS,provides a potentially promising proteomic technology for biomarkerdiscovery. An important matter for such a technology to be usedroutinely is its reproducibility. It is of significant interestto develop quantitative measures to evaluate the quality andreliability of different experimental methods. Results: We compare the quality of SELDI-TOF MS data using unfractionated,fractionated plasma samples and abundant protein depletion methodsin terms of the numbers of detected peaks and reliability. Severalstatistical quality-control and quality-assessment techniquesare proposed, including the Graeco–Latin square designfor the sample allocation on a Protein chip, the use of thepairwise Pearson correlation coefficient as the similarity measurebetween the spectra in conjunction with multi-dimensional scaling(MDS) for graphically evaluating similarity of replicates andassessing outlier samples; and the use of the reliability ratiofor evaluating reproducibility. Our results show that the numberof peaks detected is similar among the three sample preparationtechnologies, and the use of the Sigma multi-removal kit doesnot improve peak detection. Fractionation of plasma samplesintroduces more experimental variability. The peaks detectedusing the unfractionated plasma samples have the highest reproducibilityas determined by the reliability ratio. Availability: Our algorithm for assessment of SELDI-TOF experimentquality is available at http://www.biostat.harvard.edu/~xlin Contact: harezlak{at}post.harvard.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Thomas Lengauer  相似文献   

5.
Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.  相似文献   

6.
Summary: We present a large-scale implementation of the RANKPROPprotein homology ranking algorithm in the form of an openlyaccessible web server. We use the NRDB40 PSI-BLAST all-versus-allprotein similarity network of 1.1 million proteins to constructthe graph for the RANKPROP algorithm, whereas previously, resultswere only reported for a database of 108 000 proteins. We alsodescribe two algorithmic improvements to the original algorithm,including propagation from multiple homologs of the query andbetter normalization of ranking scores, that lead to higheraccuracy and to scores with a probabilistic interpretation. Availability: The RANKPROP web server and source code are availableat http://rankprop.gs.washington.edu Contact: iain{at}nec-labs.com; noble{at}gs.washington.edu Associate Editor: Burkhard Rost  相似文献   

7.
Motivation: As the use of microarrays in human studies continuesto increase, stringent quality assurance is necessary to ensureaccurate experimental interpretation. We present a formal approachfor microarray quality assessment that is based on dimensionreduction of established measures of signal and noise componentsof expression followed by parametric multivariate outlier testing. Results: We applied our approach to several data resources.First, as a negative control, we found that the Affymetrix andIllumina contributions to MAQC data were free from outliersat a nominal outlier flagging rate of =0.01. Second, we createda tunable framework for artificially corrupting intensity datafrom the Affymetrix Latin Square spike-in experiment to allowinvestigation of sensitivity and specificity of quality assurance(QA) criteria. Third, we applied the procedure to 507 Affymetrixmicroarray GeneChips processed with RNA from human peripheralblood samples. We show that exclusion of arrays by this approachsubstantially increases inferential power, or the ability todetect differential expression, in large clinical studies. Availability: http://bioconductor.org/packages/2.3/bioc/html/arrayMvout.htmland http://bioconductor.org/packages/2.3/bioc/html/affyContam.htmlaffyContam (credentials: readonly/readonly) Contact: aasare{at}immunetolerance.org; stvjc{at}channing.harvard.edu The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. Associate Editor: Trey Ideker  相似文献   

8.
Post-processing of BLAST results using databases of clustered sequences   总被引:1,自引:0,他引:1  
Motivation: When evaluating the results of a sequence similaritysearch, there are many situations where it can be useful todetermine whether sequences appearing in the results share somedistinguishing characteristic. Such dependencies between databaseentries are often not readily identifiable, but can yield importantnew insights into the biological function of a gene or protein. Results: We have developed a program called CBLAST that sortsthe results of a BLAST sequence similarity search accordingto sequence membership in user-defined ‘clusters’of sequences. To demonstrate the utility of this application,we have constructed two cluster databases. The first describesclusters of nucleotide sequences representing the same gene,as documented in the UNIGENE database, and the second describesclusters of protein sequences which are members of the proteinfamilies documented in the PROSITE database. Cluster databasesand the CBLAST post-processor provide an efficient mechanismfor identifying and exploring relationships and dependenciesbetween new sequences and database entries. Availability: The software described in this article is availablefree of charge from the EBI software archive at < ftp: //ftp.ebi. ac. uk/pub/software/unix >. Contact: E-mail: rainer _fuchs@glaxowellcome.com  相似文献   

9.
Summary: ROBIN is a web server for analyzing genome rearrangementof block-interchanges between two chromosomal genomes. It takestwo or more linear/circular chromosomes as its input, and computesthe number of minimum block-interchange rearrangements betweenany two input chromosomes for transforming one chromosome intoanother and also determines an optimal scenario taking thisnumber of rearrangements. The input can be either bacterial-sizesequence data or landmark-order data. If the input is sequencedata, ROBIN will automatically search for the identical landmarksthat are the homologous/conserved regions shared by all theinput sequences. Availability: ROBIN is freely accessed at http://genome.life.nctu.edu.tw/ROBIN Contact: cllu{at}mail.nctu.edu.tw  相似文献   

10.
11.
WorfDB (Worm ORFeome DataBase; http://worfdb.dfci.harvard.edu) was created to integrate and disseminate the data from the cloning of complete set of approximately 19 000 predicted protein-encoding Open Reading Frames (ORFs) of Caenorhabditis elegans (also referred to as the 'worm ORFeome'). WorfDB serves as a central data repository enabling the scientific community to search for availability and quality of cloned ORFs. So far, ORF sequence tags (OSTs) obtained for all individual clones have allowed exon structure corrections for approximately 3400 ORFs originally predicted by the C. elegans sequencing consortium. In addition, we now have OSTs for approximately 4300 predicted genes for which no ESTs were available. The database contains this OST information along with data pertinent to the cloning process. WorfDB could serve as a model database for other metazoan ORFeome cloning projects.  相似文献   

12.
The SGN comparative map viewer   总被引:1,自引:0,他引:1  
Motivation: With the rapid accumulation of genetic data fora multitude of different species, the availability of intuitivecomparative genomic tools becomes an important requirement forthe research community. Here we describe a web-based comparativeviewer for mapping data, including genetic, physical and cytologicalmaps, that is part of the SGN website (http://sgn.cornell.edu/)but that can also be installed and adapted for other websites.In addition to viewing and comparing different maps stored inthe SGN database, the viewer allows users to upload their ownmaps and compare them to other maps in the system. The vieweris implemented in object oriented Perl, with a simple extensibleinterface to write data adapters for other relational databaseschemas and flat file formats. Contact: lam87{at}cornell.edu Associate Editor: Alex Bateman  相似文献   

13.
14.
Motivation: The key to MS -based proteomics is peptide sequencing.The major challenge in peptide sequencing, whether library searchor de novo, is to better infer statistical significance andbetter attain noise reduction. Since the noise in a spectrumdepends on experimental conditions, the instrument used andmany other factors, it cannot be predicted even if the peptidesequence is known. The characteristics of the noise can onlybe uncovered once a spectrum is given. We wish to overcome suchissues. Results: We designed RAId to identify peptides from their associatedtandem mass spectrometry data. RAId performs a novel de novosequencing followed by a search in a peptide library that wecreated. Through de novo sequencing, we establish the spectrum-specificbackground score statistics for the library search. When thedatabase search fails to return significant hits, the top-rankingde novo sequences become potential candidates for new peptidesthat are not yet in the database. The use of spectrum-specificbackground statistics seems to enable RAId to perform well evenwhen the spectral quality is marginal. Other important featuresof RAId include its potential in de novo sequencing alone andthe ease of incorporating post-translational modifications. Availability: Programs implementing the methods described areavailable from the authors on request. Contact: yyu{at}ncbi.nlm.nih.gov Supplementary information: ftp://ftp.ncbi.nih.gov/pub/yyu/Proteomics/MSMS/RAId/MSMS_bioinfo_supp.pdf  相似文献   

15.
Motivation: Staining the human metaphase chromosomes revealscharacteristic banding patterns known as cytogenetic bands orcytobands. Using technologies based on metaphase chromosomes,researchers have accumulated much knowledge about the correlationsbetween human diseases and specific cytoband aberrations, indicatingthe presence of disease-associated genes in those bands. Withthe progress of human genome project and techniques such asfluorescent in situ hybridization, many genes have been assignedto the cytobands and annotated in public databases, making itpossible to find all genes in the disease-related cytobandsthrough database queries. However, finding genes in cytobandsremains an imprecise process, partly due to the insufficiencyof current methods for cytoband queries, especially for thosebased on cytogenetic annotations. Results: By transforming the cytoband annotations into numericalsegments, a new query method is developed that is able to accuratelydefine any cytogenetic ranges in human chromosomes. A querysystem (designated cytoband query sys CQS) is implemented usingcytogenetic annotations in the public domain. Judged by a performancetest, CQS executed as accurately as expected using cytogeneticannotations from NCBI Map Viewer. The new method is scalableand can be applied to genomes from other species. Availability: The CQS is freely accessible over the Internetat http://moris.csie.ncku.edu.tw/cqs/ Contact: clh9{at}mail.ncku.edu.tw Supplementary information: http://moris.csie.ncku.edu.tw/cqs/  相似文献   

16.
17.
Motivation: Reliable structural modelling of protein–proteincomplexes has widespread application, from drug design to advancingour knowledge of protein interactions and function. This workaddresses three important issues in protein–protein docking:implementing backbone flexibility, incorporating prior indicationsfrom experiment and bioinformatics, and providing public accessvia a server. 3D-Garden (Global And Restrained Docking ExplorationNexus), our benchmarked and server-ready flexible docking system,allows sophisticated programming of surface patches by the uservia a facet representation of the interactors’ molecularsurfaces (generated with the marching cubes algorithm). Flexibilityis implemented as a weighted exhaustive conformer search foreach clashing pair of molecular branches in a set of 5000 modelsfiltered from around 340 000 initially. Results: In a non-global assessment, carried out strictly accordingto the protocols for number of models considered and model qualityof the Critical Assessment of Protein Interactions (CAPRI) experiment,over the widely-used Benchmark 2.0 of 84 complexes, 3D-Gardenidentifies a set of ten models containing an acceptable or bettermodel in 29/45 test cases, including one with large conformationalchange. In 19/45 cases an acceptable or better model is rankedfirst or second out of 340 000 candidates. Availability: http://www.sbg.bio.ic.ac.uk/3dgarden (server) Contact: v.lesk{at}ic.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Burkhard Rost  相似文献   

18.
ABSTRACT: BACKGROUND: Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors and it is important to have simple and reliable approaches for monitoring it at the level of individual experiments. RESULTS: We developed a fast, scalable and accurate approach to estimating error rates in short reads, which has the added advantage of not requiring a reference genome. We build on the fundamental observation that there is a linear relationship between the copy number for a given read and the number of erroneous reads that differ from the read of interest by one or two bases. The slope of this relationship can be transformed to give an estimate of the error rate, both by read and by position. We present simulation studies as well as analyses of real data sets illustrating the precision and accuracy of this method, and we show that it is more accurate than alternatives that count the difference between the sample of interest and the reference genome. We show how this methodology led to the detection of mutations in the genome of the PhiX strain used for calibration of Illumina data. The proposed method is implemented in an R package, which can be downloaded from http://bcb.dfci.harvard.edu/~vwang/shadowRegression.html, and will be submitted to Bioconductor upon publication of this article. CONCLUSIONS: The proposed method can be used to monitor the quality of sequencing pipelines at the level of individual experiments without the use of reference genomes. Furthermore, having an estimate of the error rates gives one the opportunity to improve analyses and inferences in many applications of next-generation sequencing data.  相似文献   

19.
Motivation: A large number of new DNA sequences with virtuallyunknown functions are generated as the Human Genome Projectprogresses. Therefore, it is essential to develop computer algorithmsthat can predict the functionality of DNA segments accordingto their primary sequences, including algorithms that can predictpromoters. Although several promoter-predicting algorithms areavailable, they have high false-positive detections and therate of promoter detection needs to be improved further. Results: In this research, PromFD, a computer program to recognizevertebrate RNA polymerase II promoters, has been developed.Both vertebrate promoters and non-promoter sequences are usedin the analysis. The promoters are obtained from the EukaryoticPromoter Database. Promoters are divided into a training setand a test set. Non-promoter sequences are obtained from theGenBank sequence databank, and are also divided into a trainingset and a test set. The first step is to search out, among allpossible permutations, patterns of strings 5–10 bp long,that are significantly over-represented in the promoter set.The program also searches IMD (Information Matrix Database)matrices that have a significantly higher presence in the promoterset. The results of the searches are stored in the PromFD database,and the program PromFD scores input DNA sequences accordingto their content of the database entries. PromFD predicts promoters—theirlocations and the location of potential TATA boxes, if found.The program can detect 71% of promoters in the training setwith a false-positive rate of under 1 in every 13 000 bp, and47% of promoters in the test set with a false-positive rateof under 1 in every 9800 bp. PromFD uses a new approach andits false-positive identification rate is better compared withother available promoter recognition algorithms. The sourcecode for PromFD is in the ‘c++’ language. Availability: PromFD is available for Unix platforms by anonymousftp to: beagle. colorado. edu, cd pub, get promFD.tar. A Javaversion of the program is also available for netscape 2.0, byhttp: // beagle.colorado.edu/chenq. Contact: E-mail: chenq{at}beagle.colorado.edu  相似文献   

20.
Summary: The development of robust high-performance liquid chromatography(HPLC) technologies continues to improve the detailed analysisand sequencing of glycan structures released from glycoproteins.Here, we present a database (GlycoBase) and analytical tool(autoGU) to assist the interpretation and assignment of HPLC-glycanprofiles. GlycoBase is a relational database which containsthe HPLC elution positions for over 350 2-AB labelled N-glycanstructures together with predicted products of exoglycosidasedigestions. AutoGU assigns provisional structures to each integratedHPLC peak and, when used in combination with exoglycosidasedigestions, progressively assigns each structure automaticallybased on the footprint data. These tools are potentially verypromising and facilitate basic research as well as the quantitativehigh-throughput analysis of low concentrations of glycans releasedfrom glycoproteins. Availability: http://glycobase.ucd.ie Contact: matthew.campbell{at}nibrt.ie Associate Editor: Limsoon Wong Present address: Dublin-Oxford Glycobiology Laboratory, NationalInstitute for Bioprocessing Research and Training, Conway Institute,University College Dublin, Dublin, Ireland. Present address: Ludger Ltd, Culham Science Centre, Abingdon,Oxfordshire OX14 3EB., UK.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号