首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genome sequencing projects are either based on whole genome shotgun (WGS) or on a BAC-by-BAC strategy. Although WGS is in most cases the preferred choice, sometimes the BAC-by-BAC approach may be better because it requires a much simpler assembly process. Furthermore, when the study is limited to specific regions of the genome, the WGS would require an unjustified effort, making the BAC-by-BAC the only feasible strategy. In this paper we describe an informatics pipeline called PABS (Platform Assisted BAC-by-BAC Sequencing) that we developed to provide a tool to optimize the BAC-by-BAC sequencing strategy. PABS has two main functions: (i) PABS-Select, to choose suitable overlapping clones; and (ii) PABS-Validate, to verify whether a BAC under analysis is actually overlapping the neighboring BAC.  相似文献   

2.
Holen T 《RNA (New York, N.Y.)》2006,12(9):1620-1625
RNAi interference and siRNA have become useful tools for investigation of gene function. However, the discovery that not all siRNA are equally efficient made necessary screens or design algorithms to obtain high activity siRNA candidates. Several algorithms have been published, but they remain inefficient, obscure, or commercially restricted. This article describes an open-source JAVA program that is surprisingly efficient at predicting active siRNAs (Pearson correlation coefficient r = 0.52, n = 526 siRNAs). Furthermore, this version 1.0 sets the stage for further improvement of the free code by the open-source community (http://sourceforge.net/).  相似文献   

3.
pROC: an open-source package for R and S+ to analyze and compare ROC curves   总被引:3,自引:0,他引:3  

Background  

Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.  相似文献   

4.
A lack of pliant software tools that support small- to medium-scale DNA sequencing efforts is a major hindrance for recording and using laboratory workflow information to monitor the overall quality of data production. Here we describe VSQual, a set of Perl programs intended to provide simple and powerful tools to check several quality features of the sequencing data generated by automated DNA sequencing machines. The core program of VSQual is a flexible Perl-based pipeline, designed to be accessible and useful for both programmers and non-programmers. This pipeline directs the processing steps and can be easily customized for laboratory needs. Basically, the raw DNA sequencing trace files are processed by Phred and Cross_match, then the outputs are parsed, reformatted into Web-based graphical reports, and added to a Web site structure. The result is a set of real time sequencing reports easily accessible and understood by common laboratory people. These reports facilitate the monitoring of DNA sequencing as well as the management of laboratory workflow, significantly reducing operational costs and ensuring high quality and scientifically reliable results.  相似文献   

5.
X-windows based microscopy image processing package (Xmipp) is a specialized suit of image processing programs, primarily aimed at obtaining the 3D reconstruction of biological specimens from large sets of projection images acquired by transmission electron microscopy. This public-domain software package was introduced to the electron microscopy field eight years ago, and since then it has changed drastically. New methodologies for the analysis of single-particle projection images have been added to classification, contrast transfer function correction, angular assignment, 3D reconstruction, reconstruction of crystals, etc. In addition, the package has been extended with functionalities for 2D crystal and electron tomography data. Furthermore, its current implementation in C++, with a highly modular design of well-documented data structures and functions, offers a convenient environment for the development of novel algorithms. In this paper, we present a general overview of a new generation of Xmipp that has been re-engineered to maximize flexibility and modularity, potentially facilitating its integration in future standardization efforts in the field. Moreover, by focusing on those developments that distinguish Xmipp from other packages available, we illustrate its added value to the electron microscopy community.  相似文献   

6.
ESTWeb is an internet based software package designed for uniform data processing and storage for large-scale EST sequencing projects. The package provides for: (a) reception of sequencing chromatograms; (b) sequence processing such as base-calling, vector screening, comparison with public databases; (c) storage of data and analysis in a relational database, (d) generation of a graphical report of individual sequence quality; and (e) issuing of reports with statistics of productivity and redundancy. The software facilitates real-time monitoring and evaluation of EST sequence acquisition progress along an EST sequencing project.  相似文献   

7.
8.
A quality control algorithm for DNA sequencing projects.   总被引:2,自引:0,他引:2       下载免费PDF全文
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.  相似文献   

9.
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn’t taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.  相似文献   

10.
《Gene》1998,208(1):31-35
We describe two Java applets which are useful for insightful presentation of intermediate experimental data in gene discovery projects involving large scale sequencing. One of these applets provides a physical map of a genomic region and provides easy access to the second applet, which furnishes a detailed map of sequence contigs associated with clones on the physical map. In particular, the second applet displays all the known information about each contig, including the presence of exons, database homology `hits', repetitive elements and other features; the graphics are linked to other World Wide Web pages, providing detailed information on each feature. These applets should be useful to other research groups working on large sequencing projects.  相似文献   

11.
12.
A frameshift error detection algorithm for DNA sequencing projects.   总被引:3,自引:1,他引:2       下载免费PDF全文
During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.  相似文献   

13.
Population genetics simulation models are useful tools to study the effects of demography and environmental factors on genetic variation and genetic differentiation. They allow for studying species and populations with complex life histories, spatial distribution and many other complicating factors that make analytical treatment impracticable. Most simulation models are individual‐based: this poses a limitation to simulation of very large populations because of the limits in computer memory and long computation times. To overcome these limitations, we propose an intermediate approach that allows modelling of very complex demographic scenarios, which would be intractable with analytical models, and removes the limitations imposed by large population size, which affect individual‐based simulation models. We implement this approach in a software package for the r environment, MetaPopGen. The innovative concept of this approach with respect to the other population genetic simulators is that it focuses on genotype numbers rather than on individuals. Genotype numbers are iterated through time by using random number generators for appropriate probabilistic distributions to reproduce the stochasticity inherent to Mendelian segregation, survival, dispersal and reproduction. Features included in the model are age structure, monoecious and dioecious (or separate sexes) life cycles, mutation, dispersal and selection. The model simulates only one locus at a time. All demographic parameters can be genotype‐, sex‐, age‐, deme‐ and time‐dependent. MetaPopGen is therefore indicated to study large populations and very complex demographic scenarios. We illustrate the capabilities of MetaPopGen by applying it to the case of a marine fish metapopulation in the Mediterranean Sea.  相似文献   

14.
Molecular tip dating of phylogenetic trees is a growing discipline that uses DNA sequences sampled at different points in time to coestimate the timing of evolutionary events with rates of molecular evolution. In this context, beast , a program for Bayesian analysis of molecular sequences, is the most widely used phylogenetic tool. Here, we introduce tipdatingbeast , an r package built to assist the implementation of various phylogenetic tip‐dating tests using beast . tipdatingbeast currently contains two main functions. The first one allows preparing date‐randomization analyses, which assess the temporal signal of a data set. The second function allows performing leave‐one‐out analyses, which test for the consistency between independent calibration sequences and allow pinpointing those leading to potential bias. We apply those functions to an empirical data set and supply practical guidance for results interpretation.  相似文献   

15.
The Red List Categories and the accompanying five criteria developed by the International Union for Conservation of Nature (IUCN) provide an authoritative and comprehensive methodology to assess the conservation status of organisms. Red List criterion B, which principally uses distribution data, is the most widely used to assess conservation status, particularly of plant species. No software package has previously been available to perform large‐scale multispecies calculations of the three main criterion B parameters [extent of occurrence (EOO), area of occupancy (AOO) and an estimate of the number of locations] and provide preliminary conservation assessments using an automated batch process. We developed ConR, a dedicated R package, as a rapid and efficient tool to conduct large numbers of preliminary assessments, thereby facilitating complete Red List assessment. ConR (1) calculates key geographic range parameters (AOO and EOO) and estimates the number of locations sensu IUCN needed for an assessment under criterion B; (2) uses this information in a batch process to generate preliminary assessments of multiple species; (3) summarize the parameters and preliminary assessments in a spreadsheet; and (4) provides a visualization of the results by generating maps suitable for the submission of full assessments to the IUCN Red List. ConR can be used for any living organism for which reliable georeferenced distribution data are available. As distributional data for taxa become increasingly available via large open access datasets, ConR provides a novel, timely tool to guide and accelerate the work of the conservation and taxonomic communities by enabling practitioners to conduct preliminary assessments simultaneously for hundreds or even thousands of species in an efficient and time‐saving way.  相似文献   

16.

Background  

Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performedin silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making.  相似文献   

17.
The colorless, large sulfur bacteria are well known because of their intriguing appearance, size and abundance in sulfidic settings. Since their discovery in 1803 these bacteria have been classified according to their conspicuous morphology. However, in microbiology the use of morphological criteria alone to predict phylogenetic relatedness has frequently proven to be misleading. Recent sequencing of a number of 16S rRNA genes of large sulfur bacteria revealed frequent inconsistencies between the morphologically determined taxonomy of genera and the genetically derived classification. Nevertheless, newly described bacteria were classified based on their morphological properties, leading to polyphyletic taxa. We performed sequencing of 16S rRNA genes and internal transcribed spacer (ITS) regions, together with detailed morphological analysis of hand-picked individuals of novel non-filamentous as well as known filamentous large sulfur bacteria, including the hitherto only partially sequenced species Thiomargarita namibiensis, Thioploca araucae and Thioploca chileae. Based on 128 nearly full-length 16S rRNA-ITS sequences, we propose the retention of the family Beggiatoaceae for the genera closely related to Beggiatoa, as opposed to the recently suggested fusion of all colorless sulfur bacteria into one family, the Thiotrichaceae. Furthermore, we propose the addition of nine Candidatus species along with seven new Candidatus genera to the family Beggiatoaceae. The extended family Beggiatoaceae thus remains monophyletic and is phylogenetically clearly separated from other related families.  相似文献   

18.

Background

The majority of ovarian cancer biomarker discovery efforts focus on the identification of proteins that can improve the predictive power of presently available diagnostic tests. We here show that metabolomics, the study of metabolic changes in biological systems, can also provide characteristic small molecule fingerprints related to this disease.

Results

In this work, new approaches to automatic classification of metabolomic data produced from sera of ovarian cancer patients and benign controls are investigated. The performance of support vector machines (SVM) for the classification of liquid chromatography/time-of-flight mass spectrometry (LC/TOF MS) metabolomic data focusing on recognizing combinations or "panels" of potential metabolic diagnostic biomarkers was evaluated. Utilizing LC/TOF MS, sera from 37 ovarian cancer patients and 35 benign controls were studied. Optimum panels of spectral features observed in positive or/and negative ion mode electrospray (ESI) MS with the ability to distinguish between control and ovarian cancer samples were selected using state-of-the-art feature selection methods such as recursive feature elimination and L1-norm SVM.

Conclusion

Three evaluation processes (leave-one-out-cross-validation, 12-fold-cross-validation, 52-20-split-validation) were used to examine the SVM models based on the selected panels in terms of their ability for differentiating control vs. disease serum samples. The statistical significance for these feature selection results were comprehensively investigated. Classification of the serum sample test set was over 90% accurate indicating promise that the above approach may lead to the development of an accurate and reliable metabolomic-based approach for detecting ovarian cancer.  相似文献   

19.
20.

Background  

Data generated from liquid chromatography coupled to high-resolution mass spectrometry (LC-MS)-based studies of a biological sample can contain large amounts of biologically significant information in the form of proteins, peptides, and metabolites. Interpreting this data involves inferring the masses and abundances of biomolecules injected into the instrument. Because of the inherent complexity of mass spectral patterns produced by these biomolecules, the analysis is significantly enhanced by using visualization capabilities to inspect and confirm results. In this paper we describe Decon2LS, an open-source software package for automated processing and visualization of high-resolution MS data. Drawing extensively on algorithms developed over the last ten years for ICR2LS, Decon2LS packages the algorithms as a rich set of modular, reusable processing classes for performing diverse functions such as reading raw data, routine peak finding, theoretical isotope distribution modelling, and deisotoping. Because the source code is openly available, these functionalities can now be used to build derivative applications in relatively fast manner. In addition, Decon2LS provides an extensive set of visualization tools, such as high performance chart controls.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号