首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences.  相似文献   

2.

Background  

The recent availability of an expanding collection of genome sequences driven by technological advances has facilitated comparative genomics and in particular the identification of synteny among multiple genomes. However, the development of effective and easy-to-use methods for identifying such conserved gene clusters among multiple genomes–synteny blocks–as well as databases, which host synteny blocks from various groups of species (especially eukaryotes) and also allow users to run synteny-identification programs, lags behind.  相似文献   

3.

Background  

Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments.  相似文献   

4.
A lack of pliant software tools that support small- to medium-scale DNA sequencing efforts is a major hindrance for recording and using laboratory workflow information to monitor the overall quality of data production. Here we describe VSQual, a set of Perl programs intended to provide simple and powerful tools to check several quality features of the sequencing data generated by automated DNA sequencing machines. The core program of VSQual is a flexible Perl-based pipeline, designed to be accessible and useful for both programmers and non-programmers. This pipeline directs the processing steps and can be easily customized for laboratory needs. Basically, the raw DNA sequencing trace files are processed by Phred and Cross_match, then the outputs are parsed, reformatted into Web-based graphical reports, and added to a Web site structure. The result is a set of real time sequencing reports easily accessible and understood by common laboratory people. These reports facilitate the monitoring of DNA sequencing as well as the management of laboratory workflow, significantly reducing operational costs and ensuring high quality and scientifically reliable results.  相似文献   

5.
ESTWeb is an internet based software package designed for uniform data processing and storage for large-scale EST sequencing projects. The package provides for: (a) reception of sequencing chromatograms; (b) sequence processing such as base-calling, vector screening, comparison with public databases; (c) storage of data and analysis in a relational database, (d) generation of a graphical report of individual sequence quality; and (e) issuing of reports with statistics of productivity and redundancy. The software facilitates real-time monitoring and evaluation of EST sequence acquisition progress along an EST sequencing project.  相似文献   

6.
7.

Background  

Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performedin silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making.  相似文献   

8.
GEL--a computer tool for DNA sequencing projects.   总被引:1,自引:0,他引:1       下载免费PDF全文
The GEL program for entry and analysis of DNA sequencing information is discussed, and examples of interaction with the program are presented. The current version of the program represents the last of several revisions to the first GEL program, reported previously in this journal (1). Improvements and additions have been made, making the current GEL a particularly useful laboratory tool for molecular biologists engaged in DNA sequencing projects.  相似文献   

9.
Large-scale genomic sequencing projects generally rely on random sequencing of shotgun clones, followed by different gap closing strategies. To reduce the overall effort and cost of those projects and to accelerate the sequencing throughput, we have developed an efficient, high throughput oligonucleotide fingerprinting protocol to select optimal shotgun clone sets prior to sequencing. Both computer simulations and experimental results, obtained from five PAC-derived shotgun libraries spanning 535 kb of the 17p11.2 region of the human genome, demonstrate that at least a 2-fold reduction in the number of sequence reads required to sequence an individual genomic clone (cosmid, PAC, etc.) can be achieved. Treatment of clone contigs with significant clone overlaps will allow an even greater reduction.  相似文献   

10.
The rate limiting step in a large-scale sequencing project is the generation of single-stranded DNA templates. We describe a fast, semiautomated procedure, using 96-well microtitre plates, in which 192 templates can be readily prepared in 1 day. The technique can be carried out manually or can be semiautomated using a robot pipetting device. We also provide evidence for the reliability and applicability of this method to a large-scale sequencing project.  相似文献   

11.
A quality control algorithm for DNA sequencing projects.   总被引:2,自引:0,他引:2       下载免费PDF全文
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.  相似文献   

12.
The utility of using genomic DNA directly in agarose, i.e. cloneless libraries, in place of large clone libraries, radiation hybrid panels, or chromosome dissection was demonstrated. The advantage of the cloneless library approach is that, in principle, a targeted genomic resource can be developed rapidly for any genomic region using any genomic DNA sample. Here, a human chromosome 20 Not I fragment library was generated by slicing a pulsed field gel lane containing fractionating Not I cleaved DNA from a monosomic hybrid cell line into 2 mm pieces. A reliable PCR method using agarose embedded DNA was developed. InterAlu PCR generated unique patterns of products from adjacent slices (e.g. fractions). Further, the specificity of the interAlu products was demonstrated by FISH analysis and in other hybridization experiments to arrayed interAlu products. STS content mapping was used to order the fractions and also demonstrate the unique content of the library fractions.  相似文献   

13.
14.
MOTIVATION: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequences are independently and identically distributed, leading to appreciable under-prediction of gaps. RESULTS: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations that relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if the chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project.  相似文献   

15.
The sequence and genome annotations of Drosophila melanogaster were initially published in late 1999 and early 2000. Since then, the Berkeley Drosophila Genome Project (BDGP) and FlyBase have improved the quality of the sequence and reviewed the annotations by hand, respectively, to produce an account of the fruit fly genome that is of the highest quality. This review discusses the main features of this process, both from the point of view of the biology revealed in the end result and in the development of software that has been central to this genome sequencing and annotation project.  相似文献   

16.
17.
18.
Setaria genome sequencing: an overview   总被引:1,自引:0,他引:1  
The genus Setaria includes two important C4 Panicoid grass species, namely S. italica (cultivated) and S. viridis (weed; wild ancestor), which together represent an appropriate model system for architectural, physiological, evolutionary, and genomic studies in related grasses. It is a diploid, inbreeder, self-fertile annual cereal grass having short life cycle and minimal growth requirements. There close relatedness to biofuel crops like switch grass and napier grass further signifies their importance. Further, foxtail millet is an important food and fodder grain crop grown in arid and semi-arid regions in many parts of the world. Therefore, an increasing interest in these species has led to a gradual accumulation and development of genomic data and genetic resources. Setaria genome sequencing is an outcome of such endeavors. These sequencing efforts uncovered several distinctive attributes of Setaria genome that may help in understanding its physiology, evolution and adaptation. This will not only aid in comparative genomics studies of Setaria and related crops including bioenergy grasses but also help in rapid advancements of genomics information for developing varieties with superior traits either through marker-assisted selection (MAS) or using transgenic approaches in these crops.  相似文献   

19.
A frameshift error detection algorithm for DNA sequencing projects.   总被引:2,自引:1,他引:2       下载免费PDF全文
During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.  相似文献   

20.
Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号