首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
5.
6.
Optimized design and assessment of whole genome tiling arrays   总被引:1,自引:0,他引:1  
MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/.  相似文献   

7.
8.
9.
SUMMARY: AnnBuilder is an R package for assembling genomic annotation data. The system currently provides parsers to process annotation data from LocusLink, Gene Ontology Consortium, and Human Gene Project and can be extended to new data sources via user defined parsers. AnnBuilder differs from other existing systems in that it provides users with unlimited ability to assemble data from user selected sources. The products of AnnBuilder are files in XML format that can be easily used by different systems. AVAILABILITY: (http://www.bioconductor.org). Open source.  相似文献   

10.
11.
12.
MOTIVATION: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. RESULTS: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too. AVAILABILITY: A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author.  相似文献   

13.
14.
Single molecule studies have expanded rapidly over the past decade and have the ability to provide an unprecedented level of understanding of biological systems. A common challenge upon introduction of novel, data-rich approaches is the management, processing, and analysis of the complex data sets that are generated. We provide a standardized approach for analyzing these data in the freely available software package SMART: Single Molecule Analysis Research Tool. SMART provides a format for organizing and easily accessing single molecule data, a general hidden Markov modeling algorithm for fitting an array of possible models specified by the user, a standardized data structure and graphical user interfaces to streamline the analysis and visualization of data. This approach guides experimental design, facilitating acquisition of the maximal information from single molecule experiments. SMART also provides a standardized format to allow dissemination of single molecule data and transparency in the analysis of reported data.  相似文献   

15.
16.
17.
18.
19.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号