期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

prot4EST: Translating Expressed Sequence Tags from neglected genomes

James?D?Wasmuth Email author Mark?L?Blaxter 《BMC bioinformatics》2004,5(1):187

相似文献

2.

ESTimating plant phylogeny: lessons from partitioning

Jose EB de la Torre Mary G Egan Manpreet S Katari Eric D Brenner Dennis W Stevenson Gloria M Coruzzi Rob DeSalle 《BMC evolutionary biology》2006,6(1):48-15

Background

While Expressed Sequence Tags (ESTs) have proven a viable and efficient way to sample genomes, particularly those for which whole-genome sequencing is impractical, phylogenetic analysis using ESTs remains difficult. Sequencing errors and orthology determination are the major problems when using ESTs as a source of characters for systematics. Here we develop methods to incorporate EST sequence information in a simultaneous analysis framework to address controversial phylogenetic questions regarding the relationships among the major groups of seed plants. We use an automated, phylogenetically derived approach to orthology determination called OrthologID generate a phylogeny based on 43 process partitions, many of which are derived from ESTs, and examine several measures of support to assess the utility of EST data for phylogenies. 相似文献

3.

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species 总被引：2，自引：0，他引：2

Jifeng Tang Ben Vosman Roeland E Voorrips C Gerard van der Linden Jack AM Leunissen 《BMC bioinformatics》2006,7(1):438

Background

Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. 相似文献

4.

HaMStR: Profile hidden markov model based search for orthologs in ESTs

Ingo Ebersberger Sascha Strauss Arndt von Haeseler 《BMC evolutionary biology》2009,9(1):157-9

Background

EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data. 相似文献

5.

EST2uni: an open,parallel tool for automated EST analysis and database creation,with a data mining web interface and microarray expression data integration

Javier Forment Francisco Gilabert Antonio Robles Vicente Conejero Fernando Nuez Jose M Blanca 《BMC bioinformatics》2008,9(1):5

Background

Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. 相似文献

6.

Using ESTs to improve the accuracy of de novo gene prediction

Chaochun Wei Michael R Brent 《BMC bioinformatics》2006,7(1):327

Background

ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. 相似文献

7.

JANE: efficient mapping of prokaryotic ESTs and variable length sequence reads on related template genomes

Chunguang Liang Alexander Schmid María José López-Sánchez Andres Moya Roy Gross J?rg Bernhardt Thomas Dandekar 《BMC bioinformatics》2009,10(1):391

相似文献

8.

galaxieEST: addressing EST identity through automated phylogenetic analysis

R Henrik Nilsson Balaji Rajashekar Karl-Henrik Larsson Bj?rn M Ursing 《BMC bioinformatics》2004,5(1):87

Background

Research involving expressed sequence tags (ESTs) is intricately coupled to the existence of large, well-annotated sequence repositories. Comparatively complete and satisfactory annotated public sequence libraries are, however, available only for a limited range of organisms, rendering the absence of sequences and gene structure information a tangible problem for those working with taxa lacking an EST or genome sequencing project. Paralogous genes belonging to the same gene family but distinguished by derived characteristics are particularly prone to misidentification and erroneous annotation; high but incomplete levels of sequence similarity are typically difficult to interpret and have formed the basis of many unsubstantiated assumptions of orthology. 相似文献

9.

A Bayesian nonparametric method for prediction in EST analysis

Antonio Lijoi Ramsés H Mena Igor Prünster 《BMC bioinformatics》2007,8(1):339

Background

Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. 相似文献

10.

Local alignment of two-base encoded DNA sequence

Nils Homer Barry Merriman Stanley F Nelson 《BMC bioinformatics》2009,10(1):175-11

Background

DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. 相似文献

11.

Translational machinery of the chaetognath <Emphasis Type="Italic">Spadella cephaloptera</Emphasis>: a transcriptomic approach to the analysis of cytosolic ribosomal protein genes and their expression

Roxane M Barthélémy Anne Chenuil Samuel Blanquart Jean-Paul Casanova Eric Faure 《BMC evolutionary biology》2007,7(1):146

Background

Chaetognaths, or arrow worms, are small marine, bilaterally symmetrical metazoans. The objective of this study was to analyse ribosomal protein (RP) coding sequences from a published collection of expressed sequence tags (ESTs) from a chaetognath (Spadella cephaloptera) and to use them in phylogenetic studies. 相似文献

12.

A high-density consensus map of barley linking DArT markers to SSR,RFLP and STS loci and agricultural traits

Peter Wenzl Haobing Li Jason Carling Meixue Zhou Harsh Raman Edie Paul Phillippa Hearnden Christina Maier Ling Xia Vanessa Caig Jaroslava Ovesná Mehmet Cakir David Poulsen Junping Wang Rosy Raman Kevin P Smith Gary J Muehlbauer Ken J Chalmers Andris Kleinhofs Eric Huttner Andrzej Kilian 《BMC genomics》2006,7(1):1-22

Background

Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project.

Results

We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology.

Conclusion

We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. 相似文献

13.

ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

Paola?Bonizzoni Raffaella?Rizzi Graziano?Pesole Email author 《BMC bioinformatics》2005,6(1):244

相似文献

14.

Quantification and deconvolution of asymmetric LC-MS peaks using the bi-Gaussian mixture model and statistical model selection

Tianwei Yu Hesen Peng 《BMC bioinformatics》2010,11(1):1-10

Background

DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.

Results

Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.

Conclusions

The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time. 相似文献

15.

cDNA2Genome: A tool for mapping and annotating cDNAs

Coral?del Val Email author Karl-Heinz?Glatting Sandor?Suhai 《BMC bioinformatics》2003,4(1):39

相似文献

16.

Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information

Yian?A?Chen Email author David?J?Mckillen Shuyuan?Wu Matthew?J?Jenny Robert?Chapman Paul?S?Gross Gregory?W?Warr Jonas?S?Almeida 《BMC bioinformatics》2004,5(1):191

Background

Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. 相似文献

17.

Evaluation of next generation sequencing platforms for population targeted sequencing studies

Olivier Harismendy Pauline C Ng Robert L Strausberg Xiaoyun Wang Timothy B Stockwell Karen Y Beeson Nicholas J Schork Sarah S Murray Eric J Topol Samuel Levy Kelly A Frazer 《Genome biology》2009,10(3):R32-13

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies. 相似文献

18.

OligoSpawn: a software tool for the design of overgo probes from large unigene datasets

Jie Zheng Jan T Svensson Kavitha Madishetty Timothy J Close Tao Jiang Stefano Lonardi 《BMC bioinformatics》2006,7(1):7

Background

Expressed sequence tag (EST) datasets represent perhaps the largest collection of genetic information. ESTs can be exploited in a variety of biological experiments and analysis. Here we are interested in the design of overlapping oligonucleotide (overgo) probes from large unigene (EST-contigs) datasets. 相似文献

19.

Removing Noise From Pyrosequenced Amplicons 总被引：2，自引：0，他引：2

Christopher Quince Anders Lanzen Russell J Davenport Peter J Turnbaugh 《BMC bioinformatics》2011,12(1):38

Background

In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms. 相似文献

20.

An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags

Bo-Young Lee Aimee E Howe Matthew A Conte Helena D'Cotta Elodie Pepey Jean-Francois Baroiller Federica di Palma Karen L Carleton Thomas D Kocher 《BMC genomics》2010,11(1):278

Background

Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research.

Results

The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10^-10) to the UniProt database.

Conclusion

Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

相似文献