共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
A computational approach to identify genes for functional RNAs in genomic sequences 总被引:8,自引:3,他引:8 下载免费PDF全文
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80–90% accurate in jackknife testing experiments for bacteria and 90–99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web. 相似文献
4.
Allet N Barrillat N Baussant T Boiteau C Botti P Bougueleret L Budin N Canet D Carraud S Chiappe D Christmann N Colinge J Cusin I Dafflon N Depresle B Fasso I Frauchiger P Gaertner H Gleizes A Gonzalez-Couto E Jeandenans C Karmime A Kowall T Lagache S Mahé E Masselot A Mattou H Moniatte M Niknejad A Paolini M Perret F Pinaud N Ranno F Raimondi S Reffas S Regamey PO Rey PA Rodriguez-Tomé P Rose K Rossellat G Saudrais C Schmidt C Villain M Zwahlen C 《Proteomics》2004,4(8):2333-2351
We present an integrated proteomics platform designed for performing differential analyses. Since reproducible results are essential for comparative studies, we explain how we improved reproducibility at every step of our laboratory processes, e.g. by taking advantage of the powerful laboratory information management system we developed. The differential capacity of our platform is validated by detecting known markers in a real sample and by a spiking experiment. We introduce an innovative two-dimensional (2-D) plot for displaying identification results combined with chromatographic data. This 2-D plot is very convenient for detecting differential proteins. We also adapt standard multivariate statistical techniques to show that peptide identification scores can be used for reliable and sensitive differential studies. The interest of the protein separation approach we generally apply is justified by numerous statistics, complemented by a comparison with a simple shotgun analysis performed on a small volume sample. By introducing an automatic integration step after mass spectrometry data identification, we are able to search numerous databases systematically, including the human genome and expressed sequence tags. Finally, we explain how rigorous data processing can be combined with the work of human experts to set high quality standards, and hence obtain reliable (false positive < 0.35%) and nonredundant protein identifications. 相似文献
5.
O'Connell K Prencipe M O'Neill A Corcoran C Rani S Henry M Dowling P Meleady P O'Driscoll L Watson W O'Connor R 《Proteomics》2012,12(13):2115-2126
Docetaxel is a taxane-derived chemotherapy drug that has been approved for treatment of prostate cancer. While docetaxel is frequently used as a treatment for hormone-refractory prostate cancer, a subset of patients either do not respond to this treatment or those that do respond eventually become resistant to the drug over time. Resistance to docetaxel is complex and multi-factoral and further understanding of the cellular biochemistry underlying resistance is vital to improve treatment efficacy. To identify proteins altered in the resistant phenotype, three parental cell lines DU145, 22RV1 and PC-3, as well as their docetaxel resistant sub-lines, were subjected to quantitative label-free LC-MS proteomic profiling. A total of 189 significant (p < 0.05) protein abundance changes were identified in the DU145 resistant sub-lines, 254 in the 22RV1 sub-lines, and 51 and 72 in the 8 and 12 nM resistant PC-3 sub-lines, respectively. From these, 29 proteins demonstrated a significant (p < 0.05) fold change across two or more resistant variants. These included proteins indicative of an epithelial-to-mesenchemyl transition as well as altered heat shock response elements. 相似文献
6.
7.
8.
MOTIVATION: Microarray technology emerges as a powerful tool in life science. One major application of microarray technology is to identify differentially expressed genes under various conditions. Currently, the statistical methods to analyze microarray data are generally unsatisfactory, mainly due to the lack of understanding of the distribution and error structure of microarray data. RESULTS: We develop a generalized likelihood ratio (GLR) test based on the two-component model proposed by Rocke and Durbin to identify differentially expressed genes from microarray data. Simulation studies show that the GLR test is more powerful than commonly used methods, like the fold-change method and the two-sample t-test. When applied to microarray data, the GLR test identifies more differentially expressed genes than the t-test, has a lower false discovery rate and shows more consistency over independently repeated experiments. AVAILABILITY: The approach is implemented in software called GLR, which is freely available for downloading at http://www.cc.utah.edu/~jw27c60 相似文献
9.
Shay Ben-Elazar Miriam Ragle Aure Kristin Jonsdottir Suvi-Katri Leivonen Vessela N. Kristensen Emiel A. M. Janssen Kristine Kleivi Sahlberg Ole Christian Lingjrde Zohar Yakhini 《PLoS computational biology》2021,17(2)
Different miRNA profiling protocols and technologies introduce differences in the resulting quantitative expression profiles. These include differences in the presence (and measurability) of certain miRNAs. We present and examine a method based on quantile normalization, Adjusted Quantile Normalization (AQuN), to combine miRNA expression data from multiple studies in breast cancer into a single joint dataset for integrative analysis. By pooling multiple datasets, we obtain increased statistical power, surfacing patterns that do not emerge as statistically significant when separately analyzing these datasets. To merge several datasets, as we do here, one needs to overcome both technical and batch differences between these datasets. We compare several approaches for merging and jointly analyzing miRNA datasets. We investigate the statistical confidence for known results and highlight potential new findings that resulted from the joint analysis using AQuN. In particular, we detect several miRNAs to be differentially expressed in estrogen receptor (ER) positive versus ER negative samples. In addition, we identify new potential biomarkers and therapeutic targets for both clinical groups. As a specific example, using the AQuN-derived dataset we detect hsa-miR-193b-5p to have a statistically significant over-expression in the ER positive group, a phenomenon that was not previously reported. Furthermore, as demonstrated by functional assays in breast cancer cell lines, overexpression of hsa-miR-193b-5p in breast cancer cell lines resulted in decreased cell viability in addition to inducing apoptosis. Together, these observations suggest a novel functional role for this miRNA in breast cancer. Packages implementing AQuN are provided for Python and Matlab: https://github.com/YakhiniGroup/PyAQN. 相似文献
10.
Zhou Y Cras-Méneur C Ohsugi M Stormo GD Permutt MA 《Bioinformatics (Oxford, England)》2007,23(16):2073-2079
MOTIVATION: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-gene-analysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. RESULTS: We propose a method that can avoid the difficult task of estimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
11.
12.
13.
Fabrice Bertile Dr. Christine Schaeffer Yvon Le Maho Thierry Raclot Alain Van Dorsselaer 《Proteomics》2009,9(1):148-158
Prolonged fasting is characterized by consecutive phases, a short period of adaptation (phase 1), phase 2 (P2) characterized by fat oxidation, and phase 3 (P3) during which energy requirements are mostly derived from increased protein utilization. At this latter stage, food seeking behavior is induced. Very few circulating biomolecules have been identified that are involved in the response to prolonged fasting. To this end, rat plasma samples were compared by a proteomic approach, using 2‐DE. The results revealed a selective variation of the levels of apolipoprotein A‐IV, A‐I, and E, haptoglobin, transthyretin, plasma retinol binding‐protein, and vitamin D binding‐protein in P2 and P3. The variations in protein levels were confirmed by ELISA. Changes in mRNA levels encoding these proteins did not systematically correlate well with protein concentrations, and tissue‐specific regulation of mRNA expression was observed, underlining the complex metabolic regulation in response to food deprivation. In late fasting, the marked reduction of apolipoprotein A‐IV levels could contribute to the alarm signal that triggers refeeding. The variations of the other differentially expressed proteins are more likely related to lipid metabolism and insulin signaling alterations. 相似文献
14.
Payseur BA 《Molecular ecology resources》2010,10(5):806-820
Hybrids between species provide information about the evolutionary processes involved in divergence. In addition to creating hybrids in the laboratory, biologists can take advantage of natural hybrid zones to understand the factors that shape gene flow between divergent lineages. In the early stages of speciation, most regions of the genome continue to flow freely between populations. Alternatively, the subset of the genome that confers reproductive barriers between nascent species is expected to reject introgression. Now enabled by advances in genomics, this perspective is motivating detailed comparisons of gene flow across genomic regions in hybrid zones. Here, I review methods for measuring and interpreting introgression at multiple loci in hybrid zones, focusing on the problem of identifying loci that contribute to reproductive isolation. Emerging patterns from multi-locus studies of hybrid zones are highlighted, including remarkable variance in introgression across the genome. Although existing methods have been useful, there is scope for development of new analytical approaches that better connect differential patterns of gene flow in hybrid zones with current knowledge of speciation mechanisms. I outline future prospects for differential introgression studies on a genomic scale. 相似文献
15.
A modified cDNA subtraction to identify differentially expressed genes from plants with universal application to other eukaryotes 总被引:5,自引:0,他引:5
We have designed a simple and efficient polymerase chain reaction (PCR)-based cDNA subtraction protocol for high-throughput cloning of differentially expressed genes from plants that can be applied to any experimental system and as an alternative to DNA chip technology. Sequence-independent PCR-amplifiable first-strand cDNA population was synthesized by priming oligo-dT primer with a defined 5' heel sequence and ligating another specified single-stranded oligonucleotide primer on the 3' ends of first-strand cDNAs by T4 RNA ligase. A biotin label was introduced into the sense strands of cDNA that must be subtracted by using 5' biotinylated forward primer during PCR amplification to immobilize the sense strand onto the streptavidin-linked paramagnetic beads. The unamplified first strand (antisense) of the interrogating cDNA population was hybridized with a large excess of amplified sense strands of control cDNA. We used magnetic bead technology for the efficient removal of common cDNA population after hybridization to reduce the complexity of the cDNA prior to PCR amplification for the enrichment and sequence abundance normalization of differentially expressed genes. Construction of a subtracted and normalized cDNA library efficiently eliminates common abundant cDNA messages and also increases the probability of identifying clones differentially expressed in low-abundance cDNA messages. We used this method to successfully isolate differentially expressed genes from Pennisetum seedlings in response to salinity stress. Sequence analysis of the selected clones showed homologies to genes that were reported previously and shown to be involved in plant stress adaptation. 相似文献
16.
Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs. 相似文献
17.
Background
Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the changes in biological process due to the change in condition which is essential to understand differences in dynamics. 相似文献18.
Rotter A Hren M Baebler S Blejec A Gruden K 《Omics : a journal of integrative biology》2008,12(3):171-182
Due to the great variety of preprocessing tools in two-channel expression microarray data analysis it is difficult to choose the most appropriate one for a given experimental setup. In our study, two independent two-channel inhouse microarray experiments as well as a publicly available dataset were used to investigate the influence of the selection of preprocessing methods (background correction, normalization, and duplicate spots correlation calculation) on the discovery of differentially expressed genes. Here we are showing that both the list of differentially expressed genes and the expression values of selected genes depend significantly on the preprocessing approach applied. The choice of normalization method to be used had the highest impact on the results. We propose a simple but efficient approach to increase the reliability of obtained results, where two normalization methods which are theoretically distinct from one another are used on the same dataset. Then the intersection of results, that is, the lists of differentially expressed genes, is used in order to get a more accurate estimation of the genes that were de facto differentially expressed. 相似文献
19.
One of the essential issues in microarray data analysis is to identify differentially expressed genes (DEGs) under different
experimental treatments. In this article, a statistical procedure was proposed to identify the DEGs for gene expression data
with or without missing observations from microarray experiment with one- or two-treatment factors. An F statistic based on Henderson method III was constructed to test the significance of differential expression for each gene
under different treatment(s) levels. The cutoff P value was adjusted to control the experimental-wise false discovery rate. A human acute leukemia dataset corrected from 38
leukemia patients was reanalyzed by the proposed method. In comparison to the results from significant analysis of microarray
(SAM) and microarray analysis of variance (MAANOVA), it was indicated that the proposed method has similar performance with
MAANOVA for data with one-treatment factor, but MAANOVA cannot directly handle missing data. In addition, a mouse brain dataset
collected from six brain regions of two inbred strains (two-treatment factors) was reanalyzed to identify genes with distinct
regional-specific expression patterns. The results showed that the proposed method could identify more distinct regional-specific
expression patterns than the previous analysis of the same dataset. Moreover, a computer program was developed and incorporated
in the software QTModel, which is freely available at . 相似文献