排序方式: 共有59条查询结果,搜索用时 15 毫秒
21.
Background
The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. 相似文献22.
Background
The shape of phylogenetic trees has been used to make inferences about the evolutionary process by comparing the shapes of actual phylogenies with those expected under simple models of the speciation process. Previous studies have focused on speciation events, but gene duplication is another lineage splitting event, analogous to speciation, and gene loss or deletion is analogous to extinction. Measures of the shape of gene family phylogenies can thus be used to investigate the processes of gene duplication and loss. We make the first systematic attempt to use tree shape to study gene duplication using human gene phylogenies. 相似文献23.
Ostrowski LE Blackburn K Radde KM Moyer MB Schlatzer DM Moseley A Boucher RC 《Molecular & cellular proteomics : MCP》2002,1(6):451-465
Cilia play an essential role in protecting the respiratory tract by providing the force necessary for mucociliary clearance. Although the major structural components of human cilia have been described, a complete understanding of cilia function and regulation will require identification and characterization of all ciliary components. Estimates from studies of Chlamydomonas flagella predict that an axoneme contains > or = 250 proteins. To identify all the components of human cilia, we have begun a comprehensive proteomic analysis of isolated ciliary axonemes. Analysis by two-dimensional (2-D) PAGE resulted in a highly reproducible 2-D map consisting of over 240 well resolved components. Individual protein spots were digested with trypsin and sequenced using liquid chromatography/tandem mass spectrometry (LC/MS/MS). Peptide matches were obtained to 38 potential ciliary proteins by this approach. To identify ciliary components not resolved by 2-D PAGE, axonemal proteins were separated on a one-dimensional gel. The gel lane was divided into 45 individual slices, each of which was analyzed by LC/MS/MS. This experiment resulted in peptide matches to an additional 110 proteins. In a third approach, preparations of isolated axonemes were digested with Lys-C, and the resulting peptides were analyzed directly by LC/MS/MS or by multidimensional LC/MS/MS, leading to the identification of a further 66 proteins. Each of the four approaches resulted in the identification of a subset of the proteins present. In total, sequence data were obtained on over 1400 peptides, and over 200 potential axonemal proteins were identified. Peptide matches were also obtained to over 200 human expressed sequence tags. As an approach to validate the mass spectrometry results, additional studies examined the expression of several identified proteins (annexin I, sperm protein Sp17, retinitis pigmentosa protein RP1) in cilia or ciliated cells. These studies represent the first proteomic analysis of the human ciliary axoneme and have identified many potentially novel components of this complex organelle. 相似文献
24.
Brian?DM?TomEmail author Walter?R?Gilks Elizabeth?T?Brooke-Powell James?W?Ajioka 《BMC bioinformatics》2005,6(1):234
Background
A common feature of microarray experiments is the occurence of missing gene expression data. These missing values occur for a variety of reasons, in particular, because of the filtering of poor quality spots and the removal of undefined values when a logarithmic transformation is applied to negative background-corrected intensities. The efficiency and power of an analysis performed can be substantially reduced by having an incomplete matrix of gene intensities. Additionally, most statistical methods require a complete intensity matrix. Furthermore, biases may be introduced into analyses through missing information on some genes. Thus methods for appropriately replacing (imputing) missing data and/or weighting poor quality spots are required. 相似文献25.
Background
The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. 相似文献26.
SMM?VerstappenEmail author AR?Poole M?Ionescu LE?King M?Abrahamowicz DM?Hofman JWJ?Bijlsma FPJG?Lafeber the Utrecht Rheumatoid Arthritis Cohort Study group 《Arthritis research & therapy》2005,8(1):R31
Introduction
The objective of this study was to determine whether serum biomarkers for degradation and synthesis of the extracellular matrix of cartilage are associated with, and can predict, radiographic damage in patients with rheumatoid arthritis (RA). 相似文献27.
Koc EC Burkhart W Blackburn K Moyer MB Schlatzer DM Moseley A Spremulli LL 《The Journal of biological chemistry》2001,276(47):43958-43969
Identification of all the protein components of the large subunit (39 S) of the mammalian mitochondrial ribosome has been achieved by carrying out proteolytic digestions of whole 39 S subunits followed by analysis of the resultant peptides by liquid chromatography and mass spectrometry. Peptide sequence information was used to search the human EST data bases and complete coding sequences were assembled. The human mitochondrial 39 S subunit has 48 distinct proteins. Twenty eight of these are homologs of the Escherichia coli 50 S ribosomal proteins L1, L2, L3, L4, L7/L12, L9, L10, L11, L13, L14, L15, L16, L17, L18, L19, L20, L21, L22, L23, L24, L27, L28, L30, L32, L33, L34, L35, and L36. Almost all of these proteins have homologs in Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae mitochondrial ribosomes. No mitochondrial homologs to prokaryotic ribosomal proteins L5, L6, L25, L29, and L31 could be found either in the peptides obtained or by analysis of the available data bases. The remaining 20 proteins present in the 39 S subunits are specific to mitochondrial ribosomes. Proteins in this group have no apparent homologs in bacterial, chloroplast, archaebacterial, or cytosolic ribosomes. All but two of the proteins has a clear homolog in D. melanogaster while all can be found in the genome of C. elegans. Ten of the 20 mitochondrial specific 39 S proteins have homologs in S. cerevisiae. Homologs of 2 of these new classes of ribosomal proteins could be identified in the Arabidopsis thaliana genome. 相似文献
28.
Viktorian Miok Saskia M Wilting Mark A van de Wiel Annelieke Jaspers Paula I van Noort Ruud H Brakenhoff Peter JF Snijders Renske DM Steenbergen Wessel N van Wieringen 《BMC bioinformatics》2014,15(1)
Background
To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time points for gene expression (mRNA) and gene copy number (DNA) using high-resolution microarrays. Available methods for temporal differential expression analysis are not designed for integrative genomic studies.Results
Here, we present a method that allows for the identification of differential gene expression associated with DNA copy number changes over time. The temporal variation in gene expression is described by a generalized linear mixed model employing low-rank thin-plate splines. Model parameters are estimated with an empirical Bayes procedure, which exploits integrated nested Laplace approximation for fast computation. Iteratively, posteriors of hyperparameters and model parameters are estimated. The empirical Bayes procedure shrinks multiple dispersion-related parameters. Shrinkage leads to more stable estimates of the model parameters, better control of false positives and improvement of reproducibility. In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.Conclusion
With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities. In particular, in the analysis of an integrative oncogenomics study with a time-course set-up our method finds genes previously reported to be involved in cervical carcinogenesis. Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods. Finally, the proposed method is able to handle count (RNAseq) data from time course experiments as is shown on a real data set.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-327) contains supplementary material, which is available to authorized users. 相似文献29.
30.
Ribosomal RNA secondary structure: compensatory mutations and implications for phylogenetic analysis 总被引:6,自引:0,他引:6
Using sequence data from the 28S ribosomal RNA (rRNA) genes of selected
vertebrates, we investigated the effects that constraints imposed by
secondary structure have on the phylogenetic analysis of rRNA sequence
data. Our analysis indicates that characters from both base-pairing regions
(stems) and non-base-pairing regions (loops) contain phylogenetic
information, as judged by the level of support of the phylogenetic results
compared with a well-established tree based on both morphological and
molecular data. The best results (the greatest level of support of
well-accepted nodes) were obtained when the complete data set was used.
However, some previously supported nodes were resolved using either the
stem or loop bases alone. Stem bases sustain a greater number of
compensatory mutations than would be expected at random, but the number is
< 40% of that expected under a hypothesis of perfect compensation to
maintain secondary structure. Therefore, we suggest that in phylogenetic
analyses, the weighting of stem characters be reduced by no more than 20%,
relative to that of loop characters. In contrast to previous suggestions,
we do not recommend weighting of stem positions by one-half, compared with
that of loop positions, because this overcompensates for the constraints
that selection imposes on the secondary structure of rRNA.
相似文献