排序方式: 共有19条查询结果,搜索用时 25 毫秒
1.
2.
Relative apparent synapomorphy analysis (RASA). I: The statistical measurement of phylogenetic signal 总被引:10,自引:9,他引:1
We have developed a new approach to the measurement of phylogenetic signal
in character state matrices called relative apparent synapomorphy analysis
(RASA). RASA provides a deterministic, statistical measure of natural
cladistic hierarchy (phylogenetic signal) in character state matrices. The
method works by determining whether a measure of the rate of increase of
cladistic similarity among pairs of taxa as a function of phenetic
similarity is greater than a null equiprobable rate of increase. Our
investigation of the utility and limitations of RASA using simulated and
bacteriophage T7 data sets indicates that the method has numerous
advantages over existing measures of signal. A first advantage is
computational efficiency. A second advantage is that RASA employs known
methods of statistical inference, providing measurable sensitivity and
power. The performance of RASA is examined under various conditions of
branching evolution as the number of characters, character states per
character, and mutations per branch length are varied. RASA appears to
provide an unbiased and reliable measure of phylogenetic signal, and the
general approach promises to be useful in the development of new techniques
that should increase the rigor and reliability of phylogenetic estimates.
相似文献
3.
Optimal outgroup analysis 总被引:8,自引:0,他引:8
James Lyons-Weiler Guy A. Hoelzer Robin J. Tausch 《Biological journal of the Linnean Society. Linnean Society of London》1998,64(4):493-511
We present and critically examine a statistical criterion for the selection of outgroup taxa for rooting evolutionary trees. The criterion is the amount of phylogenetic signal for the ingroup when the states of the candidate outgroup taxa are assumed to be plesiomorphic relative to the ingroup for the purpose of measuring plesiomorphy content of the outgroup taxon. A statistical measure of rooted, ingroup signal was subjected to a suite of critical tests which indicate that it provides a proxy measure of plesiomorphy content. As the evolutionary distance between the ingroup ancestral node and outgroup taxa increases, the tree-independent measure of signal decreases, tracking the decay in plesiomorphy content and the increase in convergence to the ingroup states. We show that a priori generalizations about optimal outgroup taxon sampling strategies are likely to be misleading, and that testing for the suitability of available outgroup taxon sampling in specific instances is warranted. Software for optimal outgroup analysis is available. 相似文献
4.
Evolutionary origin, diversification and specialization of eukaryotic MutS homolog mismatch repair proteins 总被引:11,自引:2,他引:9
Most eubacteria, and all eukaryotes examined thus far, encode homologs of the DNA mismatch repair protein MutS. Although eubacteria encode only one or two MutS-like proteins, eukaryotes encode at least six distinct MutS homolog (MSH) proteins, corresponding to conserved (orthologous) gene families. This suggests evolution of individual gene family lines of descent by several duplication/specialization events. Using quantitative phylogenetic analyses (RASA, or relative apparent synapomorphy analysis), we demonstrate that comparison of complete MutS protein sequences, rather than highly conserved C-terminal domains only, maximizes information about evolutionary relationships. We identify a novel, highly conserved middle domain, as well as clearly delineate an N-terminal domain, previously implicated in mismatch recognition, that shows family-specific patterns of aromatic and charged amino acids. Our final analysis, in contrast to previous analyses of MutS-like sequences, yields a stable phylogenetic tree consistent with the known biochemical functions of MutS/MSH proteins, that now assigns all known eukaryotic MSH proteins to a monophyletic group, whose branches correspond to the respective specialized gene families. The rooted phylogenetic tree suggests their derivation from a mitochondrial MSH1-like protein, itself the descendent of the MutS of a symbiont in a primitive eukaryotic precursor. 相似文献
5.
A potential limitation of data from microarray experiments exists when improper control samples are used. In cancer research, comparisons of tumour expression profiles to those from normal samples is challenging due to tissue heterogeneity (mixed cell populations). A specific example exists in a published colon cancer dataset, in which tissue heterogeneity was reported among the normal samples. In this paper, we show how to overcome or avoid the problem of using normal samples that do not derive from the same tissue of origin as the tumour. We advocate an exploratory unsupervised bootstrap analysis that can reveal unexpected and undesired, but strongly supported, clusters of samples that reflect tissue differences instead of tumour versus normal differences. All of the algorithms used in the analysis, including the maximum difference subset algorithm, unsupervised bootstrap analysis, pooled variance t-test for finding differentially expressed genes and the jackknife to reduce false positives, are incorporated into our online Gene Expression Data Analyzer ( http:// bioinformatics.upmc.edu/GE2/GEDA.html ). 相似文献
6.
Pelikan R Bigbee WL Malehorn D Lyons-Weiler J Hauskrecht M 《Bioinformatics (Oxford, England)》2007,23(22):3065-3072
MOTIVATION: The 'reproducibility' of mass spectrometry proteomic profiling has become an intensely controversial topic. The mere mention of concern over the 'reproducibility' of data generated from any particular platform can lead to the anxiety over the generalizability of its results and its role in the future of discovery proteomics. In this study, we examine the reproducibility of proteomic profiles generated by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) across multiple data-generation sessions. We analyze the problem in terms of the reproducibility of signals, reproducibility of discriminative features and reproducibility of multivariate classification models on profiles for serum samples from early lung cancer and healthy control subjects. RESULTS: Proteomic profiles in individual data-generation sessions experience within-session variability. We show that combining data from multiple sessions introduces additional (inter-session) noise. While additional noise can affect the discriminative analysis, we show that its average effect on profiles in our study is relatively small. Moreover, for the purposes of prediction on future (previously unseen) data, classifiers trained on multi-session data are able to adapt to inter-session noise and improve their classification accuracy. 相似文献
7.
8.
Jon M. Davison Melissa Yee J. Michael Krill-Burger Maureen A. Lyons-Weiler Lori A. Kelly Christin M. Sciulli Katie S. Nason James D. Luketich George K. Michalopoulos William A. LaFramboise 《PloS one》2014,9(1)
Background
Prognostic biomarkers are needed for superficial gastroesophageal adenocarcinoma (EAC) to predict clinical outcomes and select therapy. Although recurrent mutations have been characterized in EAC, little is known about their clinical and prognostic significance. Aneuploidy is predictive of clinical outcome in many malignancies but has not been evaluated in superficial EAC.Methods
We quantified copy number changes in 41 superficial EAC using Affymetrix SNP 6.0 arrays. We identified recurrent chromosomal gains and losses and calculated the total copy number abnormality (CNA) count for each tumor as a measure of aneuploidy. We correlated CNA count with overall survival and time to first recurrence in univariate and multivariate analyses.Results
Recurrent segmental gains and losses involved multiple genes, including: HER2, EGFR, MET, CDK6, KRAS (recurrent gains); and FHIT, WWOX, CDKN2A/B, SMAD4, RUNX1 (recurrent losses). There was a 40-fold variation in CNA count across all cases. Tumors with the lowest and highest quartile CNA count had significantly better overall survival (p = 0.032) and time to first recurrence (p = 0.010) compared to those with intermediate CNA counts. These associations persisted when controlling for other prognostic variables.Significance
SNP arrays facilitate the assessment of recurrent chromosomal gain and loss and allow high resolution, quantitative assessment of segmental aneuploidy (total CNA count). The non-monotonic association of segmental aneuploidy with survival has been described in other tumors. The degree of aneuploidy is a promising prognostic biomarker in a potentially curable form of EAC. 相似文献9.
Background
The earliest fossil evidence of terrestrial animal activity is from the Ordovician, ~450 million years ago (Ma). However, there are earlier animal fossils, and most molecular clocks suggest a deep origin of animal phyla in the Precambrian, leaving open the possibility that animals colonized land much earlier than the Ordovician. To further investigate the time of colonization of land by animals, we sequenced two nuclear genes, glyceraldehyde-3-phosphate dehydrogenase and enolase, in representative arthropods and conducted phylogenetic and molecular clock analyses of those and other available DNA and protein sequence data. To assess the robustness of animal molecular clocks, we estimated the deuterostome-arthropod divergence using the arthropod fossil record for calibration and tunicate instead of vertebrate sequences to represent Deuterostomia. Nine nuclear and 15 mitochondrial genes were used in phylogenetic analyses and 61 genes were used in molecular clock analyses. 相似文献10.
Hauskrecht M Pelikan R Malehorn DE Bigbee WL Lotze MT Zeh HJ Whitcomb DC Lyons-Weiler J 《Applied bioinformatics》2005,4(4):227-246
BACKGROUND: Proteomic peptide profiling is an emerging technology harbouring great expectations to enable early detection, enhance diagnosis and more clearly define prognosis of many diseases. Although previous research work has illustrated the ability of proteomic data to discriminate between cases and controls, significantly less attention has been paid to the analysis of feature selection strategies that enable learning of such predictive models. Feature selection, in addition to classification, plays an important role in successful identification of proteomic biomarker panels. METHODS: We present a new, efficient, multivariate feature selection strategy that extracts useful feature panels directly from the high-throughput spectra. The strategy takes advantage of the characteristics of surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF-MS) profiles and enhances widely used univariate feature selection strategies with a heuristic based on multivariate de-correlation filtering. We analyse and compare two versions of the method: one in which all feature pairs must adhere to a maximum allowed correlation (MAC) threshold, and another in which the feature panel is built greedily by deciding among best univariate features at different MAC levels. RESULTS: The analysis and comparison of feature selection strategies was carried out experimentally on the pancreatic cancer dataset with 57 cancers and 59 controls from the University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania, USA. The analysis was conducted in both the whole-profile and peak-only modes. The results clearly show the benefit of the new strategy over univariate feature selection methods in terms of improved classification performance. CONCLUSION: Understanding the characteristics of the spectra allows us to better assess the relative importance of potential features in the diagnosis of cancer. Incorporation of these characteristics into feature selection strategies often leads to a more efficient data analysis as well as improved classification performance. 相似文献