首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Introduction

The state-of-the-art for dealing with multiple levels of relationship among the samples in genome-wide association studies (GWAS) is unified mixed model analysis (MMA). This approach is very flexible, can be applied to both family-based and population-based samples, and can be extended to incorporate other effects in a straightforward and rigorous fashion. Here, we present a complementary approach, called ‘GENMIX (genealogy based mixed model)’ which combines advantages from two powerful GWAS methods: genealogy-based haplotype grouping and MMA.

Subjects and Methods

We validated GENMIX using genotyping data of Danish Jersey cattle and simulated phenotype and compared to the MMA. We simulated scenarios for three levels of heritability (0.21, 0.34, and 0.64), seven levels of MAF (0.05, 0.10, 0.15, 0.20, 0.25, 0.35, and 0.45) and five levels of QTL effect (0.1, 0.2, 0.5, 0.7 and 1.0 in phenotypic standard deviation unit). Each of these 105 possible combinations (3 h2 x 7 MAF x 5 effects) of scenarios was replicated 25 times.

Results

GENMIX provides a better ranking of markers close to the causative locus'' location. GENMIX outperformed MMA when the QTL effect was small and the MAF at the QTL was low. In scenarios where MAF was high or the QTL affecting the trait had a large effect both GENMIX and MMA performed similarly.

Conclusion

In discovery studies, where high-ranking markers are identified and later examined in validation studies, we therefore expect GENMIX to enrich candidates brought to follow-up studies with true positives over false positives more than the MMA would.  相似文献   

2.

Background

Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data.

Results

We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing.

Conclusions

Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1172) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

While the importance of record linkage is widely recognised, few studies have attempted to quantify how linkage errors may have impacted on their own findings and outcomes. Even where authors of linkage studies have attempted to estimate sensitivity and specificity based on subjects with known status, the effects of false negatives and positives on event rates and estimates of effect are not often described.

Methods

We present quantification of the effect of sensitivity and specificity of the linkage process on event rates and incidence, as well as the resultant effect on relative risks. Formulae to estimate the true number of events and estimated relative risk adjusted for given linkage sensitivity and specificity are then derived and applied to data from a prisoner mortality study. The implications of false positive and false negative matches are also discussed.

Discussion

Comparisons of the effect of sensitivity and specificity on incidence and relative risks indicate that it is more important for linkages to be highly specific than sensitive, particularly if true incidence rates are low. We would recommend that, where possible, some quantitative estimates of the sensitivity and specificity of the linkage process be performed, allowing the effect of these quantities on observed results to be assessed.  相似文献   

4.

Background

The widespread reluctance to share published research data is often hypothesized to be due to the authors'' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically.

Methods and Findings

We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.

Conclusions

Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.  相似文献   

5.

Background

Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles.

Results

We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation.

Conclusions

seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.

Methodology/Principal Findings

Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have a related cellular or molecular function, the prediction is more likely to be accurate. This filter was implemented in Minimotif Miner using function annotations from the Gene Ontology. We have also combined two filters that are based on entirely different principles and this combined filter has a better predictability than the individual components.

Conclusions/Significance

Testing these functional filters on known and random minimotifs has revealed that they are capable of separating true motifs from false positives. In particular, for the cellular function filter, the percentage of known minimotifs that are not removed by the filter is ∼4.6 times that of random minimotifs. For the molecular function filter this ratio is ∼2.9. These results, together with the comparison with the published frequency score filter, strongly suggest that the new filters differentiate true motifs from random background with good confidence. A combination of the function filters and the frequency score filter performs better than these two individual filters.  相似文献   

7.

Background

Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question.

Principal Findings

Using a 2.1 million probe array we identified 1,477 deletions and 499 gains in 7 inbred mouse strains. Molecular characterization indicated that approximately one third of the CNVs detected by the array were false positives and we estimate the false negative rate to be more than 50%. We show that low concordance between studies is largely due to the molecular nature of CNVs, many of which consist of a series of smaller deletions and gains interspersed by regions where the DNA copy number is normal.

Conclusions

Our results indicate that CNVs detected by arrays may be the coincidental co-localization of smaller CNVs, whose presence is more likely to perturb an aCGH hybridization profile than the effect of an isolated, small, copy number alteration. Our findings help explain the hitherto unexplored discrepancies between array-based studies of copy number variation in the mouse genome.  相似文献   

8.

Background

HIV-related outcomes may be affected by biological sex and by pregnancy. Including women in general and pregnant women in particular in HIV-related research is important for generalizability of findings.

Objective

To characterize representation of pregnant and non-pregnant women in HIV-related research conducted in general populations.

Data Sources

All HIV-related articles published in fifteen journals from January to March of 2011. We selected the top five journals by 2010 impact factor, in internal medicine, infectious diseases, and HIV/AIDS.

Study Eligibility Criteria

HIV-related studies reporting original research on questions applicable to both men and women of reproductive age were considered; studies were excluded if they did not include individual-level patient data.

Study appraisal and synthesis methods.

Articles were doubly reviewed and abstracted; discrepancies were resolved through consensus. We recorded proportion of female study participants, whether pregnant women were included or excluded, and other key factors.

Results

In total, 2014 articles were published during this period. After screening, 259 articles were included as original HIV-related research reporting individual-level data; of these, 226 were determined to be articles relevant to both men and women of reproductive age. In these articles, women were adequately represented within geographic region. The vast majority of published articles, 183/226 (81%), did not mention pregnancy (or related issues); still fewer included pregnant women (n=33), reported numbers of pregnant women (n=19), or analyzed using pregnancy status (n=9).

Limitations

Data were missing for some key variables, including pregnancy. The time period over which published works were evaluated was relatively short.

Conclusions and implications of key findings.

The under-reporting and inattention to pregnancy in the HIV literature may reduce policy-makers’ ability to set evidence-based policy around HIV/AIDS care for pregnant women and women of child-bearing age.  相似文献   

9.

Background

Multiple studies have shown that the exercise electrocardiogram (ECG) is less accurate for predicting ischemia, especially in women, and there is additional evidence to suggest that heart size may affect its diagnostic accuracy.

Hypothesis

The purpose of this investigation was to assess the diagnostic accuracy of the exercise ECG based on heart size.

Methods

We evaluated 1,011 consecutive patients who were referred for an exercise nuclear stress test. Patients were divided into two groups: small heart size defined as left ventricular end diastolic volume (LVEDV) <65 mL (Group A) and normal heart size defined as LVEDV ≥65 mL (Group B) and associations between ECG outcome (false positive vs. no false positive) and heart size (small vs. normal) were analyzed using the Chi square test for independence, with a Yates continuity correction. LVEDV calculations were performed via a computer-processing algorithm. SPECT myocardial perfusion imaging was used as the gold standard for the presence of coronary artery disease (CAD).

Results

Small heart size was found in 142 patients, 123 female and 19 male patients. There was a significant association between ECG outcome and heart size (χ2 = 4.7, p = 0.03), where smaller hearts were associated with a significantly greater number of false positives.

Conclusions

This study suggests a possible explanation for the poor diagnostic accuracy of exercise stress testing, especially in women, as the overwhelming majority of patients with small heart size were women.  相似文献   

10.

Background

Determine HIV Combo (DHC) is the first point of care assay designed to increase sensitivity in early infection by detecting both HIV antibody and antigen. We conducted a large multi-centre evaluation of DHC performance in Sydney sexual health clinics.

Methods

We compared DHC performance (overall, by test component and in early infection) with conventional laboratory HIV serology (fourth generation screening immunoassay, supplementary HIV antibody, p24 antigen and Western blot tests) when testing gay and bisexual men attending four clinic sites. Early infection was defined as either acute or recent HIV infection acquired within the last six months.

Results

Of 3,190 evaluation specimens, 39 were confirmed as HIV-positive (12 with early infection) and 3,133 were HIV-negative by reference testing. DHC sensitivity was 87.2% overall and 94.4% and 0% for the antibody and antigen components, respectively. Sensitivity in early infection was 66.7% (all DHC antibody reactive) and the DHC antigen component detected none of nine HIV p24 antigen positive specimens. Median HIV RNA was higher in false negative than true positive cases (238,025 vs. 37,591 copies/ml; p = 0.022). Specificity overall was 99.4% with the antigen component contributing to 33% of false positives.

Conclusions

The DHC antibody component detected two thirds of those with early infection, while the DHC antigen component did not enhance performance during point of care HIV testing in a high risk clinic-based population.  相似文献   

11.
Phylogenetic analysis reveals a scattered distribution of autumn colours   总被引:1,自引:0,他引:1  

Background and Aims

Leaf colour in autumn is rarely considered informative for taxonomy, but there is now growing interest in the evolution of autumn colours and different hypotheses are debated. Research efforts are hindered by the lack of basic information: the phylogenetic distribution of autumn colours. It is not known when and how autumn colours evolved.

Methods

Data are reported on the autumn colours of 2368 tree species belonging to 400 genera of the temperate regions of the world, and an analysis is made of their phylogenetic relationships in order to reconstruct the evolutionary origin of red and yellow in autumn leaves.

Key Results

Red autumn colours are present in at least 290 species (70 genera), and evolved independently at least 25 times. Yellow is present independently from red in at least 378 species (97 genera) and evolved at least 28 times.

Conclusions

The phylogenetic reconstruction suggests that autumn colours have been acquired and lost many times during evolution. This scattered distribution could be explained by hypotheses involving some kind of coevolutionary interaction or by hypotheses that rely on the need for photoprotection.Key words: Autumn colour, leaf colour, comparative analysis, coevolution, photoprotection, phylogenetic analysis  相似文献   

12.

Background

A laboratory-free test for assessing recovery from pulmonary tuberculosis (TB) would be extremely beneficial in regions of the world where laboratory facilities are lacking. Our hypothesis is that analysis of cough sound recordings may provide such a test. In the current paper, we present validation of a cough analysis tool.

Methodology/Principal Findings

Cough data was collected from a cohort of TB patients in Lima, Peru and 25.5 hours of recordings were manually annotated by clinical staff. Analysis software was developed and validated by comparison to manual scoring. Because many patients cough in bursts, coughing was characterized in terms of cough epochs. Our software correctly detects 75.5% of cough episodes with a specificity of 99.6% (comparable to past results using the same definition) and a median false positive rate of 4 false positives/hour, due to the noisy, real-world nature of our dataset. We then manually review detected coughs to eliminate false positives, in effect using the algorithm as a pre-screening tool that reduces reviewing time to roughly 5% of the recording length. This cough analysis approach provides a foundation to support larger-scale studies of coughing rates over time for TB patients undergoing treatment.  相似文献   

13.

Introduction

Results of use of methodology for VMAT commissioning and quality assurance, utilizing both control point tests and dosimetric measurements are presented.

Methods and Materials

A generalizable, phantom measurement approach is used to characterize the accuracy of the measurement system. Correction for angular response of the measurement system and inclusion of couch structures are used to characterize the full range gantry angles desirable for clinical plans. A dose based daily QA measurement approach is defined.

Results

Agreement in the static vs. VMAT picket fence control point test was better than 0.5 mm. Control point tests varying gantry rotation speed, leaf speed and dose rate, demonstrated agreement with predicted values better than 1%. Angular dependence of the MatriXX array, varied over a range of 0.94–1.06, with respect to the calibration condition. Phantom measurements demonstrated central axis dose accuracy for un-modulated four field box plans was ≥2.5% vs. 1% with and without angular correction respectively with better results for VMAT (0.4%) vs. IMRT (1.6%) plans. Daily QA results demonstrated average agreement all three chambers within 0.4% over 9 month period with no false positives at a 3% threshold.

Discussion

The methodology described is simple in design and characterizes both the inherit limitations of the measurement system as well at the dose based measurements that may be directly related to patient plan QA.  相似文献   

14.

Background

As schizophrenia is genetically and phenotypically heterogeneous, targeting genetically informative phenotypes may help identify greater linkage signals. The aim of the study is to evaluate the genetic linkage evidence for schizophrenia in subsets of families with earlier age at onset or greater neurocognitive deficits.

Methods

Patients with schizophrenia (n  =  1,207) and their first-degree relatives (n  =  1,035) from 557 families with schizophrenia were recruited from six data collection field research centers throughout Taiwan. Subjects completed a face-to-face semi-structured interview, the Continuous Performance Test (CPT), the Wisconsin Card Sorting Test, and were genotyped with 386 microsatellite markers across the genome.

Results

A maximum nonparametric logarithm of odds (LOD) score of 4.17 at 2q22.1 was found in 295 families ranked by increasing age at onset, which had significant increases in the maximum LOD score compared with those obtained in initial linkage analyses using all available families. Based on this subset, a further subsetting by false alarm rate on the undegraded and degraded CPT obtained further increase in the nested subset-based LOD on 2q22.1, with a score of 7.36 in 228 families and 7.71 in 243 families, respectively.

Conclusion

We found possible evidence of linkage on chromosome 2q22.1 in families of schizophrenia patients with more CPT false alarm rates nested within the families with younger age at onset. These results highlight the importance of incorporating genetically informative phenotypes in unraveling the complex genetics of schizophrenia.  相似文献   

15.

Background

Clinicians are sometimes advised to make decisions using thresholds in measured variables, derived from prognostic studies.

Objectives

We studied why there are conflicting apparently-optimal prognostic thresholds, for example in exercise peak oxygen uptake (pVO2), ejection fraction (EF), and Brain Natriuretic Peptide (BNP) in heart failure (HF).

Data Sources and Eligibility Criteria

Studies testing pVO2, EF or BNP prognostic thresholds in heart failure, published between 1990 and 2010, listed on Pubmed.

Methods

First, we examined studies testing pVO2, EF or BNP prognostic thresholds. Second, we created repeated simulations of 1500 patients to identify whether an apparently-optimal prognostic threshold indicates step change in risk.

Results

33 studies (8946 patients) tested a pVO2 threshold. 18 found it prognostically significant: the actual reported threshold ranged widely (10–18 ml/kg/min) but was overwhelmingly controlled by the individual study population''s mean pVO2 (r = 0.86, p<0.00001). In contrast, the 15 negative publications were testing thresholds 199% further from their means (p = 0.0001). Likewise, of 35 EF studies (10220 patients), the thresholds in the 22 positive reports were strongly determined by study means (r = 0.90, p<0.0001). Similarly, in the 19 positives of 20 BNP studies (9725 patients): r = 0.86 (p<0.0001).Second, survival simulations always discovered a “most significant” threshold, even when there was definitely no step change in mortality. With linear increase in risk, the apparently-optimal threshold was always near the sample mean (r = 0.99, p<0.001).

Limitations

This study cannot report the best threshold for any of these variables; instead it explains how common clinical research procedures routinely produce false thresholds.

Key Findings

First, shifting (and/or disappearance) of an apparently-optimal prognostic threshold is strongly determined by studies'' average pVO2, EF or BNP. Second, apparently-optimal thresholds always appear, even with no step in prognosis.

Conclusions

Emphatic therapeutic guidance based on thresholds from observational studies may be ill-founded. We should not assume that optimal thresholds, or any thresholds, exist.  相似文献   

16.

Background

Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.

Principal Findings

Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.

Conclusions

We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.  相似文献   

17.

Background

Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable.

Methodology

Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs.

Conclusions

With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/~kfwong/order1scfg.  相似文献   

18.

Background

Can sequence segments coding for subcellular targeting or for posttranslational modifications occur in proteins that are not substrates in either of these processes? Although considerable effort has been invested in achieving low false-positive prediction rates, even accurate sequence-analysis tools for the recognition of these motifs generate a small but noticeable number of protein hits that lack the appropriate biological context but cannot be rationalized as false positives.

Results

We show that the carboxyl termini of a set of definitely non-peroxisomal proteins with predicted peroxisomal targeting signals interact with the peroxisomal matrix protein receptor peroxin 5 (PEX5) in a yeast two-hybrid test. Moreover, we show that examples of these proteins - chicken lysozyme, human tyrosinase and the yeast mitochondrial ribosomal protein L2 (encoded by MRP7) - are imported into peroxisomes in vivo if their original sorting signals are disguised. We also show that even prokaryotic proteins can contain peroxisomal targeting sequences.

Conclusions

Thus, functional localization signals can evolve in unrelated protein sequences as a result of neutral mutations, and subcellular targeting is hierarchically organized, with signal accessibility playing a decisive role. The occurrence of silent functional motifs in unrelated proteins is important for the development of sequence-based function prediction tools and the interpretation of their results. Silent functional signals have the potential to acquire importance in future evolutionary scenarios and in pathological conditions.  相似文献   

19.
20.

Objectives

We conducted a comparative review of clinical practice guideline development handbooks. We aimed to identify the main guideline development tasks, assign weights to the importance of each task using expert opinions and identify the handbooks that provided a comprehensive coverage of the tasks.

Methods

We systematically searched and included handbooks published (in English language) by national, international or professional bodies responsible for evidenced-based guideline development. We reviewed the handbooks to identify the main guideline development tasks and scored each handbook for each task from 0 (the handbook did not mention the task) to 2 (the task suitably addressed and explained), and calculated a weighted score for each handbook. The tasks included in over 75% of the handbooks were considered as ‘necessary’ tasks.

Result

Nineteen guideline development handbooks and twenty seven main tasks were identified. The guideline handbooks’ weighted scores ranged from 100 to 220. Four handbooks scored over 80% of the maximum possible score, developed by the National Institute for Health and Clinical Excellence, Swiss Centre for International Health, Scottish Intercollegiate Guidelines Network and World Health Organization. Necessary tasks were: selecting the guideline topic, determining the guideline scope, identifying relevant existing guidelines, involving the consumers, forming guideline development group,, developing clinical questions, systematic search for evidence, selecting relevant evidence, appraising identifies research evidence, making group decision, grading available evidence, creating recommendations, final stakeholder consultation, guideline implementation strategies, updating recommendations and correcting potential errors.

Discussion

Adequate details for evidence based development of guidelines were still lacking from many handbooks. The tasks relevant to ethical issues and piloting were missing in most handbooks. The findings help decision makers in identifying the necessary tasks for guideline development, provide an updated comparative list of guideline development handbooks, and provide a checklist to assess the comprehensiveness of guideline development processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号