首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Sample mix-ups can arise during sample collection, handling, genotyping or data management. It is unclear how often sample mix-ups occur in genome-wide studies, as there currently are no post hoc methods that can identify these mix-ups in unrelated samples. We have therefore developed an algorithm (MixupMapper) that can both detect and correct sample mix-ups in genome-wide studies that study gene expression levels. RESULTS: We applied MixupMapper to five publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in one of the datasets 23% of the samples had incorrect expression phenotypes. The consequences of sample mix-ups are substantial: when we corrected these sample mix-ups, we identified on average 15% more significant cis-expression quantitative trait loci (cis-eQTLs). In one dataset, we identified three times as many significant cis-eQTLs after correction. Furthermore, we show through simulations that sample mix-ups can lead to an underestimation of the explained heritability of complex traits in genome-wide association datasets. Availability and implementation: MixupMapper is freely available at http://www.genenetwork.nl/mixupmapper/  相似文献   

2.
3.
The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively.In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.  相似文献   

4.
Significance analysis of groups of genes in expression profiling studies   总被引:1,自引:0,他引:1  
MOTIVATION: Gene class testing (GCT) is a statistical approach to determine whether some functionally predefined classes of genes express differently under two experimental conditions. GCT computes the P-value of each gene class based on the null distribution and the gene classes are ranked for importance in accordance with their P-values. Currently, two null hypotheses have been considered: the Q1 hypothesis tests the relative strength of association with the phenotypes among the gene classes, and the Q2 hypothesis assesses the statistical significance. These two hypotheses are related but not equivalent. METHOD: We investigate three one-sided and two two-sided test statistics under Q1 and Q2. The null distributions of gene classes under Q1 are generated by permuting gene labels and the null distributions under Q2 are generated by permuting samples. RESULTS: We applied the five statistics to a diabetes dataset with 143 gene classes and to a breast cancer dataset with 508 GO (Gene Ontology) terms. In each statistic, the null distributions of the gene classes under Q1 are different from those under Q2 in both datasets, and their rankings can be different too. We clarify the one-sided and two-sided hypotheses, and discuss some issues regarding the Q1 and Q2 hypotheses for gene class ranking in the GCT. Because Q1 does not deal with correlations among genes, we prefer test based on Q2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.  相似文献   

6.
In‐depth proteomic analyses offer a systematic way to investigate protein alterations in disease and, as such, can be a powerful tool for the identification of novel biomarkers. Here, we analyzed proteomic data from a transgenic mouse model with cardiac‐specific overexpression of activated calcineurin (CnA), which results in severe cardiac hypertrophy. We applied statistically filtering and false discovery rate correction methods to identify 52 proteins that were significantly different in the CnA hearts compared to controls. Subsequent informatic analysis consisted of comparison of these 52 CnA proteins to another proteomic dataset of heart failure, three available independent microarray datasets, and correlation of their expression with the human plasma and urine proteome. Following this filtering strategy, four proteins passed these selection criteria, including myosin heavy chain 7, insulin‐like growth factor‐binding protein 7, annexin A2, and desmin. We assessed expression levels of these proteins in mouse plasma by immunoblotting, and observed significantly different levels of expression between healthy and failing mice for all four proteins. We verified antibody cross‐reactivity by examining human cardiac explant tissue by immunoblotting. Finally, we assessed protein levels in plasma samples obtained from four unaffected and four heart failure patients and demonstrated that all four proteins increased between twofold and 150‐fold in heart failure. We conclude that MYH7, IGFBP7, ANXA2, and DESM are all excellent candidate plasma biomarkers of heart failure in mouse and human.  相似文献   

7.
Citalopram (CITA) is available as a racemic mixture and as a pure enantiomer. Its antidepressive action is related to the (+)-(S)-CITA and to the metabolite (+)-(S)-demethylcitalopram (DCITA). In the present investigation, a method for the analysis of CITA and DCITA enantiomers in human and rat plasma was developed and applied to the study of pharmacokinetics. Plasma samples (1 ml) were extracted at pH 9.0 with toluene:isoamyl alcohol (9:1, v/v). The CITA and DCITA enantiomers were analyzed by LC-MS/MS on a Chiralcel OD-R column. Recovery was higher than 70% for both enantiomers. The quantification limit was 0.1 ng/ml, and linearity was observed up to 500 ng/ml plasma for each CITA and DCITA enantiomer. The method was applied to the study of the kinetic disposition of CITA administered in a single oral dose of 20 mg to a healthy volunteer and in a single dose of 20 mg/kg (by gavage) to Wistar rats (n = 6 for each time). The results showed a higher proportion of the (-)-(R)-CITA in human and rat plasma, with S/R AUC ratios for CITA of 0.28 and 0.44, respectively. S/R AUC ratios of DCITA were 0.48 for rats and 1.04 for the healthy volunteer.  相似文献   

8.
To identify potential biomarkers of lung cancer (LC), profiling of proteins in sera obtained from healthy and LC patients was determined using an antibody microarray. Based on our previous study on mRNA expression profiles between patients with LC and healthy persons, 19 proteins of interest were selected as targets for fabrication of an antibody microarray. Antibody to each protein and five nonspecific control antibodies were spotted onto a hydrogel‐coated glass slide and used for profiling of proteins in sera of LC patients in a two‐color fluorescence assay. Forty‐eight human sera samples were analyzed, and expression profiling of proteins were represented by the internally normalized ratio method. Six proteins were distinctly down‐regulated in sera of LC patients; this observation was validated by Wilcoxon test, false discovery rate, and Western blotting. Blind test of other 32 human sera using the antibody microarray followed by hierarchical clustering analysis revealed an approximate sensitivity of 88%, specificity of 80%, and an accuracy of 84%, respectively, in classifying the sera, which supports the potential of the six identified proteins as biomarkers for the prognosis of lung cancer.  相似文献   

9.
The fetal protein, fetuin, has previously only been identified in species belonging to the order Artiodactyla. Samples of fetal, newborn and adult human (Homo sapiens) and rat (Rattus norvegicus) plasma and tissues have been studied using three techniques: (a) crossed immunoelectrophoresis of plasma against each of four different anti-fetuin antisera (two anti-cattle, one anti-pig and one anti-sheep); (b) the peroxidase-antiperoxidase technique applied to agarose gels containing plasma spots; (c) the indirect immunoperoxidase technique applied to human fetal tissue sections. In human fetal samples all three methods gave evidence for the presence of fetuin except late in gestation and in the newborn. Adult plasma was negative. In rat fetuses only plasma was tested, by methods (a) and (b). Positive reactions were obtained for both fetal and adult samples; the fetal samples cross-reacted with several of the anti-fetuins, adult samples reacted with only one. All the fetal and embryonal plasma samples tested with the peroxidase-antiperoxidase method were positive for fetuin except for the chicken. Thus fetuin appears to be distributed in at least five mammalian orders (Artiodactyla, Primates, Rodentia, Carnivora and Perissodactyla).  相似文献   

10.
Yan SK  Wu DX  Singh A  Li YL  Wei WS  Cui Y  Wang SL  Xu GB 《应用生态学报》2011,22(4):1067-1074
This paper presented a new and simple assessment method for the quality of ecological monitoring data. This method theorized the associations between the data reliability as an ordinal variable with different number of classes and the data sources such as natural main ecological processes, secondary ecological processes, and extraneous and exotic processes, and offered a new data quality index to estimate the quality of the whole dataset by using the reasonableness ratio of observations. The assessment results provided the reliability class of each dataset, good explanations for outlier (or error data) flagging decisions, and quality value of the whole dataset. The method was applied to assess two tree growth datasets from Chinese Ecosystem Research Network (CERN), and the results demonstrated that the new data quality index could quantitatively evaluate the quality of the tree growth datasets. The new method would facilitate the development of corresponding software.  相似文献   

11.
H. Pardoe  J. Dobson 《Biometals》1999,12(1):77-82
Isothermal remanent magnetization was measured in 14 Wistar and five Porton rat brains. Results indicate that magnetic iron biominerals are present in most of the samples and the formation of these minerals in the rat brain is influenced by transfusion and dietary iron loading when compared to control samples. The high level of consistency in the concentrations and the lack of magnetic material in several of the measured samples indicates that a genetic mechanism may be responsible for magnetic iron biomineralization in the rat brain. Comparison with human studies indicates that extrapolation of the results of rat studies of electromagnetic field bioeffects may not be accurately extrapolated to humans in all cases  相似文献   

12.
In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.  相似文献   

13.
Comparative studies of retinol transport in plasma   总被引:1,自引:0,他引:1  
The comparative immunology and biochemistry of plasma retinol transport were studied using radioimmunoassays previously developed for human and for rat retinol-binding protein (RBP). Serum or plasma from 25 species of verebrates, from the mammalian orders Primates, Artiodactyla, Perissodactyla, Carnivora, and Rodentia and from the classes Aves, Amphibia, and Pisces, were assayed. There was a high degree of immunological specificity within a given mammalian order. Sera from seven subhuman primate species tested reacted in the human RBP immunoassay, and sera from four of five rodents reacted in the rat RBP immunoassay. Primate sera failed to react in the rat RBP immunoassay, and rodent sera failed to react in the human RBP immunoassay. Except for a slight reactivity of canine serum in the human RBP immunoassay, other sera showed no immunoreactivity. Using gel filtration, apparent molecular weights were estimated at 60,000-80,000 for the retinol transport systems in whole serum from cow, swine, chicken, and dog. Canine RBP was isolated and partially characterized. Purified canine RBP was generally similar to human and rat RBP with regard to molecular weight (approximately 20,000) and other properties. In plasma, canine RBP circulates as a protein-protein complex of higher apparent molecular weight. The complex remains to be characterized. These data suggest that mammals in general have a retinol transport system similar to the human and rat transport systems but that immunologically important differences in RBP occur among mammalian orders.  相似文献   

14.
15.
The structure and function of diverse microbial communities is underpinned by ecological interactions that remain uncharacterized. With rapid adoption of next-generation sequencing for studying microbiomes, data-driven inference of microbial interactions based on abundance correlations is widely used, but with the drawback that ecological interpretations may not be possible. Leveraging cross-sectional microbiome datasets for unravelling ecological structure in a scalable manner thus remains an open problem. We present an expectation-maximization algorithm (BEEM-Static) that can be applied to cross-sectional datasets to infer interaction networks based on an ecological model (generalized Lotka-Volterra). The method exhibits robustness to violations in model assumptions by using statistical filters to identify and remove corresponding samples. Benchmarking against 10 state-of-the-art correlation based methods showed that BEEM-Static can infer presence and directionality of ecological interactions even with relative abundance data (AUC-ROC>0.85), a task that other methods struggle with (AUC-ROC<0.63). In addition, BEEM-Static can tolerate a high fraction of samples (up to 40%) being not at steady state or coming from an alternate model. Applying BEEM-Static to a large public dataset of human gut microbiomes (n = 4,617) identified multiple stable equilibria that better reflect ecological enterotypes with distinct carrying capacities and interactions for key species.ConclusionBEEM-Static provides new opportunities for mining ecologically interpretable interactions and systems insights from the growing corpus of microbiome data.  相似文献   

16.
Camera traps are a method for monitoring wildlife and they collect a large number of pictures. The number of images collected of each species usually follows a long-tail distribution, i.e., a few classes have a large number of instances, while a lot of species have just a small percentage. Although in most cases these rare species are the ones of interest to ecologists, they are often neglected when using deep-learning models because these models require a large number of images for the training. In this work, a simple and effective framework called Square-Root Sampling Branch (SSB) is proposed, which combines two classification branches that are trained using square-root sampling and instance sampling to improve long-tail visual recognition, and this is compared to state-of-the-art methods for handling this task: square-root sampling, class-balanced focal loss, and balanced group softmax. To achieve a more general conclusion, the methods for handling long-tail visual recognition were systematically evaluated in four families of computer vision models (ResNet, MobileNetV3, EfficientNetV2, and Swin Transformer) and four camera-trap datasets with different characteristics. Initially, a robust baseline with the most recent training tricks was prepared and, then, the methods for improving long-tail recognition were applied. Our experiments show that square-root sampling was the method that most improved the performance for minority classes by around 15%; however, this was at the cost of reducing the majority classes' accuracy by at least 3%. Our proposed framework (SSB) demonstrated itself to be competitive with the other methods and achieved the best or the second-best results for most of the cases for the tail classes; but, unlike the square-root sampling, the loss in the performance of the head classes was minimal, thus achieving the best trade-off among all the evaluated methods. Our experiments also show that Swin Transformer can achieve high performance for rare classes without applying any additional method for handling imbalance, and attains an overall accuracy of 88.76% for the WCS dataset and 94.97% for Snapshot Serengeti using a location-based training/test partition. Despite the improvement in the tail classes' performance, our experiments highlight the need for better methods for handling long-tail visual recognition in camera-trap images, since state-of-the-art approaches achieve poor performance, especially in classes with just a few training instances.  相似文献   

17.
Accurate and robust brain extraction is a critical step in most neuroimaging analysis pipelines. In particular, for the large-scale multi-site neuroimaging studies involving a significant number of subjects with diverse age and diagnostic groups, accurate and robust extraction of the brain automatically and consistently is highly desirable. In this paper, we introduce population-specific probability maps to guide the brain extraction of diverse subject groups, including both healthy and diseased adult human populations, both developing and aging human populations, as well as non-human primates. Specifically, the proposed method combines an atlas-based approach, for coarse skull-stripping, with a deformable-surface-based approach that is guided by local intensity information and population-specific prior information learned from a set of real brain images for more localized refinement. Comprehensive quantitative evaluations were performed on the diverse large-scale populations of ADNI dataset with over 800 subjects (55∼90 years of age, multi-site, various diagnosis groups), OASIS dataset with over 400 subjects (18∼96 years of age, wide age range, various diagnosis groups), and NIH pediatrics dataset with 150 subjects (5∼18 years of age, multi-site, wide age range as a complementary age group to the adult dataset). The results demonstrate that our method consistently yields the best overall results across almost the entire human life span, with only a single set of parameters. To demonstrate its capability to work on non-human primates, the proposed method is further evaluated using a rhesus macaque dataset with 20 subjects. Quantitative comparisons with popularly used state-of-the-art methods, including BET, Two-pass BET, BET-B, BSE, HWA, ROBEX and AFNI, demonstrate that the proposed method performs favorably with superior performance on all testing datasets, indicating its robustness and effectiveness.  相似文献   

18.
Plants, the only natural source of oxygen, are the most important resources for every species in the world. A proper identification of plants is important for different fields. The observation of leaf characteristics is a popular method as leaves are easily available for examination. Researchers are increasingly applying image processing techniques for the identification of plants based on leaf images. In this paper, we have proposed a leaf image classification model, called BLeafNet, for plant identification, where the concept of deep learning is combined with Bonferroni fusion learning. Initially, we have designed five classification models, using ResNet-50 architecture, where five different inputs are separately used in the models. The inputs are the five variants of the leaf grayscale images, RGB, and three individual channels of RGB - red, green, and blue. For fusion of the five ResNet-50 outputs, we have used the Bonferroni mean operator as it expresses better connectivity among the confidence scores, and it also obtains better results than the individual models. We have also proposed a two-tier training method for properly training the end-to-end model. To evaluate the proposed model, we have used the Malayakew dataset, collected at the Royal Botanic Gardens in New England, which is a very challenging dataset as many leaves from different species have a very similar appearance. Besides, the proposed method is evaluated using the Leafsnap and the Flavia datasets. The obtained results on both the datasets confirm the superiority of the model as it outperforms the results achieved by many state-of-the-art models.  相似文献   

19.
Insulin-like growth factors (IGFs) together with their binding proteins (BPs) are potential regulators of folliculogenesis in mammalian ovary. To identify the various species of IGFBPs present in the ovary, we have undertaken a comprehensive purification scheme using gel filtration, ligand-affinity chromatography, and several steps of reverse phase HPLC to isolate all of the BPs in pig ovarian follicular fluid. Our effort yielded five distinct IGFBPs, and upon analysis, they were found to correspond to the previously identified human and rat IGFBP-2, -3, -4, -5, and -6. IGFBP-1 was not found in the pig ovarian follicular fluid under our experimental procedure. Of the six known classes of IGFBPs, the complete primary structures of the first five have been determined, but not IGFBP-6. Using amino acid sequence information from a tryptic fragment of pig IGFBP-6 to prepare a probe, cDNA clones encoding rat and human IGFBP-6 have been isolated and characterized. The deduced amino acid sequence revealed that rat IGFBP-6 contains 201 amino acids with a calculated mol wt of 21,461, while the human homolog contains 216 amino acids with a calculated mol wt of 22,847. In addition, a distinctive feature of human and rat IGFBP-6 is that they lack, respectively, two and four of the 18 homologous cysteines that are present in all other five IGFBPs. The missing cysteines in IGFBP-6 resulted in the absence of the invariant Gly-Cys-Gly-Cys-Cys sequence in the amino-terminal region of the molecule. Human IGFBP-6 possesses a single Asn-linked glycosylation site near the carboxyl-terminal, whereas no potential Asn-linked glycosylation sites are present in the rat sequence. A single 1.3-kilobase IGFBP-6 mRNA was detected by Northern analysis in all rat tissues examined, including testis, intestine, adrenal, kidney, stomach, spleen, heart, lung, brain, and liver, indicating that this BP is a ubiquitous protein. The chromosome location of the IGFBP-6 gene in human has been determined using polymerase chain reaction on somatic cell hybrid DNAs of human and hamster, and the results showed that it is located on chromosome 12.  相似文献   

20.
Deep learning based retinopathy classification with optical coherence tomography (OCT) images has recently attracted great attention. However, existing deep learning methods fail to work well when training and testing datasets are different due to the general issue of domain shift between datasets caused by different collection devices, subjects, imaging parameters, etc. To address this practical and challenging issue, we propose a novel deep domain adaptation (DDA) method to train a model on a labeled dataset and adapt it to an unlabelled dataset (collected under different conditions). It consists of two modules for domain alignment, that is, adversarial learning and entropy minimization. We conduct extensive experiments on three public datasets to evaluate the performance of the proposed method. The results indicate that there are large domain shifts between datasets, resulting a poor performance for conventional deep learning methods. The proposed DDA method can significantly outperform existing methods for retinopathy classification with OCT images. It achieves retinopathy classification accuracies of 0.915, 0.959 and 0.990 under three cross-domain (cross-dataset) scenarios. Moreover, it obtains a comparable performance with human experts on a dataset where no labeled data in this dataset have been used to train the proposed DDA method. We have also visualized the learnt features by using the t-distributed stochastic neighbor embedding (t-SNE) technique. The results demonstrate that the proposed method can learn discriminative features for retinopathy classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号