首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Haoyan Hu  Yumou Qiu 《Biometrics》2023,79(2):1173-1186
Partial correlation is a common tool in studying conditional dependence for Gaussian distributed data. However, partial correlation being zero may not be equivalent to conditional independence under non-Gaussian distributions. In this paper, we propose a statistical inference procedure for partial correlations under the high-dimensional nonparanormal (NPN) model where the observed data are normally distributed after certain monotone transformations. The NPN partial correlation is the partial correlation of the normal transformed data under the NPN model, which is a more general measure of conditional dependence. We estimate the NPN partial correlations by regularized nodewise regression based on the empirical ranks of the original data. A multiple testing procedure is proposed to identify the nonzero NPN partial correlations. The proposed method can be carried out by a simple coordinate descent algorithm for lasso optimization. It is easy-to-implement and computationally more efficient compared to the existing methods for estimating NPN graphical models. Theoretical results are developed to show the asymptotic normality of the proposed estimator and to justify the proposed multiple testing procedure. Numerical simulations and a case study on brain imaging data demonstrate the utility of the proposed procedure and evaluate its performance compared to the existing methods. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.  相似文献   

2.
Pairwise correlations are currently a popular way to estimate a large-scale network (> 1000 nodes) from functional magnetic resonance imaging data. However, this approach generally results in a poor representation of the true underlying network. The reason is that pairwise correlations cannot distinguish between direct and indirect connectivity. As a result, pairwise correlation networks can lead to fallacious conclusions; for example, one may conclude that a network is a small-world when it is not. In a simulation study and an application to resting-state fMRI data, we compare the performance of pairwise correlations in large-scale networks (2000 nodes) against three other methods that are designed to filter out indirect connections. Recovery methods are evaluated in four simulated network topologies (small world or not, scale-free or not) in scenarios where the number of observations is very small compared to the number of nodes. Simulations clearly show that pairwise correlation networks are fragmented into separate unconnected components with excessive connectedness within components. This often leads to erroneous estimates of network metrics, like small-world structures or low betweenness centrality, and produces too many low-degree nodes. We conclude that using partial correlations, informed by a sparseness penalty, results in more accurate networks and corresponding metrics than pairwise correlation networks. However, even with these methods, the presence of hubs in the generating network can be problematic if the number of observations is too small. Additionally, we show for resting-state fMRI that partial correlations are more robust than correlations to different parcellation sets and to different lengths of time-series.  相似文献   

3.
Generalized estimating equation (GEE) is widely adopted for regression modeling for longitudinal data, taking account of potential correlations within the same subjects. Although the standard GEE assumes common regression coefficients among all the subjects, such an assumption may not be realistic when there is potential heterogeneity in regression coefficients among subjects. In this paper, we develop a flexible and interpretable approach, called grouped GEE analysis, to modeling longitudinal data with allowing heterogeneity in regression coefficients. The proposed method assumes that the subjects are divided into a finite number of groups and subjects within the same group share the same regression coefficient. We provide a simple algorithm for grouping subjects and estimating the regression coefficients simultaneously, and show the asymptotic properties of the proposed estimator. The number of groups can be determined by the cross validation with averaging method. We demonstrate the proposed method through simulation studies and an application to a real data set.  相似文献   

4.
Ultraviolet absorption provides the nearly universal basis for determining concentrations of nucleic acids. Values for the UV extinction coefficients of DNA and RNA rely on the mononucleotide values determined 30–50 years ago. We show that nearly all of the previously published extinction coefficients for the nucleoside-5′-monophosphates are too large, and in error by as much as 7%. Concentrations based on complete hydrolysis and the older set of values are too low by ~4% for typical RNA and 2–3% for typical DNA samples. We also analyzed data in the literature for the extinction coefficients of unpaired DNA oligomers. Robust prediction of concentrations can be made using 38 µg/A260 unit for single-stranded DNA (ssDNA) having non-repetitive sequences and 40–80% GC. This is superior to currently used predictions that account for nearest-neighbor frequency or base composition. The latter result in concentrations that are 10–30% too low for typical ssDNA used as primers for PCR and other similar techniques. Methods are described here to accurately measure concentrations of nucleotides by nuclear magnetic resonance. NMR can be used to accurately determine concentrations (and extinction coefficients) of biomolecules within 1%.  相似文献   

5.
Short-term memory can be defined as the capacity for holding a small amount of information in mind in an active state for a short period of time. Although some instruments have been developed to study spatial short-term memory in real environments, there are no instruments that are specifically designed to assess visuospatial short-term memory in an attractive way to children. In this paper, we present the ARSM (Augmented Reality Spatial Memory) task, the first Augmented Reality task that involves a user''s movement to assess spatial short-term memory in healthy children. The experimental procedure of the ARSM task was designed to assess the children''s skill to retain visuospatial information. They were individually asked to remember the real place where augmented reality objects were located. The children (N = 76) were divided into two groups: preschool (5–6 year olds) and primary school (7–8 year olds). We found a significant improvement in ARSM task performance in the older group. The correlations between scores for the ARSM task and traditional procedures were significant. These traditional procedures were the Dot Matrix subtest for the assessment of visuospatial short-term memory of the computerized AWMA-2 battery and a parent''s questionnaire about a child''s everyday spatial memory. Hence, we suggest that the ARSM task has high verisimilitude with spatial short-term memory skills in real life. In addition, we evaluated the ARSM task''s usability and perceived satisfaction. The study revealed that the younger children were more satisfied with the ARSM task. This novel instrument could be useful in detecting visuospatial short-term difficulties that affect specific developmental navigational disorders and/or school academic achievement.  相似文献   

6.
In this paper we review the methodological underpinnings of the general pharmacogenetic approach for uncovering genetically-driven treatment effect heterogeneity. This typically utilises only individuals who are treated and relies on fairly strong baseline assumptions to estimate what we term the ‘genetically moderated treatment effect’ (GMTE). When these assumptions are seriously violated, we show that a robust but less efficient estimate of the GMTE that incorporates information on the population of untreated individuals can instead be used. In cases of partial violation, we clarify when Mendelian randomization and a modified confounder adjustment method can also yield consistent estimates for the GMTE. A decision framework is then described to decide when a particular estimation strategy is most appropriate and how specific estimators can be combined to further improve efficiency. Triangulation of evidence from different data sources, each with their inherent biases and limitations, is becoming a well established principle for strengthening causal analysis. We call our framework ‘Triangulation WIthin a STudy’ (TWIST)’ in order to emphasise that an analysis in this spirit is also possible within a single data set, using causal estimates that are approximately uncorrelated, but reliant on different sets of assumptions. We illustrate these approaches by re-analysing primary-care-linked UK Biobank data relating to CYP2C19 genetic variants, Clopidogrel use and stroke risk, and data relating to APOE genetic variants, statin use and Coronary Artery Disease.  相似文献   

7.
Over repeat presentations of the same stimulus, sensory neurons show variable responses. This “noise” is typically correlated between pairs of cells, and a question with rich history in neuroscience is how these noise correlations impact the population''s ability to encode the stimulus. Here, we consider a very general setting for population coding, investigating how information varies as a function of noise correlations, with all other aspects of the problem – neural tuning curves, etc. – held fixed. This work yields unifying insights into the role of noise correlations. These are summarized in the form of theorems, and illustrated with numerical examples involving neurons with diverse tuning curves. Our main contributions are as follows. (1) We generalize previous results to prove a sign rule (SR) — if noise correlations between pairs of neurons have opposite signs vs. their signal correlations, then coding performance will improve compared to the independent case. This holds for three different metrics of coding performance, and for arbitrary tuning curves and levels of heterogeneity. This generality is true for our other results as well. (2) As also pointed out in the literature, the SR does not provide a necessary condition for good coding. We show that a diverse set of correlation structures can improve coding. Many of these violate the SR, as do experimentally observed correlations. There is structure to this diversity: we prove that the optimal correlation structures must lie on boundaries of the possible set of noise correlations. (3) We provide a novel set of necessary and sufficient conditions, under which the coding performance (in the presence of noise) will be as good as it would be if there were no noise present at all.  相似文献   

8.
This paper has two aims: (i) to introduce a novel method for measuring which part of overall citation inequality can be attributed to differences in citation practices across scientific fields, and (ii) to implement an empirical strategy for making meaningful comparisons between the number of citations received by articles in 22 broad fields. The number of citations received by any article is seen as a function of the article’s scientific influence, and the field to which it belongs. A key assumption is that articles in the same quantile of any field citation distribution have the same degree of citation impact in their respective field. Using a dataset of 4.4 million articles published in 1998–2003 with a five-year citation window, we estimate that differences in citation practices between the 22 fields account for 14% of overall citation inequality. Our empirical strategy is based on the strong similarities found in the behavior of citation distributions. We obtain three main results. Firstly, we estimate a set of average-based indicators, called exchange rates, to express the citations received by any article in a large interval in terms of the citations received in a reference situation. Secondly, using our exchange rates as normalization factors of the raw citation data reduces the effect of differences in citation practices to, approximately, 2% of overall citation inequality in the normalized citation distributions. Thirdly, we provide an empirical explanation of why the usual normalization procedure based on the fields’ mean citation rates is found to be equally successful.  相似文献   

9.
Quantification of molecular numbers and concentrations in living cells is critical for testing models of complex biological phenomena. Counting molecules in cells requires estimation of the fluorescence intensity of single molecules, which is generally limited to imaging near cell surfaces, in isolated cells, or where motions are diffusive. To circumvent this difficulty, we have devised a calibration technique for spinning–disk confocal microscopy, commonly used for imaging in tissues, that uses single–step bleaching kinetics to estimate the single–fluorophore intensity. To cross–check our calibrations, we compared the brightness of fluorophores in the SDC microscope to those in the total internal reflection and epifluorescence microscopes. We applied this calibration method to quantify the number of end–binding protein 1 (EB1)–eGFP in the comets of growing microtubule ends and to measure the cytoplasmic concentration of EB1–eGFP in sensory neurons in fly larvae. These measurements allowed us to estimate the dissociation constant of EB1–eGFP from the microtubules as well as the GTP–tubulin cap size. Our results show the unexplored potential of single–molecule imaging using spinning–disk confocal microscopy and provide a straightforward method to count the absolute number of fluorophores in tissues that can be applied to a wide range of biological systems and imaging techniques.  相似文献   

10.
Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1–30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1–30 kb CNV, 1–30 kb deletions, and 1–10 kb deletions in ASD. CNV in the 1–30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1–30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes.  相似文献   

11.
Population dynamic models combine density dependence and environmental effects. Ignoring sampling uncertainty might lead to biased estimation of the strength of density dependence. This is typically addressed using state‐space model approaches, which integrate sampling error and population process estimates. Such models seldom include an explicit link between the sampling procedures and the true abundance, which is common in capture–recapture settings. However, many of the models proposed to estimate abundance in the presence of capture heterogeneity lead to incomplete likelihood functions and cannot be straightforwardly included in state‐space models. We assessed the importance of estimating sampling error explicitly by taking an intermediate approach between ignoring uncertainty in abundance estimates and fully specified state‐space models for density‐dependence estimation based on autoregressive processes. First, we estimated individual capture probabilities based on a heterogeneity model for a closed population, using a conditional multinomial likelihood, followed by a Horvitz–Thompson estimate for abundance. Second, we estimated coefficients of autoregressive models for the log abundance. Inference was performed using the methodology of integrated nested Laplace approximation (INLA). We performed an extensive simulation study to compare our approach with estimates disregarding capture history information, and using R‐package VGAM, for different parameter specifications. The methods were then applied to a real data set of gray‐sided voles Myodes rufocanus from Northern Norway. We found that density‐dependence estimation was improved when explicitly modeling sampling error in scenarios with low process variances, in which differences in coverage reached up to 8% in estimating the coefficients of the autoregressive processes. In this case, the bias also increased assuming a Poisson distribution in the observational model. For high process variances, the differences between methods were small and it appeared less important to model heterogeneity.  相似文献   

12.
Functional connectivity concerns the correlated activity between neuronal populations in spatially segregated regions of the brain, which may be studied using functional magnetic resonance imaging (fMRI). This coupled activity is conveniently expressed using covariance, but this measure fails to distinguish between direct and indirect effects. A popular alternative that addresses this issue is partial correlation, which regresses out the signal of potentially confounding variables, resulting in a measure that reveals only direct connections. Importantly, provided the data are normally distributed, if two variables are conditionally independent given all other variables, their respective partial correlation is zero. In this paper, we propose a probabilistic generative model that allows us to estimate functional connectivity in terms of both partial correlations and a graph representing conditional independencies. Simulation results show that this methodology is able to outperform the graphical LASSO, which is the de facto standard for estimating partial correlations. Furthermore, we apply the model to estimate functional connectivity for twenty subjects using resting-state fMRI data. Results show that our model provides a richer representation of functional connectivity as compared to considering partial correlations alone. Finally, we demonstrate how our approach can be extended in several ways, for instance to achieve data fusion by informing the conditional independence graph with data from probabilistic tractography. As our Bayesian formulation of functional connectivity provides access to the posterior distribution instead of only to point estimates, we are able to quantify the uncertainty associated with our results. This reveals that while we are able to infer a clear backbone of connectivity in our empirical results, the data are not accurately described by simply looking at the mode of the distribution over connectivity. The implication of this is that deterministic alternatives may misjudge connectivity results by drawing conclusions from noisy and limited data.  相似文献   

13.
Measuring leaf area index (LAI) is essential for evaluating crop growth and estimating yield, thereby facilitating high-throughput phenotyping of maize (Zea mays). LAI estimation models use multi-source data from unmanned aerial vehicles (UAVs), but using multimodal data to estimate maize LAI, and the effect of tassels and soil background, remain understudied. Our research aims to (1) determine how multimodal data contribute to LAI and propose a framework for estimating LAI based on remote-sensing data, (2) evaluate the robustness and adaptability of an LAI estimation model that uses multimodal data fusion and deep neural networks (DNNs) in single- and whole growth stages, and (3) explore how soil background and maize tasseling affect LAI estimation. To construct multimodal datasets, our UAV collected red–green–blue, multispectral, and thermal infrared images. We then developed partial least square regression (PLSR), support vector regression, and random forest regression models to estimate LAI. We also developed a deep learning model with three hidden layers. This multimodal data structure accurately estimated maize LAI. The DNN model provided the best estimate (coefficient of determination [R2] = 0.89, relative root mean square error [rRMSE] = 12.92%) for a single growth period, and the PLSR model provided the best estimate (R2 = 0.70, rRMSE = 12.78%) for a whole growth period. Tassels reduced the accuracy of LAI estimation, but the soil background provided additional image feature information, improving accuracy. These results indicate that multimodal data fusion using low-cost UAVs and DNNs can accurately and reliably estimate LAI for crops, which is valuable for high-throughput phenotyping and high-spatial precision farmland management.

Multimodal data fusion (red–green–blue, multispectral, and thermal infrared) using low-cost unmanned aerial vehicles in a deep neural network and machine learning framework estimates maize leaf area index  相似文献   

14.
The Type I-F CRISPR-mediated (clustered regularly interspaced short palindromic repeats) adaptive immune system in Pseudomonas aeruginosa consists of two CRISPR loci and six CRISPR-associated (cas) genes. Foreign DNA surveillance is performed by a complex of Cas proteins (Csy1–4) that assemble with a CRISPR RNA (crRNA) into a 350-kDa ribonucleoprotein called the Csy complex. Here, we show that foreign nucleic acid recognition by the Csy complex proceeds through sequential steps, initiated by detection of two consecutive guanine–cytosine base pairs (G–C/G–C) located adjacent to the complementary DNA target. We show that this motif, called the PAM (protospacer adjacent motif), must be double-stranded and that single-stranded PAMs do not provide significant discriminating power. Binding assays performed with G–C/G–C-rich competitor sequences indicate that the Csy complex interacts directly with this dinucleotide motif, and kinetic analyses reveal that recognition of a G–C/G–C motif is a prerequisite for crRNA-guided binding to a target sequence. Together, these data indicate that the Csy complex first interacts with G–C/G–C base pairs and then samples adjacent target sequences for complementarity to the crRNA guide.  相似文献   

15.
We consider adaptive robust methods for lung cancer that are also dose-reactive, wherein the treatment is modified after each treatment session to account for the dose delivered in prior treatment sessions. Such methods are of interest because they potentially allow for errors in the delivered dose to be corrected as the treatment progresses, thereby ensuring that the tumor receives a sufficient dose at the end of the treatment. We show through a computational study with real lung cancer patient data that while dose reaction is beneficial with respect to the final dose distribution, it may lead to exaggerated daily underdose and overdose relative to non-reactive methods that grows as the treatment progresses. However, by combining dose reaction with a mechanism for updating an estimate of the uncertainty, the magnitude of this growth can be mitigated substantially. The key finding of this paper is that reacting to dose errors – an adaptation strategy that is both simple and intuitively appealing – may backfire and lead to treatments that are clinically unacceptable.  相似文献   

16.
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES''s false positive rate is correct, and that TATES''s statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor.  相似文献   

17.
To assess whether there are universal rules that govern amino acid–base recognition, we investigate hydrogen bonds, van der Waals contacts and water-mediated bonds in 129 protein–DNA complex structures. DNA–backbone interactions are the most numerous, providing stability rather than specificity. For base interactions, there are significant base–amino acid type correlations, which can be rationalised by considering the stereochemistry of protein side chains and the base edges exposed in the DNA structure. Nearly two-thirds of the direct read-out of DNA sequences involves complex networks of hydrogen bonds, which enhance specificity. Two-thirds of all protein–DNA interactions comprise van der Waals contacts, compared to about one-sixth each of hydrogen and water-mediated bonds. This highlights the central importance of these contacts for complex formation, which have previously been relegated to a secondary role. Although common, water-mediated bonds are usually non-specific, acting as space-fillers at the protein–DNA interface. In conclusion, the majority of amino acid–base interactions observed follow general principles that apply across all protein–DNA complexes, although there are individual exceptions. Therefore, we distinguish between interactions whose specificities are ‘universal’ and ‘context-dependent’. An interactive Web-based atlas of side chain–base contacts provides access to the collected data, including analyses and visualisation of the three-dimensional geometry of the interactions.  相似文献   

18.
Biodiversity losses over the next century are predicted to result in alterations of ecosystem functions that are on par with other major drivers of global change. Given the seriousness of this issue, there is a need to effectively monitor global biodiversity. Because performing biodiversity censuses of all taxonomic groups is prohibitively costly, indicator groups have been studied to estimate the biodiversity of different taxonomic groups. Quantifying cross-taxon congruence is a method of evaluating the assumption that the diversity of one taxonomic group can be used to predict the diversity of another. To improve the predictive ability of cross-taxon congruence in aquatic ecosystems, we evaluated whether body size, measured as the ratio of average body length between organismal groups, is a significant predictor of their cross-taxon biodiversity congruence. To test this hypothesis, we searched the published literature and screened for studies that used species richness correlations as their metric of cross-taxon congruence. We extracted 96 correlation coefficients from 16 studies, which encompassed 784 inland water bodies. With these correlation coefficients, we conducted a categorical meta-analysis, grouping data based on the body size ratio of organisms. Our results showed that cross-taxon congruence is variable among sites and between different groups (r values ranging between −0.53 to 0.88). In addition, our quantitative meta-analysis demonstrated that organisms most similar in body size showed stronger species richness correlations than organisms which differed increasingly in size (radj 2 = 0.94, p = 0.02). We propose that future studies applying biodiversity indicators in aquatic ecosystems consider functional traits such as body size, so as to increase their success at predicting the biodiversity of taxonomic groups where cost-effective conservation tools are needed.  相似文献   

19.

Background

EEG studies of working memory (WM) have demonstrated load dependent frequency band modulations. FMRI studies have localized load modulated activity to the dorsolateral prefrontal cortex (DLPFC), medial prefrontal cortex (MPFC), and posterior parietal cortex (PPC). Recently, an EEG-fMRI study found that low frequency band (theta and alpha) activity negatively correlated with the BOLD signal during the retention phase of a WM task. However, the coupling of higher (beta and gamma) frequencies with the BOLD signal during WM is unknown.

Methodology

In 16 healthy adult subjects, we first investigated EEG-BOLD signal correlations for theta (5–7 Hz), alpha1 (8–10), alpha2 (10–12 Hz), beta1 (13–20), beta2 (20–30 Hz), and gamma (30–40 Hz) during the retention period of a WM task with set size 2 and 5. Secondly, we investigated whether load sensitive brain regions are characterised by effects that relate frequency bands to BOLD signals effects.

Principal Findings

We found negative theta-BOLD signal correlations in the MPFC, PPC, and cingulate cortex (ACC and PCC). For alpha1 positive correlations with the BOLD signal were found in ACC, MPFC, and PCC; negative correlations were observed in DLPFC, PPC, and inferior frontal gyrus (IFG). Negative alpha2-BOLD signal correlations were observed in parieto-occipital regions. Beta1-BOLD signal correlations were positive in ACC and negative in precentral and superior temporal gyrus. Beta2 and gamma showed only positive correlations with BOLD, e.g., in DLPFC, MPFC (gamma) and IFG (beta2/gamma). The load analysis revealed that theta and—with one exception—beta and gamma demonstrated exclusively positive load effects, while alpha1 showed only negative effects.

Conclusions

We conclude that the directions of EEG-BOLD signal correlations vary across brain regions and EEG frequency bands. In addition, some brain regions show both load sensitive BOLD and frequency band effects. Our data indicate that lower as well as higher frequency brain oscillations are linked to neurovascular processes during WM.  相似文献   

20.
The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP–CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase–phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号