首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.  相似文献   

2.
3.
There are several lines of evidence supporting the role of de novo mutations as a mechanism for common disorders, such as autism and schizophrenia. First, the de novo mutation rate in humans is relatively high, so new mutations are generated at a high frequency in the population. However, de novo mutations have not been reported in most common diseases. Mutations in genes leading to severe diseases where there is a strong negative selection against the phenotype, such as lethality in embryonic stages or reduced reproductive fitness, will not be transmitted to multiple family members, and therefore will not be detected by linkage gene mapping or association studies. The observation of very high concordance in monozygotic twins and very low concordance in dizygotic twins also strongly supports the hypothesis that a significant fraction of cases may result from new mutations. Such is the case for diseases such as autism and schizophrenia. Second, despite reduced reproductive fitness1 and extremely variable environmental factors, the incidence of some diseases is maintained worldwide at a relatively high and constant rate. This is the case for autism and schizophrenia, with an incidence of approximately 1% worldwide. Mutational load can be thought of as a balance between selection for or against a deleterious mutation and its production by de novo mutation. Lower rates of reproduction constitute a negative selection factor that should reduce the number of mutant alleles in the population, ultimately leading to decreased disease prevalence. These selective pressures tend to be of different intensity in different environments. Nonetheless, these severe mental disorders have been maintained at a constant relatively high prevalence in the worldwide population across a wide range of cultures and countries despite a strong negative selection against them2. This is not what one would predict in diseases with reduced reproductive fitness, unless there was a high new mutation rate. Finally, the effects of paternal age: there is a significantly increased risk of the disease with increasing paternal age, which could result from the age related increase in paternal de novo mutations. This is the case for autism and schizophrenia3. The male-to-female ratio of mutation rate is estimated at about 4–6:1, presumably due to a higher number of germ-cell divisions with age in males. Therefore, one would predict that de novo mutations would more frequently come from males, particularly older males4. A high rate of new mutations may in part explain why genetic studies have so far failed to identify many genes predisposing to complexes diseases genes, such as autism and schizophrenia, and why diseases have been identified for a mere 3% of genes in the human genome. Identification for de novo mutations as a cause of a disease requires a targeted molecular approach, which includes studying parents and affected subjects. The process for determining if the genetic basis of a disease may result in part from de novo mutations and the molecular approach to establish this link will be illustrated, using autism and schizophrenia as examples.  相似文献   

4.
5.
6.
家养动物复杂性状基因定位的统计分析和实验设计   总被引:2,自引:0,他引:2  
YDa 《遗传学报》2003,30(12):1183-1192
复杂性状基因定位的研究是人类、动植物研究中的1个热点领域。在畜禽的研究中,其目的是定位与生产性状、繁殖性状和疾病相关的基因。在人类中,复杂性状基因定位的研究具有极大的挑战性。尽管基因定位的结果积累得很快,但能得以确认的结果却很少。关于畜禽基因定位的研究结果同样也增长很快,目前在鸡、猪、奶牛等物种中几个大尺度的基因定位工作也正在开展中。虽然在不远的将来能够得到新的、可确信的结果,但是如何精确地理解这些复杂性状的基因仍然需要一定的时间。近来,复杂性状基因定位的方法已被用于通过基因表达的数据研究转录调节因子的定位工作中,这是基因定位研究中1个新的领域。基因定位的统计分析和实验设计是基因定位研究中的关键性步骤,研究的目的在于讨论畜禽复杂性状基因定位的统计分析和实验设计的研究进展及今后的发展。  相似文献   

7.
8.
While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.  相似文献   

9.
10.
水稻籼粳亚种间杂交F1通常表现为高度不育,这种不育性的一种遗传学解释称为单位点孢子体-配子体互作模型.为了研究这种不育性,提出了一种统计方法,可以估计单位点孢子体-配子体互作模型中不育基因位点的位置和效应.该方法利用回交群体中呈现异常分离的标记位点,用最大似然法对不育基因与标记位点之间的重组率和雌配子存活率进行估计.由于所依据的是非连续变异的遗传标记的分离,而不是连续分布的配子育性指标,因此可以避免由育性直接估计所带来的重组率结果的不稳定.  相似文献   

11.
Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time‐consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low‐dimensional vector which is combined with topological information extracted from protein–protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.  相似文献   

12.
Surface sediment samples were collected from 10 typical locations throughout the Dongting Lake, China, in January 2009. Samples were assayed by atom absorption spectrophotometer and cold atomic fluorescent spectrophotometer for Pb, Cd, As, Hg. In order to investigate the spatial distribution characteristics, sources, and potential ecological risks of heavy metals, the geostatistics method, potential ecological risk index, and multivariate statistical analysis were applied. The results showed that except for the content of Hg, the contents of Pb, Cd, and As had similar spatial distribution characteristics. The average contents of Cd and As exceeded the second class contents of the National Standard for Soil Environment Quality (GB15618-1995), especially that for Cd. The potential ecological risk posed by these heavy metals decreased in the order of the outlet of the Dongting Lake > the East Dongting Lake > the South Dongting Lake > the West Dongting Lake spatially. From the results of multivariate statistical analysis, Pb and Cd, as the first group, were considered to be rooted in mining smelting processes for developed mining and heavy industry. And Hg, as the second group, was mainly derived from parent material weathering, while As was probably considered to originate from both sources above.  相似文献   

13.
There is a recognized need to design a new framework for sediment toxicity testing that meets current scientific standards and regulatory requirements, such as reliable assessment of toxicity, which prevents any harmful effects on biodiversity, a strong capability to predict population- and community-level effects, and applicability of the results to decision-making. We propose a new framework for prospective sediment toxicity testing, and suggest solutions to the key methodological challenges that hinder establishment of this framework (comparison of sensitivities, design of test batteries, consideration of different exposure routes, extrapolations to population and community levels, use of test results for decision-making). The proposed framework consists of the following three units: test-battery system, higher-tier testing systems and additional ecological modeling, and a decision support system. The key methodologies proposed to establish this framework are compound-tailored test-battery use approach, relative sensitivity distribution analysis, toxicity tests that combine bacteria and arthropods, micro- and mesocosms studies, population and community models, and model-driven decision support systems. The proposed framework, as well as the key methods mentioned above, has the potential to improve not only prospective toxicity testing for sediments, but also ecological risk assessment in general.  相似文献   

14.
In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.  相似文献   

15.

Background

Fetal conditions can in principle be affected by the mother''s genotype working through the prenatal environment.

Methodology/Principal Findings

Genotypes for 1536 SNPs in 357 cleft candidate genes were available from a previous analysis in which we focused on fetal gene effects [1]. After data-cleaning, genotypes for 1315 SNPs in 334 autosomal genes were available for the current analysis of maternal gene effects. Two complementary statistical methods, TRIMM and HAPLIN, were used to detect multi-marker effects in population-based samples from Norway (562 case-parent and 592 control-parent triads) and Denmark (235 case-parent triads). We analyzed isolated cleft lip with or without cleft palate (iCL/P) and isolated cleft palate only (iCP) separately and assessed replication by looking for genes detected in both populations by both methods. In iCL/P, neither TRIMM nor HAPLIN detected more genes than expected by chance alone; furthermore, the selected genes were not replicated across the two methods. In iCP, however, FLNB was identified by both methods in both populations. Although HIC1 and ZNF189 did not fully satisfy our stringency criterion for replication, they were strongly associated with iCP in TRIMM analyses of the Norwegian triads.

Conclusion/Significance

Except for FLNB, HIC1 and ZNF189, maternal genes did not appear to influence the risk of clefting in our data. This is consistent with recent epidemiological findings showing no apparent difference between mother-to-offspring and father-to-offspring recurrence of clefts in these two populations. It is likely that fetal genes make the major genetic contribution to clefting risk in these populations, but we cannot rule out the possibility that maternal genes can affect risk through interactions with specific teratogens or fetal genes.  相似文献   

16.
We amplified resistance gene analogues (RGAs) from the genomic DNA of 10 rice lines having varying degree of resistance to Magnaporthe grisea by using degenerate primers and various RGAs were mapped in silico on different rice chromosomes. The amplified products were grouped into 3–8 restriction fragment length polymorphic classes by using Mbo1 and Alu1 restriction enzymes. Of 98 RGAs obtained in this study, 65 RGA clones showed more than 95% homology with various RGAs sequences present in the GenBank. Phylogenetic analysis of these RGAs formed 11 groups. Using sequence homology approach, RGAs isolated in this study were physically mapped on 23 loci on chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 10, 11 and 12. Twenty RGAs were mapped near to the chromosomal regions containing known genes/QTLs for rice blast, bacterial leaf blight and sheath blight resistance. Thirty‐nine RGA sequences also contained open reading frame representing signature of potential disease resistance genes.  相似文献   

17.
RNA-Seq technologies are quickly revolutionizing genomic studies, and statistical methods for RNA-seq data are under continuous development. Timely review and comparison of the most recently proposed statistical methods will provide a useful guide for choosing among them for data analysis. Particular interest surrounds the ability to detect differential expression (DE) in genes. Here we compare four recently proposed statistical methods, edgeR, DESeq, baySeq, and a method with a two-stage Poisson model (TSPM), through a variety of simulations that were based on different distribution models or real data. We compared the ability of these methods to detect DE genes in terms of the significance ranking of genes and false discovery rate control. All methods compared are implemented in freely available software. We also discuss the availability and functions of the currently available versions of these software.  相似文献   

18.
Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non‐target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess‐zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.  相似文献   

19.
Studies of the behavior of biological systems often require monitoring of the expression of many genes in a large number of samples. While whole-genome arrays provide high-quality gene-expression profiles, their high cost generally limits the number of samples that can be studied. Although inexpensive small-scale arrays representing genes of interest could be used for many applications, it is challenging to obtain accurate measurements with conventional small-scale microarrays. We have developed a small-scale microarray system that yields highly accurate and reproducible expression measurements. This was achieved by implementing a stable gene-based quantile normalization method for array-to-array normalization, and a probe-printing design that allows use of a statistical model to correct for effects of print tips and uneven hybridization. The array measures expression values in a single sample, rather than ratios between two samples. This allows accurate comparisons among many samples. The array typically yielded correlation coefficients higher than 0.99 between technically duplicated samples. Accuracy was demonstrated by a correlation coefficient of 0.88 between expression ratios determined from this array and an Affymetrix GeneChip, by quantitative RT-PCR, and by spiking known amounts of specific RNAs into the RNA samples used for profiling. The array was used to compare the responses of wild-type, rps2 and ndr1 mutant plants to infection by a Pseudomonas syringae strain expressing avrRpt2. The results suggest that ndr1 affects a defense-signaling pathway(s) in addition to the RPS2-dependent pathway, and indicate that the microarray is a powerful tool for systems analyses of the Arabidopsis disease-signaling network.  相似文献   

20.
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号