首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two ‘4-targeted’ and ‘16-targeted’ experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.  相似文献   

2.
Gene expression heterogeneity is a key driver for microbial adaptation to fluctuating environmental conditions, cell differentiation and the evolution of species. This phenomenon has therefore enormous implications, not only for life in general, but also for biotechnological applications where unwanted subpopulations of non-producing cells can emerge in large-scale fermentations. Only time-lapse fluorescence microscopy allows real-time measurements of gene expression heterogeneity. A major limitation in the analysis of time-lapse microscopy data is the lack of fast, cost-effective, open, simple and adaptable protocols. Here we describe TLM-Quant, a semi-automatic pipeline for the analysis of time-lapse fluorescence microscopy data that enables the user to visualize and quantify gene expression heterogeneity. Importantly, our pipeline builds on the open-source packages ImageJ and R. To validate TLM-Quant, we selected three possible scenarios, namely homogeneous expression, highly ‘noisy’ heterogeneous expression, and bistable heterogeneous expression in the Gram-positive bacterium Bacillus subtilis. This bacterium is both a paradigm for systems-level studies on gene expression and a highly appreciated biotechnological ‘cell factory’. We conclude that the temporal resolution of such analyses with TLM-Quant is only limited by the numbers of recorded images.  相似文献   

3.
Genetic linkage maps, permitting the elucidation of genome structure, are one of most powerful genomic tools to accelerate marker-assisted breeding. However, due to a lack of sufficient user-friendly molecular markers, no genetic linkage map has been developed for tree peonies (Paeonia Sect. Moutan), a group of important horticultural plants worldwide. Specific-locus amplified fragment sequencing (SLAF-seq) is a recent molecular marker development technology that enable the large-scale discovery and genotyping of sequence-based marker in genome-wide. In this study, we performed SLAF sequencing of an F1 population, derived from the cross P. ostti ‘FenDanBai’ × P. × suffruticosa ‘HongQiao’, to identify sufficient high-quality markers for the construction of high-density genetic linkage map in tree peonies. After SLAF sequencing, a total of 78 Gb sequencing data and 285,403,225 pair-end reads were generated. We detected 309,198 high-quality SLAFs from these data, of which 85,124 (27.5%) were polymorphic. Subsequently, 3518 of the polymorphic markers, which were successfully encoded in to Mendelian segregation types, and were in conformity with the criteria of high-quality markers, were defined as effective markers and used for genetic linkage mapping. Finally, we constructed an integrated genetic map, which comprised 1189 markers on the five linkage groups, and spanned 920.699 centiMorgans (cM) with an average inter-marker distance of 0.774 cM. There were 1115 ‘SNP-only’ markers, 18 ‘InDel-only’ markers, and 56 ‘SNP&InDel’ markers on the map. Among these markers, 450 (37.85%) showed significant segregation distortion (P < 0.05). In conclusion, this investigation reported the first large-scale marker development and high-density linkage map construction for tree peony. The results of this study will serve as a solid foundation not only for marker-assisted breeding, but also for genome sequence assembly for tree peony.  相似文献   

4.
5.
Arthropod RNA viruses pose a serious threat to human health, yet many aspects of their replication cycle remain incompletely understood. Here we describe a versatile Drosophila toolkit of transgenic, self-replicating genomes (‘replicons’) from Sindbis virus that allow rapid visualization and quantification of viral replication in vivo. We generated replicons expressing Luciferase for the quantification of viral replication, serving as useful new tools for large-scale genetic screens for identifying cellular pathways that influence viral replication. We also present a new binary system in which replication-deficient viral genomes can be activated ‘in trans’, through co-expression of an intact replicon contributing an RNA-dependent RNA polymerase. The utility of this toolkit for studying virus biology is demonstrated by the observation of stochastic exclusion between replicons expressing different fluorescent proteins, when co-expressed under control of the same cellular promoter. This process is analogous to ‘superinfection exclusion’ between virus particles in cell culture, a process that is incompletely understood. We show that viral polymerases strongly prefer to replicate the genome that encoded them, and that almost invariably only a single virus genome is stochastically chosen for replication in each cell. Our in vivo system now makes this process amenable to detailed genetic dissection. Thus, this toolkit allows the cell-type specific, quantitative study of viral replication in a genetic model organism, opening new avenues for molecular, genetic and pharmacological dissection of virus biology and tool development.  相似文献   

6.

Background

Lotus is a diploid plant with agricultural, medicinal, and ecological significance. Genetic linkage maps are fundamental resources for genome and genetic study, and also provide molecular markers for breeding in agriculturally important species. Genotyping by sequencing revolutionized genetic mapping, the restriction-site associated DNA sequencing (RADseq) allowed rapid discovery of thousands of SNPs markers, and a crucial aspect of the sequence based mapping strategy is the reference sequences used for marker identification.

Results

We assessed the effectiveness of linkage mapping using three types of references for scoring markers: the unmasked genome, repeat masked genome, and gene models. Overall, the repeat masked genome produced the optimal genetic maps. A high-density genetic map of American lotus was constructed using an F1 population derived from a cross between Nelumbo nucifera ‘China Antique’ and N. lutea ‘AL1’. A total of 4,098 RADseq markers were used to construct the American lotus ‘AL1’ genetic map, and 147 markers were used to construct the Chinese lotus ‘China Antique’ genetic map. The American lotus map has 9 linkage groups, and spans 494.3 cM, with an average distance of 0.7 cM between adjacent markers. The American lotus map was used to anchor scaffold sequences in the N. nucifera ‘China Antique’ draft genome. 3,603 RADseq markers anchored 234 individual scaffold sequences into 9 megascaffolds spanning 67% of the 804 Mb draft genome.

Conclusions

Among the unmasked genome, repeat masked genome and gene models, the optimal reference sequences to call RADseq markers for map construction is repeat masked genome. This high density genetic map is a valuable resource for genomic research and crop improvement in lotus.  相似文献   

7.
Linkage maps are valuable tools in genetic and genomic studies. For sweet cherry, linkage maps have been constructed using mainly microsatellite markers (SSRs) and, recently, using single nucleotide polymorphism markers (SNPs) from a cherry 6K SNP array. Genotyping-by-sequencing (GBS), a new methodology based on high-throughput sequencing, holds great promise for identification of high number of SNPs and construction of high density linkage maps. In this study, GBS was used to identify SNPs from an intra-specific sweet cherry cross. A total of 8,476 high quality SNPs were selected for mapping. The physical position for each SNP was determined using the peach genome, Peach v1.0, as reference, and a homogeneous distribution of markers along the eight peach scaffolds was obtained. On average, 65.6% of the SNPs were present in genic regions and 49.8% were located in exonic regions. In addition to the SNPs, a group of SSRs was also used for construction of linkage maps. Parental and consensus high density maps were constructed by genotyping 166 siblings from a ‘Rainier’ x ‘Rivedel’ (Ra x Ri) cross. Using Ra x Ri population, 462, 489 and 985 markers were mapped into eight linkage groups in ‘Rainier’, ‘Rivedel’ and the Ra x Ri map, respectively, with 80% of mapped SNPs located in genic regions. Obtained maps spanned 549.5, 582.6 and 731.3 cM for ‘Rainier’, ‘Rivedel’ and consensus maps, respectively, with an average distance of 1.2 cM between adjacent markers for both ‘Rainier’ and ‘Rivedel’ maps and of 0.7 cM for Ra x Ri map. High synteny and co-linearity was observed between obtained maps and with Peach v1.0. These new high density linkage maps provide valuable information on the sweet cherry genome, and serve as the basis for identification of QTLs and genes relevant for the breeding of the species.  相似文献   

8.
Simple cells in primary visual cortex were famously found to respond to low-level image components such as edges. Sparse coding and independent component analysis (ICA) emerged as the standard computational models for simple cell coding because they linked their receptive fields to the statistics of visual stimuli. However, a salient feature of image statistics, occlusions of image components, is not considered by these models. Here we ask if occlusions have an effect on the predicted shapes of simple cell receptive fields. We use a comparative approach to answer this question and investigate two models for simple cells: a standard linear model and an occlusive model. For both models we simultaneously estimate optimal receptive fields, sparsity and stimulus noise. The two models are identical except for their component superposition assumption. We find the image encoding and receptive fields predicted by the models to differ significantly. While both models predict many Gabor-like fields, the occlusive model predicts a much sparser encoding and high percentages of ‘globular’ receptive fields. This relatively new center-surround type of simple cell response is observed since reverse correlation is used in experimental studies. While high percentages of ‘globular’ fields can be obtained using specific choices of sparsity and overcompleteness in linear sparse coding, no or only low proportions are reported in the vast majority of studies on linear models (including all ICA models). Likewise, for the here investigated linear model and optimal sparsity, only low proportions of ‘globular’ fields are observed. In comparison, the occlusive model robustly infers high proportions and can match the experimentally observed high proportions of ‘globular’ fields well. Our computational study, therefore, suggests that ‘globular’ fields may be evidence for an optimal encoding of visual occlusions in primary visual cortex.  相似文献   

9.
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package.  相似文献   

10.
Cortical networks show a large heterogeneity of neuronal properties. However, traditional coding models have focused on homogeneous populations of excitatory and inhibitory neurons. Here, we analytically derive a class of recurrent networks of spiking neurons that close to optimally track a continuously varying input online, based on two assumptions: 1) every spike is decoded linearly and 2) the network aims to reduce the mean-squared error between the input and the estimate. From this we derive a class of predictive coding networks, that unifies encoding and decoding and in which we can investigate the difference between homogeneous networks and heterogeneous networks, in which each neurons represents different features and has different spike-generating properties. We find that in this framework, ‘type 1’ and ‘type 2’ neurons arise naturally and networks consisting of a heterogeneous population of different neuron types are both more efficient and more robust against correlated noise. We make two experimental predictions: 1) we predict that integrators show strong correlations with other integrators and resonators are correlated with resonators, whereas the correlations are much weaker between neurons with different coding properties and 2) that ‘type 2’ neurons are more coherent with the overall network activity than ‘type 1’ neurons.  相似文献   

11.
In the last decade, many papers highlighted that the histone variant H2AX and its phosphorylation on Ser 139 (γH2AX) cannot be simply considered a specific DNA double-strand-break (DSB) marker with a role restricted to the DNA damage response, but rather as a ‘protagonist’ in different scenarios. This review will present and discuss an up-to-date view regarding the ‘non-canonical’ H2AX roles, focusing in particular on possible functional and structural parts in contexts different from the canonical DNA DSB response. We will present aspects concerning sex chromosome inactivation in male germ cells, X inactivation in female somatic cells and mitosis, but will also focus on the more recent studies regarding embryonic and neural stem cell development, asymmetric sister chromosome segregation in stem cells and cellular senescence maintenance. We will discuss whether in these new contexts there might be a relation with the canonical DNA DSB signalling function that could justify γH2AX formation. The authors will emphasize that, just as H2AX phosphorylation signals chromatin alteration and serves the canonical function of recruiting DSB repair factors, so the modification of H2AX in contexts other than the DNA damage response may contribute towards creating a specific chromatin structure frame allowing ‘non-canonical’ functions to be carried out in different cell types.  相似文献   

12.

Background

The HIV cascade of care (cascade) is a comprehensive tool which identifies attrition along the HIV care continuum. We executed analyses to explicate heterogeneity in the cascade across key strata, as well as identify predictors of attrition across stages of the cascade.

Methods

Using linked individual-level data for the population of HIV-positive individuals in BC, we considered the 2011 calendar year, including individuals diagnosed at least 6 months prior, and excluding individuals that died or were lost to follow-up before January 1st, 2011. We defined five stages in the cascade framework: HIV ‘diagnosed’, ‘linked’ to care, ‘retained’ in care, ‘on HAART’ and virologically ‘suppressed’. We stratified the cascade by sex, age, risk category, and regional health authority. Finally, multiple logistic regression models were built to predict attrition across each stage of the cascade, adjusting for stratification variables.

Results

We identified 7621 HIV diagnosed individuals during the study period; 80% were male and 5% were <30, 17% 30–39, 37% 40–49 and 40% were ≥50 years. Of these, 32% were MSM, 28% IDU, 8% MSM/IDU, 12% heterosexual, and 20% other. Overall, 85% of individuals ‘on HAART’ were ‘suppressed’; however, this proportion ranged from 60%–93% in our various stratifications. Most individuals, in all subgroups, were lost between the stages: ‘linked’ to ‘retained’ and ‘on HAART’ to ‘suppressed’. Subgroups with the highest attrition between these stages included females and individuals <30 years (regardless of transmission risk group). IDUs experienced the greatest attrition of all subgroups. Logistic regression results found extensive statistically significant heterogeneity in attrition across the cascade between subgroups and regional health authorities.

Conclusions

We found that extensive heterogeneity in attrition existed across subgroups and regional health authorities along the HIV cascade of care in B.C., Canada. Our results provide critical information to optimize engagement in care and health service delivery.  相似文献   

13.
14.
Decoding the complexity of multicellular organisms requires analytical procedures to overcome the limitations of averaged measurements of cell populations, which obscure inherent cell-cell heterogeneity and restrict the ability to distinguish between the responses of individual cells within a sample. For example, defining the timing, magnitude and the coordination of cytokine responses in single cells is critical for understanding the development of effective immunity. While approaches to measure gene expression from single cells have been reported, the absolute performance of these techniques has been difficult to assess, which likely has limited their wider application. We describe a straightforward method for simultaneously measuring the expression of multiple genes in a multitude of single-cell samples using flow cytometry, parallel cDNA synthesis, and quantification by real-time PCR. We thoroughly assess the performance of the technique using mRNA and DNA standards and cell samples, and demonstrate a detection sensitivity of ∼30 mRNA molecules per cell, and a fractional error of 15%. Using this method, we expose unexpected heterogeneity in the expression of 5 immune-related genes in sets of single macrophages activated by different microbial stimuli. Further, our analyses reveal that the expression of one ‘pro-inflammatory’ cytokine is not predictive of the expression of another ‘pro-inflammatory’ cytokine within the same cell. These findings demonstrate that single-cell approaches are essential for studying coordinated gene expression in cell populations, and this generic and easy-to-use quantitative method is applicable in other areas in biology aimed at understanding the regulation of cellular responses.  相似文献   

15.
High-density Integrated Linkage Map Based on SSR Markers in Soybean   总被引:2,自引:0,他引:2  
A well-saturated molecular linkage map is a prerequisite for modern plant breeding. Several genetic maps have been developed for soybean with various types of molecular markers. Simple sequence repeats (SSRs) are single-locus markers with high allelic variation and are widely applicable to different genotypes. We have now mapped 1810 SSR or sequence-tagged site markers in one or more of three recombinant inbred populations of soybean (the US cultivar ‘Jack’ × the Japanese cultivar ‘Fukuyutaka’, the Chinese cultivar ‘Peking’ × the Japanese cultivar ‘Akita’, and the Japanese cultivar ‘Misuzudaizu’ × the Chinese breeding line ‘Moshidou Gong 503’) and have aligned these markers with the 20 consensus linkage groups (LGs). The total length of the integrated linkage map was 2442.9 cM, and the average number of molecular markers was 90.5 (range of 70–114) for the 20 LGs. We examined allelic diversity for 1238 of the SSR markers among 23 soybean cultivars or lines and a wild accession. The number of alleles per locus ranged from 2 to 7, with an average of 2.8. Our high-density linkage map should facilitate ongoing and future genomic research such as analysis of quantitative trait loci and positional cloning in addition to marker-assisted selection in soybean breeding.Key words: EST-derived SSR marker, integrated linkage map, microsatellite marker, polymorphism information content  相似文献   

16.
Time course ‘omics’ experiments are becoming increasingly important to study system-wide dynamic regulation. Despite their high information content, analysis remains challenging. ‘Omics’ technologies capture quantitative measurements on tens of thousands of molecules. Therefore, in a time course ‘omics’ experiment molecules are measured for multiple subjects over multiple time points. This results in a large, high-dimensional dataset, which requires computationally efficient approaches for statistical analysis. Moreover, methods need to be able to handle missing values and various levels of noise. We present a novel, robust and powerful framework to analyze time course ‘omics’ data that consists of three stages: quality assessment and filtering, profile modelling, and analysis. The first step consists of removing molecules for which expression or abundance is highly variable over time. The second step models each molecular expression profile in a linear mixed model framework which takes into account subject-specific variability. The best model is selected through a serial model selection approach and results in dimension reduction of the time course data. The final step includes two types of analysis of the modelled trajectories, namely, clustering analysis to identify groups of correlated profiles over time, and differential expression analysis to identify profiles which differ over time and/or between treatment groups. Through simulation studies we demonstrate the high sensitivity and specificity of our approach for differential expression analysis. We then illustrate how our framework can bring novel insights on two time course ‘omics’ studies in breast cancer and kidney rejection. The methods are publicly available, implemented in the R CRAN package lmms.  相似文献   

17.
Envoplakin, periplakin and desmoplakin are cytoskeletal proteins that provide structural integrity within the skin and heart by resisting shear forces. Here we reveal the nature of unique hinges within their plakin domains that provides divergent degrees of flexibility between rigid long and short arms composed of spectrin repeats. The range of mobility of the two arms about the hinge is revealed by applying the ensemble optimization method to small-angle X-ray scattering data. Envoplakin and periplakin adopt ‘L’ shaped conformations exhibiting a ‘helicopter propeller’-like mobility about the hinge. By contrast desmoplakin exhibits essentially unrestricted mobility by ‘jack-knifing’ about the hinge. Thus the diversity of molecular jointing that can occur about plakin hinges includes ‘L’ shaped bends, ‘U’ turns and fully extended ‘I’ orientations between rigid blocks of spectrin repeats. This establishes specialised hinges in plakin domains as a key source of flexibility that may allow sweeping of cellular spaces during assembly of cellular structures and could impart adaptability, so preventing irreversible damage to desmosomes and the cell cytoskeleton upon exposure to mechanical stress.  相似文献   

18.
Cellular asymmetry plays a major role in the ageing and evolution of multicellular organisms. However, it remains unknown how the cell distinguishes ‘old’ from ‘new’ and whether asymmetry is an attribute of highly specialized cells or a feature inherent in all cells. Here, we investigate the segregation of three asymmetric features: old and new DNA, the spindle pole body (SPB, the centrosome analogue) and the old and new cell ends, using a simple unicellular eukaryote, Schizosaccharomyces pombe. To our knowledge, this is the first study exploring three asymmetric features in the same cells. We show that of the three chromosomes of S. pombe, chromosome I containing the new parental strand, preferentially segregated to the cells inheriting the old cell end. Furthermore, the new SPB also preferentially segregated to the cells inheriting the old end. Our results suggest that the ability to distinguish ‘old’ from ‘new’ and to segregate DNA asymmetrically are inherent features even in simple unicellular eukaryotes.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号