共查询到20条相似文献,搜索用时 0 毫秒
1.
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar''s hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. 相似文献
2.
Robert F. Woolson William R. Clarke 《Biometrical journal. Biometrische Zeitschrift》1987,29(8):937-952
Longitudinal studies are rarely complete due to attrition, mistimed visits and observations missing at random. When the data are missing at random it is possible to estimate the primary location parameters of interest by constructing a modification of Zellner's (1962) seemingly unrelated regression estimator. Such a procedure is developed in this paper and is applied to a longitudinal study of coronary risk factors in children. The method consists of two stages in which the covariance matrix is estimated at the first stage. Using the estimated covariance matrix a generalized least squares estimator of the regression parameter vector is then determined at the second stage. Limitations of the procedure are also discussed. 相似文献
3.
4.
Hong Li Gustavo Glusman Hao Hu Shankaracharya Juan Caballero Robert Hubley David Witherspoon Stephen L. Guthery Denise E. Mauldin Lynn B. Jorde Leroy Hood Jared C. Roach Chad D. Huff 《PLoS genetics》2014,10(1)
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data. 相似文献
5.
《Somatosensory & motor research》2013,30(2):183-192
The evoked potential (EP) over primary somatosensory cortex (SI) was monitored before and after a complete lesion of the primate dorsal column (DC) pathway on one side. The EP was elicited by electrocutaneous or mechanical stimulation of either foot, and was recorded from the contralateral cortical surface for periods of up to 3 months after the lesion. The amplitudes of the three major peaks (P20, N50, and P90) of the cortical somatosensory EP were significantly reduced following interruption of the contralateral DC. Over weeks following the lesion, there was a significant increase in amplitude of the P90 component of the EP that was not evident in the other peaks. The postlesion increases in P90 amplitude were correlated with improved performance on a task that required grasping with either foot, suggesting that behavioral recovery from a DC lesion results in part from neural plasticity, as opposed to a simple relearning of the task. 相似文献
6.
High-throughput shotgun sequence data make it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual’s genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an individual are limited to low coverage, preventing the confident calling of genotypes necessary to directly count the proportion of heterozygous sites. Here, we present a method for estimating an individual’s genome-wide rate of heterozygosity from low-coverage sequence data, without an intermediate step that calls genotypes. Our method jointly learns the shared allele distribution between the individual and a panel of other individuals, together with the sequencing error distributions and the reference bias. We show our method works well, first, by its performance on simulated sequence data and, second, on real sequence data where we obtain estimates using low-coverage data consistent with those from higher coverage. We apply our method to obtain estimates of the rate of heterozygosity for 11 humans from diverse worldwide populations and through this analysis reveal the complex dependency of local sequencing coverage on the true underlying heterozygosity, which complicates the estimation of heterozygosity from sequence data. We show how we can use filters to correct for the confounding arising from sequencing depth. We find in practice that ratios of heterozygosity are more interpretable than absolute estimates and show that we obtain excellent conformity of ratios of heterozygosity with previous estimates from higher-coverage data. 相似文献
7.
Endornaviruses have large double-stranded RNA (dsRNA) genomes that carry a single open reading frame (ORF). Here we report the complete genome of a novel endornavirus, assembled from next-generation sequence data generated from Vitis vinifera-extracted dsRNA. Two different fungal hosts have been identified for this virus, suggesting that horizontal transmission of the virus is possible. 相似文献
8.
Simon P. Skinner Gael Radou Roman Tuma Jeanine J. Houwing-Duistermaat Emanuele Paci 《Biophysical journal》2019,116(7):1194-1203
Hydrogen/deuterium exchange monitored by mass spectrometry is a promising technique for rapidly fingerprinting structural and dynamical properties of proteins. The time-dependent change in the mass of any fragment of the polypeptide chain depends uniquely on the rate of exchange of its amide hydrogens, but determining the latter from the former is generally not possible. Here, we show that, if time-resolved measurements are available for a number of overlapping peptides that cover the whole sequence, rate constants for each amide hydrogen exchange (or equivalently, their protection factors) may be extracted and the uniqueness of the solutions obtained depending on the degree of peptide overlap. However, in most cases, the solution is not unique, and multiple alternatives must be considered. We provide a statistical method that clusters the solutions to further reduce their number. Such analysis always provides meaningful constraints on protection factors and can be used in situations in which obtaining more refined experimental data is impractical. It also provides a systematic way to improve data collection strategies to obtain unambiguous information at single-residue level (e.g., for assessing protein structure predictions at atomistic level). 相似文献
9.
Hywel Bowden Jones 《Genomics》1997,43(3):258
Radiation hybrid mapping has become an established tool for building physical maps. It represents a powerful way of constructing YAC contigs and high-resolution maps for positional cloning experiments. Ideally, radiation hybrids should not only provide support for the true order of the markers, but also accurate estimates of the physical distances between them. Statistical analysis of radiation hybrids has proved difficult because of the number of parameters (representing the fragment retention probabilities) that must be estimated, and simplifying assumptions are needed to analyze large numbers of markers simultaneously. The ramifications of these assumptions for the calculation of physical distances are investigated. A simple two-locus model is presented to demonstrate that variation in marker retention can lead to distortions in the estimates of distance. Multilocus simulations show that, when marker retention is constant across the chromosome, good estimates of physical distance can be derived using simple models of retention. However, further simulations exploring variable retention schemes demonstrate that significant errors in the estimates of map distances can occur. Ways of minimizing these distortions are discussed. 相似文献
10.
The paper is concerned with methods for the estimation of the coalescence time (time since the most recent common ancestor) of a sample of intraspecies DNA sequences. The methods take advantage of prior knowledge of population demography, in addition to the molecular data. While some theoretical results are presented, a central focus is on computational methods. These methods are easy to implement, and, since explicit formulae tend to be either unavailable or unilluminating, they are also more useful and more informative in most applications. Extensions are presented that allow for the effects of uncertainty in our knowledge of population size and mutation rates, for variability in population sizes, for regions of different mutation rate, and for inference concerning the coalescence time of the entire population. The methods are illustrated using recent data from the human Y chromosome. 相似文献
11.
12.
Matched case-control studies often include pairs with incomplete exposure information. This work presents and compares two estimators for the odds ratio that can be used when the exposures of some of the cases and controls are missing. A simulation study shows that the estimator that uses the marginal exposure frequencies is usually more efficient than the estimator based on discordant pairs. 相似文献
13.
14.
The main condition of completing the process of adaptation of the body to the effect of an external factor is the return of the homeostatic system parameters to their initial levels or their stabilization at a new level. The article considers the state of incomplete adaptation (IA) based on the process of the stabilization of systemic reactions (respiration and blood circulation) on repeated exposure to extreme environmental factors (hypoxia and cold) associated with the excitation of the central regulatory mechanisms of the respiratory center system performing a compensatory–protective function. It is postulated that a change in the afferent information flows (the thresholds of excitation and reactivity of the peripheral receptor systems) forms the basis of IA. The IA state is supposed to persist for an indefinitely long period of time due to insufficient functional reserves and to be the cause of psychosomatic pathology. 相似文献
15.
Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions 总被引:4,自引:0,他引:4
The character compatibility approach, which removes all homoplasic characters and involves finding the largest clique of compatible
characters in a dataset, in principle, provides a powerful means for obtaining correct topology in difficult to resolve cases.
However, the usefulness of this approach to generalized molecular sequence data for phylogeny determination has not been studied
in the past. We have used this approach to determine the topology of 23 proteobacterial species (6 each of α-, β- and γ-,
3 δ-, and 2 ε-proteobacteria) using sequence data for 10 conserved proteins (Hsp60, Hsp70, EF-Tu, EF-G, alanyl-tRNA synthetase,
RecA, GyrA, GyrB, RpoB and RpoC). All sites in the sequence alignments of these proteins where only two amino acids were found,
with each amino acid present in at least two species, were selected. Mutual compatibility determination on these binary state
sites was carried out by two means. In one case, all of these sites were combined into a large dataset (Set A; 957 characters)
prior to compatibility analysis. In the second case, compatibility analysis was carried out on characters from individual
proteins and all compatible sites were combined into a large dataset (Set B; 398 characters) for further studies. Upon compatibility
analyses, the largest cliques that were obtained from Sets A and B consisted of 337 and 323 compatible characters, respectively.
In these cliques, all proteobacterial subgroups were clearly distinguished and branching orders of most of the species were
also resolved. The ε-proteobacteria exhibited the earliest branching, whereas the β- and γ-subgroups were found to have emerged
last. The relative placement of the α- and δ-subgroups, however, was not resolved. The topology of these species was also
determined based on 16S rRNA sequences and a concatenated dataset of sequences for all 10 proteins by means of neighbor-joining,
maximum likelihood, and maximum parsimony methods. In the protein trees, all proteobacterial groups were reliably resolved
and they branched in the following order: (ε(δ(α(β,γ)))). However, in the rRNA trees, the γ- and β-subgroups exhibited polyphyletic
branching and many internal nodes were not resolved. These results indicate that the character compatibility analysis using
generalized molecular sequence data provides a powerful means for evolutionary studies. Based on molecular sequences, it should
be possible to obtain very large datasets of compatible characters that should prove very helpful in clarifying difficult
to resolve phylogenetic relationships.
[Reviewing Editor: Dr. Yves Van de Peer] 相似文献
16.
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software. 相似文献
17.
18.
19.
20.
A. N. Pettitt 《Biometrical journal. Biometrische Zeitschrift》1983,25(4):361-372
An approximation for the probability of the rank order of k independent random variables is used to analyze incomplete ranked data, consisting of rankings of objects by judges. The approximation allows linear models to be fitted to the data so that differences amongst groups of judges can be investigated, and factorial models applied to investigate differences amongst the objects. 相似文献