首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The emergence of large, fine-grained mobility datasets offers significant opportunities for the development and application of new methodologies for transportation analysis. In this paper, the link between routing behaviour and traffic patterns in urban areas is examined, introducing a method to derive estimates of traffic patterns from a large collection of fine-grained routing data. Using this dataset, the interconnectivity between road network junctions is extracted in the form of a Markov chain. This representation encodes the probability of the successive usage of adjacent road junctions, encoding routes as flows between decision points rather than flows along road segments. This network of functional interactions is then integrated within a modified Markov chain Monte Carlo (MCMC) framework, adapted for the estimation of urban traffic patterns. As part of this approach, the data-derived links between major junctions influence the movement of directed random walks executed across the network to model origin-destination journeys. The simulation process yields estimates of traffic distribution across the road network. The paper presents an implementation of the modified MCMC approach for London, United Kingdom, building an MCMC model based on a dataset of nearly 700000 minicab routes. Validation of the approach clarifies how each element of the MCMC framework contributes to junction prediction performance, and finds promising results in relation to the estimation of junction choice and minicab traffic distribution. The paper concludes by summarising the potential for the development and extension of this approach to the wider urban modelling domain.  相似文献   

2.
3.
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.  相似文献   

4.
Measuring gene expression over time can provide important insights into basic cellular processes. Identifying groups of genes with similar expression time-courses is a crucial first step in the analysis. As biologically relevant groups frequently overlap, due to genes having several distinct roles in those cellular processes, this is a difficult problem for classical clustering methods. We use a mixture model to circumvent this principal problem, with hidden Markov models (HMMs) as effective and flexible components. We show that the ensuing estimation problem can be addressed with additional labeled data partially supervised learning of mixtures - through a modification of the expectation-maximization (EM) algorithm. Good starting points for the mixture estimation are obtained through a modification to Bayesian model merging, which allows us to learn a collection of initial HMMs. We infer groups from mixtures with a simple information-theoretic decoding heuristic, which quantifies the level of ambiguity in group assignment. The effectiveness is shown with high-quality annotation data. As the HMMs we propose capture asynchronous behavior by design, the groups we find are also asynchronous. Synchronous subgroups are obtained from a novel algorithm based on Viterbi paths. We show the suitability of our HMM mixture approach on biological and simulated data and through the favorable comparison with previous approaches. A software implementing the method is freely available under the GPL from http://ghmm.org/gql.  相似文献   

5.
Mitigating traffic congestion on urban roads, with paramount importance in urban development and reduction of energy consumption and air pollution, depends on our ability to foresee road usage and traffic conditions pertaining to the collective behavior of drivers, raising a significant question: to what degree is road traffic predictable in urban areas? Here we rely on the precise records of daily vehicle mobility based on GPS positioning device installed in taxis to uncover the potential daily predictability of urban traffic patterns. Using the mapping from the degree of congestion on roads into a time series of symbols and measuring its entropy, we find a relatively high daily predictability of traffic conditions despite the absence of any priori knowledge of drivers'' origins and destinations and quite different travel patterns between weekdays and weekends. Moreover, we find a counterintuitive dependence of the predictability on travel speed: the road segment associated with intermediate average travel speed is most difficult to be predicted. We also explore the possibility of recovering the traffic condition of an inaccessible segment from its adjacent segments with respect to limited observability. The highly predictable traffic patterns in spite of the heterogeneity of drivers'' behaviors and the variability of their origins and destinations enables development of accurate predictive models for eventually devising practical strategies to mitigate urban road congestion.  相似文献   

6.
The detection of genetic segments of Identical by Descent (IBD) in Genome-Wide Association Studies has proven successful in pinpointing genetic relatedness between reportedly unrelated individuals and leveraging such regions to shortlist candidate genes. These techniques depend on high-density genotyping arrays and their effectiveness in diverse sequence data is largely unknown. Due to decreasing costs and increasing effectiveness of high throughput techniques for whole-exome sequencing, an influx of exome sequencing data has become available. Studies using exomes and IBD-detection methods within known pedigrees have shown that IBD can be useful in finding hidden genetic candidates where known relatives are available. We set out to examine the viability of using IBD-detection in whole exome sequencing data in population-wide studies. In doing so, we extend GERMLINE, a method to detect IBD from exome sequencing data by finding small slices of matching alleles between pairs of individuals and extending them into full IBD segments. This algorithm allows for efficient population-wide detection in dense data. We apply this algorithm to a cohort of Crohn''s Disease cases where whole-exome and GWAS array data is available. We confirm that GWAS-based detected segments are highly accurate and predictive of underlying shared variation. Where segments inferred from GWAS are expected to be of high accuracy, we compare exome-based detection accuracy of multiple detection strategies. We find detection accuracy to be prohibitively low in all assessments, both in terms of segment sensitivity and specificity. Even after isolating relatively long segments beyond 10cM, exome-based detection continued to offer poor specificity/sensitivity tradeoffs. We hypothesize that the variable coverage and platform biases of exome capture account for this decreased accuracy and look toward whole genome sequencing data as a higher quality source for detecting population-wide IBD.  相似文献   

7.
Road traffic injuries are a major cause of preventable death in sub-Saharan Africa. Accurate epidemiologic data are scarce and under-reporting from primary data sources is common. Our objectives were to estimate the incidence of road traffic deaths in Malawi using capture-recapture statistical analysis and determine what future efforts will best improve upon this estimate. Our capture-recapture model combined primary data from both police and hospital-based registries over a one year period (July 2008 to June 2009). The mortality incidences from the primary data sources were 0.075 and 0.051 deaths/1000 person-years, respectively. Using capture-recapture analysis, the combined incidence of road traffic deaths ranged 0.192-0.209 deaths/1000 person-years. Additionally, police data were more likely to include victims who were male, drivers or pedestrians, and victims from incidents with greater than one vehicle involved. We concluded that capture-recapture analysis is a good tool to estimate the incidence of road traffic deaths, and that capture-recapture analysis overcomes limitations of incomplete data sources. The World Health Organization estimated incidence of road traffic deaths for Malawi utilizing a binomial regression model and survey data and found a similar estimate despite strikingly different methods, suggesting both approaches are valid. Further research should seek to improve capture-recapture data through utilization of more than two data sources and improving accuracy of matches by minimizing missing data, application of geographic information systems, and use of names and civil registration numbers if available.  相似文献   

8.
基于图像融合与混合像元分解的城市植被盖度提取   总被引:1,自引:0,他引:1  
刘勇  岳文泽 《生态学报》2010,30(1):93-99
城市植被盖度提取对于开展城市绿色空间保护和城市规划具有重要意义。随着遥感技术的发展,混合像元分解模型被广泛用于从中等分辨率的多光谱影像提取城市植被盖度,但较低的影像空间分辨率限制了该模型的应用领域。为此,以杭州市为例,首先引入Gram-Schmidt(GS)方法对Landsat ETM+的多光谱波段和全色波段进行融合,再通过混合像元分解模型从ETM+融合影像上提取城市植被盖度,最后利用SPOT影像进行精度检验。结果发现,采用GS方法对影像进行融合后,标准差、信息熵、平均梯度提高,相对偏差小于0.07,说明在保留多光谱信息的基础上提高了其空间分辨率。与SPOT影像相比,在融合影像上75%以上样本的植被盖度值相似,误差较大的区域是市区植被特别稀疏或茂盛的像元。与源影像相比,从融合影像上提取的植被盖度的均方根误差和系统误差降低了0.01。该方法在降低城市植被监测成本、提高监测精度方面具有潜力。  相似文献   

9.
Recent advances in massively parallel sequencing technology have created new opportunities to probe the hidden world of microbes. Taxonomy-independent clustering of the 16S rRNA gene is usually the first step in analyzing microbial communities. Dozens of algorithms have been developed in the last decade, but a comprehensive benchmark study is lacking. Here, we survey algorithms currently used by microbiologists, and compare seven representative methods in a large-scale benchmark study that addresses several issues of concern. A new experimental protocol was developed that allows different algorithms to be compared using the same platform, and several criteria were introduced to facilitate a quantitative evaluation of the clustering performance of each algorithm. We found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies. The benchmark study identified our recently developed ESPRIT-Tree, a fast implementation of the average linkage-based hierarchical clustering algorithm, as one of the best algorithms available in terms of computational efficiency and clustering accuracy.  相似文献   

10.
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.  相似文献   

11.
胡姝婧  胡德勇  赵文吉 《生态学报》2010,30(4):1018-1024
植被是城市生态系统的重要组成部分,及时获取植被覆盖信息对城市生态环境监测具有重要意义。利用中分辨率Landsat TM遥感数据,采用线性光谱分解模型(LSMM)开展城市植被覆盖度提取;同时,通过改进训练样本选择方式,在最小噪声变换(MNF)、像元纯净指数分析(PPI)、N维可视化分析基础上得到端元样本,再运用模糊C-均值(FCM)获取植被覆盖度;最后以高分辨率SPOT5遥感数据对两种方式的提取结果进行精度检验。结果显示,基于LSMM和改进的FCM提取的城市植被覆盖度与检验数据相关系数分别为0.8252和0.9381,后者可以较好地处理其他要素的非线性影响,因而具有较高精度。  相似文献   

12.
13.
MOTIVATION: We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector. RESULTS: We present experimental results demonstrating that our method can better recover functionally-relevant clusterings in mRNA expression data than standard clustering techniques, including hierarchical agglomerative clustering, and we show that by computing probabilities instead of point estimates, our method avoids converging to poor solutions.  相似文献   

14.
Here, we propose BpMatch: an algorithm that, working on a suitably modified suffix-tree data structure, is able to compute, in a fast and efficient way, the coverage of a source sequence S on a target sequence T, by taking into account direct and reverse segments, eventually overlapped. Using BpMatch, the operator should define a priori, the minimum length l of a segment and the minimum number of occurrences minRep, so that only segments longer than l and having a number of occurrences greater than minRep are considered to be significant. BpMatch outputs the significant segments found and the computed segment-based distance. On the worst case, assuming the alphabet dimension d is a constant, the time required by BpMatch to calculate the coverage is O(l2n). On the average, by setting l ≥ 2 log(d)(n), the time required to calculate the coverage is only O(n). BpMatch, thanks to the minRep parameter, can also be used to perform a self-covering: to cover a sequence using segments coming from itself, by avoiding the trivial solution of having a single segment coincident with the whole sequence. The result of the self-covering approach is a spectral representation of the repeats contained in the sequence. BpMatch is freely available on: www.sourceforge.net/projects/bpmatch.  相似文献   

15.
BackgroundData from the Chinese police service suggest substantial reductions in road traffic injuries since 2002, but critics have questioned the accuracy of those data, especially considering conflicting data reported by the health department.MethodsTo address the gap between police and health department data and to determine which may be more accurate, we conducted a simulation study based on the modified Smeed equation, which delineates a non-linear relation between road traffic mortality and the level of motorization in a country or region. Our goal was to simulate trends in road traffic mortality in China and compare performances in road traffic safety management between China and 13 other countries.ResultsChinese police data indicate a peak in road traffic mortalities in 2002 and a significant and a gradual decrease in population-based road traffic mortality since 2002. Health department data show the road traffic mortality peaked in 2012. In addition, police data suggest China’s road traffic mortality peaked at a much lower motorization level (0.061 motor vehicles per person) in 2002, followed by a reduction in mortality to a level comparable to that of developed countries. Simulation results based on health department data suggest high road traffic mortality, with a mortality peak in 2012 at a moderate motorization level (0.174 motor vehicles per person). Comparisons to the other 13 countries suggest the health data from China may be more valid than the police data.ConclusionOur simulation data indicate China is still at a stage of high road traffic mortality, as suggested by health data, rather than a stage of low road traffic mortality, as suggested by police data. More efforts are needed to integrate safety into road design, improve road traffic management, improve data quality, and alter unsafe behaviors of pedestrians, drivers and passengers in China.  相似文献   

16.
Road mortality of freshwater turtles can be high enough to imperil populations near roads, thus there is a need to efficiently and accurately locate regions of excessive road-kill along road networks for mitigation. Weekly over 2?years, we drove a 160?km highway circuit in northeastern New York State, USA and recorded the location of all detected road-kill of three freshwater turtle species (Chelydra serpentina, Chrysemys picta, Emydoidea blandingii). We then analyzed the spatial dispersion of road-kill and the road and landscape features associated with road-kill locations. Road-kill was most prevalent at a limited number of short road segments, termed ‘hotspots’. The locations of hotspots, as indicated by kernel density analysis, and the peak spatial extent of hotspots (250?m), as indicated by Ripley’s?K, corresponded to the locations and average lengths of causeways (road segments with wetlands within 100?m on both sides). Hotspots were located at causeways that were greater than 200?m length and characterized by high traffic volumes, close proximity to water, and high forest coverage. We conclude that freshwater turtle road mortality is spatially aggregated at short, severe hotspots, and hotspot locations can be predicted when the locations of wetlands, traffic volumes, and the land-uses bordering roads are known. Hotspot models using these predictors can locate sites along a road network that are the most promising for mitigation to reduce excessive road mortality and maintain connectivity.  相似文献   

17.
In low and middle income countries road traffic injuries are commonly under-reported. This problem is significantly higher among those less severely injured road users. The objective of this study was to determine the incidence and the level of ascertainment of road traffic injuries and deaths by traffic police and hospital registry. In this study two-sample capture-recapture method was applied using data from traffic police and hospital injury surveillance, through June 2012 to May 2013. The study was conducted on one of the busiest highways in Ethiopia, the Addis Ababa – Hawassa highway. Primary data were collected by accident investigators and hospital emergency nurses using a structured checklist. Four matching variables; name of the victim, sex, place and time of the accidents was used to get the matched cases. During the study period the police independently reported 224 deaths and 446 injuries/billion vehicle kilometer while hospitals reported 123 deaths and 1,046 injuries/billion vehicle kilometer. Both sources in common captured 73 deaths and 248 injuries/billion vehicle kilometer. Taking the two data sources into consideration, the capture-recapture model estimated the incidence of deaths and injuries ranged 368–390 and 1,869–1,895 per billion vehicle kilometer, respectively. The police source captured 57.4%–60.9% of deaths and 23.5%–23.9% of injuries while the hospital sources captured 31.5%–33.4% of deaths and 55.2%–56% of injuries. Deaths and injuries among females, younger age victims, cyclists/motorcyclists and pedestrians were under-reported by traffic police. In conclusion neither of the two sources independently provided accurate coverage of road traffic incident related deaths and injuries. Strengthening both systems is necessary to obtain accurate information on road accidents and human causalities.  相似文献   

18.
In construction of smart city, numerous vehicles’ trajectory data are produced by Global Positioning System (GPS) to track their real time location. When these GPS data are processed by map matching, results can be used to support a large number of ITS applications such as real time road condition calculation, inspection of traffic event and emergency treatment. However, as the fast explosive growth of monitored vehicle number, massive GPS data proposes overwhelming challenges for map matching. Consequently, traditional map matching algorithms can hardly satisfy high demands for matching speed and accuracy. Therefore, a real time map matching algorithm for numerous GPS data is proposed to guarantee high matching accuracy and matching efficiency. Meanwhile, it can meet demands of GPS data processing required by the monitor of numerous vehicles within the city. Main contributions of the method are: (1) A Kalman filter based correcting algorithm is proposed to improve the matching accuracy of the traditional topological algorithm on the complicated road sections such as intersections and parallel roads. (2) Based on the Spark streaming framework, the serial map-matching algorithm is converted into a parallelized map-matching algorithm, which significantly improves the processing efficiency of the map matching. (3) A gridding method being applicable to the parallelized algorithm was proposed by the paper. The GPS data in the same grid were allocated to the same computing unit to improve the efficiency of the parallelized computation. Experimental results show that the matching accuracy of the algorithm demonstrated by the paper is increased by 10%; the matching efficiency is 25% higher than same amount of stand-alone computers. A cluster of 15 computers that operates the proposed algorithm is capable for the real time map matching for GPS data produced by 800 thousand vehicles, which can effectively and extensively support the lastingly increased demand for processing numerous GPS data.  相似文献   

19.
SUMMARY: SScore is an R package that facilitates the comparison of gene expression between Affymetrix GeneChips using the S-score algorithm. The S-score algorithm uses probe level data directly to assess differences in gene expression, without requiring a preliminary separate step of probe set expression summary estimation. Therefore, the algorithm avoids introduction of error associated with the expression summary estimation process and has been demonstrated to improve the accuracy of identifying differentially expressed genes. The S-score produces accurate results even when few or no replicates are available. AVAILABILITY: The R package SScore is available from Bioconductor at http://www.bioconductor.org  相似文献   

20.
Bondell HD  Reich BJ 《Biometrics》2008,64(1):115-123
Summary .   Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号