申请同济大学工学硕士学位论文


基于数据挖掘的城市快速路

拥堵关联特征研究

 

培养单位:交通运输工程学院

一级学科:交通运输工程

二级学科:交通运输规划与管理

研 究 生:梁林林

指导教师:石小法 副教授

二○一二年三月

 

A dissertation submitted to

Tongji University in conformity with the requirements for

the degree of Master of Engineering

Congestion correlation analysis of

urban expressway based on data mining

 

School: School of Transportation Engineering

Discipline: Transportation Engineering

Major: Transport Planning and Management

Candidate: Liang Linlin

Supervisor: Associate Prof. Shi Xiaofa

March, 2012

 

摘  要

论文以上海市南北高架与内环高架快速路线圈检测数据为数据支撑,首先从数据完备性与异常数据比重两个角度,分析了原始数据的数据质量,从数据修复与数据规约两个方面,实现了原始数据的预处理;其次,在断面交通状态完备性分析的基础上,选取占有率、速度、单车平均占有率作为特征指标,基于状态划分理论,采用FCM聚类分析算法,实现了拥堵的判别;再次,基于关联规则挖掘理论,设计事务内与跨事务两种类型的交通拥堵关联规则的挖掘方法,利用统计学方法得到的断面拥堵相关性计算结果进行关联规则筛选,实现了断面拥堵之间强关联规则的挖掘;最后,分析了所挖掘关联规则的时空特征与网络特征,并设计基于挖掘关联规则的短时交通预测模型,实现了行驶车速与交通状态的短时预测。论文获得的主要结论如下:

一、拥堵在时间、空间两个方向上带状分布明显,呈现出拥堵在时间上的持续与空间上的扩散;将此类带状可定义为一个拥堵区段,发现研究路网范围内存在七个主要的拥堵区段。

二、拥堵相关性统计分析表明,快速路断面拥堵之间的相关性略强于独立性,其中约9%的断面与断面之间拥堵呈现出显著的相关性,且更趋向于拥堵→拥堵的正向相关。

三、共挖掘到1852对快速路断面之间的拥堵关联规则,其中同时关联340对,占挖掘样本的7.0%,跨时间(5min10min15min20min30min)关联1512对,平均占挖掘样本的6.3%

四、对于挖掘的关联规则,其前、后件之间的距离以0~3km为主,占46.8%;其前件或后件断面多属于七个主要拥堵区段,以同时拥堵关联规则分析,前件或后件含于七个主要拥堵区段的占80.2%;前后件均属七个主要拥堵区段的占44.7%,同一拥堵区段内部断面之间的占19.4%,不同拥堵区段断面之间的关联规则占25.3%

五、基于拥堵关联规则,考虑断面之间的关联关系进行短时车速预测,可以稳定提高高峰时段预测的精度;基于拥堵关联规则网络,可实现断面拥堵状态的短时预测,且精度较高,在误报率小于15%的情况下,可实现85%的拥堵预报。

关键词: 拥堵判别,拥堵关联规则,拥堵预报,数据挖掘

 

ABSTRACT

Nowadays, the developments of information technology and traffic information collection system make large amount and different kinds of continuous traffic data available to transportation planners, such as floating car data, intelligent card data, bus operation data, fixed detection data, license recognition data, mobile data and so on. All these data makes the analysis on characteristic of urban road traffic status possible.

In this dissertation, loop detection data on Shanghai North-South and Inner Ring expressway was used to mining congestion association characteristic of urban expressway. Firstly, the pretreatment of the raw data was realized through data quality analysis, data recovery and data standardization. Secondly, traffic state completeness for each road was checked to find out the target roads, on which traffic congestion has occurred during the history period. Then, based on traffic state classification theory, taking occupancy, speed and average vehicle occupancy as characteristic index, using FCM clustering algorithm, the congestion period was identified for each road. Thirdly, based on association rule mining theory, we designed the inner-transaction and inter-transaction association rule mining method for road traffic congestion. Then, to get strong congestion association rules, the rules were screened using road congestion correlation coefficients calculating by statistical methods. Lastly, the temporal-spatial and network characteristics of the mined association rules were analyzed. Meanwhile, short-term traffic prediction models based on congestion association rules was designed to achieve the short-term forecast of the speed and traffic conditions.

The main conclusions obtained in this dissertation are as follows. (1) The temporal-spatial banded distribution of traffic congestion is shown apparently, which means the duration in time and spread in space of road congestion. (2) It is shown in road congestion correlation coefficients that the correlation of expressway congestion is slightly stronger than the independence, and approximately nine percent shows a high degree of correlation. Also, positive correlation, which means that congestion on one road would result in congestion on the other road, is more favored than negative correlation. (3) Generally speaking, 1852 rules were mined, with 340 inner transaction rules and 1512 inter transaction rules, representing7.0% and 6.2% of the mining samples respectively. (4) For the mined association rules, the average distance between its predecessor and latter become much longer with the increasing of time interval between them. The majority relationship between the predecessor and the latter is up-downstream road sections, but far-away rules also appeared in the mining results. As far as the road sections involved, it can be seen as several groups generally, which are the up-downstream road sections of the key bottlenecks in reality. Rules are mainly about road sections in one group or between two groups. (5) The accuracy of short-term speed prediction in peak periods can be improved effectively by taking the association relationship between road sections into consideration. Based on the congestion association rules network, short-term traffic state forecasting can be realized with high precision, which can predict 85% of congestion with less than 15% mistaken reporting rate.

Key Words:  automatic congestion identification, congestion association rules, congestion forecasting, data mining