[cs-talks] Today 11am: Vatche Ishakian

Natali Ruchansky natalir at bu.edu
Fri Dec 4 09:05:31 EST 2015

*Data Seminar*

*Title*: Process Trace Clustering: A Heterogeneous Information Network
*Speaker*: Vatche Ishakian​ (IBM)​

*Friday, December 4, 2015 at 11am in MCS 148*

*Abstract*: Process mining is the task of extracting information from event
logs, such as ones generated from workflow management or enterprise
resource planning systems, in order to discover models of the underlying
processes, organizations, and products. As the event logs often contain a
variety of process executions, the discovered models can be complex and
difficult to comprehend. Trace clustering helps solve this problem by
splitting the event logs into smaller subsets and applying process
discovery algorithms on each subset, resulting in per-subset discovered
processes that are less complex and more accurate. However, the
state-of-the-art clustering techniques are limited: the similarity measures
are not process-aware and they do not scale well to high-dimensional event
logs. In this paper, we propose a conceptualization of process's event logs
as a heterogeneous information network, in order to capture the rich
semantic meaning, and thereby derive better process-specific features. In
addition, we propose SeqPathSim, a meta path-based similarity measure that
considers node sequences in the heterogeneous graph and results in better
clustering. We also introduce a new dimension reduction method that
combines event similarity with regularization by process model structure to
deal with event logs of high dimensionality. The experimental results show
that our proposed approach outperforms state-of-the-art trace clustering
approaches in both accuracy and structural complexity metrics.
