[Dmbu-l] A Generic Framework for Efficient and Effective Subsequence Retrieval [Thursday 04/26 @ 12:00 pm in MCS 148]
cmav at bu.edu
Wed Apr 25 15:06:25 EDT 2012
Haohan Zhu will be our speaker for this Thursday 4/26, 12pm in MCS 148
*Data Mining and Database Group Seminar*
*A Generic Framework for Efficient and Effective Subsequence Retrieval*
*Speaker: *Haohan Zhu,* *Boston University
This paper proposes a general framework for matching similar subsequences
in both time series and string databases. The matching results are pairs of
query subsequences and database subsequences. The framework finds all
possible pairs of similar subsequences if the distance measure satisfies
the "consistency" property, which is a property introduced in this paper.
We show that most popular distance functions, such as the Euclidean
distance, DTW, ERP, the Frechet distance for time series, and the Hamming
distance and Levenshtein distance for strings, are all "consistent". We
also propose an index structure for metric spaces named "reference net".
The reference net is an unsupervised index which costs O(n) space, where n
is the size of the dataset. The experiments demonstrate the ability of our
method to improve retrieval performance when combined with diverse distance
measures. The experiments also illustrate that the reference net has a
better running time than cover trees and the maximum variance method, while
all three methods have similar costs in terms of space.
Joint work with George Kollios and Vassilis Athitsos
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dmbu-l