[Dmbu-l] BUCS HIC-DMM Seminar: Julia Stoyanovich on Making Interval-Based Clustering Rank-Aware [Wednesday 10/26 @ 12:00 pm in MCS 180] (fwd)

Evimaria Terzi evimaria at cs.bu.edu
Tue Oct 25 10:04:59 EDT 2011

Hi all, 

instead of the regular meeting this wed, we have a visitor who is giving 
a talk at the Hariri Institute.

Please come and attend,

"Everything should be made as simple as possible, but no simpler"
(A. Einstein)

---------- Forwarded message ----------
Date: Tue, 25 Oct 2011 09:32:32 -0400
From: BU CS Colloquium <bucscolloquium at gmail.com>
To:  <colloq-l at cs.bu.edu>
Subject: BUCS HIC-DMM Seminar: Julia Stoyanovich on Making Interval-Based
    Clustering Rank-Aware [Wednesday 10/26 @ 12:00 pm in MCS 180]

Boston University -- Computer Science Department

*Rafik B. Hariri Institute for Computing and Computational Science*
*& Engineering Data Management and Mining Seminar Series*
Wednesday October 26, 2011
12:00 - 1:00 pm
Hariri Institute Seminar, MCS 180

*Making Interval-Based Clustering Rank-Aware*
*Julia Soyanovich*
*University of Pennsylvania*


In online applications such as Yahoo! Personals and Trulia.com, users define
structured profiles in order to find potentially interesting matches.
Typically, profiles are evaluated against large datasets and produce
thousands of ranked matches. Highly ranked results tend to be homogeneous,
which hinders data exploration. For example, a dating website user who is
looking for a partner between 20 and 40 years old, and who sorts the matches
by income from higher to lower, will see a large number of matches in their
late 30s who hold an MBA degree and work in the financial industry, before
seeing any matches in different age groups and walks of life.  An
alternative to presenting results in a ranked list is to find clusters in
the result space, identified by a combination of attributes that correlate
with rank. Such clusters may describe matches between 35 and 40 with an MBA,
matches between 25 and 30 who work in the software industry, etc., allowing
for data exploration of ranked results. We refer to the problem of finding
such clusters as rank-aware interval-based clustering and argue that it is
not addressed by standard clustering algorithms. We formally define the
problem and, to solve it, propose a novel measure of locality, together with
a family of clustering quality measures appropriate for this application
scenario. These ingredients may be used by a variety of clustering
algorithms, and we present BARAC, a particular subspace-clustering algorithm
that enables rank-aware interval-based clustering in domains with
heterogeneous attributes.  We validate the effectiveness of our approach
with a large-scale user study, and perform an extensive experimental
evaluation of efficiency, demonstrating that our methods are practical on
the large scale. Our evaluation is performed on large datasets from Yahoo!
Personals, a leading online dating site, and on restaurant data from Yahoo!

Julia Stoyanovich is a Visiting Scholar at the University of Pennsylvania.
Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia
University, and a B.S. in Computer Science and in Mathematics and Statistics
from the University of Massachusetts at Amherst.  After receiving her B.S.
Julia went on to work for two start-ups and one real company in New York
City, where she interacted with, and was puzzled by, a variety of massive
datasets.  Julia's research focuses on modeling and exploring large datasets
in presence of rich semantic and statistical structure.  She has recently
worked on personalized search and ranking in social content sites,
rank-aware clustering in large structured datasets that focus on dating and
restaurant reviews, data exploration in repositories of biological objects
as diverse as scientific publications, functional genomics experiments and
scientific workflows, and representation and inference in large datasets
with missing values.

*Host*: Evimaria Terzi

More information about the Dmbu-l mailing list