# [Busec] Fwd: [Theory-reading-group] Theory Day

Ran Canetti canetti at bu.edu
Tue Jun 17 23:00:44 EDT 2014

FYI, some summer diversion from crypto...

-------- Original Message --------
Date: Mon, 16 Jun 2014 05:41:37 -0400
From: Ankur Moitra <moitra at MIT.EDU>
To: theory-reading-group at csail.MIT.EDU, toc-faculty at csail.MIT.EDU,
toc-students at csail.MIT.EDU, toc at csail.MIT.EDU,
focs2014-pc at lists.csail.mit.edu
CC: minilek at seas.harvard.edu

Hi All,

We happen to have a lot of theory visitors in town (for a PC meeting) so
we're
going to take advantage of it and have the first ever MSR/MIT Theory Day
this
coming friday! All are welcome, and I hope to see you there!

See below for the details.

Best,
Ankur

Date: Friday, June 20th

Location: Barton Room (first floor)
Microsoft Research, One Memorial Drive

10:30-11:15: Nikhil Bansal, "Minimizing Flow Time on Unrelated Machines"
11:15-12:00: Aaron Roth, "Correctness Protection via Differential Privacy"
12:00-1:00: Lunch Break
1:00-1:45: Ankur Moitra, "New Algorithms for Dictionary Learning"
1:45-2:30: David Woodruff, "Turnstile Streaming Algorithms Might as Well Be
Linear Sketches"
2:30-3:00: Coffee Break
3:00-3:45: James Lee, "Anti-concentration of Smoothed Functions and
Talagrand's
Convolution Conjecture"

----------------------------------------------------------------------------------

Minimizing Flow Time on Unrelated Machines

Nikhil Bansal, Eindhoven University

The flow time of a job is defined as the amount of time the job spends
in the
system, i.e. its completion time minus its arrival time, and is a widely
studied measure of quality in scheduling. I will describe the first
poly-logarithmic approximation algorithms for the problems of minimizing (i)
the total flow time and (ii) the maximum flow time on unrelated
machines. The
main idea is to consider a new LP relaxation for these problems that has
much
fewer constraints, and apply iterated rounding.

Joint work with Janardhan Kulkarni

----------------------------------------------------------------------------------

Correctness Protection via Differential Privacy

Aaron Roth, UPenn

False discovery is a growing problem in scientific research. Despite
sophisticated statistical techniques for controlling the false discovery
rate
and related statistics designed to protect against spurious discoveries,
there
is significant evidence that many
published scientific papers contain incorrect conclusions.

In this talk we consider the role that adaptivity has in this problem. A
fundamental disconnect between the theorems that control false discovery
rate
and the practice of science is that the theorems assume a fixed
collection of
hypotheses to be tested, selected non-adaptively before the data is
gathered,
whereas science is by definition an
adaptive process, in which data is shared and re-used, while hypotheses are
generated after seeing the results of previous tests.

We note that false discovery cannot be prevented when a substantial
number of
adaptive queries are made to the data, and data is used naively -- i.e. when
queries are answered exactly with their empirical estimates on a given
finite
data set. However we show that remarkably, there is a different way to
evaluate
statistical queries on a data set that allows even an adaptive analyst
to make
exponentially many queries to the data set, while guaranteeing that with
high
probability, all of the conclusions he draws generalize to the underlying
distribution. This technique counter-intuitively involves actively
perturbing
the answers given to the data analyst, using techniques developed for
privacy
preservation -- but in our application, the perturbations are added
entirely to
increase the utility of the data.

Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, and Omer
Reingold.

----------------------------------------------------------------------------------

New Algorithms for Dictionary Learning

Ankur Moitra, MIT

Sparse representations play an essential role in many fields including
statistics, signal processing and machine learning. But can we efficiently
learn a basis that enables a sparse representation, if one exists? This
problem
has many names, and is often called dictionary learning or sparse
coding. Here
we study a stochastic model for this problem where there is an unknown $n \times m$ dictionary $A$ and our goal is to learn $A$ from random
examples of
the form $Y = AX$ where $X$ is sparse and is drawn from an appropriate
distribution. Recently Spielman, Wang and Wright gave an algorithm that
works
when $A$ has full column rank.

Here we give the first algorithms for the overcomplete setting ($m > n$). See
also independent, concurrent work of Agarwal et al. Our algorithms work for
$\mu$-incoherent dictionaries, which have long been a central object of
study
in sparse recovery. We require $X$ to be $k$-sparse for $k$ approximately
$\sqrt{n}/C\mu \log n$. Notably, this bound is within a logarithmic
factor of
the information theoretic threshold for sparse recovery on incoherent
dictionaries. Our work is based on new connections between dictionary
learning
and overlapping clustering in random graphs, and also new insights into when
and why alternating minimization converges.

Joint work with Sanjeev Arora, Rong Ge and Tengyu Ma

----------------------------------------------------------------------------------

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

In the turnstile model of data streams, an underlying vector x in {-m,
-m+1,...,
m-1, m} is presented as a long sequence of arbitrary positive and negative
integer updates to its coordinates. A randomized algorithm seeks to
approximate
a function f(x) with constant probability while only making a single
pass over
this sequence of updates and using a small amount of space. All known
algorithms in this model are linear sketches: they sample a matrix A from a
distribution on integer matrices in the preprocessing phase, and
maintain the
linear sketch Ax while processing the stream. At the end of the stream, they
output an arbitrary function of Ax. One cannot help but ask: are linear
sketches universal?

In this work we answer this question by showing that any 1-pass constant
probability streaming algorithm for approximating an arbitrary function
f of x
in the turnstile model can also be implemented by sampling a matrix A
from the
uniform distribution on O(n log m) integer matrices, with entries of
magnitude
poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of
the number of possible states of Ax, as x ranges over {-m, -m+1, ..., m}^n,
plus the amount of randomness needed to store A, is at most a logarithmic
factor larger than the space required of the space-optimal algorithm. Our
result shows that to prove space lower bounds for 1-pass streaming
algorithms,
it suffices to prove lower bounds in the simultaneous model of communication
complexity, rather than the stronger 1-way model. Moreover, the fact that we
can assume we have a linear sketch with polynomially-bounded entries further
simplifies existing lower bounds, e.g., for frequency moments we present a
simpler proof of the tilde{Omega}(n^{1-2/k}) bit complexity lower bound
without
using communication complexity.

Joint work with Yi Li and Huy L. Nguyen

----------------------------------------------------------------------------------

Anti-concentration of Smoothed Functions and Talagrand's Convolution
Conjecture

James Lee, University of Washington

I will present a proof of the Gaussian case of Talagrand's convolution
conjecture which asserts that the noised version of an arbitrary
non-negative
function cannot be concentrated on any fixed scale.  In other words,
Markov's
inequality is not tight for such functions.  This follows from a more
general
logarithmic anti-concentration phenomenon for non-negative functions
that are
not too log-concave.  The proof proceeds by analyzing a stochastic
evolution of
the Gaussian measure to our target measure.  I will also discuss versions of
this process on the discrete cube and the relationship to Fourier
analysis and
information theory.

Joint work with Ronen Eldan.

_______________________________________________