Avrim Blum: Publications and Working papers
These publications and working papers are presented roughly in reverse
chronological order of their initial publication. Much
of this work was supported by grants
from
the National Science Foundation and other funding agencies including DARPA and the Simons foundation. Any opinions, findings, and
conclusions or recommendations expressed in this material are
those of the authors and do not reflect or represent the views of
these funding agencies.
2024
 Winning Without Observing Payoffs: Exploiting Behavioral Biases to Win Nearly Every Round. With Melissa Dutz. ITCS 2024.
Consider repeatedly playing a zerosum game with unknown payoff matrix where you cannot observe anyone's payoffs, not even your own. All you know is that the game is symmetric (like RockPaperScissorsLizardSpock) and you can observe your opponent's actions. Feldman et al. (2010) proposed this setting and gave an algorithm that guarantees payoff approaching the minimax optimal value (i.e., zero), the best one can generically hope for. Here we consider whether one can actually win in this setting if the opponent is behaviorally biased. We model several deterministic, biased opponents and show that even without knowing the game matrix or observing any payoffs, one can take advantage of each bias to win nearly every round (so long as the game has the property that each action beats and is beaten by at least one other action). We also provide a partial characterization of the kinds of biased strategies that can be exploited to win nearly every round, and provide algorithms for beating some kinds of biased strategies even when we don't know which strategy the opponent uses.
 On the Vulnerability of Fairness Constrained Learning to Malicious Noise. With Princewill Okoroafor, Aadirupa Saha, and Kevin Stangl. AISTATS 2024.
We consider the vulnerability of fairnessconstrained learning to malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and proved that any proper learner can exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we need only incur a Θ(α)
loss in accuracy, where α is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(sqrt(α)) loss, and give a matching Ω(sqrt(α)) lower bound. For Equalized Odds and Predictive Parity, however, and adversary can indeed force an Ω(1) loss. The key technical novelty of our work is how randomization can bypass simple 'tricks' an adversary can use to amplify its power. These results provide a more finegrained view of the sensitivity of fairnessconstrained learning to adversarial noise in training data.
 Agnostic MultiRobust Learning using ERM. With Saba Ahmadi, Omar Montasser, and Kevin Stangl. AISTATS 2024.
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentiallymany perturbations that an adversary might make to a test example, but the attacker only needs to find one successful perturbation. Xiang et al. [2022] proposed an algorithm for patch attacks that reduces the effective number of perturbations from an exponential to a polynomial, and learns using an ERM oracle. However, their guarantee requires the natural examples to be robustly realizable. In this work we consider the nonrobustlyrealizable case. Our first contribution is to give a guarantee for this setting by utilizing an approach of Feige, Mansour, and Schapire [2015]. Next, we extend our results to a multigroup setting and introduce a novel agnostic multirobust learning problem where the goal is to learn a predictor that achieves low robust loss on a (potentially) rich collection of subgroups.
 Dueling Optimization with a Monotone Adversary. With Meghal Gupta, Gene Li, Naren Sarayu Manoj, Aadirupa Saha, and Yuanyuan Yang. ALT 2024.
We introduce and study the problem of dueling optimization with a monotone adversary, a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer x* for a function f:X→R, for X \subseteq R^d. In each round, the algorithm submits a pair of guesses x1 and x2, and the adversary responds with any point in the space that is at least as good as both guesses. The cost of each query is the suboptimality of the worst of the two guesses; i.e., max(f(x1) − f(x*),f(x2) − f(x*)). The goal is to minimize the number of iterations required to find an εoptimal point and to minimize the total cost (regret) of the guesses over many rounds. Our main result is an efficient randomized algorithm for several natural choices of the function f and set X that incurs cost O(d) and iteration complexity O(d log(1/ε)^2). Moreover, our dependence on d is asymptotically optimal, as we show examples in which any randomized algorithm for this problem must incur Ω(d) cost and iteration complexity.
2023
 Eliciting User Preferences for Personalized MultiObjective Decision Making through Comparative Feedback. With Han Shao, Lee Cohen, Yishay Mansour, Aadirupa Saha, and Matthew Walter. NeurIPS 2023.
In classic RL, policies are evaluated with respect to a scalar reward function. However, many realworld problems involve balancing multiple objectives whose relative priority will vary from user to user. So, an optimal policy for one user might be suboptimal for another. In this work, we consider a Markov decision process with a vectorvalued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to elicit enough information about a given user via comparison queries to efficiently compute a nearoptimal policy for them. We consider two feedback models: (1) a model where a user is provided with two policies and returns their preferred policy, and (2) a model where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
 Strategic Classification under Unknown Personalized Manipulation. With Han Shao and Omar Montasser. NeurIPS 2023.
We consider strategic classification, where agents can strategically manipulate their feature vector to a limited extent in order to be classified as positive.
Unlike most prior work, our work considers manipulations to be personalized, meaning that agents can have different levels of manipulation abilities (e.g., varying radii for ball manipulations), and unknown to the learner. We formalize the learning problem in an interaction model where the learner first deploys a classifier and the agent manipulates the feature vector within their manipulation set to game the deployed classifier. We investigate various scenarios in terms of the information available to the learner during the interaction, such as observing the original feature vector before or after deployment, observing the manipulated feature vector, or not seeing either the original or the manipulated feature vector, and provide online mistake bounds and PAC sample complexity in these scenarios for ball manipulations.
 Fundamental Bounds on Online Strategic Classification. With Saba Ahmadi and Kunhe Yang. ACMEC 2023.
We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from nonstrategic online classification.
For instance, whereas in the nonstrategic case, a mistake bound of lnH is achievable via the halving algorithm when the target function belongs to a known class H, we show that no deterministic algorithm can achieve a mistake bound o(Δ) in the strategic setting, where Δ is the maximum degree of the manipulation graph (even when H=Δ). We obtain an algorithm achieving mistake bound O(ΔlnH). We also extend this to the agnostic setting and obtain an algorithm with a Δ multiplicative regret, and we show no deterministic algorithm can achieve o(Δ) multiplicative regret. We also study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences.
 An Analysis of Robustness of NonLipschitz Networks. With MariaFlorina Balcan, Dravyansh Sharma, and Hongyang Zhang. JMLR 24(98):143, 2023.
Despite significant advances, deep networks remain highly susceptible to adversarial attack. One fundamental challenge is that small input perturbations can often produce large movements in the network’s finallayer feature space. In this paper, we define an attack model that abstracts this challenge, to help understand its intrinsic properties. In our model, the adversary may move data an arbitrary distance in feature space but only in random lowdimensional subspaces. We prove such adversaries can be quite powerful: defeating any algorithm that must classify any input it is given. However, by allowing the algorithm to abstain on unusual inputs, we show such adversaries can be overcome when classes are reasonably wellseparated in feature space. We further provide theoretical guarantees for setting algorithm parameters to optimize over accuracyabstention tradeoffs using datadriven methods, and provide connections to strategic classification as well.
 Setting Fair Incentives to Maximize Improvement. With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. FORC 2023.
We consider the problem of helping agents improve by setting
shortterm goals. Given a set of target skill levels, we assume each
agent will try to improve from their initial skill level to the
closest target level within reach or do nothing if no target level is
within reach. Our goal is to optimize the target levels for social
welfare and fairness objectives, where social welfare is defined as
the total amount of improvement, and fairness objectives are
considered when agents belong to different underlying
populations. A key technical challenge of this problem is the
nonmonotonicity of social welfare in the set of target levels, i.e.,
adding a new target level may decrease the total amount of improvement
as it may get easier for some agents to improve. Considering these
properties, we provide algorithms for optimal and nearoptimal
improvement for both social welfare and fairness objectives. Finally,
we extend our algorithms to learning settings where we have only
sample access to the initial skill levels of agents.
 Multiagent Value of Information for Components' Inspections. With Chaochao Lin, MariaFlorina Balcan, and Matteo Pozzi. ICASP 2023.
We consider a multiagent componentinspection game. In this game, agents manage different components of an overall system. Each component may be either intact or damaged, and the overall system works or not according to some Boolean function of the status of its components. Agents incur costs for repairing their own components, but the penalty for system failure is shared among all agents. The twist is that agents do not know if their components need repair (they have a Bayesian prior), and can only find out through (costless) inspection actions, which publicly reveals the status of the component to all. We assess the 'Value of Information' for this game, showing that interesting cases can trigger Information Avoidance, where rational agents prefer not to collect free information.
2022
 Boosting barely robust learners: A new perspective on adversarial robustness.
With Omar Montasser, Greg Shakhnarovich, and Hongyang Zhang. NeurIPS 2022.
In this work we show that having the ability to learn a predictor that is robust to adversarial perturbations of magnitude 2*gamma on a small fraction of any given data distribution is equivalent to having the ability to learn a predictor that is robust to adversarial perturbations of magnitude gamma on nearly all of any given data distribution. Moreover, we present oracleefficient reductions in both directions.
 A Theory of PAC Learnability under Transformation Invariances.
With Han Shao and Omar Montasser. NeurIPS 2022.
Transformation invariances are present in many realworld problems, e.g., a rotated car is still identified as a car. Data augmentation, which adds the transformed data into the training set and trains a model on the augmented data, is a common technique to build these invariances into the learning process. However, it is unclear how data augmentation performs theoretically and what the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC learnability under transformation invariances in three settings according to different levels of realizability: (i) a hypothesis in H fits the augmented data; (ii) a hypothesis in H fits only the original data and the transformed data lying in the support of the data distribution; (iii) the agnostic case. One interesting observation is that distinguishing between the original data and the transformed data is necessary to achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating between the original and transformed data (including data augmentation) is not optimal. We propose two combinatorial measures characterizing the optimal sample complexity in these settings and provide sampleoptimal algorithms.
 RobustlyReliable Learners Under Poisoning Attacks.
With Nina Balcan, Steve Hanneke, and Dravyansh Sharma. COLT 2022.
We analyze robustness guarantees achievable in the face of targeted poisoning attacks. We define and show how to provide robustlyreliable predictions, in which the predicted label is guaranteed to be correct so long as the adversary has not exceeded a given corruption budget and the true target function belongs to the class in question. Prior approaches had considered only certificates that the algorithm's prediction does not change, rather than certifying correctness. We provide a complete characterization of learnability in this setting, in particular, nearlytight matching upper and lower bounds on the region that can be certified, as well as efficient algorithms for computing this region given an ERM oracle. Moreover, for the case of linear separators over logconcave distributions, we provide efficient truly polynomial time algorithms (i.e., nonoracle algorithms) for such robustlyreliable predictions. We also extend these results to the active setting where the algorithm adaptively asks for labels of specific informative examples, and the difficulty is that the adversary might even be adaptive to this interaction, as well as to the agnostic learning setting where there is no perfect classifier even over the uncorrupted data.
 On Classification of Strategic Agents Who Can Both Game and Improve.
With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. FORC 2022.
We consider classification of agents who can both game and improve, such as loan applicants who may be able to take some actions that increase their perceived creditworthiness and others that also increase their true creditworthiness. A decisionmaker would like to define a classification rule with few falsepositives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each.
For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partialinformation learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finitepoint version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NPhard in our full linear model and give additional results for lowdimensional data.
 Multi Stage Screening: Enforcing Fairness and Maximizing Efficiency in a PreExisting Pipeline.
With Kevin Stangl and Ali Vakilian. FAccT 2022.
Consider an actor making selection decisions (e.g., hiring) using a series of classifiers. The early stages (e.g. resume screen, coding screen, phone interview) filter out some of the applicants, and then an expensive but accurate test (e.g. a full interview) is applied to those who make it to the final stage.
We examine the goal of maximizing quantities of interest to the decision maker subject to requiring Equality of Opportunity (qualified members of each group have the same chance of being hired), via modification of the probabilities of promotion through the screening process at each stage based on performance at the previous stage.
We exhibit algorithms for satisfying Equal Opportunity over the selection process and maximizing precision (the fraction of interviews that yield qualified candidates) as well as linear combinations of precision and recall at the end of the final stage. We also present examples showing that the solution space is nonconvex, which motivate our combinatorial exact and (FPTAS) approximation algorithms for maximizing the linear combination of precision and recall. Finally, we discuss the ‘price of’ additional restrictions, such as not allowing the decisionmaker to use group membership in its decision process.
 Stochastic Vertex Cover with Few Queries.
With Soheil Behnezhad and Mahsa Derakhshan. SODA 2022.
We study the minimum vertex cover problem in the following stochastic setting. Let G be an arbitrary given graph and p<1 a parameter of the problem, and let G_p be a random subgraph that includes each edge of G independently with probability p. We are unaware of the realization G_p, but can learn if an edge e exists in G_p by querying it. The goal is to find an approximate minimum vertex cover of G_p by querying few edges of G nonadaptively. We present a (2+epsilon) approximation for general graphs which queries O(1/epsilon^3 p) edges per vertex, a 1.367approximation for bipartite graphs which queries poly(1/p) edges per vertex, and a (1+epsilon) approximation for bipartite graphs with tripleexp(1/p) queries per vertex.
Our techniques also lead to improved bounds for bipartite stochastic matching. We obtain a 0.731approximation with nearlylinear in 1/p pervertex queries, breaking the prevalent 2/3approximation barrier in the poly(1/p) query regime, improving algorithms of [Behnezhad et al., SODA'19] and [Assadi and Bernstein, SOSA'19].
2021
 Excess Capacity and Backdoor Poisoning. With Naren Manoj. NeurIPS 2021.
This work presents a theoretical framework for analyzing backdoor data
poisoning attacks, in which an attacker aims to embed a "trigger" in
training data that it can use later to produce specific desired
misclassifications. We identify a parameter we call the
memorization capacity that we show captures the vulnerability
of a learning problem to such an attack in this model. We then use
this to analyze several natural settings. We also show that under
certain assumptions, adversarial training can detect the presence of
backdoors in a training set, and also show a formal relationship
between the problems of backdoor filtering and robust generalization.
 Robust learning under cleanlabel attack. With Steve Hanneke, Jian Qian, and Han Shao. COLT 2021.
We study the problem of robust learning under cleanlabel
datapoisoning attacks, where the attacker injects an arbitrary set
of correctlylabeled examples to the training set with the goal of getting the algorithm to make mistakes on specific test instances at test
time. The learning goal is to minimize the
attackable rate: the probability mass of test instances that the attacker can successfully attack. We show that any robust algorithm
with diminishing attackable rate can achieve an optimal O(1/ϵ) dependence on
ϵ in its PAC sample complexity. On the other hand, the
attackable rate might be large even for some optimal PAC learners,
e.g., SVM for linear classifiers. Furthermore, we show that the class
of linear hypotheses is not robustly learnable when the data
distribution has zero margin and is robustly learnable in the case of
positive margin but requires sample complexity exponential in the
dimension. For a general hypothesis class with bounded VC dimension,
if the attacker is limited to add at most t>0 poison examples, the
optimal robust learning sample complexity grows almost linearly with
t.
 The Strategic Perceptron. With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. ACMEC 2021.
The classic Perceptron algorithm is a simple and elegant learning procedure
that makes a bounded number of mistakes
when data is linearly separable with a nonzero margin.
However, what if data points correspond to strategic agents
who want to be classified as positive, and who can
modify their positions by a limited amount wihout changing their true label? In this situation the observed position of data points will depend on the current classifier, and the Perceptron algorithm may fail to achieve its guarantees. Indeed, we
illustrate examples where the predictor may oscillate between two
solutions forever, making an unbounded number of mistakes even though
a perfect largemargin linear classifier exists. Our main contribution
is providing a modified Perceptronstyle algorithm which makes a
bounded number of mistakes in presence of strategic agents with both
ℓ_2 and weighted ℓ_1 manipulation costs. We also present several open problems in this model.
 IncentiveCompatible Kidney Exchange in a Slightly SemiRandom Model. With Paul Gölz. ACMEC 2021.
Motivated by kidney exchange, we study the following mechanismdesign
problem. Vertices of a directed graph are partitioned across multiple players (hospitals). Players report their vertices to the mechanism, which then aims to find a long path (a chain
of transplantations) starting at a distinguished vertex (an
altruistic donor). The challenge is that players want to maximize the number of their own vertices that lie on the path returned by the mechanism, and may strategically omit vertices to do so.
Unfortunately, in worstcase
instances, competing with the overall longest path is impossible while
incentivizing (approximate) truthfulness.
We therefore adopt a semirandom model where a small (o(n))
number of random edges are added to worstcase instances. We give
a truthful mechanism for this setting that competes with the longest path
whose subpaths within
each player have a minimum average length. In fact, our mechanism
satisfies even a stronger notion of truthfulness, which we
call matchingtime incentive compatibility.
 One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning. With Nika Haghtalab, Richard Phillips, and Han Shao. ICML 2021.
This paper introduces a framework for incentiveaware learning and
data sharing in collaborative learning, with the goal of incentivizing agents to maintain their collaboration. Our stable and envyfree
equilibria capture notions of collaboration in the presence of agents
interested in meeting their learning objectives while keeping their
own sample collection burden low. For example, in an envyfree
equilibrium, no agent would wish to swap their sampling burden with
any other agent and in a stable equilibrium, no agent would wish to
unilaterally reduce their sampling burden. In addition to formalizing
this framework, our contributions include characterizing the
structural properties of such equilibria, proving when they exist, and
showing how they can be computed. Furthermore, we compare the sample
complexity of incentiveaware collaboration with that of optimal
collaboration when one ignores agents' incentives.
 Learning Complexity of Simulated Annealing. With Chen Dan and Saeed Seddighin. AISTATS 2021.
A key component that plays a crucial role in the performance of
simulated annealing is the cooling schedule. Motivated by this, we
study the following question: "Given samples of instances of an
optimization problem drawn from some distribution D, can we design
cooling schedules that minimize the runtime or maximize the success
rate of simulated annealing with respect to D?" We provide positive
results both in terms of sample complexity and simulation
complexity. For sample complexity, we show that O~(m^1/2) samples
suffice to find an approximately optimal cooling schedule of length
m, and complement this with a sample complexity lower bound of Ω~(m^1/3).
These results are general and rely on
no assumptions. For simulation complexity, however, we make additional
assumptions to measure the success rate of an algorithm. To this end,
we introduce the monotone stationary graph that models the performance
of simulated annealing. Based on this model, we present polynomial
time algorithms with provable guarantees for the learning problem.
 CommunicationAware Collaborative Learning. With Shelby Heinecke and Lev Reyzin. AAAI 2021.
Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise.
2020
 Online Learning with Primary and Secondary Losses. With Han Shao. NeurIPS 2020.
We consider a form of "combining expert advice" where at each round there are two loss vectors: a primary loss and a secondary loss. Our goal is to perform nearly as well as the best expert with respect to the primary loss, while at the same time achieving the minimal guarantee of performing not too much worse than the worst expert with respect to the secondary loss. E.g., imagine a recruiter deciding which job applicants to hire, where the company gives higher weight to falsepositives than falsenegatives (the primary loss) but society might weight these errors more equally (the secondary loss).
Unfortunately, we show that this goal  and more generally the goal of low regret with respect to the primary loss while bounding the secondary loss by a linear threshold  is unachievable without any bounded variance assumption on the secondary loss. However, on the positive side, we show that running any switchinglimited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by o(T) for any time interval.
 Random Smoothing Might be Unable to Certify L_infinity Robustness for HighDimensional Images. With Travis Dick, Naren Manoj, and Hongyang Zhang. JMLR 21(211):1−21, 2020.
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the L_p ball of radius epsilon when p>2. Although random smoothing has been well understoodfor the L_2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the L_infinity threat model. In this work, we show that any noise distribution D over R^d that provides L_p robustness for all base classifiers with p>2 must satisfy E[eta_i^2] = Omega(d^{1−2/p} epsilon^2 (1delta)/delta^2)for 99% of the features (pixels)of vector eta ~ D, where epsilon is the robust radius and delta is the score gap between the highestscored class and the runnerup. Therefore, for highdimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
 Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?. With Kevin Stangl. FORC 2020.
Multiple fairness constraints have been proposed in the literature, motivated by concerns about ways demographic groups might be treated unfairly by learned classifiers. In this work we consider a different motivation: learning from biased training data. We posit several ways training data may be biased, including having a more noisy or negativelybiased labeling process for members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both.
Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairnessconstrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. In contrast, requiring calibration will make ERM perform worse. We also consider other approachess including reweighting training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.
 Active Local Learning. With Arturs Backurs and Neha Gupta. COLT 2020.
In this work we consider active local learning: given a query point x, and active access to an unlabeled training set S, output the prediction h(x) of a nearoptimal h in H using significantly fewer labels than would be needed to actually learn h fully. In particular, the number of label queries should be independent of the complexity of H, and the function h should be welldefined, independent of x. This immediately also implies an algorithm for distance estimation: estimating the value opt(H) from many fewer labels than needed to actually learn a nearoptimal h in H, by running local learning on a few random query points and computing the average error. We give algorithms for several types of functions including Lipschitzbounded functions and
approximating the minimum error that can be achieved by the NadarayaWatson estimator under a linear diagonal transformation with eigenvalues coming from a small range.
 Advancing Subgroup Fairness via Sleeping Experts. With Thodoris Lykouris. ITCS 2020. We show how the sleeping experts framework can be used to provide meaningful fairness and quality guarantees when making decisions on individuals that belong to a variety of groups that may overlap. Our approach is based on a "best effort" notion of fairness, where for each group we wish to perform nearly as well as the best decision rule for that group from a given class. We also show that the ability to achieve such guarantees is sensitive to the precise notion of "performing well", showing that certain notions can be intrinsically unachievable.
 Foundations of Data Science. With John Hopcroft and Ravi Kannan. Cambridge University Press. Free prepublication version (for personal use only; not for redistribution, resale, or use in derivative works).
Textbook on core mathematical concepts and algorithms for data science, including properties of highdimensional spaces, singular value decomposition, random walks and Markov chains, fundamentals of machine learning, streaming algorithms, clustering, properties of random graphs, and other topics.
 Approximation Stability and Proxy Objectives. Chapter 6 in Beyond the WorstCase Analysis of Algorithms, T. Roughgarden, editor. Cambridge University Press.
Survey article on approximation stability.
 Technical perspective: Algorithm selection as a learning problem. Communications of the ACM 63(6):86, 2020. A brief discussion of and perspective on recent work analyzing algorithm selection and design as a learning problem.
2019
 Computing
Stackelberg Equilibria of Large GeneralSum Games. With
Nika Haghtalab, MohammadTaghi Hajiaghayi, and Saeed Seddighin. SAGT
2019. In this work we study the computational
complexity of finding Stackelberg Equilibria in generalsum games,
where the set of pure strategies of the leader and the followers are
exponentially large in a natural representation of the problem. In
zerosum games, the notion of a Stackelberg equilibrium coincides with
the notion of a Nash Equilibrium. Finding these equilibrium concepts
in zerosum games can be efficiently done when the players have
polynomially many pure strategies but also when (in additional to some
structural properties) a bestresponse oracle is available. In this
work, we show that while there are natural large generalsum games
where the Stackelberg Equilibria can be computed efficiently if the
Nash equilibrium in its zerosum form could be computed efficiently,
in general, structural properties that allow for efficient computation
of Nash equilibrium in zerosum games are not sufficient for computing
Stackelberg equilibria in generalsum games.
 BiluLinial
Stability, Certified Algorithms and the Independent Set
Problem. With Haris Angelidakis, Pranjal Awasthi, Vaggos
Chatziafratis, and Chen Dan. ESA 2019. We
consider the classic Max Weighted Independent Set problem on
perturbationstable instances: instances such that the optimal
solution does not change even if node weights are perturbed by a at
most some given factor gamma. We give algorithms to find the optimal
solution on graphs of degree at most Δ that are
Õ(Δ/sqrt(log Δ))stable, kcolorable graphs that are (k
 1)stable, and planar graphs that are (1 + epsilon)stable, using
both combinatorial techniques as well as LPs and the SheraliAdams
hierarchy. For general graphs, we also present a an O(n^(1/2 
epsilon)) lower bound, assuming the planted clique conjecture. As a
byproduct of our techniques, we also give algorithms and lower bounds
for stable instances of Node Multiway Cut. We also we initiate the
study of certified algorithms for Independent Set: these are
algorithms that on any instance I (stable or not) produce a bounded
perturbation I' of I along with a solution that is optimal for I' (so
if I was stable, then this will be optimal for I also). This notion
was recently introduced by Makarychev and Makarychev (2018). Here, we
obtain Δcertified algorithms for Independent Set on graphs of
maximum degree Δ, and (1+epsilon)certified algorithms on planar
graphs. Finally, we analyze the algorithm of Berman and Fürer (1994)
and prove that it is a ((Δ + 1)/3 + epsilon)certified algorithm
for Independent Set on graphs of maximum degree Δ where all
weights are equal to 1.
 Optimal
Strategies of Blotto Games: Beyond Convexity. With Soheil
Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Christos
H. Papadimitriou, and Saeed Seddighin. ACMEC 2019. Colonel Blotto is a classic game first introduced by
Borel in 1921. Two colonels each have a pool of troops that they
divide simultaneously among a set of battlefields. The winner of each
battlefield is the colonel who puts more troops in it and the overall
utility of each colonel is the sum of weights of the battlefields that
s/he wins.
Two main objectives have been proposed for this game are
(i) maximizing expected payoff, and (ii) maximizing the
probability of obtaining a minimum payoff u (corresponding to settings
such as winning electoral votes). In this paper, we consider both
objectives and show how it is possible to obtain (almost) optimal
solutions that have few strategies in their support.
One of the main technical challenges with bounded support
strategies is that the solution space
becomes nonconvex. However, we show through a
set of structural results that the solution space can, interestingly,
be partitioned into polynomially many disjoint convex polytopes that
can be considered independently. Coupled with a number of other
combinatorial observations, this leads to polynomial time
approximation schemes for both of the aforementioned objectives.
We also provide the first complexity result for finding the maximin of
Blottolike games: we show that computing the maximin of a
generalization of the Colonel Blotto game that we call General Colonel
Blotto is exponential timecomplete.
2018
 On preserving
nondiscrimination when combining expert advice. With
Suriya Gunasekar, Thodoris Lykouris, and Nati Srebro. NeurIPS 2018.
This work considers the question: Can we combine "fair" expert advice
fairly? Specifically, given a set of predictors that are individually
"fair" with respect to a particular fairness condition, can we combine
them to (a) perform nearly as well the best of them in hindsight
(i.e., achieve low regret), while simultaneously (b) at least
approximately maintaining the given fairness condition? This would be
fairly trivial in the iid setting (just observe a small sample of data
and then select the predictor that performed best on the sample).
However, we show that this is not in general achievable in the noniid
online setting for fairness notions of "equalized odds" and "equality
of opportunity". On the positive side, we show that this can
be achieved for a fairness notion of "equalized error rates".
 Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems. With Nina Balcan and ShangTse Chen. AAMAS 2018.
We consider online decisionmaking in settings where
players want to guard against possible adversarial attacks or other
catastrophic failures. To address this, we propose a solution concept
in which players have an additional constraint that at each time step
they must play a diversified mixed strategy: one that does not put
too much weight on any one action. We explore properties
of diversified strategies in both zerosum and generalsum
games, and provide algorithms for minimizing regret within the
family of diversified strategies as well as methods for using taxes
or fees to guide standard regretminimizing players towards diversified
strategies. In generalsum games, requiring
diversification can actually lead to higherwelfare equilibria,
and we give guarantees on both price of anarchy and the social
welfare produced by regretminimizing diversified agents.
 Active Tolerant
Testing. With Lunjia Hu. COLT 2018.
We show for a nontrivial hypothesis class C one can
estimate the error rate of the best h in C using substantially
fewer labeled examples than would be needed to actually
learn a good h in C. Specifically, we show that for unions of
d intervals on the line, in the agnostic active setting (the target is
arbitrary and we can query labels from a pool of examples drawn from
the underlying distribution), we can estimate the error rate of the
best h in C to +/ epsilon with a number of label requests that is
independent of d and depends only on epsilon. Our work extends that
of [BBBY12] who solved the nontolerant testing problem for this class
(distinguishing the zeroerror case from the case that the best
hypothesis in the class has error greater than epsilon).
We also consider the related problem of estimating the performance of
a given learning algorithm A in this setting. That is, given a
large pool of unlabeled examples drawn from distribution D, can
we, from only a few label queries, estimate how well A would
perform if the entire dataset were labeled and given as training data
to A? We focus on kNearest Neighbor style algorithms, and
also show how our results can be applied to the problem of
hyperparameter tuning (selecting the best value of k for the given
learning problem).
 Approximate
Convex Hull of Data Streams. With Vladimir Braverman,
Ananya Kumar, Harry Lang, and Lin F. Yang. ICALP 2018. Given a set of points P in R^d and a tolerance
parameter epsilon, let OPT denote the size of the smallest subset S of
P such that every point in P is either in or within distance epsilon
of the convex hull of S. We consider the problem of maintaining such
a small epsilonhull in a streaming model in which the points of P
arrive one at a time and we would like to use space not much larger
than OPT. We begin with lower bounds that show, under a reasonable
model, that it is not possible to have a singlepass
streaming algorithm that computes an epsilonhull with O(OPT)
space. We instead propose three relaxations of the problem for which
we can compute epsilonhulls using space nearlinear to the optimal
size. Our first algorithm for points in R^2 that arrive in
randomorder uses O(log n * OPT) space. Our second algorithm for
points in R^2 makes O(log(epsilon^{1})) passes before outputting the
epsilonhull and requires O(OPT) space. Our third algorithm, for
points in R^d for any fixed dimension d, outputs, with high
probability, an epsilonhull for all but deltafraction of directions
and requires O(OPT * log OPT) space.
 Algorithms for Generalized Topic Modeling. With Nika Haghtalab. AAAI 2018.
In this
work we consider a generalization of the traditional
topic modeling framework, where we no longer assume that
words are drawn i.i.d. and instead view a topic as a complex
distribution over sequences of paragraphs. Since one could
not hope to even represent such a distribution in general (even
if paragraphs are given using some natural feature representation),
we aim instead to directly learn a predictor that given a
new document, accurately predicts its topic mixture, without
learning the distributions explicitly. We present several natural
conditions under which one can do this from unlabeled data
only, and give efficient algorithms to do so, also discussing
issues such as noise tolerance and sample complexity. More
generally, our model can be viewed as a generalization of cotraining
and multiview learning to settings where items have fractional labels.
 On Price versus Quality. With Yishay Mansour. ITCS 2018.
This work proposes a model where the value of a
buyer for some product (like a slice of pizza) is a combination of
their personal desire for the product (how hungry they are for pizza)
and the quality of the product (how good the pizza is). Sellers in
this setting have a twodimensional optimization problem of
determining both the quality level at which to make their product (how
expensive ingredients to use) and the price at which to sell it. We
analyze optimal seller strategies as well as analogs of Walrasian
equilibria in this setting. A key question we are interested in is: to
what extent will the price of a good be a reliable indicator of the
good's quality? One result we show is that while the specific quality
and price that an optimal seller should choose in this model will
depend highly on the specific distribution of buyers,
price and quality will always be linearly related, regardless of
that distribution. We also show that for the case of multiple buyers
and sellers, an analog of Walrasian equilibrium exists in this
setting, and can be found via a natural tatonnement process. Finally,
we analyze markets with a combination of "locals" (who know the
quality of each good) and "tourists" (who do not).
 From Battlefields to Elections: Winning Strategies of Blotto and Auditing Games. With Soheil Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Mohammad Mahdian, Christos H. Papadimitriou, Ronald L. Rivest, Saeed Seddighin, Philip B. Stark. SODA 2018.
Mixed strategies are typically evaluated based on the expected payoff
that they guarantee. In this paper, we consider settings where
instead the goal is to receive utility at least u with
probability at least p.
The first game that we consider is Colonel Blotto, a wellstudied game
that was introduced in 1921. In the Colonel Blotto game, two colonels
divide their troops among a set of battlefields. Each battlefield is
won by the colonel that puts more troops in it. The payoff of each
colonel is the weighted number of battlefields that she wins, and we
consider a version where each player wants to maximize the probability
that they win at least half of the total weight. We give approximation
algorithms as well as exact algorithms for special cases of this
problem.
2017
 Collaborative PAC Learning. With Nika Haghtalab, Ariel Procaccia, and Mingda Qiao. NIPS 2017.
We consider a collaborative PAC learning model, in which k players attempt to
learn the same underlying concept. We ask how much more information is required
to learn an accurate classifier for all players simultaneously. We refer to
the ratio between the samplecomplexity of collaborative PAC learning and
the samplecomplexity of singleplayer PAC learning as the overhead. We
design learning algorithms with only O(ln k) and O((ln k)^2) overhead
respectively in the personalized and centralized
variants our model. In contrast, not sharing information among players
would incur overhead O(k). We complement our
upper bounds with an Omega(ln k) lower bound, showing that our results are
tight up to a logarithmic factor.
 Lifelong Learning in Costly Feature Spaces. With Nina Balcan and Vaishnavh Nagarajan. ALT 2017. Journal version in Theoretical Computer Science 808:1437, 2020 (Special Issue on Algorithmic Learning Theory).
In this work we study lifelong and representation learning for structured target functions including decision trees and polynomials. More specifically, we consider learning a series of target functions, where each can be represented as, say, a decision tree, and all these decision trees share pieces in common that we call "metafeatures". As we learn target functions, our aim is to learn these metafeatures in a way that allows us to learn new target functions more cheaply. We specifically focus on reducing the number of feature evaluations that need to be performed in the learning process, motivated by applications such as medical diagnosis where decision trees are popular in part due to their ability to make predictions based on a small number of features of any given example.
 Efficient CoTraining of Linear Separators under Weak Dependence.
With Yishay Mansour. COLT 2017.
We develop the first polynomialtime algorithm for cotraining of
homogeneous linear separators under weak dependence, a relaxation of
the condition of independence given the label. Our algorithm learns from
purely unlabeled data, except for a single labeled example to break
symmetry of the two classes, and works for any data distribution having an
inversepolynomial margin and with center of mass at the origin.
 Efficient PAC Learning from the Crowd.
With Pranjal Awasthi, Nika Haghtalab, and Yishay Mansour.
COLT 2017.
Standard approaches to crowdsourcing view the process of acquiring
labeled data separately from the process of learning a classifier,
which can lead to computational and statistical inefficiencies. For
example, learning from poorlylabeled data can be computationally
hard, and efforts to eliminate noise through voting often require a
large number of queries per example. In this paper, we show how by
interleaving the process of labeling and learning, we can attain
computational efficiency with much less overhead in the labeling
cost. In particular, we consider a setting where a certain alpha
fraction of labelers actually know the true target function and label
data correctly, whereas the remaining 1alpha fraction have other
arbitrary functions in mind. When alpha is greater than 1/2, we show
that any class that can be efficiently PAClearned from nonnoisy data
can still be efficiently learned in this setting with only a
constantfactor blowup in total number of labels requested. When
alpha is less than 1/2, we can do this with only an additional
constant number of queries to a knowncorrect labeling oracle.
All algorithms require asking any given labeler only O(1) labeling
questions.

Opting into Optimal Matchings.
With Ioannis Caragiannis, Nika Haghtalab, Ariel Procaccia, Eviatar
Procaccia, and Rohit Vaish. SODA 2017, pages 23512363.
We consider design of optimal, individually rational matching
mechanisms (in a general sense, allowing for cycles in directed graphs).
In particular, each playerwho is associated with a subset of
verticesshould be guaranteed to match at least
as many of his own vertices when he opts into the matching mechanism as
when he opts out. We offer a new perspective on this problem by considering
an arbitrary graph, but assuming that vertices are associated with players
at random. Our main result is that under certain conditions, any
fixed optimal matching is likely to be individually rational up to
lowerorder terms. We also show that a simple and practical mechanism is
(fully) individually rational, and likely to be optimal up to lowerorder
terms. We discuss the implications of our results for market design in
general, and kidney exchange in particular.
2016
 Sparse
Approximation via Generating Point Sets.
With Sariel HarPeled and Benjamin Raichel. SODA 2016. ACM Transactions on Algorithms (TALG), Volume 15 Issue 3, July 2019.
We consider the following problem: given a collection P of n objects
(represented as points in the unit ball in R^d), find a
small subset T of P such that each object in P is
close to a sparse convex combination of objects in T. E.g.,
if we allow T=P then this is trivial with sparsity 1, and
for some sets P
(e.g., random points in the unit ball) there is no such
small set T. Let k_opt = k_opt(P,epsilon) be the size of
the smallest subset T_opt such that every point in P is within
distance epsilon of the convex hull of T_opt. Our goal is
to find a set T of k_alg points and distance epsilon_alg such that
every point in P is within distance epsilon_alg to a sparse
combination of points in T, where k_alg and epsilon_alg are
not too much larger than k_opt and epsilon_opt. We give
several efficient algorithms with different guarantees of
this form.
 On the
Computational Hardness of Manipulating Pairwise Voting
Rules. With Rohit Vaish, Neeldhara Misra,
and Shivani Agarwal. AAMAS 2016, pages 358367.
In this work we study the computational tractability
of manipulating voting rules when the input is a collection
of incomplete pairwise preferences. We show that in this
scenario, manipulation can be computationally hard even for
a single manipulator.
 SemiSupervised Learning. Entry in the
Encyclopedia of Algorithms, pages 19361941, 2016.
2015

Efficient Representations for Lifelong Learning and
Autoencoding.
With Nina Balcan and Santosh Vempala. COLT 2015.
In this work we pose and provide efficient algorithms for several
natural theoretical formulations of lifelong learning.
Specifically, we consider the problem of learning many different
target functions over time, that share certain commonalities that are
initially unknown to the learning algorithm. Our aim is to learn new
internal representations as the algorithm learns new target functions,
that capture this commonality and allow subsequent learning tasks to
be solved more efficiently and from less data. We develop efficient
algorithms for two very different kinds of commonalities that target
functions might share: one based on learning common lowdimensional
and unions of lowdimensional subspaces and one based on learning
nonlinear Boolean combinations of features. Our algorithms for
learning Boolean feature combinations additionally have a dual
interpretation, and can be viewed as giving an efficient procedure for
constructing nearoptimal sparse Boolean autoencoders under a natural
"anchorset" assumption.

The Ladder: A Reliable Leaderboard for Machine Learning
Competitions.
With Moritz Hardt. ICML 2015.
We consider the problem of maintaining an accurate leaderboard for a
machine learning competition that faithfully represents the quality of
the best submission of each competing team. What makes this
challenging is that participants may repeatedly evaluate their
submissions on the leaderboard, and in the process overfit to the
holdout data that supports it. Moreover, we (the organizers) cannot
control the capacity or complexity of the rules participants are
using. In this work, we introduce a notion of leaderboard accuracy
tailored to the format of a competition. We introduce a natural
algorithm called the Ladder and demonstrate that it
simultaneously supports strong theoretical guarantees in a fully
adaptive model of estimation, withstands practical adversarial
attacks, and achieves high utility on real submission files from a
Kaggle competition. Notably, we are able to sidestep a powerful recent
hardness result for adaptive risk estimation that rules out algorithms
such as ours under a seemingly very similar notion of accuracy. On a
practical note, we provide a parameterfree variant of our algorithm
that can be easily deployed.

Learning What's Going On: Reconstructing Preferences and Priorities
from Opaque Transactions.
With Yishay Mansour and Jamie Morgenstern. ACMEC 2015.
Journal version in ACM Transactions on Economics and Computation (TEAC) 6(34): 13:113:20, 2018 (Special Issue for EC 2015).
We consider a setting where n buyers, with combinatorial preferences
over m items, and a seller, running a prioritybased allocation
mechanism, repeatedly interact. Our goal, from observing limited
information about the results of these interactions, is to reconstruct
both the preferences of the buyers and the mechanism of the
seller. More specifically, we consider an online setting where at each
stage, a subset of the buyers arrive and are allocated items,
according to some unknown priority that the seller has among the
buyers. Our learning algorithm observes only which buyers arrive and
the allocation produced (or some function of the allocation, such as
just which buyers received positive utility and which did not), and
its goal is to predict the outcome for future subsets of buyers. We
derive mistake bound algorithms for additive, unitdemand and single
minded buyers. We also consider the case where buyers' utilities for a
fixed bundle can change between stages due to different (observed)
prices.

Ignorance is Almost Bliss: NearOptimal Stochastic Matching With Few
Queries.
With John P. Dickerson, Nika Haghtalab, Ariel D. Procaccia, Tuomas
Sandholm, and Ankit Sharma. ACMEC 2015. Journal version in Operations Research 68(1):1634, 2020.
We consider the problem of finding a maximum matching in a graph whose
edges are unknown but can be accessed via queries. More specifically,
we are given an initial graph G, where each edge may be "faulty"
(succeeding or failing independently with success probability p) and
we have the ability to query edges to determine which have succeeded
or failed. We give algorithms that from a limited number of queries,
and a limited number of rounds of queries, can find a matching nearly
as high as the true maximum matching in the graph of live edges. Our
motivation comes from the problem of kidney exchange, and we also
empirically explore the application of (adaptations of) these
algorithms to the kidney exchange problem, where patients with
endstage renal failure swap willing but incompatible donors. We show
on both generated data and on real data from the first 169 match runs
of the UNOS nationwide kidney exchange that even a very small number
of nonadaptive edge queries per vertex results in large gains in
expected successful matches.

Commitment Without Regrets: Online Learning in Stackelberg Security
Games.
With Nina Balcan, Nika Haghtalab, and Ariel D. Procaccia. ACMEC 2015.
In a Stackelberg Security Game, a defender commits to a randomized
deployment of security resources, and an attacker bestresponds by
attacking a target that maximizes his utility. Here, we consider the
case that there are k different types of attackers (each type
has its own payoff matrix) and that a series of attackers of these
types are arriving over time. After each attacker arrives, the
defender receives some feedback (observing either the current attacker
type or merely which target was attacked). We design noregret
algorithms whose regret (when compared to the best fixed strategy in
hindsight) is polynomial in the parameters of the game, and sublinear
in the number of time steps.
 Privacypreserving
Public Information in
Sequential Games. With Jamie Morgenstern, Ankit
Sharma, and Adam Smith. ITCS 2015.
We consider settings where competitors for limited resources want to
maintain privacy of their actions and yet also coordinate so as to not
all chase the same resources and end up with low overall
social welfare. We consider a sequentialmove setting and
explore whether "noisy" information about the current state can
be publicly announced in a manner that both (a) provably
maintains privacy and (b) sufficies to keep play from
reaching bad gamestates. We show that in many
games of interest, this is indeed possible.
We model behavior of players
in this imperfect information setting in two ways  greedy
and undominated strategic behaviors, and we prove
guarantees on social welfare that certain kinds of
privacypreserving information can help attain. Furthermore,
we design a counter with improved privacy guarantees under
continual observation.
 Learning
Valuation Distributions from Partial
Observation. With Yishay Mansour and Jamie
Morgenstern. AAAI 2015.
Auction theory traditionally assumes that bidders' valuation
distributions are known to the auctioneer, such as in the
revenueoptimal Myerson auction. However, this theory does not
describe how the auctioneer comes to possess this information. In this
work, we consider the problem of learning bidders' valuation
distributions from much weaker forms of observations. Specifically, we
consider a setting where there is a repeated sealedbid auction, where
all we can observe for each round is who won, but not how
much they bid or paid. We can also participate (i.e., submit a bid)
ourselves, and observe when we win. From this information, our goal is
to (approximately) recover the inherently recoverable part of the
underlying bid distributions for each bidder. We also consider extensions where
different subsets of bidders participate in each round, and where
bidders' valuations have a commonvalue component added to their
independent private values.
 Online Allocation and Pricing with Economies of Scale. With Yishay Mansour and Liu Yang. WINE 2015.
We consider the problem of online allocation of goods that
have a decreasing marginal cost per
item to the seller, when
customers are unitdemand and arrive one at a time, each with
a valuation function on items sampled iid from some unknown
distribution over valuation functions. Our strategy
operates by using an initial sample to learn enough about
the distribution to determine how best to allocate to future
customers, together with an analysis of structural
properties of optimal solutions that allow for uniform
convergence analysis. We show, for instance, if customers
have {0,1} valuations over items, and the goal of the
allocator is to give each customer an item he or she values,
we can efficiently produce such an allocation with cost at
most a constant factor greater than the minimum over such
allocations in hindsight, so long as the marginal costs do
not decrease too rapidly. We also give a bicriteria
approximation to social welfare for the case of more general
valuation functions when the allocator is budget
constrained.
2014
 Active Learning and BestResponse Dynamics.
With
Nina Balcan, Chris Berlind, Emma Cohen, Kaushik Patnaik, and Le Song.
Proc. 27th Annual Conference on Neural Information Processing Systems
(NIPS) 2014.
We examine a setting where lowpower distributed sensors are each
making highly noisy measurements of some unknown target
function. A center wants to accurately learn this function
by querying a small number of sensors, which ordinarily
would be impossible due to the high noise rate. The question
we address is whether local communication among sensors,
together with natural bestresponse dynamics in an
appropriatelydefined game, can denoise the system without
destroying the true signal and allow the center to succeed
from only a small number of active queries. By using
techniques from game theory and empirical processes, we
prove positive (and negative) results on the denoising power
of several natural dynamics. We then show experimentally
that when combined with recent agnostic active learning
algorithms, this process can achieve low error from very few
queries, performing substantially better than active or
passive learning without these denoising dynamics as well as
passive learning with denoising.
 Learning Mixtures of Ranking Models. With
Pranjal Awasthi, Or Sheffet, and Aravindan Vijayaraghavan. Proc. 27th
Annual Conference on Neural Information Processing Systems (NIPS)
2014.
This work concerns the problem of learning probabilistic
models for ranking data in a heterogeneous population. The specific
problem we study is learning the parameters of a Mallows Mixture
Model. Despite being widely studied, current heuristics for this
problem do not have theoretical guarantees and can get stuck in bad
local optima. We present the first polynomial time algorithm which
provably learns the parameters of a mixture of two Mallows models. A
key component of our algorithm is a novel use of tensor decomposition
techniques to learn the topk prefix in both the rankings. Before this
work, even the question of identifiability in the case of a mixture of
two Mallows models was unresolved.

Learning Optimal Commitment to Overcome Insecurity.
With Nika Haghtalab and Ariel Procaccia.
Proc. 27th Annual Conference on Neural Information Processing Systems
(NIPS) 2014.
Algorithms for Stackelberg security games compute an optimal
strategy for the defender to commit to under the assumption the
attacker will bestrespond. Doing so generally requires
knowledge of what the attacker's payoffs are. In this work,
we design an algorithm that
optimizes the defender's strategy with no prior information, by
observing the attacker's responses to randomized deployments of
resources and learning his priorities. In contrast to previous work,
our algorithm requires a number of queries that is polynomial in the
representation of the game.
 Lazy Defenders Are Almost Optimal Against Diligent Attackers. With Nika Haghtalab and Ariel Procaccia.
Proc. 28th AAAI Conference on Artificial Intelligence (AAAI),
2014.
Most work on Stackelberg security games
assumes that the attacker can perfectly observe
(and therefore will optimally respond to) the defender's randomized
assignment of resources to
targets. This assumption has been challenged by recent
papers, which designed tailormade algorithms that
compute optimal defender strategies for security games
with limited surveillance. We analytically demonstrate
that in zerosum security games, lazy defenders, who
simply keep optimizing against perfectly informed attackers,
are almost optimal against a wide range of attackers with more limited
information. This result suggests that in many cases
limited surveillance may not need to be explicitly addressed.
 Estimating Accuracy from Unlabeled Data
. With Anthony Platanios (lead author) and Tom Mitchell.
UAI 2014.
We propose an approach for using unlabeled data to estimate the true
accuracy of learned classifiers, given access to multiple classifiers
making different "kinds" of errors. We first show how to
estimate error rates exactly from unlabeled data when given
at least three classifiers that make independent errors,
based on their rates of agreement. We then show that even
when the competing classifiers do not make independent
errors, both their accuracies and error dependencies can be
estimated by making certain relaxed assumptions. Experiments
on two realworld data sets produce estimates within a few
percent of the true accuracy, using solely unlabeled
data.
2013

Fast Private Data Release Algorithms for Sparse Queries.
With Aaron Roth. RANDOM 2013.
We revisit the problem of accurately and efficiently answering large
classes of statistical queries while preserving
differential privacy. In this paper we consider the
class of sparse queries, which take nonzero
values on only polynomially many universe elements. We
give efficient query release algorithms for this class,
in both the interactive and the noninteractive
setting. Our algorithms also achieve better accuracy
bounds than existing general techniques do when applied
to sparse queries in that our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the ``infinite universe'' model in which no finite universe need be specified by the data curator.
 Exploiting Ontology Structures and
Unlabeled Data for Learning. With Nina
Balcan and Yishay Mansour. ICML 2013.
We present and analyze a theoretical model designed to understand and
explain the effectiveness of ontologies for learning multiple related
tasks from primarily unlabeled data, motivated by the success of the
CMU NELL (NeverEnding
Language Learning) system. We present both
informationtheoretic results as well as efficient algorithms.
We show in this model that an ontology, which specifies the
relationships between multiple outputs, in some cases is sufficient
to completely learn a classification using a large unlabeled data
source. (The paper linked to here is a longer version of what appears in ICML2013).

Harnessing the Power of Two Crossmatches. With Anupam
Gupta, Ariel Procaccia, and Ankit
Sharma. ACMEC 2013.
Kidney exchanges allow incompatible donorpatient pairs to swap
kidneys, but each donation must pass three tests: blood,
tissue, and crossmatch. In practice a matching is computed
based on the first two tests, and then a single crossmatch
test is performed for each matched patient. In this paper,
we ask: if we were allowed to perform two crossmatches per
patient, how could we best do so to maximize the number of
matched patients? Our main result is a polynomial time
algorithm for this problem that almost surely computes
optimal  up to lower order terms  solutions on random
large kidney exchange instances.
 Differentially
Private Data Analysis of Social Networks via Restricted
Sensitivity. With Jeremiah Blocki, Anupam
Datta, and Or Sheffet. ITCS 2013.
We introduce the notion of restricted sensitivity as an alternative to
global and smooth sensitivity to improve accuracy in
differentially private data analysis.
Restricted sensitivity is similar to global
sensitivity except instead of quantifying over all
possible datasets, we take advantage of any beliefs about
the dataset that a querier may have, to quantify over only a
restricted class of datasets. Specifically, given a query f
and a hypothesis H about the structure of a dataset D, we
show generically how to transform f into a new query f_H
whose global sensitivity (over all datasets including those
that do not satisfy H) matches the sensitivity of f only
over deviations that remain within H. Moreover, if the
belief of the querier is
correct (i.e., D is in H) then f_H(D) = f(D).
Thus, we maintain privacy whether or not D is in H and (when
restricted sensitivity is low) provide
accurate results in the event that H holds true.
We then demonstrate the usefulness of this notion by applying it to
the task of answering queries regarding socialnetworks, in
both edgeadjacency and vertexadjacency models.
 Learnability of DNF with
RepresentationSpecific Queries. With Liu
Yang and Jaime Carbonell. ITCS 2013.
We study the problem of PAC learning the class of DNF formulas with
the aid of pairwise queries that, given two positive
examples, return whether or not the examples satisfy at
least one term in common in the target formula. We
also consider numerical queries that return the number of
terms in common satisfied by the two examples. We provide
both positive and negative results for learning with such
queries under both uniform and general distributions. For
example, for boolean queries, we show that learning an
arbitrary DNF target under an arbitrary distribution is no
easier than in the traditional PAC model. On the other
hand, for numerical queries, we show we can learn arbitrary
DNF formulas under the uniform distribution, and in the
process, we give an algorithm for learning a sum of monotone
terms from labeled data only. We also present a number of
results for various DNF subclasses.
2012
 Active Property
Testing. With Nina Balcan, Eric Blais, and
Liu Yang. FOCS 2012. [arXiv (full version)]
In this work, we define, analyze,
and develop algorithms for the problem of property
testing in a framework motivated by active
learning.
In this framework (as in most machine learning applications), one cannot
obtain labels for arbitrary points in the input space;
instead, one can only
request labels from points in a given (polynomially) large unlabeled
sample taken from the underlying distribution.
We present both general results for this model as well as testers for
various important classes.
For example, we show
that testing unions of d intervals can be done with O(1) label
requests in this setting, a result that
also yields improvements in both the full
query and passive testing models as well.
For testing linear
separators in R^n over the Gaussian distribution, we show that both
active and passive testing can be done with O(sqrt(n)) queries,
substantially less than the Omega(n) needed for learning, with
nearmatching lower bounds. We also present a method
for building testable properties out of
others in this model, which we then use to provide testers for a number of
assumptions used in semisupervised learning.
Finally, we develop a general notion of
the testing dimension of a given property with respect to a
given distribution, that we show characterizes (up to constant
factors) the intrinsic number of label requests needed to test that
property. We then use these dimensions to prove a number of
lower bounds, including for linear separators and the class of
dictator functions. Our work
brings together tools from a range of areas including Ustatistics,
noisesensitivity, selfcorrection, and spectral analysis of random
matrices, and develops new tools that may be of independent interest.
 The JohnsonLindenstrauss
transform itself preserves differential
privacy. With Jeremiah Blocki, Anupam Datta,
and Or Sheffet (lead author). FOCS 2012.
We show that the JohnsonLindenstrauss transform
provides a novel way of preserving differential privacy. In
particular, if we take two databases, D and D', such that
(i) D'D is a rank1 matrix of bounded norm and (ii) all
singular values of D and D' are sufficiently large, then
multiplying either D or D' with a vector of iid normal
Gaussians yields two statistically close distributions in
the sense of differential privacy.
We apply the JohnsonLindenstrauss transform to the task of
approximating cutqueries: the number of edges crossing a
(S,VS)cut in a graph. We show that the JL transform allows
us to publish a sanitized graph that preserves edge
differential privacy (where two graphs are neighbors if they
differ on a single edge) while adding only O(S/epsilon)
random noise to any given query w.h.p. Comparing the
additive noise of our algorithm to existing algorithms for
answering cutqueries in a differentially private manner, we
outperform other methods on small cuts (S = o(n)).
We also apply our technique to the task of estimating the variance of
a given matrix in any given direction.
 Additive Approximation for
Nearperfect Phylogeny Construction. With Pranjal
Awasthi, Jamie Morgenstern, and Or Sheffet. APPROX 2012. [arXiv]
We study the problem of constructing phylogenetic trees for a given
set of species, formulated as that of finding a minimum Steiner tree
on n points over the Boolean hypercube of dimension d. It is known
that an optimal tree can be found in linear time if there is a
perfect phylogeny: i.e., the cost of the optimal phylogeny is
exactly d (deleting irrelevant coordinates). Moreover, if the data
is a nearperfect phylogenythe cost of the optimal tree is d+q for
small qit is known that an exact solution can be found in time
polynomial in n and d, but exponential in q [BDHRS06]. Here, we
give an algorithm running time time polynomial in n, d, and q that
finds a phylogenetic tree of cost d+O(q^2). We also discuss the
motivation and reasoning for studying such additive approximations.
 Distributed Learning, Communication
Complexity, and Privacy. With Nina
Balcan, Shai Fine, and Yishay Mansour. COLT 2012.
Suppose you have two databases: one with the positive examples and
another with the negative examples. How much communication between
them is needed to learn a good hypothesis? In this work we examine
this basic question and its generalizations, as well as related issues
such as privacy. Broadly, we consider a framework where data is
distributed among several locations, and our goal is to learn a
lowerror hypothesis over the joint distribution using as little
communication, and as few rounds of communication, as possible. Our
general results show that in addition to VCdimension and covering
number, quantities such as the teachingdimension and mistakebound of
a class play an important role in determining communication
requirements. Moreover, boosting can be performed in a generic manner
in the distributed setting to achieve communication with only
logarithmic dependence on 1/epsilon for any concept class. We also
present tight results for a number of common specific concept classes
including conjunctions, parity functions, and decision lists. For
linear separators, we show that for nonconcentrated distributions, we
can use a version of the Perceptron algorithm to learn with much less
communication than the number of updates given by the usual margin
bounds. We additionally present an analysis of privacy, considering
both differential privacy and a notion of distributional privacy
that is especially appealing in this context.
2011
 Welfare and Profit
Maximization with Production Costs. With
Anupam Gupta, Yishay Mansour, and Ankit Sharma. FOCS,
2011.
Combinatorial Auctions are a central problem in Algorithmic
Mechanism Design: pricing and allocating goods to buyers
with complex preferences in order to maximize social welfare or profit. The
problem has been wellstudied in the case of limited supply
(one copy of each item), and in the case of digital goods
(the seller can produce additional copies at no cost). Yet
in the case of resourcesoil, labor, computing cycles,
etc.neither of these abstractions is just right:
additional supplies of these resources can be found, but at
increasing difficulty (marginal cost) as resources are
depleted.
In this work, we initiate the study of combinatorial pricing under
increasing marginal cost. The goal is to sell these goods, using
posted prices, to buyers arriving online with unknown and arbitrary
combinatorial valuation functions to maximize either the social
welfare, or the seller's profit. We give algorithms that achieve
constant factor approximations for a class of natural cost
functionslinear, lowdegree polynomial, logarithmicand that
give logarithmic approximations for more general increasing marginal
cost functions (along with a necessary additive loss). We show that
these bounds are essentially best possible for these settings.
 Centerbased Clustering under Perturbation Stability. With Pranjal Awasthi and Or
Sheffet. Information Processing Letters, 112(12):4954, Jan
2012. doi:10.1016/j.ipl.2011.10.002
In this paper we give algorithms for kmedian, kmeans, and other
centerbased clustering objectives, for instances that are stable to
small constantfactor perturbations of the input. This notion of stability was
studied by Bilu and Linial
[BL10] in the context of the maxcut problem, where they showed that one
could optimally solve maxcut instances stable to
perturbations of size sqrt(n). In
this work we show that stability to factor3
perturbations is sufficient to find optimal solutions for any
centerbased clustering objective (such as kmedian,
kmeans, and kcenter) in the case of finite metrics without
Steiner points, and that
stability to factor 2 + sqrt(3) perturbations is sufficient
for the case of general metrics. Specifically, we show that
for such instances, the popular
SingleLinkage algorithm combined with dynamic programming will find
the optimal clustering.
2010
 A Discriminative
Model for SemiSupervised Learning. With Nina
Balcan. JACM Vol 57, Issue 3, 2010. This
is an expanded and more indepth version of our COLT'05 paper "A
PACstyle Model for Learning from Labeled and Unlabeled Data". See
details below.
 Trading off Mistakes and Don'tKnow
Predictions. With Amin Sayedi
and Morteza Zadimoghaddam. NIPS 2010.
We consider an online learning framework in which the agent is allowed
to say ``I don't know'' and analyze the achievable
tradeoffs between saying ``I don't know'' and making
mistakes. If mistakes have the same cost as don'tknows, the
model reduces to the standard mistakebound model, and if
mistakes have infinite cost, the model reduces to KWIK framework
introduced by Li, Littman, and Walsh. We propose a general, though
inefficient, algorithm for general finite concept classes that
minimizes the number of don'tknow predictions subject to a given
bound on the number of allowed mistakes. We then present specific
polynomialtime algorithms for the concept classes of monotone
disjunctions and linear separators with a margin.
 Stability yields a PTAS for
kMedian and kMeans
Clustering. With Pranjal Awasthi and Or
Sheffet. FOCS 2010. [longer version]
Ostrovsky et al. [ORSS06] show that given n points in
Euclidean space such that the optimal (k1)means
clustering is a factor 1/epsilon^2 more expensive than the
best kmeans clustering, one can get a (1+f(epsilon))approximation
to kmeans in time poly(n,k) by using a variant of Lloyd's algorithm.
In this work we show we can replace the "1/epsilon^2" with
just "1+alpha" for any constant alpha>0 and obtain a PTAS.
In particular, under this assumption, for any epsilon>0 we can achieve a
1+epsilon approximation for kmeans in time polynomial
in n and k, and exponential in 1/epsilon and 1/alpha (our running time is
n^O(1) * (k log n)^poly(1/epsilon,1/alpha).).
We thus decouple the strength of the assumption from the quality of
the approximation ratio. We give a PTAS for kmedian
in finite metrics under the
analogous assumption.
We also show we can obtain a PTAS under the assumption
of BalcanBlumGupta09 (see below) that all 1+alpha
approximations are deltaclose to a desired target clustering, in the
case that all target clusters have size greater than 2delta
n and alpha is
constant. Note that the point of BBG09 is that the true
goal in clustering is usually to get close to the target
rather than to achieve a good objective value. From this
perspective, our advance is that
for kmeans in Euclidean spaces we reduce the
distance of the clustering found to the target from O(delta)
to delta when all target clusters are large, and for
kmedian we improve the "largeness" condition in BBG09 needed to get
exactly deltaclose from O(delta*n) to delta*n.
Our results are based on a new notion of clustering stability.
 On NashEquilibria of
ApproximationStable Games. With Pranjal
Awasthi, Nina Balcan, Or Sheffet and Santosh
Vempala. SAGT 2010. Journal version in Current Science, Vol 103, Issue
9, November 10, 2012.
In this paper, we define the notion of games that are
approximation stable, meaning that all epsilonequilibria are
contained inside a small ball of radius Delta
around a true equilibrium, which is a natural condition if you want play to be
predictable even if players are only at approximate
equilibrium.
Many natural small games such as matching pennies and
rockpaperscissors are indeed approximation stable. We show
both upper and lower bounds on size of supports of approximate
equilibria in such games, yielding more efficient algorithms
for computing approximate equilibria as Delta gets close to epsilon.
We also consider an inverse condition, namely that all
nonapproximate equilibria are far from some true equilibrium,
and give an efficient algorithm for games satisfying that condition.
 Improved Guarantees for Agnostic Learning of Disjunctions. With Pranjal Awasthi and Or
Sheffet. COLT 2010.
Given some arbitrary distribution D over {0,1}^n and arbitrary
target function c, the goal in agnostic learning of
disjunctions is to achieve an error rate comparable to the error
OPT of the best disjunction with respect to (D,c).
In recent work, [Peleg07] shows how to
achieve a bound of O(sqrt(n)*OPT) + epsilon in
polynomial time. In this paper we improve on Peleg's bound, giving a
polynomialtime algorithm achieving a bound of O(n^{1/3 + alpha}*OPT) +
epsilon, for any constant alpha>0. The heart of
the algorithm is a method for weaklearning when OPT =
O(1/n^{1/3+alpha}), which can then be fed into existing agnostic
boosting procedures to achieve the desired guarantee.
 Circumventing the Price of Anarchy:
Leading Dynamics to Good behavior. With Nina
Balcan and Yishay Mansour. ICS 2010. Journal version
combining this work and "Improved equilibria via public service
advertising" (SODA 2009) appears in SIAM
J. Computing, 42(1), 230264, 2013.
We explore the problem of how
selfinterested agents with some
knowledge of the game might be able
to quickly find their
way to states of quality close to the best equilibrium
in games with high price of
anarchy but low price of stability. We
consider two natural learning models in which players
adaptively decide between greedy behavior and following a proposed good but
untrusted strategy and analyze two important classes of games in this
context, fair costsharing and consensus games. These games both have
very high Price of Anarchy and yet we show that behavior in these
models can efficiently reach lowcost states.
2009
 Thoughts on Clustering
. Essay for the 2009 NIPS Workshop "Clustering:
Science or Art?"
 Tracking Dynamic Sources of Malicious Activity at
InternetScale. With Shobha Venkataraman, Dawn Song,
Subhabrata Sen, and Oliver Spatscheck. NIPS 2009.
We consider the problem of discovering dynamic malicious regions on
the Internet. We model this problem as one of adaptively pruning a
known decision tree (in particular, the IP addressspace tree), but
with additional challenges: (1) severe space requirements, since the
underlying decision tree has over 4 billion leaves, and (2) a changing
target function, since malicious activity on the Internet is
dynamic. We present a novel algorithm that addresses this problem, by
combining "experts" and online paging algorithms. We prove guarantees
on our algorithm's performance as a function of the best possible
pruning of a similar size, and our experiments show that our
algorithm achieves high accuracy on large realworld data sets, improving
over existing approaches.
 The Price of
Uncertainty. With Nina Balcan and Yishay
Mansour. ACMEC 2009. [slides]
We study the degree to which small fluctuations in
costs in wellstudied potential games can impact the
result of natural bestresponse and improvedresponse dynamics. We
consider a wide
variety of potential games including fair costsharing games,
setcover games, routing games, and jobscheduling games.
We show that in certain
cases, even extremely small fluctuations can cause these dynamics to
spin out of control and move to states of much higher social cost,
whereas in other cases these dynamics are much more stable even to
large degrees of fluctuation.
We also consider the resilience of these dynamics to a small number of
Byzantine players about which no assumptions are made. We show that
in certain cases (e.g., fair costsharing, setcovering,
jobscheduling) even a single Byzantine player can cause bestresponse
dynamics to transition to states of substantially higher cost, whereas
in others (e.g., the class of betanice games which includes
routing, marketsharing and many others) these dynamics are much more
resilient.
 Approximate Clustering
without the Approximation. With Nina Balcan and Anupam
Gupta. SODA 2009.
Journal version: Clustering Under
Approximation Stability, JACM, Volume 60, Issue 2,
April 2013. [unofficial
local copy]
For most clustering problems, our true goal is to classify the points
correctly, and commonly studied objectives such as kmedian, kmeans,
and minsum are really only a proxy. That is, there is some unknown
correct clustering (grouping proteins by their function or grouping
images by who is in them) and the implicit hope is that
approximately optimizing these objectives will in fact
produce a clustering that is close pointwise to the
correct answer.
In this paper, we show that if we make this implicit assumption
explicitthat is, if we assume that any capproximation to the
given clustering objective F is epsilonclose to the
targetthen we can produce clusterings that are O(epsilon)close
to the target, even for values c for which obtaining a
capproximation is NPhard. In particular, for kmedian and
kmeans objectives, we show that we can achieve this guarantee for
any constant c > 1, and for minsum objective we can do this for any
constant c > 2.
Our results also highlight a difference
between assuming that the optimal solution to, say, the
kmedian objective is epsilonclose to the target, and assuming
that any approximately optimal solution is epsilonclose to
the target. In the former
case, the problem of finding a solution that is O(epsilon)close to
the target remains computationally hard, and yet for the latter we
have an efficient algorithm.
 Improved Equilibria via Public
Service Advertising. With Nina Balcan and Yishay
Mansour. SODA 2009.
Many natural games have both good and bad Nash equilibria. In
such cases, one could hope to improve poor behavior
by a "public service advertising
campaign" encouraging players to follow a good equilibrium, and
if every player follows the advice then we are done. However, it is a
bit much to assume that everyone will follow along. In this
paper we consider the question of to what extent can such an
advertising campaign cause behavior to switch from a bad equilibrium
to a good one even if only a fraction of people actually follow the
given advice, and do so only temporarily. Unlike in the ``value of
altruism'' model, we assume everyone will ultimately act in their own
interest.
We analyze this question for several important and widely studied
classes of games including network design with fair cost sharing,
scheduling with unrelated machines, and party affiliation games (which
include consensus and cut games). We show that for some of these
games (such as fair cost sharing), a random alpha fraction of the
population following the given advice is sufficient to get a guarantee
within an O(1/alpha) factor of the price of stability for any alpha >
0. However, for some games (such as party affiliation games), there
is a strict threshold (in this case, alpha < 1/2 yields almost no
benefit, yet alpha > 1/2 is enough to reach nearoptimal behavior),
and for some games, such as scheduling, no value alpha < 1 is
sufficient.
2008
 Clustering
with Interactive Feedback. With Nina Balcan. ALT 2008.
We initiate a theoretical study of the
problem of clustering data under interactive feedback. We introduce a
querybased model in which users can provide feedback to a clustering
algorithm in a natural way via split and merge requests.
We then analyze the ``clusterability'' of different concept classes in
this framework  the ability to cluster correctly with a bounded
number of requests under only the assumption that each cluster can be
described by a concept in the class  and provide efficient
algorithms as well as informationtheoretic upper and lower bounds.
 Improved Guarantees for
Learning via Similarity Functions. With Nina
Balcan and Nati Srebro. COLT 2008.
We provide a new broader notion of a "good similarity function" that
improves in two important ways upon the notion in [BB06].
First, as before, any largemargin kernel
is also a good similarity function
in our sense, but now with a much milder degradation of the parameters.
Second, we can show that for
distributionspecific PAC learning, the new notion is strictly more
powerful that the traditional notion of a largemargin kernel:
although any concept class that can be learned with some kernel
function can also be learned using our new similarity based approach,
the reverse is not true. (In contrast, the [BB06] definition is no more
powerful than kernels for distributionspecific learning.)
Our new notion of similarity relies upon L_1 regularized learning,
and our separation result is related to a separation result between
what is learnable with L_1 vs. L_2 regularization.
 Item Pricing for Revenue
Maximization. With Nina Balcan and Yishay Mansour.
ACMEC 2008.
This paper considers the problem of pricing items to maximize revenue
from buyers with unknown complex preferences over
bundles, and presents two main results. (1) for the case of
unlimited supply, a random single price achieves a
logarithmic approximation for buyers with general
valuation functions (not just singleminded or unitdemand
as was previously known). (2) for the case of limited
supply, a random single price (with buyers arriving in an arbitrary order)
achieves an exp(sqrt(log(n)loglog(n))) approximation,
with a nearmatching lower bound. Also includes results for
multiunit auctions and "simple submodular" valuations. An
earlier tech report
with just the first result appears here.
 A Discriminative Framework
for Clustering via Similarity Functions. With Nina Balcan
and Santosh Vempala. STOC 2008. [full version (2009)]
Theoretical treatments of clustering
from pairwise similarity information typically view
the similarity information as groundtruth and then design algorithms
to (approximately) optimize various graphbased objective functions.
However, in most applications, this similarity information is merely
based on some heuristic: the true goal is to cluster the points
correctly rather than to optimize any specific graph property. In
this work, we develop a theoretical framework for
clustering from this perspective. In particular,
motivated by work in learning theory that asks ``what natural
properties of a similarity (or kernel) function are sufficient to be
able to learn
well?'' we ask ``what natural properties of a similarity function are
sufficient to be able to cluster well?'' Our approach can be
viewed as developing a PAC model for clustering, where the natural
object of study, rather than being a concept class, is more like a
class of (concept, distribution) pairs.
 Regret Minimization and the
Price of Total Anarchy. With MohammadTaghi Hajiaghayi,
Katrina Ligett, and Aaron Roth. STOC 2008.
This paper proposes weakening the assumption made
when studying the price of anarchy: Rather than assume that
selfinterested players will play according to a Nash equilibrium, we
assume only that selfish players play so as to minimize their own
regret. Regret minimization can be done via simple, efficient
algorithms even in many settings where the number of action choices
for each player is exponential in the natural parameters of the
problem. We prove that despite our weakened assumptions, in several
broad classes of games, this ``price of total anarchy'' matches the
Nash price of anarchy, even though play may never converge to Nash
equilibrium. We also show that the price of total anarchy is in many
cases resilient to the presence of Byzantine players, about whom we
make no assumptions.
 A Learning Theory Approach to
NonInteractive Database Privacy. With
Katrina Ligett and Aaron Roth. STOC 2008. Journal version: JACM, Volume 60, Issue 2,
April 2013. [unofficial
local copy]
We demonstrate that, ignoring computational constraints, it is
possible to release databases preserving differential privacy that
are useful for all queries over a discretized domain from
any given concept class with polynomial VCdimension. We
also present an efficient algorithm for "large margin
halfspace" queries. In
addition, inspired by learning
theory, we introduce a new notion of data privacy, which we call
distributional privacy, and show that it is strictly stronger
than differential privacy.
 Veritas: Combining expert opinions without
labeled data.. With Sharath Cholleti (lead
author), Sally Goldman, David Politte, and Steven Don.
International Journal on Artificial Intelligence Tools, 2009:
633651. (originally appeared in ICTAI, 2008).
Looks at a boostingbased method for combining expert opinions when
only unlabeled data is present, motivated by the problem of
segmenting lung nodules in CT scans.
 Limits of Learningbased Signature Generation
with Adversaries. With Shobha Venkataraman
(lead author) and Dawn Song. Network and Distributed Systems
Security Symposium (NDSS) 2008.
We give limits on the accuracy of patternextraction algorithms
for signature generation in an adversarial setting, by
adapting and extending lower bounds for online learning in
the mistakebound model, when there are limits on the number
of allowed mistakes of different types.
2007
 Mechanism Design, Machine Learning,
and Pricing Problems. With Nina Balcan. SIGecom Exchanges 2007, special issue on Combinatorial Auctions.
Short survey article on machine learning techniques for mechanism design.
 A
Theory of LossLeaders: Making Money by Pricing Below
Cost.
With Nina Balcan, TH. Hubert Chan and
MohammadTaghi Hajiaghayi. WINE 2007.
 Separating Populations with Wide
Data: a Spectral Analysis. With Amin
CojaOghlan, Alan Frieze, and Shuheng Zhou. 18th
International Symposium on Algorithms and Computation (ISAAC 2007).
LNCS 4835, pp. 439451.
We consider the problem of partitioning a small data
sample drawn from a mixture of k product distributions. We are
interested in the case that individual features are of low average
quality gamma, and we want to use as few of them as
possible to correctly partition the sample. We analyze a spectral
technique that is able to approximately optimize the total data
sizethe product of number of data points n and the number of
features Kneeded to correctly perform this partitioning as a
function of 1/gamma for K>n. Our goal is motivated by an
application in clustering individuals according to their population of
origin using markers, when the divergence between any two of
the populations is small.
 Clearing Algorithms for Barter
Exchange Markets: Enabling Nationwide Kidney
Exchanges.
With David Abraham and Tuomas Sandholm.
ACMEC 2007.
Shows how MIP techniques can be used to get a
more scalable and robust algorithm for solving
optimization problems involved in clearing paireddonation
kidney exchanges. Algorithm is currently in use by the Alliance for Paired Donation.
 Learning, Regret Minimization, and
Equilibria [ps].
With Yishay Mansour. Book chapter in Algorithmic Game
Theory, Noam Nisan, Tim Roughgarden, Eva
Tardos, and Vijay Vazirani, eds. [Slides for related talk]
Book chapter describing connections between online learning and game
theory. Includes description of algorithms for combining
expert advice (minimizing external regret), algorithms for
the stronger goal of minimizing internal regret, algorithms
for the limitedfeedback (multiarm bandit) setting, and
connections between these and minimax optimality (for
zerosum games) and correlated equilibria (for generalsum
games). Also discusses how such algorithms will approach
Nash equilibrium in nonatomic routing games.

FiG: Automatic Fingerprint Generation.
With Juan Caballero, Shobha Venkataraman, Pongsin Poosankam, Min Gyung
Kang, and Dawn Song. In NDSS 2007.
 Open Problems in Efficient
SemiSupervised PAC Learning. With Nina Balcan.
COLT'07 Open Problems List.
Open problems about computationallyefficient semisupervised learning
that I would love to see solved (small monetary rewards
offered).
2006
 On a Theory of Learning with
Similarity Functions. With Nina Balcan. International Conference on Machine Learning (ICML), pp. 7380, 2006.
Journal version
combines this conference paper with subsequent paper
of Nati Srebro from COLT 2007: Machine Learning Journal
72(12):89112, August, 2008. DOI 10.1007/s1099400850595.
[NIPS'05 workshop talk]
[Cornell'07 colloquium talk (broader)]
Kernel functions have become an extremely popular tool in machine
learning. They have an attractive theory that describes a kernel function
as being good for a given learning problem if data is separable by a
large margin in a (possibly very highdimensional) implicit space
defined by the kernel. This theory, however, has a bit of a
disconnect with the intuition of a good kernel as a good
similarity function. In this work we develop an alternative
theory of learning with similarity functions more generally (i.e., sufficient conditions for a
similarity function to allow one to learn well) that does not require
reference to implicit spaces, and does not require the function to
be positive semidefinite. Our results also
generalize the standard theory in the sense that any good kernel
function under the usual definition can be shown to also be a good
similarity function under our definition (though with some loss in the
parameters). In this way, we provide the first steps towards a theory
of kernels that describes the effectiveness of a given kernel function
in terms of natural similaritybased properties.
 Routing without Regret: On
Convergence to Nash Equilibria of RegretMinimizing Algorithms in Routing
Games. With Eyal EvenDar
and Katrina Ligett. PODC, pp. 4552, 2006.
[Slides for related talk]
A number of noregret algorithms have been
developed in the gametheory and online learning literature. This
paper considers the question: if each player in a routing game uses a
noregret strategy to choose their route on day t+1 based on
their experience on days 1,...,t, what can we say about the
overall behavior of the system? The main result of this paper is that
in the RoughgardenTardos setting of multicommodity flow and
infinitesimal agents, if each player uses a noregret strategy then a
1epsilon fraction of the daily flows will be epsilonNash (almost all
users having only a small incentive to deviate) where epsilon
approaches 0 at a rate that depends polynomially on the players'
regret bounds and the maximum slope of any latency function.
 Approximation Algorithms and
Online Mechanisms for Item Pricing.
With Nina Balcan. Theory of Computing, 3/9:179195, 2007. Originally appeared in
ACM Conference on Electronic Commerce, pp. 2935, 2006.
[Slides for talk given at Spencer0660]
Presents approximation and online algorithms for a number of problems of
pricing items so as to maximize a seller's revenue in an
unlimited supply setting. Our main result is an O(k)approximation
for pricing items to singleminded bidders who each want at most k
items. For the case k=2 (where we get a 4approximation) this can
be viewed as the following graph vertex pricing problem: given a
graph G with valuations v_e on the edges, find prices p_i for
the vertices to maximize sum_{e=(i,j): v_e >= p_i+p_j} p_i + p_j. We
also show how these algorithms can be applied to the online
setting, in which customers arrive over time and must be
presented with prices that depend only on information gained
from customers seen in the past.
 A RandomSurfer WebGraph Model
With Hubert Chan and Mugizi Rwebangira. ANALCO '06.
This paper gives theoretical and experimental results on a
randomsurfer model for construction of a random graph. In this
model, a new node connects to the existing graph by choosing a
start node at random and then performing a short random walk, flipping a
coin at each node visited to decide whether or not to stop and
connect there. Our understanding of this model is still
quite preliminary, though. Many open questions.
 Random Projection, Margins, Kernels, and FeatureSelection [ps].
LNCS 3940, pp. 5268, 2006. Survey article based on an
invited talk given at the 2005 PASCAL Workshop on Subspace, Latent
Structure and Feature selection techniques.
Random projection is a simple technique that has had a number of
applications in algorithm design. In the context of machine learning,
it can provide insight into questions such as ``why is a learning
problem easier if data is separable by a large margin?'' and ``in what
sense is choosing a kernel much like choosing a set of features?''
This article is intended to provide an introduction to random
projection and to survey some simple learning algorithms and other
applications to learning based on it. Portions of this article are
based on work in [BB05,BBV04] joint with Nina Balcan and Santosh
Vempala.
2005
 Reducing Mechanism Design to
Algorithm Design via Machine Learning
[short version,
long version].
With Nina Balcan, Jason Hartline, and Yishay Mansour. JCSS
74:12451270, 2008 (JCSS special issue on Learning Theory).
Originally appeared as "Mechanism Design via Machine Learning", FOCS 2005.
[slides]
Examines how samplecomplexity arguments in machine learning can be
used to reduce problems of incentivecompatible
mechanism design to standard algorithmic questions, for a wide
class of revenuemaximizing pricing problems.
 From External to Internal
Regret [local ps]
[local pdf].
With Yishay Mansour. JMLR 8(Jun):13071324, 2007.
Originally appeared in COLT 2005.
Gives a generic method for converting externalregret algorithms
to internalregret algorithms, along with a specific
algorithm for the bandit setting. Also gives a new simple
method for a generalized "sleeping experts"
setting. If you are interested in this paper you should
definitely also check out Stoltz and
Lugosi, "Internal Regret in OnLine Portfolio Selection", MLJ
59 (1/2), 2005. Their paper gives a different conversion
procedure and has a number of other results. See also Gilles Stoltz's PhD thesis.
 A Discriminative Model for
SemiSupervised Learning. With Nina
Balcan. JACM Vol 57, Issue 3, 2010.
(Original COLT '05 paper
titled "A PACstyle Model
for Learning from Labeled and Unlabeled Data")
This paper gives an extension to the PAC model that
allows one to discuss ways of using
unlabeled data to help with learning. The basic idea is
that rather than "learning a class C", one instead talks of
"learning a class C under compatibility notion χ", where
χ(h,D) tells how apriori compatible a proposed
hypothesis h is with respect to a given distribution D.
E.g., if you believe there should be a largemargin
separator then your χ would give a low score to h's with
large probability mass near the separating hyperplane. Or
in cotraining, χ would penalize hypotheses (h_1,h_2)
such that Pr_x(h_1(x) ≠ h_2(x)) is large. If χ is
"legal" (in a sense defined in the paper) then we can use this model
to give samplecomplexity bounds for both labeled and
unlabeled data, and talk about conditions under which
unlabeled data can significantly reduce the number of
labeled examples needed. We also talk about welljustified ways
of performing regularization in this setting and give a number
of algorithmic results as well.
 Practical Privacy: The SuLQ Framework.
With Cynthia Dwork, Frank McSherry, and Kobbi Nissim. PODS 2005.

New
Streaming Algorithms for Fast Detection of Superspreaders.
With Shobha Venkataraman, Dawn Song, and Phillip Gibbons. NDSS
2005.
Experimental and theoretical work on streaming algorithms (one pass,
logarithmic memory) for detecting sources that send to many
distinct destinations. That is, given a sequence of (x,y)
pairs, you want to identify those x's that have appeared paired with
many different y's.
 NearOptimal Online
Auctions. With Jason Hartline. SODA 2005,
pages 11561163.
Uses an approach based on an online learning algorithm of Kalai to get
improved bounds for the problem of adaptive pricing of a
digital good. We consider both the online auction and
posted price settings.
2004
 CoTraining and Expansion: Towards
Bridging Theory and Practice. With Nina
Balcan and Ke Yang. NIPS 2004.
This paper looks at conditions needed for cotraining to succeed in terms of
expansion properties of the underlying distribution. Proves
bounds for the case that we have base learning
algorithms able to make only onesided error (i.e., learn
from positive data only). Expansion is a much weaker condition than
those considered previously, such as independence given the
label, and appears to be the "right" condition on the
distribution needed in order for cotraining to work well
when the base algorithms have only 1sided error.
 Kernels as Features: On Kernels, Margins, and Lowdimensional Mappings
[local copy]. With Nina Balcan
and Santosh Vempala. Machine
Learning, 65:7994, 2006, DOI: 10.1007/s1099400675501. Extended abstract in 15th
International Conference on Algorithmic Learning Theory (ALT
'04). Springer LNAI
3244, pp. 194205, 2004.
[talk ppt] Here is a related
survey article.
Kernel functions are typically viewed as implicit mappings to a
highdimensional space that allow one to "magically" get the
power of that space without having to pay for it, if data is
separable in that space by a large margin.
In this paper we show that in the presence of a large margin, a
kernel can instead be efficiently converted into a
mapping to a low dimensional space. In particular,
we give an efficient procedure that, given blackbox access to
the kernel and unlabeled data, generates a small number of
features that approximately preserve both separability and
margin.
 Detection of Interactive
Stepping Stones: Algorithms and Confidence Bounds.
With Dawn Song and Shobha Venkataraman. 7th International Symposium on
Recent Advances in Intrusion Detection (RAID '04). Springer
LNCS 3224,
pp. 258277, 2004.
Use analysis of random walks to detect steppingstone attacks under
the "maximum delay bound" assumption. Gives learningstyle
bounds on number of packets that need to be observed to
perform detection at a desired confidence level.
 Online Geometric Optimization
in the Bandit Setting Against an Adaptive Adversary.
With Brendan McMahan. COLT '04, pages 109123.
We show how the recent, elegant result of Kalai and Vempala for online
geometric optimization can be extended to the "bandit"
version of the problem, in which one is only told of the cost
incurred and not the full cost vector, even in the
repeatedgame setting (an adaptive adversary).
 SemiSupervised Learning Using
Randomized Mincuts. With John Lafferty,
Mugizi Robert Rwebangira, and Rajashekar Reddy. ICML '04.
We consider a randomized version of the mincut approach to learning
from labeled and unlabeled data (see paper with Shuchi
Chawla from 2001), and motivate it from both a
samplecomplexity perspective and from the goal of
approximating Markov Random Field pernode probabilities.
 Approximation Algorithms for
DeadlineTSP and Vehicle Routing with
TimeWindows. With Nikhil Bansal, Shuchi
Chawla, and Adam Meyerson. STOC '04, pages 166174.
Consider a version of the metric TSP problem in which
each node has a value and the goal is to collect as much
value as possible, *but* each node also has a deadline and only
counts if it is reached by its deadline. (More generally,
nodes might have release dates too.) We give an O(log n)
apx for deadlines, O(min[log^2 n, log D_max]) for timewindows,
and a bicriteria approximation with the
interesting property that it can achieve an
O(k)approximation while violating the deadlines by only a
(1 + 1/2^k) factor. Big open question: can you get a
constant factor apx?
2003
 Approximation Algorithms for Orienteering and
DiscountedReward TSP [local copy]. With Shuchi Chawla, David Karger,
Adam Meyerson, Maria Minkoff, and Terran Lane. SIAM
J. Computing 37(2):653670, 2007. An earlier version appears in
FOCS'03, pages 4655. Also available as Tech report CMUCS03121.
We give a constantfactor approximation algorithm for the rooted
Orienteering problem on general graphs, and for a new problem that we
call the DiscountedReward TSP. Given a weighted graph with rewards
on nodes, and a start node s, the goal in the Orienteering
Problem is to find a path that maximizes the reward collected,
subject to a hard limit on the total length of the path. In the
DiscountedReward TSP, instead of a length limit we are given
a discount factor gamma, and the goal is to maximize total discounted
reward collected, where reward for a node reached at time t is
discounted by gamma^t. This is similar to the objective in MDPs
except we only receive a reward the first time a node is
visited.
 Scheduling for FlowTime with
Admission Control. With Nikhil Bansal,
Shuchi Chawla, and Kedar Dhamdhere. ESA '03, pages 4354.
Considers the problem of job scheduling to minimize flow time, when the server is allowed to reject jobs at some penalty.
This can be thought of the problem of managing your todo list, when
your cost function is the total amount of time that jobs are
sitting on your stack plus a cost for each job that you say
"no" to. E.g., if
you initially agree to a task (like refereeing a paper) and
then six months later you realize you cannot do it and say
no, you pay both for the "no" and for the six months it has
been sitting on your desk.
We give 2competitive online algorithms for the case of unweighted
flow time and uniform costs, and
extend some of our results to the case of weighted flow time and
machines with varying speeds. We also give a resource augmentation
result for the case of arbitrary penalties and present
a number of lower bounds.
 Preference Elicitation
and Query Learning. With Jeffrey Jackson,
Tuomas Sandholm, and Martin Zinkevich. Journal of Machine
Learning Research 5:649667, 2004 (special issue on
Learning Theory). Extended abstract appears in COLT '03.
Explores the connection between "preference elicitation", a problem
that arises in combinatorial auctions, and the problem of
learning via queries. Preference elicitation can be thought of as a
kind of learning problem with multiple target concepts, but where
the goal is not to identify the concepts so much as it is
to produce an "optimal example".
 PACMDL Bounds.
With John Langford. COLT '03.
Attempts to unify a number of bounds (including VCdimension and
PACBayes) in a single MDL framework. In this setting, Alice has a
set of labeled examples, Bob has the same examples but without labels,
and Alice's job is to communicate the labels to Bob using only a small
number of bits. The standard Occam's razor results say that if Alice
can do this by sending a hypothesis (a function h(x) over
single examples that Bob would then run m times) then she can be confident
in the predictive ability of that hypothesis. But what about other
methods of communicating labels? Extending Occam's razor to generic
communication schemes requires a bit of "definition design" but can
then encompass the more powerful VCdimension and PACBayes bounds.
 Planning in the Presence of Cost
Functions Controlled by an Adversary. With
Brendan McMahan and Geoff Gordon. ICML '03.
Looks at fast algorithms for finding "minimax optimal plans" in a
certain adversarial MDP setting. Includes some experimental
results on a robot navigation domain.
 Open problem: Learning
a function of r relevant variables. COLT 2003
open problems session.
 On Statistical Query Sampling
and NMR Quantum Computing. With Ke
Yang. 18th IEEE Conference on Computational Complexity (CCC '03).
We introduce a problem called Statistical Query Sampling that models
an issue that arises in NMR quantum computing. We give a
number of lower bounds for this problem, and relate it to
the (more standard) problem of Statistical Query Learning.
 On PolynomialTime
Preference Elicitation with Value Queries.
With Martin Zinkevich and Tuomas Sandholm. ACM Conference on
Electronic Commerce, 2003.
We consider the question of whether interesting classes of preferences
can be elicited in polynomial time using value queries. Building on
known results on Membership Query learning, we show that readonce
formulas over a set of gates motivated from a shoppingagent scenario
can be elicited in polynomial time, as well as a class of preferences
we call "toolbox DNF". We also give a number of (positive and
negative) results for the subsequent allocation problem. For
instance, we show how network flow can be used to do allocation
efficiently with two bidders with toolboxDNF preferences.
 Combining Online Algorithms
for Rejection and Acceptance. With Yossi
Azar and Yishay Mansour. SPAA '03, pages 159163. Combined
with subsequent paper by David Bunde and Yishay Mansour in journal version appearing in Theory of Computing, 1:105117, 2005.
The callcontrol problem has the interesting property that in some
versions one can design online algorithms with good competitive ratio
in terms of fraction of calls accepted, in some versions one can
design algorithms with good C.R. in terms of the fraction rejected,
and in some versions one can do both, but in the last case, the
algorithms tend to be very different. We consider the problem: given
an algorithm A with competitive ratio c_A for fraction accepted, and
an algorithm R with ratio c_R in terms of fraction rejected, can we
combine them into a single algorithm that is good under both measures?
We do this achieving ratio O(c_A^2) [improved in
journal version to O(c_A)] for acceptance and O(c_A c_R) for
rejection.
 Online Oblivious
Routing. With Nikhil Bansal, Shuchi Chawla,
and Adam Meyerson. 15th ACM Symposium on Parallel Algorithms
and Architectures (SPAA '03), pages 4449.
Uses online learning tools to develop a polynomialtime algorithm for
performing nearly as well as
the best fixed routing in hindsight, in a repeated "oblivious routing
game". In this setting the algorithm is allowed to choose a new
routing each night, but is still oblivious to the demands that will
occur the next day. Our result is a strengthening of a recent
result of Azar et al., who gave a polynomial time algorithm to find
the minimax optimal strategy in this game. It is a strengthening in
that it achieves a competitive ratio arbitrarily to close to that of
Azar et al., while at the same time performing nearly as well as the
optimal static routing for the sequence of demands that actually
occurred.
 Online Learning in Online
Auctions. With Vijay Kumar, Atri Rudra, and Felix Wu.
SODA '03, pages 202204. The link points to a somewhat longer version.
Describes how the WeightedMajority algorithm can be used to get
improved bounds for online auctions of digital goods, as well as for
the posted price setting (that corresponds to a "bandit"
version of the problem: the auctioneer has to pick a price
first, and then only gets the single bit back indicating
whether or not the buyer purchased). Also gives some lower
bounds.
2002

Correlation
Clustering [local copy].
With Nikhil Bansal and Shuchi Chawla.
Machine Learning 56(13):89113, 2004 (Special
Issue on Theoretical Advances in Data Clustering).
An earlier version appears in FOCS
'02, pages 238247.
Considers a clustering problem motivated by machinelearning style
applications, from the perspective of approximation algorithms. We
give a constantfactor approximation under a cost measure and a PTAS
under a benefit measure. A nice feature of this clustering
formulation is that one does not need to specify the desired number of
clusters in advance.
 Smoothed Analysis of the
Perceptron Algorithm for Linear Programming.
With John Dunagan. SODA
'02, pages 905914.
This paper shows that the simple Perceptron
algorithm has good behavior for linear programming (polynomialtime
whp) in the smoothed analysis model of SpielmanTeng. SpielmanTeng
had shown this for a specific version of the Simplex algorithm. It is
interesting that the bounds for the Perceptron algorithm are better
than those known for Simplex in this model, as a function of most of
the parameters. The one exception is the "epsilon" term (e.g., if you
are interested in a bound that holds on 99% of the instances, then the
Perceptron bounds are better, but Perceptron has an epsilon chance of
taking time Omega(1/epsilon^2), so does badly in expectation).
However, I think the real difference is that Perceptron solves
only the feasibility problem, and not the optimization problem.
Normally, these are equivalent by simple reduction, but it is not
clear that reduction makes sense in the smoothedanalysis model,
because it involves a binarysearch that will surely create
illconditioned instances. So, it could well be that feasibility is a
strictly easier problem than optimization in this model.
 Online Algorithms for Market
Clearing. With Tuomas Sandholm and Martin
Zinkevich. JACM 53(5): 845879, 2006. Extended
abstract in SODA '02, pages 971980.
We consider the problem of market clearing in a double
auction (exchange) where buyers and sellers arrive and depart online.
We give algorithms with optimal competitive ratios for several natural
objectives and also give a few results having to do with learning and
incentivecompatibility.
 Static Optimality and Dynamic
SearchOptimality in Lists and Trees. With
Shuchi Chawla and Adam Kalai. Algorithmica 36(3):249260, 2003
(special issue on online algorithms).
Originally appeared in Proceedings of the
13th Annual Symposium on Discrete Algorithms (SODA),
pages 18, 2002.
This paper uses notions from online learning to attack several
problems in adaptive datastructures.
2001
 Admission Control to Minimize
Rejections. With Adam Kalai and Jon
Kleinberg. Internet Mathematics 1(2):165176, 2004.
Originally appeared in Proceedings of WADS'01 (LNCS 2125,
pp.155164, 2001).
Studies admission control from the perspective of approximately
minimizing rejections, getting a factor of 2 for a collection of
natural problems. This can make more sense than the usual
perspective (apx maximizing the number of acceptances) if we are not
highly overloaded (e.g., if optimal can accept 99% of requests).
 Learning from Labeled and Unlabeled Data
using Graph Mincuts.
With Shuchi Chawla. ICML '01, pp. 1926.
A natural extension of nearestneighbor algorithms, when you add in
unlabeled data, leads to viewing learning as a mincut problem. This
paper explores this connection and gives some empirical results.
2000
 FeatureBoost: A Meta Learning Algorithm
that Improves Model Robustness.
With Joseph O'Sullivan, John Langford, and Rich Caruana. ICML '00,
pp. 703710.
How can you make learning algorithms less "lazy", so that they search
for multiple "reallydifferent" prediction rules, in case we are later
faced with data in which features are corrupted or obscured?
 Noisetolerant Learning, the Parity
problem, and the Statistical Query model.
With Adam Kalai and Hal Wasserman. JACM 50(4): 506519 (2003).
Extended abstract in STOC'00, pp. 435440.
This paper gives a slightly subexponential algorithm for learning
parity in the presence of random noise. Scaling the problem down
gives the first known example of a problem that can be learned in
polynomial time from noisy data but cannot be learned in polynomial
time in the Statistical Query model of Kearns.
1999
 Finelycompetitive Paging.
With Carl Burch and Adam Kalai. FOCS'99, pp. 450458.
Using ideas from online learning, we give a paging algorithm with
especially good behavior under a finegrained notion of competitive
ratio. For instance, the algorithm gives nearoptimal performance when the
request stream can be partitioned unto a small number of working
sets. Unfortunately, the algorithm itself is not computationally
efficient.
 Probabilistic Planning in the
Graphplan Framework. With John Langford. 5th
European Conference on Planning (ECP'99). See the PGP web page.
Approaches probabilistic planning from the Graphplan
perspective. The result ends up looking at lot like a
gametree search, but using the planning graph to quickly prune states
that can be guaranteed not to reach the goals in time.
 Beating the HoldOut: Bounds for Kfold
and Progressive CrossValidation. With Adam Kalai and John
Langford. Proceedings of the 12th Annual Conference on
Computational Learning Theory (COLT '99), pp. 203208.
We show that for k>2, kfold CV is at least slightly better than
simply using a holdout set of 1/k of the examples, in terms of the
quality of the error estimate. We also analyze a "progressive
validation" approach (similar to a method used by Littlestone for
converting online algorithms to batch) that we show is in many ways as
good as the holdout, while using on average half as many examples for
testing.
 Microchoice Bounds and Self
Bounding Learning Algorithms. With John
Langford. Machine Learning 51(2): 165179 (2003).
Originally appeared in Proceedings of the 12th Annual Conference on
Computational Learning Theory (COLT '99), pp. 209214.
Gives adaptive samplecomplexity bounds for learning
algorithms that work by making a sequence of small choices. These
allow for a computationallyefficient version of Freund's QueryTrees.
 Online Algorithms for Combining Language Models.
With Stan Chen, Adam Kalai, and Roni Rosenfeld. In Proceedings of
the International Conference on Accoustics, Speech, and Signal
Processing (ICASSP '99). [postscript]
[gzipped postscript]
Uses online portfolioselection algorithms in the
context of combining language models, with experimental comparisons.
1998
 On Learning Monotone Boolean
Functions. With Carl Burch and John Langford.
Proceedings of the 39th Annual Symposium on Foundations of
Computer Science (FOCS '98).
For learning an arbitrary monotone Boolean function over the uniform
distribution, we give a simple algorithm that achieves error rate at
most 1/2  Omega(1/sqrt(n)), and show that no algorithm can do better than
1/2  omega(log(n)/sqrt(n)) from a polynomial size sample. These
improve over the previous best upper and lower bounds.
 Combining Labeled and Unlabeled Data
with CoTraining [pdf]. With Tom Mitchell.
Proceedings of the 11th Annual Conference on Computational
Learning Theory, pages 92100, 1998. (The document linked to
here fixes some minor bugs in the COLT version).
Introduces and studies CoTraining, a natural approach to
using unlabeled data when we have two different sources of
information about each example. The idea is to train two classifiers,
one using each type of information. We can then search over the
unlabeled data to find examples where one classifier is confident and
the other is not, and then use the label given by the confident classifier
as training data for the other.
In the process of analyzing this setting, we also give new results
(see lemma 1)
on PAC learning with noise when the positive and negative noise rates
are different.

On a Theory of Computing Symposia. With
Prabhakar Raghavan. International Conference on Fun with
Algorithms (FUN '98).
Also appeared in SIGACT News, September 1998.
How can you get the advantages of parallel sessions while still
allowing attendees to see all the talks they wanted? The
answer is to have four sessions, with each talk given
twice! This paper explores a number of properties of this
approach, along with connections to flows and expanders.

SemiDefinite Relaxations for Minimum Bandwidth and other
VertexOrdering problems. With Goran Konjevod,
R. Ravi, and Santosh Vempala. Theoretical Computer
Science, 235(1):2542, 2000. (Special issue in honor of
Manuel Blum's 60th Birthday!)
Extended abstract in
Proceedings of the 30th Annual Symposium on the Theory of
Computing (STOC '98).

A Note on Learning from MultipleInstance Examples.
With Adam Kalai. Machine Learning, 30:2329, 1998.
1997

Universal Portfolios With and Without Transaction Costs.
With Adam Kalai.
Machine Learning, 35: 193205, 1999 (special
issue for COLT '97). Originally appeared in Proceedings
of the 10th Annual Conference on Computational Learning Theory,
pages 309313, July 1997.

Online Learning and the Metrical Task System Problem.
With Carl Burch. Machine Learning, 39: 3558,
2000. Originally appeared in Proceedings of the 10th Annual
Conference on Computational Learning Theory (COLT '97), pages 4553.

A polylog(n)competitive algorithm for metrical task
systems. With Yair Bartal, Carl Burch, and Andrew
Tomkins. Proceedings of the 29th Annual Symposium on the Theory of
Computing (STOC '97), pages 711719.
 An O~(n^{3/14})Coloring
Algorithm for 3Colorable Graphs. With David Karger.
Information Processing Letters, 61(1):4953, January 1997.
1996
 OnLine Algorithms in Machine
Learning (a survey). This is a survey paper for a talk
given at the Dagstuhl workshop on OnLine algorithms (June '96).
Appears as Chapter 14 in "Online Algorithms: The State of the
Art", LNCS # 1442, Fiat and Woeginger eds., 1998.
 A Polynomialtime Algorithm for
Learning Noisy Linear Threshold Functions. With Alan
Frieze, Ravi Kannan, and Santosh Vempala. Algorithmica, 22:3552,
1998. An extended abstract appears in Proceedings of the
37th Annual Symposium on Foundations of Computer Science (FOCS'96),
pages 330338.
 A Constantfactor
Approximation Algorithm for the kMST Problem.
With R. Ravi and Santosh Vempala. JCSS, 58:101108 (1999).
An extended abstract appears in Proceedings of the 28th Annual
ACM Symposium on the Theory of Computing (STOC '96), pages 442448.
 Randomized Robot Navigation
Algorithms. With Piotr Berman, Amos Fiat, Howard Karloff,
Adi Rosen, and Michael Saks. In Proceedings of the Seventh Annual
ACMSIAM Symposium on Discrete Algorithms (SODA), pages 7584, January 1996.
1995
 Fast Planning Through Planning
Graph Analysis . With Merrick Furst.
Artificial Intelligence 90:281300, 1997. An extended
abstract appears in Proceedings of
the 14th International Joint Conference on Artificial Intelligence
(IJCAI), pages 16361642, August 1995.
See also the Graphplan home page .
 Empirical support for Winnow and
WeightedMajority based algorithms: results on a calendar scheduling
domain . Machine Learning 26:523, 1997. An
earlier version is in Proceedings of the Twelfth International
Conference on Machine Learning, pages 6472, July 1995. Click here for
more information and source code.
 Learning with Unreliable Boundary
Queries . With Prasad Chalasani, Sally Goldman, and
Donna Slonim. Journal of Computer and System
Sciences,56(2):209222, 1998. Originally appeared in
Proceedings of the Eighth Annual Conference on Computational
Learning Theory (COLT), pages 98107, July 1995.
 New Approximation Guarantees
for Minimum Weight kTrees and PrizeCollecting Salesmen.
With Baruch Awerbuch, Yossi Azar, and Santosh Vempala. SIAM
J. Computing, 28(1):254262, 1999. Originally published
in Proceedings of the 27th Annual
ACM Symposium on Theory of Computing, pages 277283, 1995. A
tech report version appears as CMUCS94173, August, 1994.
 A ConstantFactor
Approximation Algorithm for the Geometric kMST Problem in the
Plane. With J.S.B. Mitchell, Prasad
Chalasani, and
Santosh Vempala. SIAM J. Computing
28(3): 771781 (1998). This paper
combines two conference papers: J.S.B. Mitchell, "Guillotine
subdivisions approximate polygonal subdivisions: A simple new
method for the geometric kMST problem", SODA '96,
pp. 402408, and Blum, Chalasani, and Vempala, "A
constantfactor approximation for the kMST problem in the
plane", STOC '95, pp. 294302.
 Coloring Random and SemiRandom kColorable
Graphs. With Joel Spencer. Journal of Algorithms
19:204234, 1995. This paper extends the semirandom model results
in "Some Tools for Approximate 3Coloring", Proceedings of
the 31st Annual IEEE Symposium on Foundations of Computer
Science, pages 554562, October 1990.
1994
 Relevant Examples and Relevant
Features: Thoughts from Computational Learning Theory .
This is a survey paper presented at the 1994 AAAI Fall Symposium.
Here is a longer article with a broader
perspective, joint with Pat Langley, that appears in
Artificial Intelligence, 97:245272, 1997.
 On learning readksatisfyj DNF.
With Howard Aizenstein, Roni Khardon, Eyal Kushilevitz, Leonard Pitt, and Dan Roth.
SIAM J. Computing, 27(6):15151530, 1998. Originally
published in Proceedings of the Seventh Annual Conference
on Computational Learning Theory, pages 110117, July 1994.
 The Minimum Latency Problem.
With Prasad Chalasani, Don Coppersmith, Bill Pulleyblank, Prabhakar
Raghavan, and Madhu Sudan. In Proceedings of the 26th Annual ACM
Symposium on Theory of Computing, pages 163171, 1994.
 Weakly Learning DNF and
Characterizing Statistical Query Learning Using Fourier Analysis.
With Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay
Mansour, and Steven Rudich. In Proceedings of the 26th Annual ACM
Symposium on Theory of Computing, pages 253262, 1994. [notes and clarifications]
 New Approximation
Algorithms for Graph Coloring.
JACM 41(3):470516, May 1994. This paper combines
the worstcase approximation results in "Some Tools for
Approximate 3Coloring", FOCS 1990 (pp 554562), and those in
"An O(n^0.4)Approximation Algorithm for 3Coloring (and
Improved Approximation Algorithms for kColoring)",
STOC 1989 (pp 535542).
See 1995 paper with Joel Spencer for results on SemiRandom model.
1993
 Cryptographic Primitives Based on
Hard Learning Problems.
With Merrick Furst, Michael Kearns, and Richard Lipton.
In Advances in Cryptology  CRYPTO 93, Lecture Notes in
Computer Science #773, pages 278291, SpringerVerlag, 1994.
 Learning an
Intersection of a Constant Number of Halfspaces over a Uniform
Distribution.
With Ravi Kannan. Journal of Computer and System Sciences
54(2):371380, 1997 (JCSS special issue for FOCS '93).
Originally appeared in Proceedings of the 34th Annual IEEE Symposium
on Foundations of Computer Science, pages 312320,
November 1993. Also published as Chapter 9 in Theoretical
Advances in Neural Computation and Learning ,
Roychowdhury, Siu and Orlitsky, eds. Kluwer, 1994.
 An OnLine Algorithm for Improving
Performance in Navigation.
With Prasad Chalasani.
SIAM J. Comput. 29(6): 19071938 (2000). Originally
appeared in Proceedings of the
34th Annual IEEE Symposium on Foundations of Computer Science,
pages 211, November 1993.
 On Learning Embedded Symmetric Concepts.
With Prasad Chalasani and Jeffrey Jackson. In Proceedings of
the Sixth Annual Conference on Computational Learning Theory,
pages 337346, July 1993.
 Generalized Degree Sums and Hamiltonian Graphs.
With Ronald Gould. ARS Combinatoria, 35:3554, 1993.
1992
 A Decomposition Theorem and Bounds for Randomized Server
Problems.
With Howard Karloff, Yuval Rabani, and Michael Saks. SIAM J.
Computing, 30(5): 16241661, 2000. Originally appeared un
Proceedings of the 33rd Annual IEEE Symposium on Foundations of
Computer Science, pages 197207, October 1992.
 Learning Switching Concepts.
With Prasad Chalasani. In Proceedings of the Fifth Annual
Workshop on Computational Learning Theory, pages 231242, July 1992.
 Fast Learning of kTerm DNF Formulas with Queries.
With Steven Rudich. In Proceedings of the 24th Annual ACM
Symposium on Theory of Computing, pages 382389, May 1992.
 Rankr Decision Trees are a Subclass of rDecision Lists.
Information Processing Letters, 42:183185, 1992.
1991
 Learning in the Presence of Finitely or
Infinitely Many Irrelevant Attributes.
With Lisa Hellerstein and Nick Littlestone. JCSS
50(1):3240, February 1995. An earlier version appears in
Proceedings of the Fourth Annual Workshop on Computational
Learning Theory, pages 157166, August 1991.
 Algorithms for Approximate Graph Coloring.
Ph.D. thesis, MIT Laboratory for Computer Science MIT/LCS/TR506,
May 1991.
 Linear Approximation of Shortest
Superstrings. [ps]
With Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis.
JACM 41(4):630647, 1994.
An earlier version appears in Proceedings of the 23rd Annual
ACM Symposium on Theory of Computing, pages 328336, May 1991.
 Navigating in Unfamiliar Geometric
Terrain. [ps]
With Prabhakar Raghavan and Baruch Schieber. Siam J.
Comp 26(1):110137, February 1997. An earlier version
appears in Proceedings
of the 23rd Annual ACM Symposium on Theory of Computing, pages
494504, May 1991.
1990
 Learning Boolean Functions in an
Infinite Attribute Space.
Machine Learning, 9(4):373386, 1992. Also in
Proceedings of the 22nd ACM Symposium on Theory of Computing,
pages 6472, May 1990.
 Some Tools for
Approximate 3Coloring.
Proceedings of the 31st Annual IEEE Symposium on Foundations of
Computer Science, pages 554562, October 1990.
 Separating DistributionFree
and MistakeBound Learning Models over the Boolean Domain.
[pdf]
SIAM J. Computing, Vol 23, No. 5, 1994.
Also in Proceedings of the 31st Annual IEEE Symposium on Foundations of
Computer Science, pages 211218, October 1990.
 Learning Functions of k Terms.
With Mona Singh. In Proceedings of the Third Annual Workshop on
Computational Learning Theory, pages 144153, August 1990.
1989
 On the Computational Complexity of Training Simple Neural
Networks.
Master's thesis, MIT Laboratory for Computer
Science, MIT/LCS/TR445, May 1989.
 An Õ(n^0.4)Approximation Algorithm for 3Coloring (and
Improved Approximation Algorithms for kColoring).
In Proceedings of the 21st ACM Symposium on Theory of
Computing, pages 535542, May 1989.
 Training a 3Node Neural Network is NPComplete.
With Ron Rivest. Neural Networks, 5(1):117127, 1992. Also in
Advances in Neural Information Processing Systems 1 (proceedings of
the 1988 NIPS conference), pp. 494501, 1989.