Avrim Blum's publications

home | research interests | survey talks

Avrim Blum: Publications and Working papers

These publications and working papers are presented roughly in reverse chronological order of their initial publication. Much of this work was supported by grants from the National Science Foundation and other funding agencies including DARPA and the Simons foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not reflect or represent the views of these funding agencies.

2025

On Learning Verifiers and Implications to Chain-of-Thought Reasoning. With Maria-Florina Balcan, Zhiyuan Li, and Dravyansh Sharma. NeurIPS 2025. Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. In this work we consider the problem of learning reliable verifiers for sequential reasoning, including natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying this problem. We propose and analyze several natural verification goals, at different levels of strength, in this framework. We provide sample complexity upper-bounds for learning verifiers satisfying these goals, as well as lower-bound and impossibility results for learning other natural verification objectives without additional assumptions.
Replicable Online Learning. With Saba Ahmadi and Siddharth Bhandari. NeurIPS 2025. We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. (2022) in an online setting. In our model, the input sequence received by the learner is generated from an arbitrary sequence of distributions D_1,...,D_T. The learner's goal is to both (a) achieve sublinear regret and (b) to produce with high probability the exact same sequence of outputs if replayed using the same random seed on an input sequence drawn from the same sequence of distributions D_1,...,D_T. (Note: the random seed is for the learner's randomized algorithm; the draws from D_1,...,D_T are independent of the previous draws). We give algorithms achieving these guarantees and leave as an open question to determine the optimal regret bounds achievable in this setting.
PAC Learning with Improvements. With Idan Attias, Keziah Naggita, Donya Saless, Dravyansh Sharma, Matthew R. Walter. ICML 2025. In this work we define a mathematical model for machine learning in settings where agents being classified have the ability to strive and improve. For example, imagine a hiring test used to measure an agent's skill level such that for some threshold theta, agents who score above theta will be successful and those who score below theta will not (i.e., learning a threshold on the line). Suppose also that by putting in effort, agents can improve their skill level by some amount b. In that case, if we learn an approximation theta-hat of theta such that theta <= theta-hat <= theta+b and use it for hiring, we can actually achieve error zero, in the sense that (1) any agent classified as positive is truly qualified, and (2) any agent who truly is qualified can be classified as positive by putting in effort. Thus, the ability for agents to improve can allow for a goal one could not hope to achieve in standard models, namely zero error. In this paper, we explore this phenomenon more broadly, examining under what conditions the ability of agents to improve can reduce the sample complexity of learning, or alternatively, can make learning harder.
Proofs as Explanations: Short Certificates for Reliable Predictions. With Steve Hanneke, Chirag Pabbaraju, and Donya Saless. COLT 2025. We consider a model for explainable AI in which an explanation for a prediction h(x)=y consists of a subset S' (if it exists) of the full training set S such that all classifiers h' in family H that make at most b mistakes on S' predict h'(x)=y. Such a set S' serves as a proof that x indeed has label y under the assumption that (1) the target function h* belongs to H, and (2) S has at most b corrupted points. For example, if b=0 and H is the family of linear classifiers in R^d, and if x lies inside the convex hull of positive data points in S (so every consistent linear classifier labels x as positive), then Caratheodory's theorem states that x lies inside the convex hull of d+1 of those points, so a set S' of size d+1 could be released as an explanation for a positive prediction, and would serve as a short proof of correctness under the assumption that h* is in H. In this work, we consider this problem more generally (for general hypothesis families and general values of b) and give quantities that precisely characterize the worst-case size of the smallest certificate achievable, as well as analyzing distribution-dependent and distribution-independent bounds.
Pessimism Traps and Algorithmic Interventions. With Emily Diana, Kavya Ravichandran, and Alexander Tolbert. FORC 2025. Pessimism traps are settings where a community can get locked into a cycle of suboptimal behaviors due to pessimism - reinforced by the behavior of others - about their chance of success at more ambitious activities. In this work, we use the mathematics of information cascades to model this process as a result of rational behavior under uncertainty. We then examine how algorithmic interventions can be used to break such traps in both single-community and multi-community models, and in settings where the intervening entity does not even know which actions are optimal.
Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem. With Kavya Ravichandran. ALT 2025. We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has k arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an Omega(sqrt(k)) approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an O(sqrt(k)) approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra O(log k) approximation factor.
A Model for Combinatorial Dictionary Learning and Inference. With Kavya Ravichandran. ALT 2025. We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a combinatorial model in which to study this question, motivated by the way objects occlude each other in a scene to form an image. First, we identify a property we call "well-structuredness" of a set of low-dimensional components which ensures that no two components in the set are too similar. We show how well-structuredness is sufficient for learning the set of latent components comprising a set of sample instances. We then consider the problem: given a set of components and an instance generated from some unknown subset of them, identify which parts of the instance arise from which components. We consider two variants: (1) determine the minimal number of components required to explain the instance; (2) determine the correct explanation for as many locations as possible. For the latter goal, we also devise a version that is robust to adversarial corruptions, with just a slightly stronger assumption on the components.
Learning Actionable Counterfactual Explanations in Large State Spaces. With Keziah Naggita and Matt Walter. TMLR 2025. We consider agents needing advice on achieving a goal, such as junior students asking a guidance counselor what they need to do in order to have a strong college application when they are seniors. The guidance counselor has data from prior students, and wants to use this in order to provide good personalized policies to new students who arrive. We give a mathematical formulation for problems of this type, and experimentally evaluate algorithms on publicly available datasets.
Distributional Adversarial Loss. With Saba Ahmadi, Siddharth Bhandari, Chen Dan, and Prabhav Jain. AISTATS 2025. We initiate the study of a notion of adversarial loss that we call distributional adversarial loss. In this notion, for each original example, the allowed adversarial perturbation set is a family of distributions, and the adversarial loss over each example is the maximum loss over all the associated distributions. This generalizes the usual notion that considers a set of points, not distributions, as the perturbation set of each clean example. As an application of our approach, we show how to unify the two lines of work on randomized smoothing and robust learning in the PAC-learning setting and derive sample complexity bounds for randomized smoothing methods. We also investigate the role of randomness in achieving robustness against adversarial attacks. We show a general derandomization technique that preserves the extent of a randomized classifier's robustness against adversarial attacks and show its effectiveness empirically.
Competitive Strategies to use "Warm Start" Algorithms with Predictions. With Vaidehi Srinivas. SODA 2025. Many algorithms have the property that they will run much faster on a given instance I if you initialize the algorithm with a solution that is close to the eventual solution to that instance. Suppose you need to solve a series of instances I_1, I_2, ...: past work showed how you can achieve performance on the sequence comparable to the best fixed initialization in hindsight. In this work we give competitive guarantees against stronger benchmarks. In particular, we show how to perform nearly as well as the best fixed set of k starting points simultaneously for all values of k, where the notion of "nearly as well" degrades gracefully with k (and even against the best adaptive sequence of k starting points, where there is a natural cost for moving the k points). We also show connections to the classic k-server problem and also analyze a contextual version of this problem.

2024

Winning Without Observing Payoffs: Exploiting Behavioral Biases to Win Nearly Every Round. With Melissa Dutz. ITCS 2024. Consider repeatedly playing a zero-sum game with unknown payoff matrix where you cannot observe anyone's payoffs, not even your own. All you know is that the game is symmetric (like Rock-Paper-Scissors-Lizard-Spock) and you can observe your opponent's actions. Feldman et al. (2010) proposed this setting and gave an algorithm that guarantees payoff approaching the minimax optimal value (i.e., zero), the best one can generically hope for. Here we consider whether one can actually win in this setting if the opponent is behaviorally biased. We model several deterministic, biased opponents and show that even without knowing the game matrix or observing any payoffs, one can take advantage of each bias to win nearly every round (so long as the game has the property that each action beats and is beaten by at least one other action). We also provide a partial characterization of the kinds of biased strategies that can be exploited to win nearly every round, and provide algorithms for beating some kinds of biased strategies even when we don't know which strategy the opponent uses.
On the Vulnerability of Fairness Constrained Learning to Malicious Noise. With Princewill Okoroafor, Aadirupa Saha, and Kevin Stangl. AISTATS 2024. We consider the vulnerability of fairness-constrained learning to malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and proved that any proper learner can exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we need only incur an O(alpha) loss in accuracy, where alpha is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(sqrt(alpha)) loss, and give a matching Omega(sqrt(alpha)) lower bound. For Equalized Odds and Predictive Parity, however, and adversary can indeed force an Omega(1) loss. The key technical novelty of our work is how randomization can bypass simple 'tricks' an adversary can use to amplify its power. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.
Agnostic Multi-Robust Learning using ERM. With Saba Ahmadi, Omar Montasser, and Kevin Stangl. AISTATS 2024. A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test example, but the attacker only needs to find one successful perturbation. Xiang et al. [2022] proposed an algorithm for patch attacks that reduces the effective number of perturbations from an exponential to a polynomial, and learns using an ERM oracle. However, their guarantee requires the natural examples to be robustly realizable. In this work we consider the non-robustly-realizable case. Our first contribution is to give a guarantee for this setting by utilizing an approach of Feige, Mansour, and Schapire [2015]. Next, we extend our results to a multi-group setting and introduce a novel agnostic multi-robust learning problem where the goal is to learn a predictor that achieves low robust loss on a (potentially) rich collection of subgroups.
Dueling Optimization with a Monotone Adversary. With Meghal Gupta, Gene Li, Naren Sarayu Manoj, Aadirupa Saha, and Yuanyuan Yang. ALT 2024. We introduce and study the problem of dueling optimization with a monotone adversary, a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer x* for a function f:X->R, for a convex set X in R^d. In each round, the algorithm submits a pair of guesses x1 and x2, and the adversary responds with any point in the space that is at least as good as both guesses. The cost of each query is the suboptimality of the worst of the two guesses; i.e., max(f(x1) - f(x*),f(x2) - f(x*)). The goal is to minimize the number of iterations required to find an ε-optimal point and to minimize the total cost (regret) of the guesses over many rounds. Our main result is an efficient randomized algorithm for several natural choices of the function f and set X that incurs cost O(d) and iteration complexity O(d log(1/epsilon)^2). Moreover, our dependence on d is asymptotically optimal, as we show examples in which any randomized algorithm for this problem must incur Omega(d) cost and iteration complexity.

2023

Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback. With Han Shao, Lee Cohen, Yishay Mansour, Aadirupa Saha, and Matthew Walter. NeurIPS 2023. In classic RL, policies are evaluated with respect to a scalar reward function. However, many real-world problems involve balancing multiple objectives whose relative priority will vary from user to user. So, an optimal policy for one user might be sub-optimal for another. In this work, we consider a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to elicit enough information about a given user via comparison queries to efficiently compute a near-optimal policy for them. We consider two feedback models: (1) a model where a user is provided with two policies and returns their preferred policy, and (2) a model where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
Strategic Classification under Unknown Personalized Manipulation. With Han Shao and Omar Montasser. NeurIPS 2023. We consider strategic classification, where agents can strategically manipulate their feature vector to a limited extent in order to be classified as positive. Unlike most prior work, our work considers manipulations to be personalized, meaning that agents can have different levels of manipulation abilities (e.g., varying radii for ball manipulations), and unknown to the learner. We formalize the learning problem in an interaction model where the learner first deploys a classifier and the agent manipulates the feature vector within their manipulation set to game the deployed classifier. We investigate various scenarios in terms of the information available to the learner during the interaction, such as observing the original feature vector before or after deployment, observing the manipulated feature vector, or not seeing either the original or the manipulated feature vector, and provide online mistake bounds and PAC sample complexity in these scenarios for ball manipulations.
Fundamental Bounds on Online Strategic Classification. With Saba Ahmadi and Kunhe Yang. ACM-EC 2023. We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from non-strategic online classification. For instance, whereas in the non-strategic case, a mistake bound of ln|H| is achievable via the halving algorithm when the target function belongs to a known class H, we show that no deterministic algorithm can achieve a mistake bound o(Delta) in the strategic setting, where Delta is the maximum degree of the manipulation graph (even when |H|=Delta). We obtain an algorithm achieving mistake bound O(Delta*ln|H|). We also extend this to the agnostic setting and obtain an algorithm with a Delta multiplicative regret, and we show no deterministic algorithm can achieve o(Delta) multiplicative regret. We also study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences.
An Analysis of Robustness of Non-Lipschitz Networks. With Maria-Florina Balcan, Dravyansh Sharma, and Hongyang Zhang. JMLR 24(98):1-43, 2023. Despite significant advances, deep networks remain highly susceptible to adversarial attack. One fundamental challenge is that small input perturbations can often produce large movements in the network's final-layer feature space. In this paper, we define an attack model that abstracts this challenge, to help understand its intrinsic properties. In our model, the adversary may move data an arbitrary distance in feature space but only in random low-dimensional subspaces. We prove such adversaries can be quite powerful: defeating any algorithm that must classify any input it is given. However, by allowing the algorithm to abstain on unusual inputs, we show such adversaries can be overcome when classes are reasonably well-separated in feature space. We further provide theoretical guarantees for setting algorithm parameters to optimize over accuracy-abstention trade-offs using data-driven methods, and provide connections to strategic classification as well.
Setting Fair Incentives to Maximize Improvement. With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. FORC 2023. We consider the problem of helping agents improve by setting short-term goals. Given a set of target skill levels, we assume each agent will try to improve from their initial skill level to the closest target level within reach or do nothing if no target level is within reach. Our goal is to optimize the target levels for social welfare and fairness objectives, where social welfare is defined as the total amount of improvement, and fairness objectives are considered when agents belong to different underlying populations. A key technical challenge of this problem is the non-monotonicity of social welfare in the set of target levels, i.e., adding a new target level may decrease the total amount of improvement as it may get easier for some agents to improve. Considering these properties, we provide algorithms for optimal and near-optimal improvement for both social welfare and fairness objectives. Finally, we extend our algorithms to learning settings where we have only sample access to the initial skill levels of agents.
Multi-agent Value of Information for Components' Inspections. With Chaochao Lin, Maria-Florina Balcan, and Matteo Pozzi. ICASP 2023. We consider a multi-agent component-inspection game. In this game, agents manage different components of an overall system. Each component may be either intact or damaged, and the overall system works or not according to some Boolean function of the status of its components. Agents incur costs for repairing their own components, but the penalty for system failure is shared among all agents. The twist is that agents do not know if their components need repair (they have a Bayesian prior), and can only find out through (costless) inspection actions, which publicly reveals the status of the component to all. We assess the 'Value of Information' for this game, showing that interesting cases can trigger Information Avoidance, where rational agents prefer not to collect free information.

2022

Boosting barely robust learners: A new perspective on adversarial robustness. With Omar Montasser, Greg Shakhnarovich, and Hongyang Zhang. NeurIPS 2022. In this work we show that having the ability to learn a predictor that is robust to adversarial perturbations of magnitude 2*gamma on a small fraction of any given data distribution is equivalent to having the ability to learn a predictor that is robust to adversarial perturbations of magnitude gamma on nearly all of any given data distribution. Moreover, we present oracle-efficient reductions in both directions.
A Theory of PAC Learnability under Transformation Invariances. With Han Shao and Omar Montasser. NeurIPS 2022. Transformation invariances are present in many real-world problems, e.g., a rotated car is still identified as a car. Data augmentation, which adds the transformed data into the training set and trains a model on the augmented data, is a common technique to build these invariances into the learning process. However, it is unclear how data augmentation performs theoretically and what the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC learnability under transformation invariances in three settings according to different levels of realizability: (i) a hypothesis in H fits the augmented data; (ii) a hypothesis in H fits only the original data and the transformed data lying in the support of the data distribution; (iii) the agnostic case. One interesting observation is that distinguishing between the original data and the transformed data is necessary to achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating between the original and transformed data (including data augmentation) is not optimal. We propose two combinatorial measures characterizing the optimal sample complexity in these settings and provide sample-optimal algorithms.
Robustly-Reliable Learners Under Poisoning Attacks. With Nina Balcan, Steve Hanneke, and Dravyansh Sharma. COLT 2022. We analyze robustness guarantees achievable in the face of targeted poisoning attacks. We define and show how to provide robustly-reliable predictions, in which the predicted label is guaranteed to be correct so long as the adversary has not exceeded a given corruption budget and the true target function belongs to the class in question. Prior approaches had considered only certificates that the algorithm's prediction does not change, rather than certifying correctness. We provide a complete characterization of learnability in this setting, in particular, nearly-tight matching upper and lower bounds on the region that can be certified, as well as efficient algorithms for computing this region given an ERM oracle. Moreover, for the case of linear separators over logconcave distributions, we provide efficient truly polynomial time algorithms (i.e., non-oracle algorithms) for such robustly-reliable predictions. We also extend these results to the active setting where the algorithm adaptively asks for labels of specific informative examples, and the difficulty is that the adversary might even be adaptive to this interaction, as well as to the agnostic learning setting where there is no perfect classifier even over the uncorrupted data.
On Classification of Strategic Agents Who Can Both Game and Improve. With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. FORC 2022. We consider classification of agents who can both game and improve, such as loan applicants who may be able to take some actions that increase their perceived credit-worthiness and others that also increase their true credit-worthiness. A decision-maker would like to define a classification rule with few false-positives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each. For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partial-information learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finite-point version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NP-hard in our full linear model and give additional results for low-dimensional data.
Multi Stage Screening: Enforcing Fairness and Maximizing Efficiency in a Pre-Existing Pipeline. With Kevin Stangl and Ali Vakilian. FAccT 2022. Consider an actor making selection decisions (e.g., hiring) using a series of classifiers. The early stages (e.g. resume screen, coding screen, phone interview) filter out some of the applicants, and then an expensive but accurate test (e.g. a full interview) is applied to those who make it to the final stage. We examine the goal of maximizing quantities of interest to the decision maker subject to requiring Equality of Opportunity (qualified members of each group have the same chance of being hired), via modification of the probabilities of promotion through the screening process at each stage based on performance at the previous stage. We exhibit algorithms for satisfying Equal Opportunity over the selection process and maximizing precision (the fraction of interviews that yield qualified candidates) as well as linear combinations of precision and recall at the end of the final stage. We also present examples showing that the solution space is non-convex, which motivate our combinatorial exact and (FPTAS) approximation algorithms for maximizing the linear combination of precision and recall. Finally, we discuss the "price of" additional restrictions, such as not allowing the decision-maker to use group membership in its decision process.
Stochastic Vertex Cover with Few Queries. With Soheil Behnezhad and Mahsa Derakhshan. SODA 2022. We study the minimum vertex cover problem in the following stochastic setting. Let G be an arbitrary given graph and p<1 a parameter of the problem, and let G_p be a random subgraph that includes each edge of G independently with probability p. We are unaware of the realization G_p, but can learn if an edge e exists in G_p by querying it. The goal is to find an approximate minimum vertex cover of G_p by querying few edges of G non-adaptively. We present a (2+epsilon) approximation for general graphs which queries O(1/epsilon^3 p) edges per vertex, a 1.367-approximation for bipartite graphs which queries poly(1/p) edges per vertex, and a (1+epsilon) approximation for bipartite graphs with triple-exp(1/p) queries per vertex. Our techniques also lead to improved bounds for bipartite stochastic matching. We obtain a 0.731-approximation with nearly-linear in 1/p per-vertex queries, breaking the prevalent 2/3-approximation barrier in the poly(1/p) query regime, improving algorithms of [Behnezhad et al., SODA'19] and [Assadi and Bernstein, SOSA'19].

2021

Excess Capacity and Backdoor Poisoning. With Naren Manoj. NeurIPS 2021. This work presents a theoretical framework for analyzing backdoor data poisoning attacks, in which an attacker aims to embed a "trigger" in training data that it can use later to produce specific desired misclassifications. We identify a parameter we call the memorization capacity that we show captures the vulnerability of a learning problem to such an attack in this model. We then use this to analyze several natural settings. We also show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set, and also show a formal relationship between the problems of backdoor filtering and robust generalization.
Robust learning under clean-label attack. With Steve Hanneke, Jian Qian, and Han Shao. COLT 2021. We study the problem of robust learning under clean-label data-poisoning attacks, where the attacker injects an arbitrary set of correctly-labeled examples to the training set with the goal of getting the algorithm to make mistakes on specific test instances at test time. The learning goal is to minimize the attackable rate: the probability mass of test instances that the attacker can successfully attack. We show that any robust algorithm with diminishing attackable rate can achieve an optimal O(1/ϵ) dependence on ϵ in its PAC sample complexity. On the other hand, the attackable rate might be large even for some optimal PAC learners, e.g., SVM for linear classifiers. Furthermore, we show that the class of linear hypotheses is not robustly learnable when the data distribution has zero margin and is robustly learnable in the case of positive margin but requires sample complexity exponential in the dimension. For a general hypothesis class with bounded VC dimension, if the attacker is limited to add at most t>0 poison examples, the optimal robust learning sample complexity grows almost linearly with t.
The Strategic Perceptron. With Saba Ahmadi, Hedyeh Beyhaghi, and Keziah Naggita. ACM-EC 2021. The classic Perceptron algorithm is a simple and elegant learning procedure that makes a bounded number of mistakes when data is linearly separable with a nonzero margin. However, what if data points correspond to strategic agents who want to be classified as positive, and who can modify their positions by a limited amount wihout changing their true label? In this situation the observed position of data points will depend on the current classifier, and the Perceptron algorithm may fail to achieve its guarantees. Indeed, we illustrate examples where the predictor may oscillate between two solutions forever, making an unbounded number of mistakes even though a perfect large-margin linear classifier exists. Our main contribution is providing a modified Perceptron-style algorithm which makes a bounded number of mistakes in presence of strategic agents with both ℓ_2 and weighted ℓ_1 manipulation costs. We also present several open problems in this model.
Incentive-Compatible Kidney Exchange in a Slightly Semi-Random Model. With Paul Gölz. ACM-EC 2021. Motivated by kidney exchange, we study the following mechanism-design problem. Vertices of a directed graph are partitioned across multiple players (hospitals). Players report their vertices to the mechanism, which then aims to find a long path (a chain of transplantations) starting at a distinguished vertex (an altruistic donor). The challenge is that players want to maximize the number of their own vertices that lie on the path returned by the mechanism, and may strategically omit vertices to do so. Unfortunately, in worst-case instances, competing with the overall longest path is impossible while incentivizing (approximate) truthfulness. We therefore adopt a semi-random model where a small (o(n)) number of random edges are added to worst-case instances. We give a truthful mechanism for this setting that competes with the longest path whose subpaths within each player have a minimum average length. In fact, our mechanism satisfies even a stronger notion of truthfulness, which we call matching-time incentive compatibility.
One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning. With Nika Haghtalab, Richard Phillips, and Han Shao. ICML 2021. This paper introduces a framework for incentive-aware learning and data sharing in collaborative learning, with the goal of incentivizing agents to maintain their collaboration. Our stable and envy-free equilibria capture notions of collaboration in the presence of agents interested in meeting their learning objectives while keeping their own sample collection burden low. For example, in an envy-free equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we compare the sample complexity of incentive-aware collaboration with that of optimal collaboration when one ignores agents' incentives.
Learning Complexity of Simulated Annealing. With Chen Dan and Saeed Seddighin. AISTATS 2021. A key component that plays a crucial role in the performance of simulated annealing is the cooling schedule. Motivated by this, we study the following question: "Given samples of instances of an optimization problem drawn from some distribution D, can we design cooling schedules that minimize the runtime or maximize the success rate of simulated annealing with respect to D?" We provide positive results both in terms of sample complexity and simulation complexity. For sample complexity, we show that O~(m^1/2) samples suffice to find an approximately optimal cooling schedule of length m, and complement this with a sample complexity lower bound of Ω~(m^1/3). These results are general and rely on no assumptions. For simulation complexity, however, we make additional assumptions to measure the success rate of an algorithm. To this end, we introduce the monotone stationary graph that models the performance of simulated annealing. Based on this model, we present polynomial time algorithms with provable guarantees for the learning problem.
Communication-Aware Collaborative Learning. With Shelby Heinecke and Lev Reyzin. AAAI 2021. Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise.

2020

Online Learning with Primary and Secondary Losses. With Han Shao. NeurIPS 2020. We consider a form of "combining expert advice" where at each round there are two loss vectors: a primary loss and a secondary loss. Our goal is to perform nearly as well as the best expert with respect to the primary loss, while at the same time achieving the minimal guarantee of performing not too much worse than the worst expert with respect to the secondary loss. E.g., imagine a recruiter deciding which job applicants to hire, where the company gives higher weight to false-positives than false-negatives (the primary loss) but society might weight these errors more equally (the secondary loss). Unfortunately, we show that this goal - and more generally the goal of low regret with respect to the primary loss while bounding the secondary loss by a linear threshold - is unachievable without any bounded variance assumption on the secondary loss. However, on the positive side, we show that running any switching-limited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by o(T) for any time interval.
Random Smoothing Might be Unable to Certify L_infinity Robustness for High-Dimensional Images. With Travis Dick, Naren Manoj, and Hongyang Zhang. JMLR 21(211):1−21, 2020. We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the L_p ball of radius epsilon when p>2. Although random smoothing has been well understoodfor the L_2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the L_infinity threat model. In this work, we show that any noise distribution D over R^d that provides L_p robustness for all base classifiers with p>2 must satisfy E[eta_i^2] = Omega(d^{1−2/p} epsilon^2 (1-delta)/delta^2)for 99% of the features (pixels)of vector eta ~ D, where epsilon is the robust radius and delta is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?. With Kevin Stangl. FORC 2020. Multiple fairness constraints have been proposed in the literature, motivated by concerns about ways demographic groups might be treated unfairly by learned classifiers. In this work we consider a different motivation: learning from biased training data. We posit several ways training data may be biased, including having a more noisy or negatively-biased labeling process for members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. In contrast, requiring calibration will make ERM perform worse. We also consider other approachess including reweighting training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.
Active Local Learning. With Arturs Backurs and Neha Gupta. COLT 2020. In this work we consider active local learning: given a query point x, and active access to an unlabeled training set S, output the prediction h(x) of a near-optimal h in H using significantly fewer labels than would be needed to actually learn h fully. In particular, the number of label queries should be independent of the complexity of H, and the function h should be well-defined, independent of x. This immediately also implies an algorithm for distance estimation: estimating the value opt(H) from many fewer labels than needed to actually learn a near-optimal h in H, by running local learning on a few random query points and computing the average error. We give algorithms for several types of functions including Lipschitz-bounded functions and approximating the minimum error that can be achieved by the Nadaraya-Watson estimator under a linear diagonal transformation with eigenvalues coming from a small range.
Advancing Subgroup Fairness via Sleeping Experts. With Thodoris Lykouris. ITCS 2020. We show how the sleeping experts framework can be used to provide meaningful fairness and quality guarantees when making decisions on individuals that belong to a variety of groups that may overlap. Our approach is based on a "best effort" notion of fairness, where for each group we wish to perform nearly as well as the best decision rule for that group from a given class. We also show that the ability to achieve such guarantees is sensitive to the precise notion of "performing well", showing that certain notions can be intrinsically unachievable.
Foundations of Data Science. With John Hopcroft and Ravi Kannan. Cambridge University Press. Free pre-publication version (for personal use only; not for re-distribution, resale, or use in derivative works). Textbook on core mathematical concepts and algorithms for data science, including properties of high-dimensional spaces, singular value decomposition, random walks and Markov chains, fundamentals of machine learning, streaming algorithms, clustering, properties of random graphs, and other topics.
Approximation Stability and Proxy Objectives. Chapter 6 in Beyond the Worst-Case Analysis of Algorithms, T. Roughgarden, editor. Cambridge University Press. Survey article on approximation stability.
Technical perspective: Algorithm selection as a learning problem. Communications of the ACM 63(6):86, 2020. A brief discussion of and perspective on recent work analyzing algorithm selection and design as a learning problem.

2019

Computing Stackelberg Equilibria of Large General-Sum Games. With Nika Haghtalab, MohammadTaghi Hajiaghayi, and Saeed Seddighin. SAGT 2019. In this work we study the computational complexity of finding Stackelberg Equilibria in general-sum games, where the set of pure strategies of the leader and the followers are exponentially large in a natural representation of the problem. In zero-sum games, the notion of a Stackelberg equilibrium coincides with the notion of a Nash Equilibrium. Finding these equilibrium concepts in zero-sum games can be efficiently done when the players have polynomially many pure strategies but also when (in additional to some structural properties) a best-response oracle is available. In this work, we show that while there are natural large general-sum games where the Stackelberg Equilibria can be computed efficiently if the Nash equilibrium in its zero-sum form could be computed efficiently, in general, structural properties that allow for efficient computation of Nash equilibrium in zero-sum games are not sufficient for computing Stackelberg equilibria in general-sum games.
Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem. With Haris Angelidakis, Pranjal Awasthi, Vaggos Chatziafratis, and Chen Dan. ESA 2019. We consider the classic Max Weighted Independent Set problem on perturbation-stable instances: instances such that the optimal solution does not change even if node weights are perturbed by a at most some given factor gamma. We give algorithms to find the optimal solution on graphs of degree at most Δ that are Õ(Δ/sqrt(log Δ))-stable, k-colorable graphs that are (k - 1)-stable, and planar graphs that are (1 + epsilon)-stable, using both combinatorial techniques as well as LPs and the Sherali-Adams hierarchy. For general graphs, we also present a an O(n^(1/2 - epsilon)) lower bound, assuming the planted clique conjecture. As a by-product of our techniques, we also give algorithms and lower bounds for stable instances of Node Multiway Cut. We also we initiate the study of certified algorithms for Independent Set: these are algorithms that on any instance I (stable or not) produce a bounded perturbation I' of I along with a solution that is optimal for I' (so if I was stable, then this will be optimal for I also). This notion was recently introduced by Makarychev and Makarychev (2018). Here, we obtain Δ-certified algorithms for Independent Set on graphs of maximum degree Δ, and (1+epsilon)-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Fürer (1994) and prove that it is a ((Δ + 1)/3 + epsilon)-certified algorithm for Independent Set on graphs of maximum degree Δ where all weights are equal to 1.
Optimal Strategies of Blotto Games: Beyond Convexity. With Soheil Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Christos H. Papadimitriou, and Saeed Seddighin. ACM-EC 2019. Colonel Blotto is a classic game first introduced by Borel in 1921. Two colonels each have a pool of troops that they divide simultaneously among a set of battlefields. The winner of each battlefield is the colonel who puts more troops in it and the overall utility of each colonel is the sum of weights of the battlefields that s/he wins. Two main objectives have been proposed for this game are (i) maximizing expected payoff, and (ii) maximizing the probability of obtaining a minimum payoff u (corresponding to settings such as winning electoral votes). In this paper, we consider both objectives and show how it is possible to obtain (almost) optimal solutions that have few strategies in their support. One of the main technical challenges with bounded support strategies is that the solution space becomes non-convex. However, we show through a set of structural results that the solution space can, interestingly, be partitioned into polynomially many disjoint convex polytopes that can be considered independently. Coupled with a number of other combinatorial observations, this leads to polynomial time approximation schemes for both of the aforementioned objectives. We also provide the first complexity result for finding the maximin of Blotto-like games: we show that computing the maximin of a generalization of the Colonel Blotto game that we call General Colonel Blotto is exponential time-complete.

2018

On preserving non-discrimination when combining expert advice. With Suriya Gunasekar, Thodoris Lykouris, and Nati Srebro. NeurIPS 2018. This work considers the question: Can we combine "fair" expert advice fairly? Specifically, given a set of predictors that are individually "fair" with respect to a particular fairness condition, can we combine them to (a) perform nearly as well the best of them in hindsight (i.e., achieve low regret), while simultaneously (b) at least approximately maintaining the given fairness condition? This would be fairly trivial in the iid setting (just observe a small sample of data and then select the predictor that performed best on the sample). However, we show that this is not in general achievable in the non-iid online setting for fairness notions of "equalized odds" and "equality of opportunity". On the positive side, we show that this can be achieved for a fairness notion of "equalized error rates".
Diversified Strategies for Mitigating Adversarial Attacks in Multiagent Systems. With Nina Balcan and Shang-Tse Chen. AAMAS 2018. We consider online decision-making in settings where players want to guard against possible adversarial attacks or other catastrophic failures. To address this, we propose a solution concept in which players have an additional constraint that at each time step they must play a diversified mixed strategy: one that does not put too much weight on any one action. We explore properties of diversified strategies in both zero-sum and general-sum games, and provide algorithms for minimizing regret within the family of diversified strategies as well as methods for using taxes or fees to guide standard regret-minimizing players towards diversified strategies. In general-sum games, requiring diversification can actually lead to higher-welfare equilibria, and we give guarantees on both price of anarchy and the social welfare produced by regret-minimizing diversified agents.
Active Tolerant Testing. With Lunjia Hu. COLT 2018. We show for a nontrivial hypothesis class C one can estimate the error rate of the best h in C using substantially fewer labeled examples than would be needed to actually learn a good h in C. Specifically, we show that for unions of d intervals on the line, in the agnostic active setting (the target is arbitrary and we can query labels from a pool of examples drawn from the underlying distribution), we can estimate the error rate of the best h in C to +/- epsilon with a number of label requests that is independent of d and depends only on epsilon. Our work extends that of [BBBY12] who solved the non-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than epsilon). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled and given as training data to A? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem).
Approximate Convex Hull of Data Streams. With Vladimir Braverman, Ananya Kumar, Harry Lang, and Lin F. Yang. ICALP 2018. Given a set of points P in R^d and a tolerance parameter epsilon, let OPT denote the size of the smallest subset S of P such that every point in P is either in or within distance epsilon of the convex hull of S. We consider the problem of maintaining such a small epsilon-hull in a streaming model in which the points of P arrive one at a time and we would like to use space not much larger than OPT. We begin with lower bounds that show, under a reasonable model, that it is not possible to have a single-pass streaming algorithm that computes an epsilon-hull with O(OPT) space. We instead propose three relaxations of the problem for which we can compute epsilon-hulls using space near-linear to the optimal size. Our first algorithm for points in R^2 that arrive in random-order uses O(log n * OPT) space. Our second algorithm for points in R^2 makes O(log(epsilon^{-1})) passes before outputting the epsilon-hull and requires O(OPT) space. Our third algorithm, for points in R^d for any fixed dimension d, outputs, with high probability, an epsilon-hull for all but delta-fraction of directions and requires O(OPT * log OPT) space.
Algorithms for Generalized Topic Modeling. With Nika Haghtalab. AAAI 2018. In this work we consider a generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More generally, our model can be viewed as a generalization of co-training and multi-view learning to settings where items have fractional labels.
On Price versus Quality. With Yishay Mansour. ITCS 2018. This work proposes a model where the value of a buyer for some product (like a slice of pizza) is a combination of their personal desire for the product (how hungry they are for pizza) and the quality of the product (how good the pizza is). Sellers in this setting have a two-dimensional optimization problem of determining both the quality level at which to make their product (how expensive ingredients to use) and the price at which to sell it. We analyze optimal seller strategies as well as analogs of Walrasian equilibria in this setting. A key question we are interested in is: to what extent will the price of a good be a reliable indicator of the good's quality? One result we show is that while the specific quality and price that an optimal seller should choose in this model will depend highly on the specific distribution of buyers, price and quality will always be linearly related, regardless of that distribution. We also show that for the case of multiple buyers and sellers, an analog of Walrasian equilibrium exists in this setting, and can be found via a natural tatonnement process. Finally, we analyze markets with a combination of "locals" (who know the quality of each good) and "tourists" (who do not).
From Battlefields to Elections: Winning Strategies of Blotto and Auditing Games. With Soheil Behnezhad, Mahsa Derakhshan, Mohammad Taghi Hajiaghayi, Mohammad Mahdian, Christos H. Papadimitriou, Ronald L. Rivest, Saeed Seddighin, Philip B. Stark. SODA 2018. Mixed strategies are typically evaluated based on the expected payoff that they guarantee. In this paper, we consider settings where instead the goal is to receive utility at least u with probability at least p. The first game that we consider is Colonel Blotto, a well-studied game that was introduced in 1921. In the Colonel Blotto game, two colonels divide their troops among a set of battlefields. Each battlefield is won by the colonel that puts more troops in it. The payoff of each colonel is the weighted number of battlefields that she wins, and we consider a version where each player wants to maximize the probability that they win at least half of the total weight. We give approximation algorithms as well as exact algorithms for special cases of this problem.

2017

Collaborative PAC Learning. With Nika Haghtalab, Ariel Procaccia, and Mingda Qiao. NIPS 2017. We consider a collaborative PAC learning model, in which k players attempt to learn the same underlying concept. We ask how much more information is required to learn an accurate classifier for all players simultaneously. We refer to the ratio between the sample-complexity of collaborative PAC learning and the sample-complexity of single-player PAC learning as the overhead. We design learning algorithms with only O(ln k) and O((ln k)^2) overhead respectively in the personalized and centralized variants our model. In contrast, not sharing information among players would incur overhead O(k). We complement our upper bounds with an Omega(ln k) lower bound, showing that our results are tight up to a logarithmic factor.
Lifelong Learning in Costly Feature Spaces. With Nina Balcan and Vaishnavh Nagarajan. ALT 2017. Journal version in Theoretical Computer Science 808:14-37, 2020 (Special Issue on Algorithmic Learning Theory). In this work we study lifelong and representation learning for structured target functions including decision trees and polynomials. More specifically, we consider learning a series of target functions, where each can be represented as, say, a decision tree, and all these decision trees share pieces in common that we call "meta-features". As we learn target functions, our aim is to learn these meta-features in a way that allows us to learn new target functions more cheaply. We specifically focus on reducing the number of feature evaluations that need to be performed in the learning process, motivated by applications such as medical diagnosis where decision trees are popular in part due to their ability to make predictions based on a small number of features of any given example.
Efficient Co-Training of Linear Separators under Weak Dependence. With Yishay Mansour. COLT 2017. We develop the first polynomial-time algorithm for co-training of homogeneous linear separators under weak dependence, a relaxation of the condition of independence given the label. Our algorithm learns from purely unlabeled data, except for a single labeled example to break symmetry of the two classes, and works for any data distribution having an inverse-polynomial margin and with center of mass at the origin.
Efficient PAC Learning from the Crowd. With Pranjal Awasthi, Nika Haghtalab, and Yishay Mansour. COLT 2017. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier, which can lead to computational and statistical inefficiencies. For example, learning from poorly-labeled data can be computationally hard, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider a setting where a certain alpha fraction of labelers actually know the true target function and label data correctly, whereas the remaining 1-alpha fraction have other arbitrary functions in mind. When alpha is greater than 1/2, we show that any class that can be efficiently PAC-learned from non-noisy data can still be efficiently learned in this setting with only a constant-factor blowup in total number of labels requested. When alpha is less than 1/2, we can do this with only an additional constant number of queries to a known-correct labeling oracle. All algorithms require asking any given labeler only O(1) labeling questions.
Opting into Optimal Matchings. With Ioannis Caragiannis, Nika Haghtalab, Ariel Procaccia, Eviatar Procaccia, and Rohit Vaish. SODA 2017, pages 2351-2363. We consider design of optimal, individually rational matching mechanisms (in a general sense, allowing for cycles in directed graphs). In particular, each player---who is associated with a subset of vertices---should be guaranteed to match at least as many of his own vertices when he opts into the matching mechanism as when he opts out. We offer a new perspective on this problem by considering an arbitrary graph, but assuming that vertices are associated with players at random. Our main result is that under certain conditions, any fixed optimal matching is likely to be individually rational up to lower-order terms. We also show that a simple and practical mechanism is (fully) individually rational, and likely to be optimal up to lower-order terms. We discuss the implications of our results for market design in general, and kidney exchange in particular.

2016

Sparse Approximation via Generating Point Sets. With Sariel Har-Peled and Benjamin Raichel. SODA 2016. ACM Transactions on Algorithms (TALG), Volume 15 Issue 3, July 2019. We consider the following problem: given a collection P of n objects (represented as points in the unit ball in R^d), find a small subset T of P such that each object in P is close to a sparse convex combination of objects in T. E.g., if we allow T=P then this is trivial with sparsity 1, and for some sets P (e.g., random points in the unit ball) there is no such small set T. Let k_opt = k_opt(P,epsilon) be the size of the smallest subset T_opt such that every point in P is within distance epsilon of the convex hull of T_opt. Our goal is to find a set T of k_alg points and distance epsilon_alg such that every point in P is within distance epsilon_alg to a sparse combination of points in T, where k_alg and epsilon_alg are not too much larger than k_opt and epsilon_opt. We give several efficient algorithms with different guarantees of this form.
On the Computational Hardness of Manipulating Pairwise Voting Rules. With Rohit Vaish, Neeldhara Misra, and Shivani Agarwal. AAMAS 2016, pages 358-367. In this work we study the computational tractability of manipulating voting rules when the input is a collection of incomplete pairwise preferences. We show that in this scenario, manipulation can be computationally hard even for a single manipulator.
Semi-Supervised Learning. Entry in the Encyclopedia of Algorithms, pages 1936-1941, 2016.

2015

Efficient Representations for Lifelong Learning and Autoencoding. With Nina Balcan and Santosh Vempala. COLT 2015. In this work we pose and provide efficient algorithms for several natural theoretical formulations of life-long learning. Specifically, we consider the problem of learning many different target functions over time, that share certain commonalities that are initially unknown to the learning algorithm. Our aim is to learn new internal representations as the algorithm learns new target functions, that capture this commonality and allow subsequent learning tasks to be solved more efficiently and from less data. We develop efficient algorithms for two very different kinds of commonalities that target functions might share: one based on learning common low-dimensional and unions of low-dimensional subspaces and one based on learning nonlinear Boolean combinations of features. Our algorithms for learning Boolean feature combinations additionally have a dual interpretation, and can be viewed as giving an efficient procedure for constructing near-optimal sparse Boolean autoencoders under a natural "anchor-set" assumption.
The Ladder: A Reliable Leaderboard for Machine Learning Competitions. With Moritz Hardt. ICML 2015. We consider the problem of maintaining an accurate leaderboard for a machine learning competition that faithfully represents the quality of the best submission of each competing team. What makes this challenging is that participants may repeatedly evaluate their submissions on the leaderboard, and in the process overfit to the holdout data that supports it. Moreover, we (the organizers) cannot control the capacity or complexity of the rules participants are using. In this work, we introduce a notion of leaderboard accuracy tailored to the format of a competition. We introduce a natural algorithm called the Ladder and demonstrate that it simultaneously supports strong theoretical guarantees in a fully adaptive model of estimation, withstands practical adversarial attacks, and achieves high utility on real submission files from a Kaggle competition. Notably, we are able to sidestep a powerful recent hardness result for adaptive risk estimation that rules out algorithms such as ours under a seemingly very similar notion of accuracy. On a practical note, we provide a parameter-free variant of our algorithm that can be easily deployed.
Learning What's Going On: Reconstructing Preferences and Priorities from Opaque Transactions. With Yishay Mansour and Jamie Morgenstern. ACM-EC 2015. Journal version in ACM Transactions on Economics and Computation (TEAC) 6(3-4): 13:1-13:20, 2018 (Special Issue for EC 2015). We consider a setting where n buyers, with combinatorial preferences over m items, and a seller, running a priority-based allocation mechanism, repeatedly interact. Our goal, from observing limited information about the results of these interactions, is to reconstruct both the preferences of the buyers and the mechanism of the seller. More specifically, we consider an online setting where at each stage, a subset of the buyers arrive and are allocated items, according to some unknown priority that the seller has among the buyers. Our learning algorithm observes only which buyers arrive and the allocation produced (or some function of the allocation, such as just which buyers received positive utility and which did not), and its goal is to predict the outcome for future subsets of buyers. We derive mistake bound algorithms for additive, unit-demand and single minded buyers. We also consider the case where buyers' utilities for a fixed bundle can change between stages due to different (observed) prices.
Ignorance is Almost Bliss: Near-Optimal Stochastic Matching With Few Queries. With John P. Dickerson, Nika Haghtalab, Ariel D. Procaccia, Tuomas Sandholm, and Ankit Sharma. ACM-EC 2015. Journal version in Operations Research 68(1):16-34, 2020. We consider the problem of finding a maximum matching in a graph whose edges are unknown but can be accessed via queries. More specifically, we are given an initial graph G, where each edge may be "faulty" (succeeding or failing independently with success probability p) and we have the ability to query edges to determine which have succeeded or failed. We give algorithms that from a limited number of queries, and a limited number of rounds of queries, can find a matching nearly as high as the true maximum matching in the graph of live edges. Our motivation comes from the problem of kidney exchange, and we also empirically explore the application of (adaptations of) these algorithms to the kidney exchange problem, where patients with end-stage renal failure swap willing but incompatible donors. We show on both generated data and on real data from the first 169 match runs of the UNOS nationwide kidney exchange that even a very small number of non-adaptive edge queries per vertex results in large gains in expected successful matches.
Commitment Without Regrets: Online Learning in Stackelberg Security Games. With Nina Balcan, Nika Haghtalab, and Ariel D. Procaccia. ACM-EC 2015. In a Stackelberg Security Game, a defender commits to a randomized deployment of security resources, and an attacker best-responds by attacking a target that maximizes his utility. Here, we consider the case that there are k different types of attackers (each type has its own payoff matrix) and that a series of attackers of these types are arriving over time. After each attacker arrives, the defender receives some feedback (observing either the current attacker type or merely which target was attacked). We design no-regret algorithms whose regret (when compared to the best fixed strategy in hindsight) is polynomial in the parameters of the game, and sublinear in the number of time steps.
Privacy-preserving Public Information in Sequential Games. With Jamie Morgenstern, Ankit Sharma, and Adam Smith. ITCS 2015. We consider settings where competitors for limited resources want to maintain privacy of their actions and yet also coordinate so as to not all chase the same resources and end up with low overall social welfare. We consider a sequential-move setting and explore whether "noisy" information about the current state can be publicly announced in a manner that both (a) provably maintains privacy and (b) sufficies to keep play from reaching bad game-states. We show that in many games of interest, this is indeed possible. We model behavior of players in this imperfect information setting in two ways -- greedy and undominated strategic behaviors, and we prove guarantees on social welfare that certain kinds of privacy-preserving information can help attain. Furthermore, we design a counter with improved privacy guarantees under continual observation.
Learning Valuation Distributions from Partial Observation. With Yishay Mansour and Jamie Morgenstern. AAAI 2015. Auction theory traditionally assumes that bidders' valuation distributions are known to the auctioneer, such as in the revenue-optimal Myerson auction. However, this theory does not describe how the auctioneer comes to possess this information. In this work, we consider the problem of learning bidders' valuation distributions from much weaker forms of observations. Specifically, we consider a setting where there is a repeated sealed-bid auction, where all we can observe for each round is who won, but not how much they bid or paid. We can also participate (i.e., submit a bid) ourselves, and observe when we win. From this information, our goal is to (approximately) recover the inherently recoverable part of the underlying bid distributions for each bidder. We also consider extensions where different subsets of bidders participate in each round, and where bidders' valuations have a common-value component added to their independent private values.
Online Allocation and Pricing with Economies of Scale. With Yishay Mansour and Liu Yang. WINE 2015. We consider the problem of online allocation of goods that have a decreasing marginal cost per item to the seller, when customers are unit-demand and arrive one at a time, each with a valuation function on items sampled iid from some unknown distribution over valuation functions. Our strategy operates by using an initial sample to learn enough about the distribution to determine how best to allocate to future customers, together with an analysis of structural properties of optimal solutions that allow for uniform convergence analysis. We show, for instance, if customers have {0,1} valuations over items, and the goal of the allocator is to give each customer an item he or she values, we can efficiently produce such an allocation with cost at most a constant factor greater than the minimum over such allocations in hindsight, so long as the marginal costs do not decrease too rapidly. We also give a bicriteria approximation to social welfare for the case of more general valuation functions when the allocator is budget constrained.

2014

Active Learning and Best-Response Dynamics. With Nina Balcan, Chris Berlind, Emma Cohen, Kaushik Patnaik, and Le Song. Proc. 27th Annual Conference on Neural Information Processing Systems (NIPS) 2014. We examine a setting where low-power distributed sensors are each making highly noisy measurements of some unknown target function. A center wants to accurately learn this function by querying a small number of sensors, which ordinarily would be impossible due to the high noise rate. The question we address is whether local communication among sensors, together with natural best-response dynamics in an appropriately-defined game, can denoise the system without destroying the true signal and allow the center to succeed from only a small number of active queries. By using techniques from game theory and empirical processes, we prove positive (and negative) results on the denoising power of several natural dynamics. We then show experimentally that when combined with recent agnostic active learning algorithms, this process can achieve low error from very few queries, performing substantially better than active or passive learning without these denoising dynamics as well as passive learning with denoising.
Learning Mixtures of Ranking Models. With Pranjal Awasthi, Or Sheffet, and Aravindan Vijayaraghavan. Proc. 27th Annual Conference on Neural Information Processing Systems (NIPS) 2014. This work concerns the problem of learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model. Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima. We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. Before this work, even the question of identifiability in the case of a mixture of two Mallows models was unresolved.
Learning Optimal Commitment to Overcome Insecurity. With Nika Haghtalab and Ariel Procaccia. Proc. 27th Annual Conference on Neural Information Processing Systems (NIPS) 2014. Algorithms for Stackelberg security games compute an optimal strategy for the defender to commit to under the assumption the attacker will best-respond. Doing so generally requires knowledge of what the attacker's payoffs are. In this work, we design an algorithm that optimizes the defender's strategy with no prior information, by observing the attacker's responses to randomized deployments of resources and learning his priorities. In contrast to previous work, our algorithm requires a number of queries that is polynomial in the representation of the game.
Lazy Defenders Are Almost Optimal Against Diligent Attackers. With Nika Haghtalab and Ariel Procaccia. Proc. 28th AAAI Conference on Artificial Intelligence (AAAI), 2014.
Most work on Stackelberg security games assumes that the attacker can perfectly observe (and therefore will optimally respond to) the defender's randomized assignment of resources to targets. This assumption has been challenged by recent papers, which designed tailor-made algorithms that compute optimal defender strategies for security games with limited surveillance. We analytically demonstrate that in zero-sum security games, lazy defenders, who simply keep optimizing against perfectly informed attackers, are almost optimal against a wide range of attackers with more limited information. This result suggests that in many cases limited surveillance may not need to be explicitly addressed.
Estimating Accuracy from Unlabeled Data . With Anthony Platanios (lead author) and Tom Mitchell. UAI 2014.
We propose an approach for using unlabeled data to estimate the true accuracy of learned classifiers, given access to multiple classifiers making different "kinds" of errors. We first show how to estimate error rates exactly from unlabeled data when given at least three classifiers that make independent errors, based on their rates of agreement. We then show that even when the competing classifiers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. Experiments on two real-world data sets produce estimates within a few percent of the true accuracy, using solely unlabeled data.

2013

Fast Private Data Release Algorithms for Sparse Queries. With Aaron Roth. RANDOM 2013.
We revisit the problem of accurately and efficiently answering large classes of statistical queries while preserving differential privacy. In this paper we consider the class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than existing general techniques do when applied to sparse queries in that our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the ``infinite universe'' model in which no finite universe need be specified by the data curator.
Exploiting Ontology Structures and Unlabeled Data for Learning. With Nina Balcan and Yishay Mansour. ICML 2013.
We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data, motivated by the success of the CMU NELL (Never-Ending Language Learning) system. We present both information-theoretic results as well as efficient algorithms. We show in this model that an ontology, which specifies the relationships between multiple outputs, in some cases is sufficient to completely learn a classification using a large unlabeled data source. (The paper linked to here is a longer version of what appears in ICML2013).
Harnessing the Power of Two Crossmatches. With Anupam Gupta, Ariel Procaccia, and Ankit Sharma. ACM-EC 2013.
Kidney exchanges allow incompatible donor-patient pairs to swap kidneys, but each donation must pass three tests: blood, tissue, and crossmatch. In practice a matching is computed based on the first two tests, and then a single crossmatch test is performed for each matched patient. In this paper, we ask: if we were allowed to perform two crossmatches per patient, how could we best do so to maximize the number of matched patients? Our main result is a polynomial time algorithm for this problem that almost surely computes optimal --- up to lower order terms --- solutions on random large kidney exchange instances.
Differentially Private Data Analysis of Social Networks via Restricted Sensitivity. With Jeremiah Blocki, Anupam Datta, and Or Sheffet. ITCS 2013.
We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. Restricted sensitivity is similar to global sensitivity except instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over only a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the sensitivity of f only over deviations that remain within H. Moreover, if the belief of the querier is correct (i.e., D is in H) then f_H(D) = f(D). Thus, we maintain privacy whether or not D is in H and (when restricted sensitivity is low) provide accurate results in the event that H holds true. We then demonstrate the usefulness of this notion by applying it to the task of answering queries regarding social-networks, in both edge-adjacency and vertex-adjacency models.
Learnability of DNF with Representation-Specific Queries. With Liu Yang and Jaime Carbonell. ITCS 2013.
We study the problem of PAC learning the class of DNF formulas with the aid of pairwise queries that, given two positive examples, return whether or not the examples satisfy at least one term in common in the target formula. We also consider numerical queries that return the number of terms in common satisfied by the two examples. We provide both positive and negative results for learning with such queries under both uniform and general distributions. For example, for boolean queries, we show that learning an arbitrary DNF target under an arbitrary distribution is no easier than in the traditional PAC model. On the other hand, for numerical queries, we show we can learn arbitrary DNF formulas under the uniform distribution, and in the process, we give an algorithm for learning a sum of monotone terms from labeled data only. We also present a number of results for various DNF subclasses.

2012

Active Property Testing. With Nina Balcan, Eric Blais, and Liu Yang. FOCS 2012. [arXiv (full version)]
In this work, we define, analyze, and develop algorithms for the problem of property testing in a framework motivated by active learning. In this framework (as in most machine learning applications), one cannot obtain labels for arbitrary points in the input space; instead, one can only request labels from points in a given (polynomially) large unlabeled sample taken from the underlying distribution. We present both general results for this model as well as testers for various important classes. For example, we show that testing unions of d intervals can be done with O(1) label requests in this setting, a result that also yields improvements in both the full query and passive testing models as well. For testing linear separators in R^n over the Gaussian distribution, we show that both active and passive testing can be done with O(sqrt(n)) queries, substantially less than the Omega(n) needed for learning, with near-matching lower bounds. We also present a method for building testable properties out of others in this model, which we then use to provide testers for a number of assumptions used in semi-supervised learning. Finally, we develop a general notion of the testing dimension of a given property with respect to a given distribution, that we show characterizes (up to constant factors) the intrinsic number of label requests needed to test that property. We then use these dimensions to prove a number of lower bounds, including for linear separators and the class of dictator functions. Our work brings together tools from a range of areas including U-statistics, noise-sensitivity, self-correction, and spectral analysis of random matrices, and develops new tools that may be of independent interest.
The Johnson-Lindenstrauss transform itself preserves differential privacy. With Jeremiah Blocki, Anupam Datta, and Or Sheffet (lead author). FOCS 2012.
We show that the Johnson-Lindenstrauss transform provides a novel way of preserving differential privacy. In particular, if we take two databases, D and D', such that (i) D'-D is a rank-1 matrix of bounded norm and (ii) all singular values of D and D' are sufficiently large, then multiplying either D or D' with a vector of iid normal Gaussians yields two statistically close distributions in the sense of differential privacy. We apply the Johnson-Lindenstrauss transform to the task of approximating cut-queries: the number of edges crossing a (S,V-S)-cut in a graph. We show that the JL transform allows us to publish a sanitized graph that preserves edge differential privacy (where two graphs are neighbors if they differ on a single edge) while adding only O(|S|/epsilon) random noise to any given query w.h.p. Comparing the additive noise of our algorithm to existing algorithms for answering cut-queries in a differentially private manner, we outperform other methods on small cuts (|S| = o(n)). We also apply our technique to the task of estimating the variance of a given matrix in any given direction.
Additive Approximation for Near-perfect Phylogeny Construction. With Pranjal Awasthi, Jamie Morgenstern, and Or Sheffet. APPROX 2012. [arXiv]
We study the problem of constructing phylogenetic trees for a given set of species, formulated as that of finding a minimum Steiner tree on n points over the Boolean hypercube of dimension d. It is known that an optimal tree can be found in linear time if there is a perfect phylogeny: i.e., the cost of the optimal phylogeny is exactly d (deleting irrelevant coordinates). Moreover, if the data is a near-perfect phylogeny--the cost of the optimal tree is d+q for small q--it is known that an exact solution can be found in time polynomial in n and d, but exponential in q [BDHRS06]. Here, we give an algorithm running time time polynomial in n, d, and q that finds a phylogenetic tree of cost d+O(q^2). We also discuss the motivation and reasoning for studying such additive approximations.
Distributed Learning, Communication Complexity, and Privacy. With Nina Balcan, Shai Fine, and Yishay Mansour. COLT 2012.
Suppose you have two databases: one with the positive examples and another with the negative examples. How much communication between them is needed to learn a good hypothesis? In this work we examine this basic question and its generalizations, as well as related issues such as privacy. Broadly, we consider a framework where data is distributed among several locations, and our goal is to learn a low-error hypothesis over the joint distribution using as little communication, and as few rounds of communication, as possible. Our general results show that in addition to VC-dimension and covering number, quantities such as the teaching-dimension and mistake-bound of a class play an important role in determining communication requirements. Moreover, boosting can be performed in a generic manner in the distributed setting to achieve communication with only logarithmic dependence on 1/epsilon for any concept class. We also present tight results for a number of common specific concept classes including conjunctions, parity functions, and decision lists. For linear separators, we show that for non-concentrated distributions, we can use a version of the Perceptron algorithm to learn with much less communication than the number of updates given by the usual margin bounds. We additionally present an analysis of privacy, considering both differential privacy and a notion of distributional privacy that is especially appealing in this context.

2011

Welfare and Profit Maximization with Production Costs. With Anupam Gupta, Yishay Mansour, and Ankit Sharma. FOCS, 2011.
Combinatorial Auctions are a central problem in Algorithmic Mechanism Design: pricing and allocating goods to buyers with complex preferences in order to maximize social welfare or profit. The problem has been well-studied in the case of limited supply (one copy of each item), and in the case of digital goods (the seller can produce additional copies at no cost). Yet in the case of resources---oil, labor, computing cycles, etc.---neither of these abstractions is just right: additional supplies of these resources can be found, but at increasing difficulty (marginal cost) as resources are depleted. In this work, we initiate the study of combinatorial pricing under increasing marginal cost. The goal is to sell these goods, using posted prices, to buyers arriving online with unknown and arbitrary combinatorial valuation functions to maximize either the social welfare, or the seller's profit. We give algorithms that achieve constant factor approximations for a class of natural cost functions---linear, low-degree polynomial, logarithmic---and that give logarithmic approximations for more general increasing marginal cost functions (along with a necessary additive loss). We show that these bounds are essentially best possible for these settings.
Center-based Clustering under Perturbation Stability. With Pranjal Awasthi and Or Sheffet. Information Processing Letters, 112(1-2):49-54, Jan 2012. doi:10.1016/j.ipl.2011.10.002
In this paper we give algorithms for k-median, k-means, and other center-based clustering objectives, for instances that are stable to small constant-factor perturbations of the input. This notion of stability was studied by Bilu and Linial [BL10] in the context of the max-cut problem, where they showed that one could optimally solve max-cut instances stable to perturbations of size sqrt(n). In this work we show that stability to factor-3 perturbations is sufficient to find optimal solutions for any center-based clustering objective (such as k-median, k-means, and k-center) in the case of finite metrics without Steiner points, and that stability to factor 2 + sqrt(3) perturbations is sufficient for the case of general metrics. Specifically, we show that for such instances, the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering.

2010

A Discriminative Model for Semi-Supervised Learning. With Nina Balcan. JACM Vol 57, Issue 3, 2010. This is an expanded and more in-depth version of our COLT'05 paper "A PAC-style Model for Learning from Labeled and Unlabeled Data". See details below.
Trading off Mistakes and Don't-Know Predictions. With Amin Sayedi and Morteza Zadimoghaddam. NIPS 2010.
We consider an online learning framework in which the agent is allowed to say ``I don't know'' and analyze the achievable tradeoffs between saying ``I don't know'' and making mistakes. If mistakes have the same cost as don't-knows, the model reduces to the standard mistake-bound model, and if mistakes have infinite cost, the model reduces to KWIK framework introduced by Li, Littman, and Walsh. We propose a general, though inefficient, algorithm for general finite concept classes that minimizes the number of don't-know predictions subject to a given bound on the number of allowed mistakes. We then present specific polynomial-time algorithms for the concept classes of monotone disjunctions and linear separators with a margin.
Stability yields a PTAS for k-Median and k-Means Clustering. With Pranjal Awasthi and Or Sheffet. FOCS 2010. [longer version]
Ostrovsky et al. [ORSS06] show that given n points in Euclidean space such that the optimal (k-1)-means clustering is a factor 1/epsilon^2 more expensive than the best k-means clustering, one can get a (1+f(epsilon))-approximation to k-means in time poly(n,k) by using a variant of Lloyd's algorithm. In this work we show we can replace the "1/epsilon^2" with just "1+alpha" for any constant alpha>0 and obtain a PTAS. In particular, under this assumption, for any epsilon>0 we can achieve a 1+epsilon approximation for k-means in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha (our running time is n^O(1) * (k log n)^poly(1/epsilon,1/alpha).). We thus decouple the strength of the assumption from the quality of the approximation ratio. We give a PTAS for k-median in finite metrics under the analogous assumption. We also show we can obtain a PTAS under the assumption of Balcan-Blum-Gupta09 (see below) that all 1+alpha approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than 2delta n and alpha is constant. Note that the point of BBG09 is that the true goal in clustering is usually to get close to the target rather than to achieve a good objective value. From this perspective, our advance is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition in BBG09 needed to get exactly delta-close from O(delta*n) to delta*n. Our results are based on a new notion of clustering stability.
On Nash-Equilibria of Approximation-Stable Games. With Pranjal Awasthi, Nina Balcan, Or Sheffet and Santosh Vempala. SAGT 2010. Journal version in Current Science, Vol 103, Issue 9, November 10, 2012.
In this paper, we define the notion of games that are approximation stable, meaning that all epsilon-equilibria are contained inside a small ball of radius Delta around a true equilibrium, which is a natural condition if you want play to be predictable even if players are only at approximate equilibrium. Many natural small games such as matching pennies and rock-paper-scissors are indeed approximation stable. We show both upper and lower bounds on size of supports of approximate equilibria in such games, yielding more efficient algorithms for computing approximate equilibria as Delta gets close to epsilon. We also consider an inverse condition, namely that all non-approximate equilibria are far from some true equilibrium, and give an efficient algorithm for games satisfying that condition.
Improved Guarantees for Agnostic Learning of Disjunctions. With Pranjal Awasthi and Or Sheffet. COLT 2010.
Given some arbitrary distribution D over {0,1}^n and arbitrary target function c, the goal in agnostic learning of disjunctions is to achieve an error rate comparable to the error OPT of the best disjunction with respect to (D,c). In recent work, [Peleg07] shows how to achieve a bound of O(sqrt(n)*OPT) + epsilon in polynomial time. In this paper we improve on Peleg's bound, giving a polynomial-time algorithm achieving a bound of O(n^{1/3 + alpha}*OPT) + epsilon, for any constant alpha>0. The heart of the algorithm is a method for weak-learning when OPT = O(1/n^{1/3+alpha}), which can then be fed into existing agnostic boosting procedures to achieve the desired guarantee.
Circumventing the Price of Anarchy: Leading Dynamics to Good behavior. With Nina Balcan and Yishay Mansour. ICS 2010. Journal version combining this work and "Improved equilibria via public service advertising" (SODA 2009) appears in SIAM J. Computing, 42(1), 230-264, 2013.
We explore the problem of how self-interested agents with some knowledge of the game might be able to quickly find their way to states of quality close to the best equilibrium in games with high price of anarchy but low price of stability. We consider two natural learning models in which players adaptively decide between greedy behavior and following a proposed good but untrusted strategy and analyze two important classes of games in this context, fair cost-sharing and consensus games. These games both have very high Price of Anarchy and yet we show that behavior in these models can efficiently reach low-cost states.

2009

Thoughts on Clustering . Essay for the 2009 NIPS Workshop "Clustering: Science or Art?"
Tracking Dynamic Sources of Malicious Activity at Internet-Scale. With Shobha Venkataraman, Dawn Song, Subhabrata Sen, and Oliver Spatscheck. NIPS 2009.
We consider the problem of discovering dynamic malicious regions on the Internet. We model this problem as one of adaptively pruning a known decision tree (in particular, the IP address-space tree), but with additional challenges: (1) severe space requirements, since the underlying decision tree has over 4 billion leaves, and (2) a changing target function, since malicious activity on the Internet is dynamic. We present a novel algorithm that addresses this problem, by combining "experts" and online paging algorithms. We prove guarantees on our algorithm's performance as a function of the best possible pruning of a similar size, and our experiments show that our algorithm achieves high accuracy on large real-world data sets, improving over existing approaches.
The Price of Uncertainty. With Nina Balcan and Yishay Mansour. ACM-EC 2009. [slides]
We study the degree to which small fluctuations in costs in well-studied potential games can impact the result of natural best-response and improved-response dynamics. We consider a wide variety of potential games including fair cost-sharing games, set-cover games, routing games, and job-scheduling games. We show that in certain cases, even extremely small fluctuations can cause these dynamics to spin out of control and move to states of much higher social cost, whereas in other cases these dynamics are much more stable even to large degrees of fluctuation. We also consider the resilience of these dynamics to a small number of Byzantine players about which no assumptions are made. We show that in certain cases (e.g., fair cost-sharing, set-covering, job-scheduling) even a single Byzantine player can cause best-response dynamics to transition to states of substantially higher cost, whereas in others (e.g., the class of beta-nice games which includes routing, market-sharing and many others) these dynamics are much more resilient.
Approximate Clustering without the Approximation. With Nina Balcan and Anupam Gupta. SODA 2009. Journal version: Clustering Under Approximation Stability, JACM, Volume 60, Issue 2, April 2013. [unofficial local copy]
For most clustering problems, our true goal is to classify the points correctly, and commonly studied objectives such as k-median, k-means, and min-sum are really only a proxy. That is, there is some unknown correct clustering (grouping proteins by their function or grouping images by who is in them) and the implicit hope is that approximately optimizing these objectives will in fact produce a clustering that is close pointwise to the correct answer. In this paper, we show that if we make this implicit assumption explicit---that is, if we assume that any c-approximation to the given clustering objective F is epsilon-close to the target---then we can produce clusterings that are O(epsilon)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c > 1, and for min-sum objective we can do this for any constant c > 2. Our results also highlight a difference between assuming that the optimal solution to, say, the k-median objective is epsilon-close to the target, and assuming that any approximately optimal solution is epsilon-close to the target. In the former case, the problem of finding a solution that is O(epsilon)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
Improved Equilibria via Public Service Advertising. With Nina Balcan and Yishay Mansour. SODA 2009.
Many natural games have both good and bad Nash equilibria. In such cases, one could hope to improve poor behavior by a "public service advertising campaign" encouraging players to follow a good equilibrium, and if every player follows the advice then we are done. However, it is a bit much to assume that everyone will follow along. In this paper we consider the question of to what extent can such an advertising campaign cause behavior to switch from a bad equilibrium to a good one even if only a fraction of people actually follow the given advice, and do so only temporarily. Unlike in the ``value of altruism'' model, we assume everyone will ultimately act in their own interest. We analyze this question for several important and widely studied classes of games including network design with fair cost sharing, scheduling with unrelated machines, and party affiliation games (which include consensus and cut games). We show that for some of these games (such as fair cost sharing), a random alpha fraction of the population following the given advice is sufficient to get a guarantee within an O(1/alpha) factor of the price of stability for any alpha > 0. However, for some games (such as party affiliation games), there is a strict threshold (in this case, alpha < 1/2 yields almost no benefit, yet alpha > 1/2 is enough to reach near-optimal behavior), and for some games, such as scheduling, no value alpha < 1 is sufficient.

2008

Clustering with Interactive Feedback. With Nina Balcan. ALT 2008.
We initiate a theoretical study of the problem of clustering data under interactive feedback. We introduce a query-based model in which users can provide feedback to a clustering algorithm in a natural way via split and merge requests. We then analyze the ``clusterability'' of different concept classes in this framework --- the ability to cluster correctly with a bounded number of requests under only the assumption that each cluster can be described by a concept in the class --- and provide efficient algorithms as well as information-theoretic upper and lower bounds.
Improved Guarantees for Learning via Similarity Functions. With Nina Balcan and Nati Srebro. COLT 2008.
We provide a new broader notion of a "good similarity function" that improves in two important ways upon the notion in [BB06]. First, as before, any large-margin kernel is also a good similarity function in our sense, but now with a much milder degradation of the parameters. Second, we can show that for distribution-specific PAC learning, the new notion is strictly more powerful that the traditional notion of a large-margin kernel: although any concept class that can be learned with some kernel function can also be learned using our new similarity based approach, the reverse is not true. (In contrast, the [BB06] definition is no more powerful than kernels for distribution-specific learning.) Our new notion of similarity relies upon L_1 regularized learning, and our separation result is related to a separation result between what is learnable with L_1 vs. L_2 regularization.
Item Pricing for Revenue Maximization. With Nina Balcan and Yishay Mansour. ACM-EC 2008.
This paper considers the problem of pricing items to maximize revenue from buyers with unknown complex preferences over bundles, and presents two main results. (1) for the case of unlimited supply, a random single price achieves a logarithmic approximation for buyers with general valuation functions (not just single-minded or unit-demand as was previously known). (2) for the case of limited supply, a random single price (with buyers arriving in an arbitrary order) achieves an exp(sqrt(log(n)loglog(n))) approximation, with a near-matching lower bound. Also includes results for multi-unit auctions and "simple submodular" valuations. An earlier tech report with just the first result appears here.
A Discriminative Framework for Clustering via Similarity Functions. With Nina Balcan and Santosh Vempala. STOC 2008. [full version (2009)]
Theoretical treatments of clustering from pairwise similarity information typically view the similarity information as ground-truth and then design algorithms to (approximately) optimize various graph-based objective functions. However, in most applications, this similarity information is merely based on some heuristic: the true goal is to cluster the points correctly rather than to optimize any specific graph property. In this work, we develop a theoretical framework for clustering from this perspective. In particular, motivated by work in learning theory that asks ``what natural properties of a similarity (or kernel) function are sufficient to be able to learn well?'' we ask ``what natural properties of a similarity function are sufficient to be able to cluster well?'' Our approach can be viewed as developing a PAC model for clustering, where the natural object of study, rather than being a concept class, is more like a class of (concept, distribution) pairs.
Regret Minimization and the Price of Total Anarchy. With MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. STOC 2008.
This paper proposes weakening the assumption made when studying the price of anarchy: Rather than assume that self-interested players will play according to a Nash equilibrium, we assume only that selfish players play so as to minimize their own regret. Regret minimization can be done via simple, efficient algorithms even in many settings where the number of action choices for each player is exponential in the natural parameters of the problem. We prove that despite our weakened assumptions, in several broad classes of games, this ``price of total anarchy'' matches the Nash price of anarchy, even though play may never converge to Nash equilibrium. We also show that the price of total anarchy is in many cases resilient to the presence of Byzantine players, about whom we make no assumptions.
A Learning Theory Approach to Non-Interactive Database Privacy. With Katrina Ligett and Aaron Roth. STOC 2008. Journal version: JACM, Volume 60, Issue 2, April 2013. [unofficial local copy]
We demonstrate that, ignoring computational constraints, it is possible to release databases preserving differential privacy that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We also present an efficient algorithm for "large margin halfspace" queries. In addition, inspired by learning theory, we introduce a new notion of data privacy, which we call distributional privacy, and show that it is strictly stronger than differential privacy.
Veritas: Combining expert opinions without labeled data.. With Sharath Cholleti (lead author), Sally Goldman, David Politte, and Steven Don. International Journal on Artificial Intelligence Tools, 2009: 633-651. (originally appeared in ICTAI, 2008).
Looks at a boosting-based method for combining expert opinions when only unlabeled data is present, motivated by the problem of segmenting lung nodules in CT scans.
Limits of Learning-based Signature Generation with Adversaries. With Shobha Venkataraman (lead author) and Dawn Song. Network and Distributed Systems Security Symposium (NDSS) 2008.
We give limits on the accuracy of pattern-extraction algorithms for signature generation in an adversarial setting, by adapting and extending lower bounds for online learning in the mistake-bound model, when there are limits on the number of allowed mistakes of different types.

2007

Mechanism Design, Machine Learning, and Pricing Problems. With Nina Balcan. SIGecom Exchanges 2007, special issue on Combinatorial Auctions.
Short survey article on machine learning techniques for mechanism design.
A Theory of Loss-Leaders: Making Money by Pricing Below Cost. With Nina Balcan, T-H. Hubert Chan and MohammadTaghi Hajiaghayi. WINE 2007.
Separating Populations with Wide Data: a Spectral Analysis. With Amin Coja-Oghlan, Alan Frieze, and Shuheng Zhou. 18th International Symposium on Algorithms and Computation (ISAAC 2007). LNCS 4835, pp. 439-451.
We consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality gamma, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size---the product of number of data points n and the number of features K---needed to correctly perform this partitioning as a function of 1/gamma for K>n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
Clearing Algorithms for Barter Exchange Markets: Enabling Nationwide Kidney Exchanges. With David Abraham and Tuomas Sandholm. ACM-EC 2007.
Shows how MIP techniques can be used to get a more scalable and robust algorithm for solving optimization problems involved in clearing paired-donation kidney exchanges. Algorithm is currently in use by the Alliance for Paired Donation.
Learning, Regret Minimization, and Equilibria [ps]. With Yishay Mansour. Book chapter in Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani, eds. [Slides for related talk]
Book chapter describing connections between online learning and game theory. Includes description of algorithms for combining expert advice (minimizing external regret), algorithms for the stronger goal of minimizing internal regret, algorithms for the limited-feedback (multi-arm bandit) setting, and connections between these and minimax optimality (for zero-sum games) and correlated equilibria (for general-sum games). Also discusses how such algorithms will approach Nash equilibrium in non-atomic routing games.
FiG: Automatic Fingerprint Generation. With Juan Caballero, Shobha Venkataraman, Pongsin Poosankam, Min Gyung Kang, and Dawn Song. In NDSS 2007.
Open Problems in Efficient Semi-Supervised PAC Learning. With Nina Balcan. COLT'07 Open Problems List.
Open problems about computationally-efficient semi-supervised learning that I would love to see solved (small monetary rewards offered).

2006

On a Theory of Learning with Similarity Functions. With Nina Balcan. International Conference on Machine Learning (ICML), pp. 73-80, 2006. Journal version combines this conference paper with subsequent paper of Nati Srebro from COLT 2007: Machine Learning Journal 72(1-2):89-112, August, 2008. DOI 10.1007/s10994-008-5059-5. [NIPS'05 workshop talk] [Cornell'07 colloquium talk (broader)]
Kernel functions have become an extremely popular tool in machine learning. They have an attractive theory that describes a kernel function as being good for a given learning problem if data is separable by a large margin in a (possibly very high-dimensional) implicit space defined by the kernel. This theory, however, has a bit of a disconnect with the intuition of a good kernel as a good similarity function. In this work we develop an alternative theory of learning with similarity functions more generally (i.e., sufficient conditions for a similarity function to allow one to learn well) that does not require reference to implicit spaces, and does not require the function to be positive semi-definite. Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels that describes the effectiveness of a given kernel function in terms of natural similarity-based properties.
Routing without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games. With Eyal Even-Dar and Katrina Ligett. PODC, pp. 45-52, 2006. [Slides for related talk]
A number of no-regret algorithms have been developed in the game-theory and online learning literature. This paper considers the question: if each player in a routing game uses a no-regret strategy to choose their route on day t+1 based on their experience on days 1,...,t, what can we say about the overall behavior of the system? The main result of this paper is that in the Roughgarden-Tardos setting of multicommodity flow and infinitesimal agents, if each player uses a no-regret strategy then a 1-epsilon fraction of the daily flows will be epsilon-Nash (almost all users having only a small incentive to deviate) where epsilon approaches 0 at a rate that depends polynomially on the players' regret bounds and the maximum slope of any latency function.
Approximation Algorithms and Online Mechanisms for Item Pricing. With Nina Balcan. Theory of Computing, 3/9:179-195, 2007. Originally appeared in ACM Conference on Electronic Commerce, pp. 29-35, 2006. [Slides for talk given at Spencer06-60]
Presents approximation and online algorithms for a number of problems of pricing items so as to maximize a seller's revenue in an unlimited supply setting. Our main result is an O(k)-approximation for pricing items to single-minded bidders who each want at most k items. For the case k=2 (where we get a 4-approximation) this can be viewed as the following graph vertex pricing problem: given a graph G with valuations v_e on the edges, find prices p_i for the vertices to maximize sum_{e=(i,j): v_e >= p_i+p_j} p_i + p_j. We also show how these algorithms can be applied to the online setting, in which customers arrive over time and must be presented with prices that depend only on information gained from customers seen in the past.
A Random-Surfer Web-Graph Model With Hubert Chan and Mugizi Rwebangira. ANALCO '06.
This paper gives theoretical and experimental results on a random-surfer model for construction of a random graph. In this model, a new node connects to the existing graph by choosing a start node at random and then performing a short random walk, flipping a coin at each node visited to decide whether or not to stop and connect there. Our understanding of this model is still quite preliminary, though. Many open questions.
Random Projection, Margins, Kernels, and Feature-Selection [ps]. LNCS 3940, pp. 52-68, 2006. Survey article based on an invited talk given at the 2005 PASCAL Workshop on Subspace, Latent Structure and Feature selection techniques.
Random projection is a simple technique that has had a number of applications in algorithm design. In the context of machine learning, it can provide insight into questions such as ``why is a learning problem easier if data is separable by a large margin?'' and ``in what sense is choosing a kernel much like choosing a set of features?'' This article is intended to provide an introduction to random projection and to survey some simple learning algorithms and other applications to learning based on it. Portions of this article are based on work in [BB05,BBV04] joint with Nina Balcan and Santosh Vempala.

2005

Reducing Mechanism Design to Algorithm Design via Machine Learning [short version, long version]. With Nina Balcan, Jason Hartline, and Yishay Mansour. JCSS 74:1245-1270, 2008 (JCSS special issue on Learning Theory). Originally appeared as "Mechanism Design via Machine Learning", FOCS 2005. [slides]
Examines how sample-complexity arguments in machine learning can be used to reduce problems of incentive-compatible mechanism design to standard algorithmic questions, for a wide class of revenue-maximizing pricing problems.
From External to Internal Regret [local ps] [local pdf]. With Yishay Mansour. JMLR 8(Jun):1307--1324, 2007. Originally appeared in COLT 2005.
Gives a generic method for converting external-regret algorithms to internal-regret algorithms, along with a specific algorithm for the bandit setting. Also gives a new simple method for a generalized "sleeping experts" setting. If you are interested in this paper you should definitely also check out Stoltz and Lugosi, "Internal Regret in On-Line Portfolio Selection", MLJ 59 (1/2), 2005. Their paper gives a different conversion procedure and has a number of other results. See also Gilles Stoltz's PhD thesis.
A Discriminative Model for Semi-Supervised Learning. With Nina Balcan. JACM Vol 57, Issue 3, 2010. (Original COLT '05 paper titled "A PAC-style Model for Learning from Labeled and Unlabeled Data")
This paper gives an extension to the PAC model that allows one to discuss ways of using unlabeled data to help with learning. The basic idea is that rather than "learning a class C", one instead talks of "learning a class C under compatibility notion χ", where χ(h,D) tells how a-priori compatible a proposed hypothesis h is with respect to a given distribution D. E.g., if you believe there should be a large-margin separator then your χ would give a low score to h's with large probability mass near the separating hyperplane. Or in co-training, χ would penalize hypotheses (h_1,h_2) such that Pr_x(h_1(x) ≠ h_2(x)) is large. If χ is "legal" (in a sense defined in the paper) then we can use this model to give sample-complexity bounds for both labeled and unlabeled data, and talk about conditions under which unlabeled data can significantly reduce the number of labeled examples needed. We also talk about well-justified ways of performing regularization in this setting and give a number of algorithmic results as well.
Practical Privacy: The SuLQ Framework. With Cynthia Dwork, Frank McSherry, and Kobbi Nissim. PODS 2005.
New Streaming Algorithms for Fast Detection of Superspreaders. With Shobha Venkataraman, Dawn Song, and Phillip Gibbons. NDSS 2005.
Experimental and theoretical work on streaming algorithms (one pass, logarithmic memory) for detecting sources that send to many distinct destinations. That is, given a sequence of (x,y) pairs, you want to identify those x's that have appeared paired with many different y's.
Near-Optimal Online Auctions. With Jason Hartline. SODA 2005, pages 1156--1163.
Uses an approach based on an online learning algorithm of Kalai to get improved bounds for the problem of adaptive pricing of a digital good. We consider both the online auction and posted price settings.

2004

Co-Training and Expansion: Towards Bridging Theory and Practice. With Nina Balcan and Ke Yang. NIPS 2004.
This paper looks at conditions needed for co-training to succeed in terms of expansion properties of the underlying distribution. Proves bounds for the case that we have base learning algorithms able to make only one-sided error (i.e., learn from positive data only). Expansion is a much weaker condition than those considered previously, such as independence given the label, and appears to be the "right" condition on the distribution needed in order for co-training to work well when the base algorithms have only 1-sided error.
Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings [local copy]. With Nina Balcan and Santosh Vempala. Machine Learning, 65:79-94, 2006, DOI: 10.1007/s10994-006-7550-1. Extended abstract in 15th International Conference on Algorithmic Learning Theory (ALT '04). Springer LNAI 3244, pp. 194-205, 2004. [talk ppt] Here is a related survey article.
Kernel functions are typically viewed as implicit mappings to a high-dimensional space that allow one to "magically" get the power of that space without having to pay for it, if data is separable in that space by a large margin. In this paper we show that in the presence of a large margin, a kernel can instead be efficiently converted into a mapping to a low dimensional space. In particular, we give an efficient procedure that, given black-box access to the kernel and unlabeled data, generates a small number of features that approximately preserve both separability and margin.
Detection of Interactive Stepping Stones: Algorithms and Confidence Bounds. With Dawn Song and Shobha Venkataraman. 7th International Symposium on Recent Advances in Intrusion Detection (RAID '04). Springer LNCS 3224, pp. 258-277, 2004.
Use analysis of random walks to detect stepping-stone attacks under the "maximum delay bound" assumption. Gives learning-style bounds on number of packets that need to be observed to perform detection at a desired confidence level.
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary. With Brendan McMahan. COLT '04, pages 109-123.
We show how the recent, elegant result of Kalai and Vempala for online geometric optimization can be extended to the "bandit" version of the problem, in which one is only told of the cost incurred and not the full cost vector, even in the repeated-game setting (an adaptive adversary).
Semi-Supervised Learning Using Randomized Mincuts. With John Lafferty, Mugizi Robert Rwebangira, and Rajashekar Reddy. ICML '04.
We consider a randomized version of the mincut approach to learning from labeled and unlabeled data (see paper with Shuchi Chawla from 2001), and motivate it from both a sample-complexity perspective and from the goal of approximating Markov Random Field per-node probabilities.
Approximation Algorithms for Deadline-TSP and Vehicle Routing with Time-Windows. With Nikhil Bansal, Shuchi Chawla, and Adam Meyerson. STOC '04, pages 166-174.
Consider a version of the metric TSP problem in which each node has a value and the goal is to collect as much value as possible, *but* each node also has a deadline and only counts if it is reached by its deadline. (More generally, nodes might have release dates too.) We give an O(log n) apx for deadlines, O(min[log^2 n, log D_max]) for time-windows, and a bicriteria approximation with the interesting property that it can achieve an O(k)-approximation while violating the deadlines by only a (1 + 1/2^k) factor. Big open question: can you get a constant factor apx?

2003

Approximation Algorithms for Orienteering and Discounted-Reward TSP [local copy]. With Shuchi Chawla, David Karger, Adam Meyerson, Maria Minkoff, and Terran Lane. SIAM J. Computing 37(2):653-670, 2007. An earlier version appears in FOCS'03, pages 46-55. Also available as Tech report CMU-CS-03-121.
We give a constant-factor approximation algorithm for the rooted Orienteering problem on general graphs, and for a new problem that we call the Discounted-Reward TSP. Given a weighted graph with rewards on nodes, and a start node s, the goal in the Orienteering Problem is to find a path that maximizes the reward collected, subject to a hard limit on the total length of the path. In the Discounted-Reward TSP, instead of a length limit we are given a discount factor gamma, and the goal is to maximize total discounted reward collected, where reward for a node reached at time t is discounted by gamma^t. This is similar to the objective in MDPs except we only receive a reward the first time a node is visited.
Scheduling for Flow-Time with Admission Control. With Nikhil Bansal, Shuchi Chawla, and Kedar Dhamdhere. ESA '03, pages 43-54.
Considers the problem of job scheduling to minimize flow time, when the server is allowed to reject jobs at some penalty. This can be thought of the problem of managing your to-do list, when your cost function is the total amount of time that jobs are sitting on your stack plus a cost for each job that you say "no" to. E.g., if you initially agree to a task (like refereeing a paper) and then six months later you realize you cannot do it and say no, you pay both for the "no" and for the six months it has been sitting on your desk. We give 2-competitive online algorithms for the case of unweighted flow time and uniform costs, and extend some of our results to the case of weighted flow time and machines with varying speeds. We also give a resource augmentation result for the case of arbitrary penalties and present a number of lower bounds.
Preference Elicitation and Query Learning. With Jeffrey Jackson, Tuomas Sandholm, and Martin Zinkevich. Journal of Machine Learning Research 5:649--667, 2004 (special issue on Learning Theory). Extended abstract appears in COLT '03.
Explores the connection between "preference elicitation", a problem that arises in combinatorial auctions, and the problem of learning via queries. Preference elicitation can be thought of as a kind of learning problem with multiple target concepts, but where the goal is not to identify the concepts so much as it is to produce an "optimal example".
PAC-MDL Bounds. With John Langford. COLT '03.
Attempts to unify a number of bounds (including VC-dimension and PAC-Bayes) in a single MDL framework. In this setting, Alice has a set of labeled examples, Bob has the same examples but without labels, and Alice's job is to communicate the labels to Bob using only a small number of bits. The standard Occam's razor results say that if Alice can do this by sending a hypothesis (a function h(x) over single examples that Bob would then run m times) then she can be confident in the predictive ability of that hypothesis. But what about other methods of communicating labels? Extending Occam's razor to generic communication schemes requires a bit of "definition design" but can then encompass the more powerful VC-dimension and PAC-Bayes bounds.
Planning in the Presence of Cost Functions Controlled by an Adversary. With Brendan McMahan and Geoff Gordon. ICML '03.
Looks at fast algorithms for finding "minimax optimal plans" in a certain adversarial MDP setting. Includes some experimental results on a robot navigation domain.
Open problem: Learning a function of r relevant variables. COLT 2003 open problems session.
On Statistical Query Sampling and NMR Quantum Computing. With Ke Yang. 18th IEEE Conference on Computational Complexity (CCC '03).
We introduce a problem called Statistical Query Sampling that models an issue that arises in NMR quantum computing. We give a number of lower bounds for this problem, and relate it to the (more standard) problem of Statistical Query Learning.
On Polynomial-Time Preference Elicitation with Value Queries. With Martin Zinkevich and Tuomas Sandholm. ACM Conference on Electronic Commerce, 2003.
We consider the question of whether interesting classes of preferences can be elicited in polynomial time using value queries. Building on known results on Membership Query learning, we show that read-once formulas over a set of gates motivated from a shopping-agent scenario can be elicited in polynomial time, as well as a class of preferences we call "toolbox DNF". We also give a number of (positive and negative) results for the subsequent allocation problem. For instance, we show how network flow can be used to do allocation efficiently with two bidders with toolbox-DNF preferences.
Combining Online Algorithms for Rejection and Acceptance. With Yossi Azar and Yishay Mansour. SPAA '03, pages 159-163. Combined with subsequent paper by David Bunde and Yishay Mansour in journal version appearing in Theory of Computing, 1:105-117, 2005.
The call-control problem has the interesting property that in some versions one can design online algorithms with good competitive ratio in terms of fraction of calls accepted, in some versions one can design algorithms with good C.R. in terms of the fraction rejected, and in some versions one can do both, but in the last case, the algorithms tend to be very different. We consider the problem: given an algorithm A with competitive ratio c_A for fraction accepted, and an algorithm R with ratio c_R in terms of fraction rejected, can we combine them into a single algorithm that is good under both measures? We do this achieving ratio O(c_A^2) [improved in journal version to O(c_A)] for acceptance and O(c_A c_R) for rejection.
Online Oblivious Routing. With Nikhil Bansal, Shuchi Chawla, and Adam Meyerson. 15th ACM Symposium on Parallel Algorithms and Architectures (SPAA '03), pages 44-49.
Uses online learning tools to develop a polynomial-time algorithm for performing nearly as well as the best fixed routing in hindsight, in a repeated "oblivious routing game". In this setting the algorithm is allowed to choose a new routing each night, but is still oblivious to the demands that will occur the next day. Our result is a strengthening of a recent result of Azar et al., who gave a polynomial time algorithm to find the minimax optimal strategy in this game. It is a strengthening in that it achieves a competitive ratio arbitrarily to close to that of Azar et al., while at the same time performing nearly as well as the optimal static routing for the sequence of demands that actually occurred.
Online Learning in Online Auctions. With Vijay Kumar, Atri Rudra, and Felix Wu. SODA '03, pages 202-204. The link points to a somewhat longer version.
Describes how the Weighted-Majority algorithm can be used to get improved bounds for online auctions of digital goods, as well as for the posted price setting (that corresponds to a "bandit" version of the problem: the auctioneer has to pick a price first, and then only gets the single bit back indicating whether or not the buyer purchased). Also gives some lower bounds.

2002

Correlation Clustering [local copy]. With Nikhil Bansal and Shuchi Chawla. Machine Learning 56(1-3):89-113, 2004 (Special Issue on Theoretical Advances in Data Clustering). An earlier version appears in FOCS '02, pages 238--247.
Considers a clustering problem motivated by machine-learning style applications, from the perspective of approximation algorithms. We give a constant-factor approximation under a cost measure and a PTAS under a benefit measure. A nice feature of this clustering formulation is that one does not need to specify the desired number of clusters in advance.
Smoothed Analysis of the Perceptron Algorithm for Linear Programming. With John Dunagan. SODA '02, pages 905--914.
This paper shows that the simple Perceptron algorithm has good behavior for linear programming (polynomial-time whp) in the smoothed analysis model of Spielman-Teng. Spielman-Teng had shown this for a specific version of the Simplex algorithm. It is interesting that the bounds for the Perceptron algorithm are better than those known for Simplex in this model, as a function of most of the parameters. The one exception is the "epsilon" term (e.g., if you are interested in a bound that holds on 99% of the instances, then the Perceptron bounds are better, but Perceptron has an epsilon chance of taking time Omega(1/epsilon^2), so does badly in expectation). However, I think the real difference is that Perceptron solves only the feasibility problem, and not the optimization problem. Normally, these are equivalent by simple reduction, but it is not clear that reduction makes sense in the smoothed-analysis model, because it involves a binary-search that will surely create ill-conditioned instances. So, it could well be that feasibility is a strictly easier problem than optimization in this model.
Online Algorithms for Market Clearing. With Tuomas Sandholm and Martin Zinkevich. JACM 53(5): 845-879, 2006. Extended abstract in SODA '02, pages 971-980.
We consider the problem of market clearing in a double auction (exchange) where buyers and sellers arrive and depart online. We give algorithms with optimal competitive ratios for several natural objectives and also give a few results having to do with learning and incentive-compatibility.
Static Optimality and Dynamic Search-Optimality in Lists and Trees. With Shuchi Chawla and Adam Kalai. Algorithmica 36(3):249-260, 2003 (special issue on online algorithms). Originally appeared in Proceedings of the 13th Annual Symposium on Discrete Algorithms (SODA), pages 1--8, 2002.
This paper uses notions from online learning to attack several problems in adaptive data-structures.

2001

Admission Control to Minimize Rejections. With Adam Kalai and Jon Kleinberg. Internet Mathematics 1(2):165--176, 2004. Originally appeared in Proceedings of WADS'01 (LNCS 2125, pp.155-164, 2001).
Studies admission control from the perspective of approximately minimizing rejections, getting a factor of 2 for a collection of natural problems. This can make more sense than the usual perspective (apx maximizing the number of acceptances) if we are not highly overloaded (e.g., if optimal can accept 99% of requests).
Learning from Labeled and Unlabeled Data using Graph Mincuts. With Shuchi Chawla. ICML '01, pp. 19-26.
A natural extension of nearest-neighbor algorithms, when you add in unlabeled data, leads to viewing learning as a mincut problem. This paper explores this connection and gives some empirical results.

2000

FeatureBoost: A Meta Learning Algorithm that Improves Model Robustness. With Joseph O'Sullivan, John Langford, and Rich Caruana. ICML '00, pp. 703--710.
How can you make learning algorithms less "lazy", so that they search for multiple "really-different" prediction rules, in case we are later faced with data in which features are corrupted or obscured?
Noise-tolerant Learning, the Parity problem, and the Statistical Query model. With Adam Kalai and Hal Wasserman. JACM 50(4): 506-519 (2003). Extended abstract in STOC'00, pp. 435--440.
This paper gives a slightly sub-exponential algorithm for learning parity in the presence of random noise. Scaling the problem down gives the first known example of a problem that can be learned in polynomial time from noisy data but cannot be learned in polynomial time in the Statistical Query model of Kearns.

1999

Finely-competitive Paging. With Carl Burch and Adam Kalai. FOCS'99, pp. 450--458.
Using ideas from online learning, we give a paging algorithm with especially good behavior under a fine-grained notion of competitive ratio. For instance, the algorithm gives near-optimal performance when the request stream can be partitioned unto a small number of working sets. Unfortunately, the algorithm itself is not computationally efficient.
Probabilistic Planning in the Graphplan Framework. With John Langford. 5th European Conference on Planning (ECP'99). See the PGP web page.
Approaches probabilistic planning from the Graphplan perspective. The result ends up looking at lot like a game-tree search, but using the planning graph to quickly prune states that can be guaranteed not to reach the goals in time.
Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation. With Adam Kalai and John Langford. Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT '99), pp. 203--208.
We show that for k>2, k-fold CV is at least slightly better than simply using a hold-out set of 1/k of the examples, in terms of the quality of the error estimate. We also analyze a "progressive validation" approach (similar to a method used by Littlestone for converting online algorithms to batch) that we show is in many ways as good as the hold-out, while using on average half as many examples for testing.
Microchoice Bounds and Self Bounding Learning Algorithms. With John Langford. Machine Learning 51(2): 165-179 (2003). Originally appeared in Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT '99), pp. 209--214.
Gives adaptive sample-complexity bounds for learning algorithms that work by making a sequence of small choices. These allow for a computationally-efficient version of Freund's Query-Trees.
On-line Algorithms for Combining Language Models. With Stan Chen, Adam Kalai, and Roni Rosenfeld. In Proceedings of the International Conference on Accoustics, Speech, and Signal Processing (ICASSP '99). [postscript] [gzipped postscript]
Uses online portfolio-selection algorithms in the context of combining language models, with experimental comparisons.

1998

On Learning Monotone Boolean Functions. With Carl Burch and John Langford. Proceedings of the 39th Annual Symposium on Foundations of Computer Science (FOCS '98).
For learning an arbitrary monotone Boolean function over the uniform distribution, we give a simple algorithm that achieves error rate at most 1/2 - Omega(1/sqrt(n)), and show that no algorithm can do better than 1/2 - omega(log(n)/sqrt(n)) from a polynomial size sample. These improve over the previous best upper and lower bounds.
Combining Labeled and Unlabeled Data with Co-Training [pdf]. With Tom Mitchell. Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 92--100, 1998. (The document linked to here fixes some minor bugs in the COLT version).
Introduces and studies Co-Training, a natural approach to using unlabeled data when we have two different sources of information about each example. The idea is to train two classifiers, one using each type of information. We can then search over the unlabeled data to find examples where one classifier is confident and the other is not, and then use the label given by the confident classifier as training data for the other. In the process of analyzing this setting, we also give new results (see lemma 1) on PAC learning with noise when the positive and negative noise rates are different.
On a Theory of Computing Symposia. With Prabhakar Raghavan. International Conference on Fun with Algorithms (FUN '98). Also appeared in SIGACT News, September 1998.
How can you get the advantages of parallel sessions while still allowing attendees to see all the talks they wanted? The answer is to have four sessions, with each talk given twice! This paper explores a number of properties of this approach, along with connections to flows and expanders.
Semi-Definite Relaxations for Minimum Bandwidth and other Vertex-Ordering problems. With Goran Konjevod, R. Ravi, and Santosh Vempala. Theoretical Computer Science, 235(1):25--42, 2000. (Special issue in honor of Manuel Blum's 60th Birthday!) Extended abstract in Proceedings of the 30th Annual Symposium on the Theory of Computing (STOC '98).
A Note on Learning from Multiple-Instance Examples. With Adam Kalai. Machine Learning, 30:23--29, 1998.

1997

Universal Portfolios With and Without Transaction Costs. With Adam Kalai. Machine Learning, 35: 193--205, 1999 (special issue for COLT '97). Originally appeared in Proceedings of the 10th Annual Conference on Computational Learning Theory, pages 309--313, July 1997.
On-line Learning and the Metrical Task System Problem. With Carl Burch. Machine Learning, 39: 35--58, 2000. Originally appeared in Proceedings of the 10th Annual Conference on Computational Learning Theory (COLT '97), pages 45--53.
A polylog(n)-competitive algorithm for metrical task systems. With Yair Bartal, Carl Burch, and Andrew Tomkins. Proceedings of the 29th Annual Symposium on the Theory of Computing (STOC '97), pages 711--719.
An O~(n^{3/14})-Coloring Algorithm for 3-Colorable Graphs. With David Karger. Information Processing Letters, 61(1):49--53, January 1997.

1996

On-Line Algorithms in Machine Learning (a survey). This is a survey paper for a talk given at the Dagstuhl workshop on On-Line algorithms (June '96). Appears as Chapter 14 in "Online Algorithms: The State of the Art", LNCS # 1442, Fiat and Woeginger eds., 1998.
A Polynomial-time Algorithm for Learning Noisy Linear Threshold Functions. With Alan Frieze, Ravi Kannan, and Santosh Vempala. Algorithmica, 22:35--52, 1998. An extended abstract appears in Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS'96), pages 330--338.
A Constant-factor Approximation Algorithm for the k-MST Problem. With R. Ravi and Santosh Vempala. JCSS, 58:101--108 (1999). An extended abstract appears in Proceedings of the 28th Annual ACM Symposium on the Theory of Computing (STOC '96), pages 442--448.
Randomized Robot Navigation Algorithms. With Piotr Berman, Amos Fiat, Howard Karloff, Adi Rosen, and Michael Saks. In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 75--84, January 1996.

1995

Fast Planning Through Planning Graph Analysis . With Merrick Furst. Artificial Intelligence 90:281--300, 1997. An extended abstract appears in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pages 1636--1642, August 1995. See also the Graphplan home page .
Empirical support for Winnow and Weighted-Majority based algorithms: results on a calendar scheduling domain . Machine Learning 26:5--23, 1997. An earlier version is in Proceedings of the Twelfth International Conference on Machine Learning, pages 64--72, July 1995. Click here for more information and source code.
Learning with Unreliable Boundary Queries . With Prasad Chalasani, Sally Goldman, and Donna Slonim. Journal of Computer and System Sciences,56(2):209-222, 1998. Originally appeared in Proceedings of the Eighth Annual Conference on Computational Learning Theory (COLT), pages 98---107, July 1995.
New Approximation Guarantees for Minimum Weight k-Trees and Prize-Collecting Salesmen. With Baruch Awerbuch, Yossi Azar, and Santosh Vempala. SIAM J. Computing, 28(1):254--262, 1999. Originally published in Proceedings of the 27th Annual ACM Symposium on Theory of Computing, pages 277--283, 1995. A tech report version appears as CMU-CS-94-173, August, 1994.
A Constant-Factor Approximation Algorithm for the Geometric k-MST Problem in the Plane. With J.S.B. Mitchell, Prasad Chalasani, and Santosh Vempala. SIAM J. Computing 28(3): 771-781 (1998). This paper combines two conference papers: J.S.B. Mitchell, "Guillotine subdivisions approximate polygonal subdivisions: A simple new method for the geometric k-MST problem", SODA '96, pp. 402--408, and Blum, Chalasani, and Vempala, "A constant-factor approximation for the k-MST problem in the plane", STOC '95, pp. 294--302.
Coloring Random and Semi-Random k-Colorable Graphs. With Joel Spencer. Journal of Algorithms 19:204--234, 1995. This paper extends the semi-random model results in "Some Tools for Approximate 3-Coloring", Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pages 554-562, October 1990.

1994

Relevant Examples and Relevant Features: Thoughts from Computational Learning Theory . This is a survey paper presented at the 1994 AAAI Fall Symposium. Here is a longer article with a broader perspective, joint with Pat Langley, that appears in Artificial Intelligence, 97:245--272, 1997.
On learning read-k-satisfy-j DNF. With Howard Aizenstein, Roni Khardon, Eyal Kushilevitz, Leonard Pitt, and Dan Roth. SIAM J. Computing, 27(6):1515--1530, 1998. Originally published in Proceedings of the Seventh Annual Conference on Computational Learning Theory, pages 110--117, July 1994.
The Minimum Latency Problem. With Prasad Chalasani, Don Coppersmith, Bill Pulleyblank, Prabhakar Raghavan, and Madhu Sudan. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, pages 163--171, 1994.
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis. With Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, pages 253--262, 1994. [notes and clarifications]
New Approximation Algorithms for Graph Coloring. JACM 41(3):470--516, May 1994. This paper combines the worst-case approximation results in "Some Tools for Approximate 3-Coloring", FOCS 1990 (pp 554-562), and those in "An O(n^0.4)-Approximation Algorithm for 3-Coloring (and Improved Approximation Algorithms for k-Coloring)", STOC 1989 (pp 535-542). See 1995 paper with Joel Spencer for results on Semi-Random model.

1993

Cryptographic Primitives Based on Hard Learning Problems. With Merrick Furst, Michael Kearns, and Richard Lipton. In Advances in Cryptology --- CRYPTO 93, Lecture Notes in Computer Science #773, pages 278-291, Springer-Verlag, 1994.
Learning an Intersection of a Constant Number of Halfspaces over a Uniform Distribution. With Ravi Kannan. Journal of Computer and System Sciences 54(2):371--380, 1997 (JCSS special issue for FOCS '93). Originally appeared in Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, pages 312--320, November 1993. Also published as Chapter 9 in Theoretical Advances in Neural Computation and Learning , Roychowdhury, Siu and Orlitsky, eds. Kluwer, 1994.
An On-Line Algorithm for Improving Performance in Navigation. With Prasad Chalasani. SIAM J. Comput. 29(6): 1907-1938 (2000). Originally appeared in Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, pages 2--11, November 1993.
On Learning Embedded Symmetric Concepts. With Prasad Chalasani and Jeffrey Jackson. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, pages 337--346, July 1993.
Generalized Degree Sums and Hamiltonian Graphs. With Ronald Gould. ARS Combinatoria, 35:35--54, 1993.

1992

A Decomposition Theorem and Bounds for Randomized Server Problems. With Howard Karloff, Yuval Rabani, and Michael Saks. SIAM J. Computing, 30(5): 1624--1661, 2000. Originally appeared un Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science, pages 197--207, October 1992.
Learning Switching Concepts. With Prasad Chalasani. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 231--242, July 1992.
Fast Learning of k-Term DNF Formulas with Queries. With Steven Rudich. In Proceedings of the 24th Annual ACM Symposium on Theory of Computing, pages 382-389, May 1992.
Rank-r Decision Trees are a Subclass of r-Decision Lists. Information Processing Letters, 42:183--185, 1992.

1991

Learning in the Presence of Finitely or Infinitely Many Irrelevant Attributes. With Lisa Hellerstein and Nick Littlestone. JCSS 50(1):32--40, February 1995. An earlier version appears in Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 157-166, August 1991.
Algorithms for Approximate Graph Coloring. Ph.D. thesis, MIT Laboratory for Computer Science MIT/LCS/TR-506, May 1991.
Linear Approximation of Shortest Superstrings. [ps] With Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. JACM 41(4):630--647, 1994. An earlier version appears in Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pages 328-336, May 1991.
Navigating in Unfamiliar Geometric Terrain. [ps] With Prabhakar Raghavan and Baruch Schieber. Siam J. Comp 26(1):110-137, February 1997. An earlier version appears in Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pages 494-504, May 1991.

1990

Learning Boolean Functions in an Infinite Attribute Space. Machine Learning, 9(4):373--386, 1992. Also in Proceedings of the 22nd ACM Symposium on Theory of Computing, pages 64-72, May 1990.
Some Tools for Approximate 3-Coloring. Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pages 554-562, October 1990.
Separating Distribution-Free and Mistake-Bound Learning Models over the Boolean Domain. [pdf] SIAM J. Computing, Vol 23, No. 5, 1994. Also in Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pages 211-218, October 1990.
Learning Functions of k Terms. With Mona Singh. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 144-153, August 1990.

1989

On the Computational Complexity of Training Simple Neural Networks. Master's thesis, MIT Laboratory for Computer Science, MIT/LCS/TR-445, May 1989.
An Õ(n^0.4)-Approximation Algorithm for 3-Coloring (and Improved Approximation Algorithms for k-Coloring). In Proceedings of the 21st ACM Symposium on Theory of Computing, pages 535-542, May 1989.
Training a 3-Node Neural Network is NP-Complete. With Ron Rivest. Neural Networks, 5(1):117-127, 1992. Also in Advances in Neural Information Processing Systems 1 (proceedings of the 1988 NIPS conference), pp. 494-501, 1989.