Stochastic MIrror Descent Algorithm made Sparse (SMIDAS)

Overview

SMIDAS is a C++ implementation of the stochastic mirror descent algorithm proposed in

Shai Shalev-Shwartz and Ambuj Tewari, Stochastic methods for l₁ regularized loss minimization. Proceedings of the 26th International Conference on Machine Learning, pages 929-936, 2009.

It can be used for l₁-regularized loss minimization for both classification and regression problems. This means that it (approximately) solves the following minimization problem:

min_w (1/m) ∑_{i=1, … , m} L(w^T x_i, y_i) + λ|w|₁

Currently supported loss functions are the logistic loss [L(a,b) = log(1+exp(-ab))], the hinge loss [L(a,b) = max(0,1-ab)], and the squared loss [L(a,b) = (a-b)²]. The x_i's are example vectors with d coordinates x_i,1, … x_i,d. The y_i's are labels that are either ±1 (classification) or arbitrary floats (regression). The logistic and hinge losses only make sense in a classification context but the squared loss can be used in either. λ is the regularization parameter. SMIDAS is designed to run fast even for large values of m and d and can exploit the sparsity in the examples (i.e. if a lot of coordinates in the example vectors are zero).

Source Code

The source code is released under the GNU General Public License. The authors are not responsible for any consequences arising out of the use of the code, so please use it at your own risk. We do, however, appreciate receiving bug reports from you. Please direct them to Ambuj Tewari via email.

If you use SMIDAS in your research, please cite it as

Shai Shalev-Shwartz and Ambuj Tewari, Stochastic methods for l₁ regularized loss minimization. Proceedings of the 26th International Conference on Machine Learning, pages 929-936, 2009.

The code was developed under Linux using g++ (GCC) version 3.4.6. The source code is available at the following location:

http://ttic.uchicago.edu/~tewari/code/smidas/smidas.tar.gz

Installation

After downloading the file scd.tar.gz into a suitable directory, unpack it by typing

 tar zxvf smidas.tar.gz

This will create a directory called SMIDAS with the following 9 files in it:

cmd_line.cc
cmd_line.h
cmd_option.h
Losses.h
gpl.txt
Makefile
smidas.cc
test_examples
test_labels

Now, type

 make

This will create an executable named smidas (plus a couple of object files).

Usage

After installing, simply type

 ./smidas

and the following usage message is displayed:


USAGE: ./smidas [options]  

SMIDAS -- L_1 regularized loss minimization

OPTIONS:
 -lambda regularization parameter (default = 0.001)
 -iters number of iterations (default = 1000)
 -eta learning rate (default = 0.001)
 -loss 0 for logistic, 1 for hinge, 2 for squared loss (default = 0)
 -printout after how many iterations to print w (default 1000)

All 5 command line options are optional and receive their default values indicated above if they are not supplied by the user. The options lambda, loss and iters set the regularization parameter, the loss function to use, and the number of iterations for which the SMIDAS algorithm will be run, respectively. SMIDAS requires a learning rate parameter η. The option eta is meant to supply this. The printout option selects the interval after which a summary of current state of the optimization will be printed on the standard output. The following information about the current weight vector w is included in this summary:

time elapsed (in milliseconds)
number of times the data matrix was accessed
|w|₁, the l₁-norm of w
|w|₀, the l₀-norm (no. of non-zero coordinates) of w
(1/m) ∑_{i=1, … , m} L(w^T x_i, y_i), the average loss of w
(1/m) ∑_{i=1, … , m} L(w^T x_i, y_i) + λ|w|₁, the objective function
∑_{i=1, … , m} 1[w^T x_i y_i ≤ 0], the number of mistakes on the examples (ignore this field if doing regression)

Data File Format

Conceptually, we often think of the data as a huge m x d matrix with the examples sitting as rows of this matrix. Viewed this way, smidas expects its input data file to be in a sparse row major format as described below:

Line 1: m d
Line 2: N i1 v1 i2 v2 ...
⋮
Line m+1: N i1 v1 i2 v2 ...

The first line tells smidas that there are m examples each with d features or coordinates. The second line gives non-zero features for example 1. The number N is the total number of non-zero features for that examples. This is followed by N pairs of the form i1 v1. The pair i1 v1 means that feature (i1+1) has value v1. Note that, following C/C++ convention, the indices i1, i2, ... start from zero. For example, if features 1, 3 and 10 are the only non-zero features for example 1, and these have values 0.2, 0.9 and -0.7 respectively, then line 2 would read:

3 0 0.2 2 0.9 9 -0.7

The example indices need not appear in increasing order.

Line 3 then gives values for example 2, Line 4 for example 3, and so on till Line m+1.

Label File Format

The label file format is pretty simple: Line j gives the label of example j. So, there should be a total of m lines in the label file and each line should have a single number (±1 in the classification case, any floating point number in the regression case).

Test Run

To make sure everything runs okay on your machine, use the files test_examples and test_labels to do a test run. The file test_examples encodes the following 3 x 5 data matrix (3 examples, 5 features):

0    0    0    0.2  0.2
0    0    0.3  0   -0.5
0.4  0.6  0    0    0

The labels for the 3 examples are 1,1 and -1 respectively. Now, run smidas with logistic loss, the default value (0.001) of the regularization parameter, and learning rate 0.01, by typing

 ./smidas -eta 0.01 -iters 500000 -printout 50000 test_examples test_labels

This runs smidas for 500,000 iterations printing a summary line every 50,000 iterations. You should see an output that looks something like

110 200000 21.8226 5 0.110917 0.13274 0
250 400000 28.8088 5 0.0572377 0.0860465 0
380 600000 32.9352 5 0.0384127 0.0713479 0
500 800000 35.8222 5 0.0290536 0.0648759 0
610 1000000 37.9964 5 0.0234737 0.0614701 0
730 1200000 39.7255 5 0.0197936 0.0595191 0
850 1400000 41.1593 5 0.0171696 0.0583289 0
960 1600000 42.37 5 0.0152157 0.0575857 0
1080 1800000 43.405 5 0.0137119 0.057117 0
1210 2000000 44.3091 5 0.0125135 0.0568226 0

The actual output might be slightly different due to the randomness in the algorithm but the objective function should converge to 0.0551 for this toy example.