**Course: Math 547, Statistical Learning Theory, Fall 2019**

**Prerequisite:** Working knowledge (graduate or advanced undergraduate level) of: Probability Theory (Math 407/505a, 507a recommended) Real Analysis (Math 425a/b, Math 525a recommended) and Linear Algebra (Math 471).

**Course Content:** Binary classification, empirical risk minimization, support vector machines, voting algorithms and AdaBoost, Vapnik-Chervonenkis combinatorics, concentration-of-measure inequalities, sparse recovery problems, high-dimensional convex geometry.

*Last update:* 25 June 2019

**Instructor:** Steven Heilman, stevenmheilman(@-symbol)gmail.com

**Office Hours:** Mondays and Wednesdays, 9AM-10AM, KAP 406G

**Lecture Meeting Time/Location:** Mondays, Wednesdays, and Fridays, 10AM-1050AM, KAP 140

**TA: ** ..., ...(@-symbol)usc.edu

**TA Office Hours:** ...

**Textbook: **There is no official textbook, but the following might be useful references.

Kearns and Vazirani, __An Introduction to Computational Learning Theory__. I will be following this book the most in the first half of the course.

Shalev-Schwarz and Ben-David, __Understanding Machine Learning: From Theory to Algorithms__. (A link is available here.)

Gine and Nickl, __Mathematical Foundations of Infinite-Dimensional Statistical Models__. This encyclopedic book has a lot of nice mathematical tools in it. Chapters 2 and 3 are most relevant.

Vershynin, __High-Dimensional Probability: An Introduction with Applications to Data Science__. (A link is available here.) Tools and applications.

Ledoux, __Concentration of Measure__. Another book with useful mathematical tools. Most chapters here are relevant.

Boucheron, Lugosi, Massart, __Concentration Inequalities__. An encyclopedic book with some overlapping content with Ledoux's book; chapters 12 and 13 are most relevant.

Mossel's Course Webpage on Deep Learning, available here, based on the book __Deep Learning__, by Goodfellow, Bengio and Courville, available here.

**Project Abstract Due**: Friday, September 27, 5PM PST (via email)

**Progress Report Due**: Friday, November 1, 5PM PST (via email)

**Final Project Presentations**: Last two weeks of class, in class, schedule TBD

**Final Report Due**: Saturday, December 7, 5PM PST (via email)

**Email Policy:**

- My email address for this course is stevenmheilman@gmail.com
- It is your responsibility to make sure you are receiving emails from stevenmheilman@gmail.com , and they are not being sent to your spam folder.
- Do NOT email me with questions that can be answered from this syllabus.

**Exam Procedures**: **This course has no midterm exams and no final exam. **The final project is a replacement for other exams.

Disability Services: If you are registered with disability services, I would be happy to discuss this at the beginning of the course. Any student requesting accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure the letter is delivered to me as early in the semester as possible. DSP is located in 301 STU and is open 8:30am-5:00pm, Monday through Friday.

https://dsp.usc.edu/

213-740-0776 (phone)

213-740-6948 (TDD only)

213-740-8216 (fax)

ability@usc.edu

Discrimination, sexual assault, and harassment are not tolerated by the university. You are encouraged to report any incidents to the *Office of Equity and Diversity* http://equity.usc.edu/ or to the *Department of Public Safety* http://capsnet.usc.edu/department/department-public-safety/online-forms/contact-us. This is important for the safety whole USC community. Another member of the university community - such as a friend, classmate, advisor, or faculty member - can help initiate the report, or can initiate the report on behalf of another person. *The Center for Women and Men* http://www.usc.edu/student-affairs/cwm/ provides 24/7 confidential support, and the sexual assault resource center webpage sarc@usc.edu describes reporting options and other resources.

**Final Project Guidelines: **

The final project is an opportunity to do a bit of research on a subject of your choice related to statistical learning theory. A project could begin with an interesting question or a well-known problem, and perhaps lead to investigating or implementing various algorithms, conducting an empirical analysis, etc.

Along the way, you will review relevant literature, identify appropriate data sources, select appropriate means of evaluation, and either develop novel methodology for your problem or deploy and comprehensively evaluate existing methodology for your new application.

The goal is to say something interesting about a problem in statistical learning theory. You could perhaps develop new methodology for an existing problem or application that has no fully satisfactory solution. You could alternatively tackle a new problem or application with existing methodology; in this case, you should identify one or more questions without satisfactory answers in your chosen domain and explore how the methodology can help you answer those questions. You may draw inspiration from particular datasets, but your focus should rest not on the data itself but rather on the questions about the world that you can answer with that data.

While a substantial theoretical component is not required for this project, (i.e. you do not necessarily have to prove new theorems), you should be able to prove a few things or at very least review or streamline some existing proofs in your investigation.

You may work alone or in a group of two; the standards for a group project will be twice as high. In certain cases I might approve a group of three, but this is unlikely.

We strongly encourage you to come to office hours to discuss your project ideas, progress, and difficulties.

**Final Project Milestones:** In all cases, please use LaTeX.

I: Project Proposal. By this first milestone, you should have selected a question or problem of interest, identified relevant data sources, begun exploring the literature surrounding the question, and discussed your ideas with the course staff. Your project proposal deliverable is a 1/2 - 1 page report describing the question or problem you intend to tackle, why this question is important or interesting, prior work on this problem, what data you intend to use in your analyses, and the principal challenges that you anticipate.

If you would like to receive feedback about particular aspects of your proposal, please indicate this in your submission.

I can try to help in problem selection. Ideally, the problem should be something you are very interested in. As such, it might be helpful to first tell me about your interests (maybe after class or in office hours), and we can try to think of something to work on. Selecting problems to work on is a difficult skill that takes years to develop, so it would be nice if you find a project idea on your own, but I expect everyone will need at least a little help in their choice. I know some things about some fields but I don't know everything about every field, so I might not be so helpful with certain projects outside my own background, but I can learn a bit myself to help you along if your interests are outside my knowledge.

II: Progress Report. By this second milestone, you should have some initial results to share; for example, you may have implemented and evaluated the performance of existing algorithms on your dataset and task of interest, or you may have conducted an initial study with simulated data to better understand the properties of certain methods, or you were able to prove some preliminary result about some question of interest, etc.

Your progress report deliverable is a write-up of no more than 2 pages (not including references) describing what you have accomplished so far and, briefly, what you intend to do in the remainder of the term. You should be able to reuse at least part of the text of this milestone in your final report.

III: In class presentation. You will present your work in class using your choice of (a combination of) the blackboard or a computer presentation. Paper handouts are optional. The length of the presentation will vary according to course enrollment, but each person should expect to speak for about 10-20 minutes. Since the talk will be short, you should consider practicing (and timing) your talk in advance. You can practice part of it with me in office hours if you want. Expect around 3-5 minutes of questions from myself and your fellow students. Unlike other times in the class, attendance is mandatory during the presentations, and I will be taking attendance during them. Once I set the time limits they will be strict. I will indicate when you go over time. Going over time will result in penalties.

You will be graded on your presentation skills, e.g. voice volume, board usage, pacing of material, choice of material, etc. Minor technical problems will not be penalized, but major technical problems will be penalized. If you use a computer, make sure to have a backup plan in case of a technical problem to avoid such a penalty. If you want to give a short version of your presentation in office hours before the actual presentation, and then have me give feedback that might be a good idea.

IV: Final Report. Your final project report (not including acknowledgements and references) should be around 5-8 pages in length (using at most 12 point font and maximum 1 inch margins) and should follow a typical scientific style (with abstract, introduction, etc.). The write-up should clearly define your problem or question of interest, review relevant past work, and introduce and detail your approach. A comprehensive empirical evaluation could follow, or some proofs of some results, along with an interpretation of your results. Any elucidation of the theoretical properties of an empirical method under consideration is also welcome.

If this work was done in collaboration with someone outside of the class (e.g., a professor), please describe their contributions in an acknowledgements section.

The final report PDF file should be emailed. No hardcopy is needed.

**Some Project Ideas:** (to browse your own ideas, you could e.g. look through the proceedings of recent COLT conference such as COLT 2019, COLT 2018, etc.)

Reinforcement Learning

- Mastering the game of Go without human knowledge. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis, here

Streaming Algorithms and Online Learning

- The space complexity of approximating the frequency moments. Alon, Matias, Szegedy, here
- Space lower bounds for linear prediction in the streaming model. Yuval Dagan, Gil Kur, Ohad Shamir here
- Polynomial Pass Lower Bounds for Graph Streaming Algorithms. Sepehr Assadi, Yu Chen, Sanjeev Khanna here
- On Fully Dynamic Graph Sparsifiers. Ittai Abraham, David Durfee, Ioannis Koutis, Sebastian Krinninger, Richard Peng here
- Faster Spectral Sparsification in Dynamic Streams. Michael Kapralov, Aida Mousavifar, Cameron Musco, Christopher Musco, Navid Nouri here
- Dynamic Streaming Spectral Sparsification in Nearly Linear Time and Space. Michael Kapralov, Navid Nouri, Aaron Sidford, Jakab Tardos here

Robust Algorithmic Statistics

- Robustly Learning a Gaussian: Getting Optimal Error, Efficiently. Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart here
- Degree-d Chow Parameters Robustly Determine Degree-d PTFs (and Algorithmic Applications). Ilias Diakonikolas, Daniel M. Kane here
- Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin. Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi here
- Faster Algorithms for High-Dimensional Robust Covariance Estimation. Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff here
- VC Classes are Adversarially Robustly Learnable, but Only Improperly. Omar Montasser, Steve Hanneke, Nathan Srebro here
- Privately Learning High-Dimensional Distributions. Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman here
- Efficient Algorithms for Outlier-Robust Regression. Adam Klivans, Pravesh K. Kothari, Raghu Meka here

Embeddings and the "kernel trick"

- Multi-way spectral partitioning and higher-order Cheeger inequalities. James R. Lee, Shayan Oveis Gharan, Luca Trevisan here
- Oblivious Sketching of High-Degree Polynomial Kernels. Michael Kapralov, Rasmus Pagh, Ameya Velingker, David Woodruff, Amir Zandieh here

Deep Learning

- Deep Compressed Sensing. Yan Wu, Mihaela Rosca, Timothy Lillicrap here
- The Power of Depth for Feedforward Neural Networks. Ronen Eldan, Ohad Shamir here
- Deep Learning and Hierarchal Generative Models. Elchanan Mossel here
- The capacity of feedforward neural networks. Pierre Baldi, Roman Vershynin here

Miscellaneous

- Is your function low-dimensional? Anindya De, Elchanan Mossel, Joe Neeman here
- On Communication Complexity of Classification Problems. Daniel M. Kane, Roi Livni, Shay Moran, Amir Yehudayoff here
- Optimal terminal dimensionality reduction in Euclidean space. Shyam Narayanan, Jelani Nelson here
- Solving Empirical Risk Minimization in the Current Matrix Multiplication Time. Yin Tat Lee, Zhao Song, Qiuyi Zhang here
- Proof of the satisfiability conjecture for large k. Jian Ding, Allan Sly, Nike Sun here
- Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility. Mark Braverman, Gillat Kol, Shay Moran, Raghuvansh R. Saxena here

**Homework Policy**:

- Homeworks are due in discussion session Thursdays, at the
**beginning**of class. - Late homework is
**not accepted**. - If you still want to turn in your homework late, then the number of minutes late divided by ten will be deducted from the homework score. The exact deduction and rounding procedure is not guaranteed to be accurate.
- You may use whatever resources you want to do the homework, including computers, textbooks, friends, the TA, etc. However, I would discourage any over-reliance on search technology such as Google, since its overuse could degrade your learning experience. By the end of the semester, you should be able to do the entire homework on your own, without any external help.
- Do not submit homework via email.
- Collaboration on homeworks is allowed and encouraged.
- All homework assignments must be written by you, i.e. you cannot copy someone else`s solution verbatim.

**Grading Policy:**

- The final course grade is weighted in the following way. Homework (45\%), class participation (5\%), project abstract (3\%), project progress report (7\%), final presentation (20\%), final project report (30\%).
- Class participation is not the same as attendance. Especially for graduate students, you have many things to do, and life happens, so I will never explicitly take attendance, but I will notice if someone is frequently absent. Things that increase your class participation grade include: asking good questions, paying attention in class, showing up on time or early to class, etc. Things that decrease your class participation grade include: excessive talking or disruptions during class, frequent absences, excessive texting/smartphone usage in class, frequent tardiness, etc.

Tentative Schedule: (This schedule may change slightly during the course.)

Week | Monday | Tuesday | Wednesday | Thursday | Friday |

1 | Aug 26: Introduction | Aug 27 | Aug 28: Introduction, Vertex Cover, SVD, PCA | Aug 29 | Aug 30: k-means, power method |

2 | Sep 2: No class | Sep 3 | Sep 4: Learning linear threshold functions | Sep 5 | Sep 6: Perceptron Algorithm |

3 | Sep 9: Perceptron Algorithm | Sep 10 | Sep 11: Embeddings and the "kernel trick" | Sep 12 | Sep 13: Homework 1 due. Mercer's Theorem |

4 | Sep 16: Concentration of Measure, Hoeffding's Inequality | Sep 17 | Sep 18: Gaussian Concentration, Johnson-Lindenstrauss | Sep 19 | Sep 20: Bennett's Inequality |

5 | Sep 23: Empirical Risk Minimization | Sep 24 | Sep 25: Gaussian Processes | Sep 26 | Sep 27: Abstract due. Gaussian Processes |

6 | Sep 30: Sub-Gaussian Processes | Oct 1 | Oct 2: General Empirical Processes | Oct 3 | Oct 4: General Empirical Processes |

7 | Oct 7: Probably Approximately Correct (PAC) Learning | Oct 8 | Oct 9: PAC Learning | Oct 10 | Oct 11: Homework 2 due. Vapnis-Chernovenkis (VC) Theory |

8 | Oct 14: VC Theory | Oct 15 | Oct 16: VC Theory | Oct 17 | Oct 18: No class |

9 | Oct 21: VC Theory | Oct 22 | Oct 23: Sparse recovery | Oct 24 | Oct 25: high-dimensional convex geometry |

10 | Oct 28: Compressed sensing and LASSO | Oct 29 | Oct 30: Low rank matrix recovery | Oct 31 | Nov 1: Progress report due. Low rank matrix recovery |

11 | Nov 4: Boosting | Nov 5 | Nov 6: Boosting | Nov 7 | Nov 8: Homework 3 due. Deep Learning |

12 | Nov 11: Deep Learning | Nov 12 | Nov 13: Deep Learning | Nov 14 | Nov 15: Deep Learning |

13 | Nov 18: Deep Learning | Nov 19 | Nov 20: Leeway | Nov 21 | Nov 22: Leeway |

14 | Nov 25: Review of course | Nov 26 | Nov 27: No class | Nov 28 | Nov 29: No class |

15 | Dec 2: Student Presentations | Dec 3 | Dec 4: Student Presentations | Dec 5 | Dec 6: Student Presentations. [Dec 7: Final Report Due] |

**Advice on succeeding in a math class:**

- Review the relevant course material before you come to lecture. Consider reviewing course material a week or two before the semester starts.
- When reading mathematics, use a pencil and paper to sketch the calculations that are performed by the author.
- Come to class with questions, so you can get more out of the lecture. Also, finish your homework at least two days before it is due, to alleviate deadline stress.
- Write a rough draft and a separate final draft for your homework. This procedure will help you catch mistakes.
- If you are having difficulty with the material or a particular homework problem, review Polya's Problem Solving Strategies, and come to office hours.

**Homework**

**Homework .tex files**

**Supplementary Notes**