CS6025 Sublinear Algorithms
Jan - May 2022
Coordinates
- Where: Online
- When: J slot; Mon(5-5.50pm), Wed(2-3.15pm), Thur(3.30-4.45pm)
Objectives
The aim of the course is to understand algorithmic techniques to handle large amounts of data that cannot be stored and accessed as a whole. To process such large amounts of data, classical algorithms, even those that are linear-time, are inefficient. Through this course, we will look at techniques to design and analyze algorithms of sublinear time and space complexity. The main focus of the course would streaming, sketching, sampling and property testing algorithms. To complement this, we will also study techniques developed to prove lower bounds against such algorithms.
Contents
The tentative outline of the topics that will be covered in the course is as follows. The order of the topics might vary.
- Preliminaries - Tail-bounds (Markov, Chebyshev, Chernoff), Basic linear algebra, motivation and introduction to ideas of streaming, sketching and property testing with simple examples.
- Streaming, sketching and sampling - Estimating distinct elements, Estimating frequency moments – the Alon-Mathias-Szegedy sketch, Johnson-Lindenstrauss lemma based \(\ell_2\) norm approximation, Heavy-hitters – Count-Min sketch, Reservoir sampling & \(\ell_p\) sampling, Graph streams – connectivity, spanners, sparsifiers.
- Property testing - Testing graph properties - Comparison of graph property testing models, Szemeredi regularity lemma and testing properties of dense graphs, random walk based testers for sparse graphs, Testing Boolean functions - Fourier analytic methods for testing Boolean functions – linearity, monotonicity, Testing properties of probability distributions – uniformity, monotonicity.
- Lower bound techniques - Communication complexity based lower bounds for streaming, Lower bounds for property testing algorithms, connections between communication complexity lower bounds and property testing.
References
- [DP] Concentration of Measure for the Analysis of Randomized Algorithms – Devdatt Dubashi and Alessandro Panconesi.
- [M] Data Streams: Algorithms and Applications – S. Muthukrishnan.
- [IN] Sketching Algorithms for Big Data – Lecture notes from a course by Jelani Nelson and Piotr Indyk (Harvard University).
- [B] Sublinear (and Streaming) Algorithms – Lecture notes from a course by Paul Beame (University of Washington).
- [G] Introduction to Property Testing – Oded Goldreich.
- [RY] Communication Complexity – Anup Rao and Amir Yehudayaoff.
Grading policy
- Problem sets: 30% (best 3/4)
- Course project: 20%
- Mid-sem: 20%
- End-sem + Viva: 30%
Important dates
- Problem sets (tentative): Jan 28, Feb 25, Apr 1, Apr 15
- Mid-sem: Mar 10
- End-sem: May 4
Course Materials
- Moodle
- Lectures and References (with links to the videos and notes)
- Course project
- Problem sets: 1, 2, 3