Coordinates | Thursdays from 10 AM to noon in room 021 of Ludwigstrasse 31. |
Lecturer | Tom Sterkenburg. Contact me at tom.sterkenburglmu.de; visit me in room 126 of Ludwigstrasse 31. |
Course description | Machine learning is about generalizing from data. As such, machine learning is directly confronted with one of the most famous puzzles in philosophy, Hume's problem of induction. This seminar considers the philosophical problems of inductive inference from the perspective of the mathematical theory of machine learning. We will focus on the classical framework of statistical learning theory, but also explore the current debate about the need for a different kind of theory to explain the generalization of modern algorithms like deep neural nets. |
Contents and material | The first part of the seminar (until the Christmas break) centers on the problem of induction and statistical learning theory. We will go through the philosophical booklet of Harman & Kulkarni (2007) and part I (Foundations) of the machine learning textbook by Shalev-Shwartz & Ben-David (2014). This part of the course will be more lecture-based, but each lecture we will start with a discussion question. To get some practice with statistical learning theory, there are weekly exercises from the textbook that you have the option (not the obligation) to hand in for feedback. The second part of the seminar (after the Christmas break) is devoted to the modern debate about the shortcomings of statistical learning theory and the apparent need for a new kind of theory to explain generalization behavior of contemporary learning algorithms. Here each meeting is devoted to one particular paper, which we read in advance, and the focus is on discussion. See the below schedule and material for the details. The material is not yet set in stone (especially for the second part), and may be adjusted in light of the interests of the participants. |
Prerequisites | This is a philosophy seminar, and our focus will be on conceptual issues. It will be helpful to already have some knowledge of machine learning, but none is required. We will go through the basics of statistical learning theory in the first half of the course, so no prior knowledge of this theory is presupposed; but this will ask for some mathematical maturity and in particular some familiarity with probability theory. |
Assessment | Term paper. The course is worth 9 ECTS. Your grade will be determined by a term paper at the end of the course. The term paper treats of a theme we have discussed in the course, and has a length of about 6000 words. Optional exercises. As mentioned, in the first half of the course there will be weekly exercises on statistical learning theory. You have the option (and are encouraged) to do these exercises and hand in your solutions for feedback, but this is strictly optional. |
Schedule
Date | Topic | Material | Assignment |
---|---|---|---|
Thu 17 October | Intro. The problem of induction. | Lipton (2004); Harman & Kulkarni, ch. 1 (except sects. 1.2-1.4). | [discussion] |
Thu 24 October | The statistical learning theory framework and empirical risk minimization. | Shalev-Shwartz & Ben-David, chs. 1-2. | [exercises] [discussion] |
Thu 31 October | PAC learnability. | Shalev-Shwartz & Ben-David, ch. 3. Harman & Kulkarni, ch. 2 up to p. 46. | [exercises] [discussion] |
Thu 7 November | Universal convergence and no-free-lunch. | Shalev-Shwartz & Ben-David, chs. 4-5 up to sect. 5.1. | [exercises] [discussion] |
Thu 14 November | The bias-complexity trade-off and the VC dimension. | Shalev-Shwartz & Ben-David, sects. 5.2-6.3; Harman & Kulkarni, ch. 2 from sect. 2.5. | [exercises] [discussion] |
Thu 21 November | The fundamental theorem. | Shalev-Shwartz & Ben-David, ch. 6. Corfield, Schölkopf, & Vapnik (2009). | [exercises] [discussion] |
Thu 28 November | Structural risk minimization. | Shalev-Shwartz & Ben-David, ch. 7 up to sect. 7.3; Harman & Kulkarni, ch. 3. | [exercises] [discussion] |
Thu 5 December | Further approaches in machine learning theory. | ||
Thu 12 December | Rehearsal and evaluation. | Harman & Kulkarni, ch. 4. Strevens (2009). Thagard (2009). | |
Thu 19 December | NO CLASS. | ||
CHRISTMAS BREAK. | |||
Thu 9 January | The generalization paradox. | Zhang et al. (2021). | |
Thu 16 January | Double descent. | Belkin et al. (2019). | |
Thu 23 January | Bayesian deep learning. | Wilson & Izmailov (2020). | |
Thu 30 January | t.b.d. | t.b.d. | |
Thu 6 February | t.b.d. | t.b.d. | |
Fri 28 Mar | Deadline term paper. |
Material, primary.
Books.
- Shalev-Shwartz & Ben-David (2014). Understanding Machine Learning. [link]
- Harman & Kulkarni (2007). Reliable Reasoning: Induction and Statistical Learning Theory. [link]
Papers, first part.
- Corfield, Schölkopf, & Vapnik (2009): Falsificationism and statistical learning theory: Comparing the Popper and Vapnik-Chervonenkis dimensions. J. Gen. Philos. Sci. [doi]
- Lipton (2004). Induction. Chapter 1 of Inference to the Best Explanation.
- Schurz (2008). The meta-inductivist's winning strategy in the prediction game: A new approach to Hume's problem. Philos. Sci. [doi]
- Strevens (2009). Remarks on Harman and Kulkarni's "Reliable Reasoning". Abstracta. [doi]
- Thagard (2009). Inference to the best inductive practices. Abstracta. [doi]
Papers, second part.
- Belkin et al. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Nat. Aca. Sci. [doi]
- Zhang et al. (2021). Understanding deep learning (still) requires rethinking generalization. Commun. ACM. [doi]
- Wilson & Izmailov (2020). Bayesian deep learning and a probabilistic perspective of generalization. NeurIPS 2020. [link]
Material, background.
- Henderson (2022). The problem of induction. Stanford Encyclopedia of Philosophy. [link].
- Wheeler (2017). Machine epistemology and big data. The Routledge Companion to Philosophy of Social Science.