Epistemology and Theory of Machine Learning

Coordinates	Thursdays from 10 AM to noon in room 021 of Ludwigstrasse 31.
Lecturer	Tom Sterkenburg. Contact me at tom.sterkenburglmu.de; visit me in room 126 of Ludwigstrasse 31.
Teaching assistant	Katia Parshina.
Course description	Machine learning is about generalizing from data. As such, machine learning is directly confronted with one of the most famous puzzles in philosophy, Hume's problem of induction. This seminar considers the philosophical problems of inductive inference from the perspective of the mathematical theory of machine learning. We will focus on the classical framework of statistical learning theory, but also explore the current debate about the need for a different kind of theory to explain the generalization of modern algorithms like deep neural nets.
Contents and material	The first part of the seminar (until the Christmas break) centers on the problem of induction and statistical learning theory. We will go through the philosophical booklet of Harman & Kulkarni (2007) and part I (Foundations) of the machine learning textbook by Shalev-Shwartz & Ben-David (2014). This part of the course will be more lecture-based, but each lecture we will start with a discussion question. To get some practice with statistical learning theory, there are weekly exercises from the textbook that you have the option (not the obligation) to hand in for feedback. The second part of the seminar (after the Christmas break) is devoted to the modern debate about the shortcomings of statistical learning theory and the apparent need for a new kind of theory to explain generalization behavior of contemporary learning algorithms. Here each meeting is devoted to one particular paper, which we read in advance, and the focus is on discussion. See the below schedule and material for the details. The material is not yet set in stone (especially for the second part), and may be adjusted in light of the interests of the participants.
Prerequisites	This is a philosophy seminar, and our focus will be on conceptual issues. It will be helpful to already have some knowledge of machine learning, but none is required. We will go through the basics of statistical learning theory in the first half of the course, so no prior knowledge of this theory is presupposed; but this will ask for some mathematical maturity and in particular some familiarity with probability theory.
Assessment	Term paper. The course is worth 9 ECTS. Your grade will be determined by a term paper at the end of the course. The term paper treats of a theme we have discussed in the course, and has a length of about 6000 words. For some topic suggestions, see here. Optional exercises. As mentioned, in the first half of the course there will be weekly exercises on statistical learning theory. You have the option (and are encouraged) to do these exercises and hand in your solutions for feedback, but this is strictly optional.

Schedule

Date	Topic	Material	Assignment
Thu 17 October	Intro. The problem of induction.	Lipton (2004); Harman & Kulkarni, ch. 1 (except sects. 1.2-1.4).	[discussion]
Thu 24 October	The statistical learning theory framework and empirical risk minimization.	Shalev-Shwartz & Ben-David, chs. 1-2.	[exercises] [discussion]
Thu 31 October	PAC learnability.	Shalev-Shwartz & Ben-David, ch. 3. Harman & Kulkarni, ch. 2 up to p. 46.	[exercises] [discussion]
Thu 7 November	Universal convergence and no-free-lunch.	Shalev-Shwartz & Ben-David, chs. 4-5 up to sect. 5.1.	[exercises] [discussion]
Thu 14 November	The bias-complexity trade-off and the VC dimension.	Shalev-Shwartz & Ben-David, sects. 5.2-6.3; Harman & Kulkarni, ch. 2 from sect. 2.5.	[exercises] [discussion]
Thu 21 November	The fundamental theorem.	Shalev-Shwartz & Ben-David, ch. 6. Corfield, Schölkopf, & Vapnik (2009).	[exercises] [discussion]
Thu 28 November	Structural risk minimization.	Shalev-Shwartz & Ben-David, ch. 7 up to sect. 7.3; Harman & Kulkarni, ch. 3.	[exercises] [discussion]
Thu 5 December	Further approaches in machine learning theory.
Thu 12 December	Rehearsal and evaluation.	Harman & Kulkarni, ch. 4. Strevens (2009). Thagard (2009).
Thu 19 December	NO CLASS.
	CHRISTMAS BREAK.
Thu 9 January	The generalization paradox.	Zhang et al. (2021).
Thu 16 January	Double descent.	Belkin et al. (2019).
Thu 23 January	Simplicity and SRM.	Bargagli Stoffi et al. (2022).
Thu 30 January	Simplicity and deep learning.	Buchholz (in process).
Thu 6 February	Explaining deep learning.	Räz (2022).
Fri 28 Mar			Deadline term paper.

Material, primary.

Books.

Shalev-Shwartz & Ben-David (2014). Understanding Machine Learning. [link]
Harman & Kulkarni (2007). Reliable Reasoning: Induction and Statistical Learning Theory. [link]

Papers, first part.

Corfield, Schölkopf, & Vapnik (2009): Falsificationism and statistical learning theory: Comparing the Popper and Vapnik-Chervonenkis dimensions. J. Gen. Philos. Sci. [doi]
Lipton (2004). Induction. Chapter 1 of Inference to the Best Explanation.
Strevens (2009). Remarks on Harman and Kulkarni's "Reliable Reasoning". Abstracta. [doi]
Thagard (2009). Inference to the best inductive practices. Abstracta. [doi]

Papers, second part.

Bargagli Stoffi, Cevolani, & Gnecco (2022). Simple models in complex worlds: Occam's razor and statistical learning theory. Mind. Mach. [doi]
Belkin et al. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Nat. Aca. Sci. [doi]
Buchholz (in process). The curve-fitting problem revisited.
Räz (2022). Understanding deep learning with statistical relevance. Phil. Sci. [doi]
Zhang et al. (2021). Understanding deep learning (still) requires rethinking generalization. Commun. ACM. [doi]

Further material.

Dotan (2021). Theory choice, non-epistemic values, and machine learning. [doi]
Forster & Sober (1994). How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. [doi]
Lin (in process). Unified inductive logic: from formal learning to statistical inference to supervised learning. [link]
Rushing (2022). No free theory choice from machine learning. [doi]
Schurz (2008). The meta-inductivist's winning strategy in the prediction game: A new approach to Hume's problem. [doi]

epistemology and theory of machine learning

WINTER SEMESTER 2024/25

Schedule

Material, primary.

Books.

Papers, first part.

Papers, second part.

Further material.