Upcoming Talks

Ryan Rossi, Adobe Research
Abstract TBD
Zachary Lipton, Abridge / Carnegie Mellon University
Abstract TBD
Jörn-Henrik Jacobsen, Apple AI/ML Research
Abstract TBD

Previous Talks

Frank Tong, Vanderbilt University
Understanding The Computational Bases of Robust Object Recognition In Humans and Deep Neural Networks
Abstract Deep neural networks (DNNs) trained on object classification provide the best current models of human vision, with accompanying claims that they have attained or even surpassed human-level performance. However, DNNs tend to fail catastrophically in situations where humans do not, especially when faced with noisy, degraded, or ambiguous visual inputs. Such findings imply that the computations performed by DNNs do not adequately match those performed by the human brain. In this talk, I will discuss whether the brittleness of current DNN models is caused by flaws in their architectural design, imperfections in their learning protocols, or inadequacies in their training experiences. Here, we evaluated the hypothesis that everyday encounters with visual blur may be a critical feature for conferring robustness to biological and artificial visual systems. Our studies show how learning has a critical role in the acquisition of robust object representations, such that appropriately trained DNN models can better predict human behavioral and neural responses across a range of challenging viewing conditions.

Bio: Dr. Frank Tong studies the neurocomputational bases of human vision using behavioral psychophysics, functional MRI, computational modeling and deep learning techniques. A major focus of his lab is developing more robust and human-aligned DNN models of visual processing. He received his BS in Psychology from Queen’s University, Canada and PhD from Harvard University. He worked as an Assistant Professor at Princeton University from 2000-2004, and moved to Vanderbilt University thereafter, where he is now a Centennial Professor of Psychology. For his research contributions, he has received awards from the Cognitive Neuroscience Society, the Vision Sciences Society, and the National Academy of Sciences.
Nicolas Papernot, Google DeepMind / U of Toronto
Characterizing Machine Unlearning through Definitions and Implementations
Abstract The talk presents open problems in the study of machine unlearning. The need for machine unlearning, i.e., obtaining a model one would get without training on a subset of data, arises from privacy legislation and as a potential solution to data poisoning or copyright claims. The first part of the talk discusses approaches that provide exact unlearning: these approaches output the same distribution of models as would have been obtained by training without the subset of data to be unlearned in the first place. While such approaches can be computationally expensive, we discuss why it is difficult to relax the guarantee they provide to pave the way for more efficient approaches. The second part of the talk asks if we can verify unlearning. Here we show how an entity can claim plausible deniability when challenged about an unlearning request that was claimed to be processed, and conclude that at the level of model weights, being unlearnt is not always a well-defined property. Instead, unlearning is an algorithmic property.

Bio: Nicolas Papernot is an Assistant Professor of Computer Engineering and Computer Science at the University of Toronto. He also holds a Canada CIFAR AI Chair at the Vector Institute, and is a faculty affiliate at the Schwartz Reisman Institute. His research interests span the security and privacy of machine learning. Some of his group’s recent projects include generative model collapse, cryptographic auditing of ML, private learning, proof-of-learning, and machine unlearning. Nicolas is an Alfred P. Sloan Research Fellow in Computer Science and a Member of the Royal Society of Canada’s College of New Scholars. His work on differentially private machine learning was awarded an outstanding paper at ICLR 2022 and a best paper at ICLR 2017. He co-created the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) and is co-chairing its first two editions in 2023 and 2024. He previously served as an associate chair of the IEEE Symposium on Security and Privacy (Oakland), and an area chair of NeurIPS. Nicolas earned his Ph.D. at the Pennsylvania State University, working with Prof. Patrick McDaniel and supported by a Google PhD Fellowship. Upon graduating, he spent a year at Google Brain where he still spends some of his time.
Max Welling, University of Amsterdam
The Synergy between Machine Learning and the Natural Sciences
Abstract Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster Density Functional Theory, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks and Neural Quantum Error Correction codes.

Bio: Prof. Dr. Max Welling is a full professor and research chair in Machine Learning at the University of Amsterdam and a Merkin Distinguished Visiting Professor at Caltech. He is a Fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he served on the founding board. His previous appointments include Partner and VP at Microsoft Research, VP at Qualcomm Technologies, professor at UC Irvine, postdoc at UCL & U. Toronto under supervision of Prof. Geoffrey Hinton, and postdoc at Caltech under supervision of Prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate prof. Gerard ‘t Hooft.
Ricky Chen, Meta AI (FAIR) in New York
Discovering Latent Dynamics of the World: A Simulation-Free Perspective
Abstract Latent dynamics pervade the world and hence our observations of it, a.k.a. data. However, we never fully observe the data generation process, so how should we go about filling in the blanks in our observations? In this talk, I will discuss my perspective on the field of generative modeling as that of learning dynamical systems of the world. In particular, I will motivate and discuss general recipes for constructing and training generative models, with the central theme of simulation-free training paradigms. This simulation-free perspective allows us to decouple the algorithmic cost of training from the complexity of the data generation process. However, simple methods within this family class such as diffusion models are not readily amenable to additional constraints or regularizations that we wish to impose on the generation process. I will first introduce the Flow Matching approach for learning generative models where the generation process is directly prescribed. I will then discuss generalizations of this approach to setups where the generation process must lie on a manifold, and where the generation process is only implicitly defined as the solution to some task-specific objective function, connecting to problems appearing in stochastic optimal control and optimal transport.

Bio: Ricky is a Research Scientist at FAIR, Meta, based in New York. His research is on building simplified abstractions of the world through the lens of dynamical systems and flows. He generally works on integrating structured transformations into probabilistic modeling, with the goal of improved interpretability, tractable optimization, or extending into novel areas of application.
Christopher Rackauckas, MIT
SciML: Adding Scientific Models as Structure to Improve Machine Learning
Abstract Scientific machine learning (SciML) is the practice of adding scientific structure to improve the predictions from machine learning. In this talk we will showcase and explain how SciML techniques such as universal differential equations (UDEs) make it possible to improve the prediction and extrapolation capabilities of machine learning on small data. We will show various ways that physical laws, prior chemical knowledge, and conservation laws can be incorporated into a general learning process in order to give better predictions out of the same data. We will end by discussing some of the ways the SciML techniques can improve general machine learning with methods that automatically optimize hyperparameters, showing how solvers for ordinary differential equations can be used to give neural architectures with optimal depth and fast infinite layer architectures.

Bio: Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at He is the lead developer of the Pumas project and has received a top presentation award at every ACoP in the last 3 years for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.
Aapo Hyvärinen, University of Helsinki
Painful Intelligence: What AI Can Tell Us About Human Suffering
Abstract This talk discusses Aapo’s new book, which is freely available on his website ( book uses the modern theory of artificial intelligence (AI) to understand human suffering or mental pain. Both humans and sophisticated AI agents process information about the world in order to achieve goals and obtain rewards, which is why AI can be used as a model of the human brain and mind. The book starts with the assumption that suffering is mainly caused by frustration. Frustration means the failure of an agent (whether AI or human) to achieve a goal or a reward it wanted or expected. Frustration is inevitable because of the overwhelming complexity of the world, limited computational resources, and scarcity of good data. In particular, such limitations imply that an agent acting in the real world must cope with uncontrollability, unpredictability, and uncertainty, which all lead to frustration. Such computational theory is finally used to derive various interventions or training methods that will reduce suffering in humans. The ensuing interventions are very similar to those proposed by Buddhist and Stoic philosophy, and include mindfulness meditation.

Bio: Aapo Hyvärinen studied undergraduate mathematics at the Universities of Helsinki (Finland), Vienna (Austria), and Paris (France), and obtained a Ph.D. degree in Information Science at the Helsinki University of Technology in 1997. After post-doctoral work at the Helsinki University of Technology, he moved to the University of Helsinki in 2003, where he was appointed Professor in 2008, at the Department of Computer Science. From 2016 to 2019, he was Professor of Machine Learning at the Gatsby Computational Neuroscience Unit, University College London, UK. Aapo Hyvarinen is the main author of the books Independent Component Analysis (2001), Natural Image Statistics (2009), and Painful Intelligence (2022). He is Action Editor at the Journal of Machine Learning Research and Neural Computation, and has worked as Area Chair at ICML, ICLR, AISTATS, UAI, ACML and NeurIPS.
Peyman Milanfar, Google Research
Denoising as a Building Block for Imaging, Inverse Problems, and Machine Learning
Abstract Denoising is one of the oldest problems in imaging. There are thousands of papers on this topic, and their scope is vast and the approaches so diverse that putting them in some order (as I will do) is both useful and challenging. In the last decade, the quality of denoising algorithms has reached phenomenal levels – almost as good as we can ever hope. But besides this, we've found completely unexpected, brand new uses for denoising. I will describe what we can say about this general class of operators, and what makes them so special. I will argue that denoising is more important than ever; not simply as a process for removing noise, but especially now as a core engine and building block for much more complex tasks in imaging, inverse problems, and machine learning.

Bio: Peyman is a Distinguished Scientist / Senior Director at Google Research, where he leads the Computational Imaging team. Prior to this, he was a Professor of Electrical Engineering at UC Santa Cruz from 1999-2014. He was Associate Dean for Research at the School of Engineering from 2010-12. From 2012-2014 he was on leave at Google-x, where he helped develop the imaging pipeline for Google Glass. Over the last several years, Peyman's team at Google has developed several core technologies including the digital zoom pipeline for the Pixel phones, which includes the multi-frame super-resolution (Super Res Zoom) pipeline, and the RAISR upscaling algorithm. Most recently, his team led the development of the Unblur feature launched with Pixel 7/pro. Peyman received his undergraduate education in electrical engineering and mathematics from the University of California, Berkeley, and the MS and PhD degrees in electrical engineering from the Massachusetts Institute of Technology. He holds numerous patents, several of which are commercially licensed. He founded MotionDSP, which was acquired by Cubic Inc. Peyman has been keynote speaker at numerous technical conferences including Picture Coding Symposium (PCS), SIAM Imaging Sciences, SPIE, and the International Conference on Multimedia (ICME). Along with his students, he has won several best paper awards from the IEEE Signal Processing Society. He was a Distinguished Lecturer of the IEEE Signal Processing Society, and is a Fellow of the IEEE for contributions to inverse problems and super-resolution in imaging.
Graham Neubig, LTI @ Carnegie Mellon University
Towards Automating Machine Learning Engineering
Abstract When a skilled machine learning engineer is tasked with building a system for a specific application, they take several steps. Some of these include doing a literature review of the most appropriate models and datasets, choosing which ones to utilize based on accuracy and other constraints such as efficiency or latency, creating or curating training and testing data, training and comparing models, identifying weak points of the current modeling paradigm and iteratively improving. In this talk, I will discuss some two projects that take steps towards automation of this entire process. The first, prompt2model, is a method to solve the task of taking in a natural language task description (similar to a prompt that is provided to a system like ChatGPT) and utilize the entire open source model training ecosystem to train a small, easily deployable model that nonetheless has competitive accuracy with large language models. The second, Zeno, is an intelligent model comparison and error analysis tool that makes it possible for machine learning engineers to quickly uncover errors and weak spots, including methods for automatic blind-spot discovery.

Bio: Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses natural language processing, with a particular interest in fundamentals, applications, and understanding of large language models for tasks such as question answering, code generation, and multilingual applications. His final goal is that every person in the world should be able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.
Dongwon Lee, Penn State
Deepfakes, Language Models, and The Age of Synthetic Truth
Abstract The recent explosive advancements in both deepfake-enabling methods in Computer Vision and generative language models in NLP have enabled the generation of human-quality artifacts in various modalities. However, at the same time, these new AI technologies can be used by adversaries for malicious purposes, opening a window of opportunity for disinformation purveyors and state-sponsored hackers. In this talk, I’ll showcase some examples of deepfake artifacts and their underlying AI technologies, especially reviewing the current landscape of large language models. Then, I’ll discuss how adversaries may use such recent developments to create the so-called “Fake News 2.0,” which can erode the public’s confidence in democracy. Finally, I will conclude the talk by sharing the important implications of deepfakes within the information ecosystem as well as in society at large.

Bio: Dongwon Lee is a full professor and the director of the Ph.D. program in the Information School (also known as iSchool) at Penn State University, USA. He is also an ACM Distinguished Scientist (2019) and a Fulbright Cyber Security Scholar (2022). Before joining Penn State, he worked at AT&T Bell Labs in New Jersey and earned his Ph.D. in Computer Science from UCLA. From 2015 to 2017, he served as a Program Director at the National Science Foundation (NSF), co-managing cybersecurity education and research programs and contributing to the development of national research priorities. In general, his research focuses on problems at the intersection of data science, machine learning, and cybersecurity. For more details about his research, you can visit:
Stella Yu, University of Michigan
Unsupervised Learning Of Segmentation By Recognition and For Recognition
Abstract Image segmentation in computer vision has evolved such that it is routinely treated as an end task. For example, for autonomous driving, we are interested in segmenting a road scene into (cars, bikes, motorcycles, persons, trees, lamp-posts, traffic signs, curbs), etc. To differentiate a person in different contexts, we label (a person on a bike) a (bike-rider), (a person on a curb) a (it pedestrian), (a person on a horse) a (horse-rider). To understand the intent and action of a person, we want to segment a person into (head, torso, arms, legs). Segment-Anything-Model (SAM) takes supervised segmentation to a large scale, giving a false impression that segmentation is now solved. My view is that segmentation underlies the generalization capability of visual intelligence and supervised segmentation is simply the wrong approach. Segmentation should be treated not as an end-goal itself, but as an internal mid-level representation that serves visual recognition. I will present our recent works in this direction, including unsupervised learning of objectness and visual context, unsupervised discovery of visual semantic hierarchies and part-whole hierarchies.

Bio: Stella Yu received her Ph.D. from Carnegie Mellon University, where she studied robotics at the Robotics Institute and vision science at the Center for the Neural Basis of Cognition. Before she joined the University of Michigan faculty in Fall 2022, she has been the Director of Vision Group at the International Computer Science Institute, a Senior Fellow at the Berkeley Institute for Data Science, and on the faculty of Computer Science, Vision Science, Cognitive and Brain Sciences at UC Berkeley. Dr. Yu is interested not only in understanding visual perception from multiple perspectives, but also in using computer vision and machine learning to automate and exceed human expertise in practical applications.
David Stutz, Google DeepMind
Conformal prediction under ambiguous ground truth
Abstract In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.

Bio: David is a research scientist at Google DeepMind interested in robust and safe deep learning. Before, he completed his PhD at the Max Planck Institute for Informatics which included an internship at Google DeepMind and a collaboration with IBM Research. His PhD was supported by a Qualcomm Innovation Fellowship 2019 and received the DAGM MVTec Dissertation Award 2023. Other notable honors include an outstanding paper award at the CVPR 2021 CV-AML workshop, participation in the 7th and 10th Heidelberg Laureate forum, the RWTH Aachen University Springorum Denkmünze as well as the STEM-Award IT 2018 for his master thesis, and several national scholarships. He was repeatedly recognized as an outstanding/top reviewer for CVPR, ICML and NeurIPS. More details can be found on his blog at
Atlas Wang, Picsart / UT Austin
Whispers in the Weight: Unraveling the Mysteries of LLM Compression
Abstract Modern Large Language Models (LLMs) have revolutionized Natural Language Processing, yet their computational demands require compression. Through a series of studies, we delve into the intricacies of LLM compression and explore potential remedies. First, we challenge conventional compression evaluation metrics by introducing the Knowledge-Intensive Compressed LLM BenchmarK (LLM-KICK). This curated task collection provides nuanced insights into compression methods beyond perplexity. We illuminate pitfalls in existing pruning and quantization techniques, uncovering , for instance, the robustness of pruned LLMs in contextually demanding tasks. Next, we navigate the trade-offs of post-compression re-training and explore the promise of prompt-driven recovery. Through Inference-time Dynamic Prompting (IDP), prompts are autonomously selected based on context, resulting in a notable performance boost across a diverse range of tasks. Further, drawing inspiration from genomics, we conduct a holistic scientific study to examine weight redundancy in LLMs, articulating our findings as the Junk DNA Hypothesis for LLMs. This challenges common assumptions about low-magnitude weights, revealing their pivotal role in complex tasks, and that removing them risks irreversible knowledge loss.

Bio: Professor Zhangyang “Atlas” Wang is a tenured Associate Professor and holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. He is also a faculty member of UT Computer Science and the Oden Institute CSEM program. Meanwhile, in a part-time role, he serves as the Director of AI Research & Technology for Picsart, where he leads the development of cutting-edge, GenAI-powered tools for creative visual editing. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning (ML). At present, his core research mission is to leverage, understand and expand the role of low-dimensionality, from classical optimization to modern neural networks, whose impacts span over many important topics such as: efficient scaling, training and inference of large language models (LLMs); robustness and trustworthiness; learning to optimize (L2O); generative AI; and graph learning. Prof. Wang has received many research awards and is fortunate enough to work with a sizable group of accomplished students. His group:
Tri Dao, Together.AI / Princeton University
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Abstract Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. FlashAttention trains Transformers faster than existing baselines, with 2-4x speedup on the attention kernel. FlashAttention enables longer context in Transformers (4-16x longer than previous), yielding higher quality models. We will also describe recent improvements of FlashAttention: making use of new hardware features on A100 and H100 GPUs (another 2x speedup), optimizations for long-context LLM inference (2-4x faster end-to-end inference time), as well as how these ideas transfer to other model architectures.

Bio: Tri Dao is currently chief scientist of Together.AI and is an incoming Assistant Professor at Princeton University. He completed his PhD in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the interface of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.
Sharon Yixuan Li, UW Madison
How to Detect Out-of-Distribution Data in the Wild? Challenges, Research Progress, and Path Forward
Abstract When deploying machine learning models in the open and non-stationary world, their reliability is often challenged by the presence of out-of-distribution (OOD) samples. Since data shifts happen prevalently in the real world, identifying OOD inputs has become an important problem in machine learning. In this talk, I will discuss challenges, research progress, and opportunities in OOD detection. Our work is motivated by the insufficiency of existing learning objective such as ERM --- which focuses on minimizing error only on the in-distribution (ID) data, but do not explicitly account for the uncertainty that arises outside ID data. To mitigate the fundamental limitation, I will introduce a new algorithmic framework, which jointly optimizes for both accurate classification of ID samples, and reliable detection of OOD data. The learning framework integrates distributional uncertainty as a first-class construct in the learning process, thus enabling both accuracy and safety guarantees.

Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in the open world environments. She has served as Area Chair for ICLR, NeurIPS, ICML, and Program Chair for Workshop on Uncertainty and Robustness in Deep Learning. Her work is recognized by the AFOSR Young Investigator Program (YIP) award, NSF CAREER award, MIT Technology Review TR-35 Award, Forbes 30Under30 in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works also received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.
Hyung Won Chung, OpenAI
Large Language Models (in 2023)
Abstract There is one unique aspect of large language models (LLMs): larger models exhibit abilities that were not present in the smaller models. These emergent abilities have far-reaching consequences in how we should work in the field of AI. I will share some of my observations on the implications of scaling and emergent abilities. After that, I will introduce multiple stages involved in the current generations of LLM training:: pre-training and post-training (including instruction fine-tuning and RLHF). While a huge volume of research exists for each stage, the core aspects can be expressed relatively simply. I will introduce the fundamental aspects of each stage and discuss the unique challenges they pose.

Bio: Hyung Won is a research scientist at OpenAI ChatGPT team. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT.
Micah Goldblum, NYU
Bridging the gap between deep learning theory and practice
Abstract Despite the widespread proliferation of neural networks, the mechanisms through which they operate so successfully are not well understood. In this talk, we will first explore empirical and theoretical investigations into neural network training and generalization and what they can tell us about why deep learning works. Then, we will examine a recent line of work on algorithm learning. While typical neural networks are designed for pattern matching tasks, we consider whether neural networks can learn algorithms that scale to problem instances orders of magnitude larger than those seen during training.

Bio: Micah is a postdoctoral researcher at New York University working with Yann LeCun and Andrew Gordon Wilson. His research portfolio includes award winning work in Bayesian inference, generalization theory, and AI security. Before his current position, Micah received a Ph.D. in mathematics at the University of Maryland.
Guido Montúfar, UCLA
FoSR: First-order spectral rewiring for addressing oversquashing in GNNs
Abstract Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On the other hand, adding edges to the message-passing graph can lead to increasingly similar node representations and a problem known as oversmoothing. We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. We combine this with a relational architecture, which lets the GNN preserve the original graph structure and provably prevents oversmoothing. We find experimentally that our algorithm outperforms existing graph rewiring methods in several graph classification tasks. This is work with Kedar Karhadkar and Pradeep Kr. Banerjee.

Bio: Dr. Guido Montúfar is an Associate Professor at UCLA in Mathematics and Statistics & Data Science, and Research Group Leader at the Max Planck Institute for Mathematics in the Sciences. His research interests include Deep Learning Theory, Graphical Models, and Mathematical Machine Learning. He is a recipient of many prestigious awards including the ERC Starting Grant for Deep Learning Theory, the NSF CAREER award, and the 2022 Sloan Research Fellowship. Dr. Montúfar's work bridges the theoretical foundations of mathematics and machine learning, making significant contributions to both fields.
Nicholas Carlini, Google DeepMind
Are aligned language models adversarially aligned?
Abstract An aligned model is helpful and harmless. In this talk I will show that while language models may be aligned under typical situations, they are not adversarially aligned. Using standard techniques from adversarial examples, we can construct inputs to otherwise-aligned language models to coerce them into emitting harmful text and performing harmful behavior.

Bio: Nicholas Carlini is a research scientist at Google DeepMind working at the intersection of machine learning and computer security. His most recent line of work studies properties of neural networks from an adversarial perspective, for which he received best paper awards at ICML, USENIX, and IEEE S&P.
Taco Cohen, Qualcomm AI Research
Geometric Algebra Transformers: A Universal Architecture for Geometric Data
Abstract Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to Pin(3,0,1), the double cover of E(3): the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In various geometric problems, GATr shows strong improvements over non-geometric baselines.

Bio: Taco Cohen is a machine learning researcher (Principal Engineer) at Qualcomm AI Research in Amsterdam. He received a BSc in theoretical computer science from Utrecht University, and a MSc in artificial intelligence and PhD in machine learning (with prof. Max Welling) from the University of Amsterdam (all three cum laude). He was a co-founder of Scyfer, a company focussed on deep active learning, acquired by Qualcomm in 2017. His research is focused on geometric deep learning and reinforcement learning. During his studies he has interned at Google Deepmind (working with Geoff Hinton) and OpenAI. He received the 2014 University of Amsterdam MSc thesis prize, a Google PhD Fellowship, ICLR 2018 best paper award for “Spherical CNNs”, was named one of 35 innovators under 35 by MIT Tech Review, and won the 2022 ELLIS PhD Award and 2022 Kees Schouhamer Immink prize for his PhD research.
Johannes Brandstetter, Microsoft Research
Is it the network, or is it the data? Towards large-scale PDE surrogates
Abstract Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. In this talk, we approach such surrogates from two different angles. First, we have a closer look into possible ideas to best integrate physics into neural PDE surrogates. Second, we let known tricks from computer vision and the pure power of the data speak and assume that all the physics is in the data. Especially for the second, the model needs to be designed in a way to leverage it all. Finally, we compare these paradigms against each other and give an outlook.

Bio: Johannes Brandstetter did his PhD studying Higgs boson decays at the CMS experiment at the Large Hadron Collider at CERN. In 2018, he joined Sepp Hochreiter’s group in Linz, Austria. In 2021, he become ELLIS PostDoc at Max Welling’s lab at the University of Amsterdam. Since 2022, he is a Senior Researcher at the newly founded Microsoft Lab in Amsterdam. His current research interests comprise Geometric Deep Learning, neural PDE solving, and large-scale scientific simulations.
Jason Wei, OpenAI
Scaling Unlocks Emergent Abilities In Language Models
Abstract Scaling up language models has been shown to predictably improve performance on a wide range of downstream tasks. In this talk, we will instead discuss an unpredictable phenomenon that we refer to as emergent abilities of large language models. An ability is considered emergent if it is not present in smaller models but is present in larger models, which means that the ability cannot be predicted simply by extrapolating the performance of smaller models. With the popularization of large language models such as GPT-3, Chinchilla, and PaLM, dozens of emergent abilities have been discovered, including chain-of-thought prompting, which enables state-of-the-art mathematical reasoning, and instruction finetuning, which enables large language models to be usable by the broader population. The existence of such emergent phenomena raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.

Bio: Jason Wei is an AI researcher working on ChatGPT at OpenAI in San Francisco. He was previously a senior research scientist at Google Brain, where he popularized chain-of-thought prompting, co-led the first efforts on instruction tuning, and wrote about emergence in large language models. Chain-of-thought prompting was presented by Sundar Pichai at the Google I/O press event in 2022.
Diederik P. (Durk) Kingma, Google Research
Infinitely Deep Learning
Abstract Diffusion models have demonstrated amazing abilities for image and video generation. In this talk we explain some recent breakthroughs in understanding state-of-the-art diffusion models as infinitely deep variational autoencoders (VAEs). We start by introducing VAEs. Twe then introduce continuous-time diffusion models as infinitely deep VAEs, and how to optimize their evidence lower bound (ELBO). Finally, we present a new result that explains the objective functions used in state-of-the-art (SOTA) diffusion models as the ELBO with simple data augmentation. This opens up new avenues for optimizing other model families with the same objective as successful diffusion models. We will list some interesting open research questions in the diffusion model space.

Bio: Diederik P. (Durk) Kingma is a machine learning researcher at Google, with a focus on generative models. His contributions include the Variational Autoencoder (VAE), the Adam optimizer, Glow, and Variational Diffusion Models. He obtained a PhD (cum laude) from University of Amsterdam in 2017, and was part of the founding team of OpenAI in 2015.
Yang Song, OpenAI / Caltech (*)
Breaking the Curse of Dimensionality in Generative Modeling: A Homotopic Approach
Abstract Generative modeling for high-dimensional data, such as images and audio, is extremely challenging due to the curse of dimensionality. To overcome this difficulty, I introduce a homotopic approach inspired by numerical equation solving, which involves designing a homotopy of probability distributions that smoothly progresses from simple noise distribution to complex data distribution. I will present two families of approaches that rely on such homotopies: score-based diffusion models and consistency models. Both approaches use a differential equation to convert data to noise and learn to estimate the time reversal with deep neural networks. These models allow for flexible neural networks, enable zero-shot image editing, and generate high-quality samples that achieve state-of-the-art performance in many generative modeling benchmarks.

Bio: Yang Song is a research scientist at OpenAI and an incoming Assistant Professor at Caltech. His research interest is in deep generative models, inverse problem solving and AI safety. His research has been recognized with an Outstanding Paper Award at ICLR-2021, an Apple PhD Fellowship in AI/ML, a J.P. Morgan PhD Fellowship, and a WAIC rising star award.
Petar Veličković, DeepMind / University of Cambridge
Reasoning Algorithmically: from Toy Experiments to AGI Modules
Abstract Neural networks that are able to reliably execute algorithmic computation may hold transformative potential to both machine learning and theoretical computer science. On one hand, they could enable the kind of extrapolative generalisation scarcely seen with deep learning models. On another, they may allow for running classical algorithms on inputs previously considered inaccessible to them. Over the past few years, the pace of development in this area has gradually become intense. As someone who has been very active in its latest incarnation, I have witnessed these concepts grow from isolated 'toy experiments', through NeurIPS spotlights, all the way to helping detect patterns in complicated mathematical objects (published on the cover of Nature) and supporting the development of generalist reasoning agents. In this talk, I will give my personal account of this journey, and especially how our own interpretation of this methodology, and understanding of its potential, changed with time. It should be of interest to a general audience interested in graphs, (classical) algorithms, reasoning, and building intelligent systems.

Bio: Petar is a Staff Research Scientist at DeepMind, an Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. He holds a PhD in C.S from the University of Cambridge, working with Pietro Liò. His research concerns Geometric Deep Learning and has been featured in various top-tier conferences and news outlets. Currently, Petar focusing on Graph Representation Learning and its applications in Algorithmic Reasoning. He is also recognized as an ELLIS Scholar.
Hieu Pham, Google Research
Deep Learning After the Transformer
Abstract The field of machine learning has been through several exciting moments – kernel methods, Bayesian inference, non-parametric methods, to name a few. Every time a new approach pushed an existing limit, people wondered if the approach was “the best”. In our time of 2023, the Transformer is prevalent. Hardly can one find a research paper that does not mention this immensely successful model. But is the Transformer the best neural architecture? If so, can we explain why? If not, how can we improve it; more ambitiously, how can we make something better than it? In this talk, I invite you to contemplate these questions. I share my insights on the properties of the Transformer that make it favorable or not favorable for certain domains and tasks. Based on these insights, I discuss the potential directions for subsequent developments. I will discuss some recent work from my group that makes learning algorithms more efficient, with or without the Transformer.

Bio: Hieu Pham is a Research Scientist at Google Brain. He is currently focusing on improving the efficiency for large vision and language models. Before joining Google, Hieu received his Ph.D. from Carnegie Mellon University (CMU), where he worked on various AutoML projects. His work provided the foundation for one-shot neural architecture search which reduced the cost of AutoML algorithms by several orders of magnitude.
Thomas Beckers, Vanderbilt University
Safe Learning-based Control of Mechanical Systems
Abstract In modern technologies such as autonomous vehicles and service robots, control engineering plays a crucial role for the overall performance and safety of the system. However, the control design becomes often very time-consuming or even infeasible due to the increasing complexity of mechanical systems. The classical control approaches, which are based on models of the systems using first principles, are not satisfactory in the presence of complex dynamics, e.g., for highly nonlinear systems or interaction with prior unknown environment. Recent findings in computational intelligence and machine learning have shown that data-driven approaches lead to very promising results in a wide application domain including the modeling of complex dynamics. However, the major drawback in data-driven approaches frequently manifests as unpredictable outcomes. Therefore, the current application of machine learning in control is typically limited to non-critical and low performance systems. In this talk, I will present our results on safe learning-based control of partially unknown mechanical systems. In the first part of the seminar, I will show how we leverage Gaussian processes for the learning of unknown dynamics in the system. Gaussian process (GP) models are of high interest due to many beneficial properties such as the bias-variance trade-off and the strong connection to Bayesian mathematics. We exploit the Bayesian structure to include prior knowledge about the system into the learning process. In the second part, I will present a learning-enhanced model-based control law which guarantees safe control of mechanical systems with partially unknown dynamics. This control law combines the strength of model-based control with the flexibility of machine learning techniques. I demonstrate how we actively exploit the uncertainty of the GP model to guarantee high-performance and stability of the closed-loop.

Bio: Thomas Beckers is an Assistant Professor of Computer Science and the Institute for Software Integrated Systems at Vanderbilt University. Before joining Vanderbilt, he was a postdoctoral researcher at the Department of Electrical and Systems Engineering, University of Pennsylvania, where he was member of the GRASP Lab, PRECISE Center and ASSET Center. In 2020, he earned his doctorate in Electrical Engineering at the Technical University of Munich (TUM), Germany. He received the B.Sc. and M.Sc. degree in Electrical Engineering in 2010 and 2013, respectively, from the Technical University of Braunschweig, Germany. In 2018, he was a visiting researcher at the University of California, Berkeley. He is a DAAD AInet fellow and was awarded with the Rhode & Schwarz Outstanding Dissertation price. His research interests include physics-enhanced learning, nonparametric models, and safe learning-based control.
Guanya Shi, University of Washington / CMU (*)
Neural-Control Family: Safe Agile Deep-learning-based Robotic Control in Dynamic Environments
Abstract Recent breathtaking advances in machine learning beckon to their applications in a wide range of autonomous systems. However, for safety-critical settings such as agile robotic control in hazardous environments, we must confront several key challenges before widespread deployment. Most importantly, the learning system must interact with the rest of the autonomous system (e.g., highly nonlinear and non-stationary dynamics) in a way that safeguards against catastrophic failures with formal guarantees. In addition, from both computational and statistical standpoints, the learning system must incorporate prior knowledge for efficiency and generalizability. In this talk, I will present progress toward establishing a unified framework that fundamentally connects learning and control. In particular, I will introduce a concrete example in such a unified framework called Neural-Control Family, a family of deep-learning-based nonlinear control methods with not only stability and robustness guarantees but also new capabilities in agile robotic control. For example, Neural-Swarm enables close-proximity flight of a drone swarm and Neural-Fly enables precise drone control in strong time-variant wind conditions.

Bio: Guanya Shi is an incoming (Fall 2023) Assistant Professor at the Robotics Institute and the School of Computer Science at Carnegie Mellon University (CMU). He is currently a postdoctoral scholar at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. He completed his Ph.D. in 2022 from Caltech and received a B.E. from Tsinghua University in 2017. He is broadly interested in the intersection of machine learning and control theory, spanning the entire spectrum from theory to real-world agile robotics. Guanya was the recipient of several awards, including the Simoudis Discovery Prize and the Ben P.C. Chou Doctoral Prize from Caltech, and the Rising Star in Data Science from the University of Chicago.
Thomas Kipf, Google Research
Structured Scene Understanding: Objects, Dynamics, 3D
Abstract The world around us — and our understanding of it — is rich in compositional structure: from atoms and their interactions to objects and agents in our environments. How can we learn scalable models of the physical world that capture this structure from raw, unstructured observations? In this talk, I will cover our team’s recent work on structured scene understanding: I will introduce an emergent class of slot-centric neural architectures that use a set of latent variables (“slots”) grounded in the physical scene. Slots are decoupled from the image grid and can learn to capture objects or more fine-grained scene components, model their dynamics, and learn 3D-consistent representations when a scene is observed from multiple viewpoints. I will briefly introduce the Slot Attention mechanism as a core representative for this class of models and cover recent extensions to video (SAVi, SAVi++), 3D (OSRT), and visual dynamics simulation (SlotFormer).

Bio: Thomas Kipf is a Senior Research Scientist at Google Brain in Amsterdam. His research focuses on developing machine learning models that can reason about the rich structure of the physical world. He obtained his PhD from the University of Amsterdam with a thesis on “Deep Learning with Graph-Structured Representations”, advised by Max Welling. He was recently elected as an ELLIS Scholar and received the ELLIS PhD Award.
Brandon Amos, Meta AI
Learning with differentiable and amortized optimization
Abstract Optimization has been a transformative modeling and decision-making paradigm over the past century that computationally encodes non-trivial reasoning operations. Developments in optimization foundations alongside domain experts have resulted in breakthroughs for 1) controlling robotic, autonomous, mechanical, and multi-agent systems, 2) making operational decisions based on future predictions, 3) efficiently transporting or matching resources, information, and measures, 4) allocating budgets and portfolios, 5) designing materials, molecules, and other structures, 6) solving inverse problems to infer underlying hidden costs, incentives, geometries, terrains, and other structures, and 7) learning and meta-learning the parameters of predictive and statistical models. These settings often analytically specify the relevant models of the world along with an explicit objective to optimize for. Once these are specified, computational optimization solvers are able to search over the space of possible solutions or configurations and return the best one. The magic of optimization stops when 1) the relevant models of the world are too difficult or impossible to specify, leading to inaccurate or incomplete representations of the true setting, and 2) solving the optimization problem is computationally challenging and takes too long to return a solution on today's hardware. Machine learning methods help overcome both of these by providing fast predictive models and powerful latent abstractions of the world. In this talk, I will cover two ways of tightly integrating optimization and machine learning methods: 1. *Differentiable optimization* characterizes how the solution to an optimization problem changes as the inputs change. In machine learning settings, differentiable optimization provides an implicit layer that integrates optimization-based domain knowledge into the model and enables unknown parts of the optimization problem to be learned. I will cover the foundations of learning these layers with implicit differentiation and highlight applications in robotics and control settings. 2. *Amortized optimization* rapidly predicts approximate solutions to optimization problems and is useful when repeatedly solving optimization problems. Traditional optimization methods typically solve every new problem instance from scratch, ignoring shared structures and information when solving a new instance. In contrast, a solver augmented with amortized optimization learns the shared structure present in the solution mappings and better-searches the domain. I will cover the foundations of amortized optimization and highlight new applications in control and optimal transport.

Bio: Brandon Amos is a Research Scientist in Meta AI’s Fundamental AI Research group in NYC. He holds a PhD in Computer Science from Carnegie Mellon University and was supported by the USA National Science Foundation Graduate Research Fellowship (NSF GRFP). Prior to joining Meta, he has worked at Adobe Research, DeepMind, and Intel Labs. His research interests are in machine learning and optimization with a recent focus on reinforcement learning, control, optimal transport, and geometry.
Nhat Ho, UT-Austin
Hierarchical and Sequential Perspectives on Sliced Wasserstein Distance
Abstract From its origins in work by Monge and Kantorovich, the Wasserstein distance has played an important role in the theory of mathematics. In the current era, the strong and increasing connection between optimization and machine learning has brought new applications of the Wasserstein distance to the fore. In these applications, the focus is on learning the probability distributions underlying the Wasserstein distance formulation. However, the Wasserstein distance has been known to suffer from expensive computation and the curse of dimensionality. It creates several hurdles of using the Wasserstein distance in statistical machine-learning applications. A well-known approach to overcome the statistical and computational limits of the Wasserstein distance is by projecting the probability distributions into the one-dimensional manifold, which refers to as the sliced Wasserstein distance. The sliced Wasserstein distance leverages the closed-form expression of the Wasserstein distance in one dimension; therefore, its computational complexity is only linear in the number of supports of the probability distributions while the statistical rate is parametric for learning probability distributions. Despite these advantages of the sliced Wasserstein distance, it still suffers from two fundamental challenges in large-scale high dimensional statistical machine learning settings: (1) High projection complexities, namely, the number of projections to approximate the value of the sliced Wasserstein distance is huge and scales with the dimension of the problem; (2) Uninformative projecting directions, namely, there are several redundant projections to approximate the value of the sliced Wasserstein distance In this talk, we propose two fundamental approaches to tackle the above challenges of the sliced Wasserstein distance. Our first approach hierarchically projects probability measures into low-dimensional spaces before projecting them into one-dimensional space. The hierarchical projections lead to an improvement in projection complexity and enhance the expressiveness of the projection of the sliced Wasserstein distance. Our second approach considers sequential sampling for projecting directions to allow the sharing of information on new projecting directions based on the previous directions. It increases the quality of projections in terms of highlighting the difference between the probability measures and leads to a smaller number of projections, which improves the computational complexity of the sliced Wasserstein distance.

Bio: Nhat Ho is currently an Assistant Professor of Data Science, Machine Learning, and Statistics at the University of Texas at Austin. He is a core member of the University of Texas Austin Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning. A central theme of his research focuses on four important aspects of complex and large-scale models and data: (1) Interpretability, efficiency, and robustness of deep learning and complex machine learning models, including Transformer architectures, Deep Generative Models, Convolutional Neural Networks, etc.; (2) Scalability of Optimal Transport for machine learning and deep learning applications; (3) Stability and optimality of optimization and sampling algorithms for solving complex statistical machine learning models; (4) Heterogeneity of complex data, including mixture and hierarchical models, Bayesian nonparametrics.
Animesh Garg, NVIDIA / UofT / Georgia Tech (*)
Building Blocks of Generalizable Autonomy: Duality of Discovery & Bias
Abstract Generalization in embodied intelligence, such as in robotics, requires interactive learning across families of tasks is essential for discovering efficient representation and inference mechanisms. Concurrent systems need a lot of hand-holding to even learn a single cognitive concept or a dexterous skill, say “open a door”, let alone generalizing to new windows and cupboards! This is far from our vision of everyday robots! would require a broader concept of generalization and continual update of representations. This study of the science of embodied AI opens three key questions: (a) Representational biases & Causal inference for interactive decision-making, (b) Perceptual representations learned by and for interaction, and (c) Systems and abstractions for scalable learning.

Bio: Animesh Garg is a Stephen Fleming Early Career Professor at the School of Interactive Computing at Georgia Tech. He leads the People, AI, and Robotics (PAIR) research group. He is on the core faculty in the Robotics and Machine Learning programs. Animesh is also a Senior Researcher at Nvidia Research. Animesh earned a Ph.D. from UC Berkeley and was a postdoc at the Stanford AI Lab. He is on leave from the department of Computer Science at the University of Toronto and the CIFAR Chair position at the Vector Institute. His work aims to build Generalizable Autonomy which involves a confluence of representations and algorithms for reinforcement learning, control, and perception. He currently studies three aspects: learning structured inductive biases in sequential decision-making, using data-driven causal discovery, and transfer to real robots — all in the purview of embodied systems.
Parinaz Naghizadeh, OSU
Social Bias Meets Data Bias: Biased Training Data and Fair AI
Abstract Biases in existing training datasets used in algorithmic decision making, which can arise due to, e.g., prior labeling or feature measurement errors, raise ethical and economic concerns due to the resulting disparate treatment of different groups. In this talk, we will first investigate the robustness of a few existing (demographic) fairness criteria when the algorithm is trained on biased data. We show, both analytically and numerically, that some constraints can remain robust when facing certain forms of statistical bias in the training data. I will then briefly talk about an algorithm for sequential debiasing of such datasets through adaptive and bounded exploration. This is joint work with Yiqiao Liao, Yifan Yang, and Yang Liu.

Bio: Parinaz Naghizadeh is an assistant professor in the Integrated Systems Engineering and Electrical and Computer Engineering departments at The Ohio State University. Prior to joining OSU in 2019, she was a postdoctoral researcher at Purdue University and Princeton University. She received her PhD in electrical engineering from the University of Michigan in 2016. Her research interests are in network economics, game theory, algorithmic economics, and reinforcement learning. She is a recipient of the NSF CAREER award in 2022, a Rising Stars in EECS in 2017, and a Barbour Scholarship in 2014.
Hua Wei, New Jersey Institute of Technology
Towards Actionable Decision-Making in the Real World
Abstract This talk presents how to utilize data and advanced learning methods for actionable decision-making in the real world. This talk will use the decision-making in the city as a running example, firstly examining why today we have the opportunity for a potential breakthrough in actionable decision-making. Second, the talk presents our research results in reinforcement learning for traffic signal control which are published in KDD, AAAI, and CIKM conferences. Finally, I would like to discuss the open challenges in this research topic, its implications for actionable decision-making, and our preliminary efforts in addressing these challenges.

Bio: Hua Wei is an assistant professor in the Department of Informatics at the New Jersey Institute of Technology (NJIT). He obtained his Ph.D. from the Pennsylvania State University. His research interests include reinforcement learning, data mining, and urban computing. His papers have been published at high-impact venues (e.g., NeurIPS, KDD, AAAI, IJCAI, CIKM, ECML-PKDD, etc.). His research has been awarded the Best Applied Data Science Paper Award at ECML-PKDD 2020 and funded by NSF and the Department of Energy.

Video Link
Ziv Goldfeld, Cornell University
Statistical and Computational Aspect of Sliced Optimal Transport
Abstract As machine learning/inference tasks boil down to comparing or transforming complicated probability distributions, optimal transport (OT) theory---which provides a potent framework for doing so---has emerged as a tool of choice for design and analysis. Its adoption was driven by an array of favorable properties, including robustness to support mismatch, a powerful duality theory, and the Wasserstein metric it defines on the space of probability measures, which endows it with a rich geometry. Alas, statistical OT is bottlenecked by the curse of dimensionality, whereby quantitative results either deteriorate exponentially with dimension or are largely unavailable (e.g., limit theorems, resampling, efficiency). In turn, resulting performance bounds for OT-based learning methods are often vacuous or, worse yet, missing. Slicing is a modern regularization technique by which one computes the average/maximized OT distance between different low-dimensional projections of the high-dimensional distributions. This framework inherits many structural properties of classical OT but alleviates the empirical curse of dimensionality. This talk will present recent advancements in the statistical and computational analysis of sliced OT methods. We will cover fast empirical convergence rates, high-dimensional limit distribution theorems, as well as formal guarantees for computational methods such as Monte Carlo integration (for average-slicing) and projected subgradient methods (for max-slicing). Applications to implicit generative modeling will be discussed and serve to motivate the statistical exploration.

Bio: Ziv Goldfeld is an assistant professor in the School of Electrical and Computer Engineering, and a graduate field member in Computer Science, Statistics, Data Science, and the Center of Applied Mathematics, at Cornell University. Before joining Cornell, he was a postdoctoral research fellow in LIDS at MIT. Ziv graduated with a B.Sc., M.Sc., and Ph.D. (all summa cum laude) in Electrical and Computer Engineering from Ben Gurion University, Israel. Ziv’s research interests include optimal transport theory, statistical learning theory, information theory, and mathematical statistics. He seeks to understand the theoretical foundations of modern inference and information processing systems by formulating and solving mathematical models. Honors include the NSF CAREER Award, the IBM University Award, and the Rothschild Postdoctoral Fellowship.

Video Link
Baharan Mirzasoleiman, UCLA
Coresets for Efficient and Robust Learning from Massive Datasets
Abstract Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes, noisy labels, and malicious data points. In such cases, training on the entire data does not result in a high-quality model. In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the most informative subsets for learning from massive datasets. Training on such subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels and data poisoning attacks. I will discuss how we can develop effective and theoretically rigorous techniques that provide strong guarantees for the learned models’ quality and robustness against noisy labels.

Bio: Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at University of California Los Angeles. Baharan’s research focuses on developing new methods that enable efficient and robust learning from massive datasets. She received her PhD from ETH Zurich, and was a Postdoc at Stanford University. She was awarded an ETH medal for Outstanding Doctoral Dissertation, and a Google Anita Borg Memorial Scholarship. She was also selected as a Rising Star in EECS from MIT, and received an NSF Career Award.

Video Link
Chen Feng, NYU
3D Deep Learning for Soft Robotics and Self-Driving
Abstract Deep learning on 3D data like point clouds offers many new possibilities for robotics and self-driving. It leads to efficient tools to represent complex objects and scenes in the 3D world which robots and autonomous vehicles need to interact with. In this talk, I will discuss my group's work on both object-level and scene-level 3D deep learning. At the object level, I will explain FoldingNet (CVPR'18), a 3D point cloud auto-encoder that essentially resembles the paper-folding operations in its lightweight decoder with better shape reconstruction performance. This new decoder can address a challenging robotics task: soft robot proprioception. At the scene level, I will explain DiscoNet (NeurIPS'21), an efficient collaborative perception method using a dynamic directed graph with matrix-valued edge weights for an ego-vehicle to adaptively retrieve the most important complementary information from its neighboring vehicles. This could improve LiDAR-based perception's performance and robustness in self-driving against challenges such as data sparsity and occlusions. At last, I will briefly introduce our new public dataset V2X-Sim (RA-L'22), to facilitate research in 3D (and 2D) deep learning for collaborative perception.

Bio: Dr. Chen Feng is an assistant professor at NYU, appointed across departments including civil and mechanical engineering and computer science. His lab AI4CE (pronounced as A-I-force) aims to advance robot vision and machine learning through multidisciplinary use-inspired research that originates from engineering domains. Before NYU, Chen was a research scientist in the computer vision group at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA, focusing on localization, mapping, and deep learning for self-driving cars and robotics. Chen holds a Bachelor's degree in geospatial engineering from Wuhan University in China, and a master’s degree in electrical engineering and a Ph.D. in civil engineering, both from the University of Michigan at Ann Arbor. While publishing in and reviewing for prestigious AI/Robotics venues like CVPR/ICCV/ICRA/IROS, Chen also serves as an associate editor for IEEE Robotics and Automation Letters (RA-L). More information on his research can be found at

Video Link
Daniel Moyer, Vanderbilt University
Invariant Representations
Abstract The removal of unwanted information is a surprisingly common task. Removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation all require the selective removal of information. In this talk, I will describe a method for constructing such representations by minimizing mutual information in a variational setting. This path also provides insight into adversarial methods and their training schema. We will then discuss applications and implications in multi-site MRI, style transfer, and fair representation.

Bio: Daniel Moyer will join the Computer Science Department at Vanderbilt University for the Fall 2022 semester as an Assistant Professor. Previously, he was a post-doc in CSAIL at MIT, working with Prof. Polina Golland on fetal MRI. He received his doctorate in 2019 from the University of Southern California under Paul Thompson and Greg Ver Steeg, where he worked on representation learning problems in diffusion MRI and neuroimaging.

Video Link
Kayhan Batmanghelich, University of Pittsburgh
Bridging between AI Models & Medical Insights: Learning, Inference, & Model Explanation Applications
Abstract The healthcare industry is arriving at a new era where the medical communities increasingly employ computational medicine and machine learning. Despite significant progress in the modern machine learning literature, adopting the new approaches has been slow in the biomedical and clinical research communities due to the lack of explainability and limited data. Such challenges present new opportunities to develop novel methods that address AI's unique challenges in medicine. This talk has three parts. In the first part of the talk, I show examples of model explainability (XAI) tailored toward AI in Radiology applications. More specifically, I integrate ideas from causal inference for XAI (e.g., counterfactual, mediation analysis). The second part presents examples of incorporating medical insight for self-supervised learning of imaging phenotype. Finally, I address the issue of partial missingness (a common problem using clinical data) in imaging genetics for statistical independence tests.

Bio: Kayhan Batmanghelich is an Assistant Professor of the Department of Biomedical Informatics and Intelligent Systems Program with secondary appointments in the Electrical and Computer Engineering and the Computer Science Department at the University of Pittsburgh. He received his Ph.D. from the University of Pennsylvania (UPenn) under the supervision of Prof. Ben Taskar and Prof. Christos Davatzikos. He spent three years as a postdoc in Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. Polina Golland. His research is at the intersection of medical vision, machine learning, and bioinformatics. His group develops machine learning methods that address the interesting challenges of AI in medicine, such as explainability, learning with limited and weak data, and integrating medical image data with other biomedical data modalities. His research is supported by awards from NIH and NSF and industry-sponsored projects.

Video Link
Nick Cheney, University of Vermont
A Case for an Embodied Intelligence Perspective on Neural Architecture Search
Abstract Neural Architecture Search (NAS) aims to find the optimal structure of deep neural network. Various approaches to the design of network architectures have been proposed in recent years. In this talk, I'll discuss how we might draw inspiration from the design of shape and form in biological systems to find complex and adaptable neural network designs. Specifically, I'll conjecture about how recent methods and principles from embodied cognition and evolutionary robotics may be translated into an embodied perspective on NAS.

Bio: Nick Cheney is an Assistant Professor of Computer Science at the University of Vermont, where he directs the UVM Neurobotics Lab and is a core member of the Complex Systems and Data Science program. Prior to Vermont, Nick received a Ph.D. in Computational Biology from Cornell, co-advised by Hod Lipson and Steve Strogatz, and was a postdoctoral researcher at the University of Wyoming working with Jeff Clune (now at OpenAI and the University of British Columbia). He has also served as a visiting researcher at the Santa Fe Institute, NASA Ames, and Columbia University. Nick's research aims to lower the barrier to machine learning by producing more robust, scalable, and self-configurable neural network algorithms and architectures -- with a specific focus on meta-learning methods.

Video Link
Suraj Srinivas, Harvard University
Pitfalls of Saliency Map Interpretation in Deep Neural Networks
Abstract A popular method of interpreting neural networks is to use saliency map representations, which assign importance scores to each input feature of the model. In this talk, I will discuss two of our works that expose pitfalls in these methods. First, we will discuss how existing saliency maps cannot satisfy two desirable properties simultaneously and propose the “full-gradient representation” which avoids these problems. Based on this representation, we propose an approximate saliency method called FullGrad which we find explains model behavior better than competing methods in the literature. Second, we find that a popular saliency map method, the input-gradients, can be arbitrarily structured due to the shift-invariance of SoftMax. We investigate why standard neural network models have input-gradients with interpretable structure even when this is unnecessary, and we find that standard models have an implicit generative modeling component, which is responsible for this behavior. Overall, our works show that interpreting black-box models using off-the-shelf interpretability methods can be risky and must be used with caution.

Bio: Suraj Srinivas is a postdoctoral research fellow at Harvard University where he works with Prof. Hima Lakkaraju on the foundations of interpretable deep learning. He completed his Ph.D. at Idiap Research Institute & EPFL in Switzerland, advised by Prof. François Fleuret. His Ph.D. thesis on the pitfalls of gradient-based explanation methods in deep learning received the EPFL thesis distinction award in electrical engineering. His research interests are interpretability, robustness, and compression of deep neural networks.

Video Link
Hossein Mobahi, Google Research
Sharpness-Aware Minimization (SAM): Current Method and Future Directions
Abstract In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a new and effective procedure for instead simultaneously minimizing loss value and loss sharpness. Our procedure, Sharpness- Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the art procedures that specifically target learning with noisy labels. Finally, we will discuss possible directions for further research around SAM.

Bio: Hossein Mobahi is a senior research scientist at Google Research. His current interests revolve around the interplay between optimization and generalization in deep neural networks. Prior to joining Google in 2016, he was a postdoctoral researcher at CSAIL of MIT. He obtained his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC).

Video Link
Xiaorui Liu, North Carolina State University
Communication-Efficient Distributed Machine Learning
Abstract The success of modern AI systems relies on large-scale machine learning on big data. Distributed machine learning systems provide the computational infrastructure for such success by utilizing the parallel computation power of massive computation devices. However, the scalability and efficiency of these systems are greatly limited by the high communication cost between the devices. In this talk, I will discuss how to design communication-efficient distributed ML algorithms. Specifically, I will introduce novel decentralized algorithms with communication compression that reduce 95% of the communication bits without sacrificing the convergence complexities. These algorithms fundamentally improve the efficiency of large-scale ML both theoretically and numerically.

Bio: Xiaorui Liu is an incoming assistant professor in the Computer Science Department at North Carolina State University starting from 2022 Fall. He will get his Ph.D. degree from Michigan State University advised by Prof. Jiliang Tang. His research interests include distributed and trustworthy machine learning, with a focus on big data and graph data. He was awarded the Best Paper Honorable Mention Award at ICHI 2019, MSU Engineering Distinguished Fellowship, and Cloud Computing Fellowship. He organized and co-presented five tutorials in KDD 2021, IJCAI 2021, ICAPS 2021, and WWW 2022, and he has published innovative works in top-tier conferences such as NeurIPS, ICML, ICLR, KDD, AISTATS, and SIGIR.

Video Link
Dongkuan Xu, North Carolina State University
Resource-efficient Deep Learning: Democratizing AI at Scale
Abstract The phenomenal success of deep learning in the past decade has been mostly driven by the construction of increasingly large deep neural network models. These models usually impose an ideal assumption that there are sufficient resources, including large-scale parameters, sufficient data, and massive computation, for the optimization. However, this assumption usually fails in real-world scenarios. For example, computer memory may be limited as in edge devices, large-scale data are difficult to obtain due to expensive costs and privacy constraints, and computational power is constrained as in most university labs. As a result, these resource discrepancy issues have hindered the democratization of deep learning techniques in many AI applications, and the development of efficient deep learning methods that can adapt to different resource constraints is of great importance. In this talk, I will present my recent research contributions centered around resource-efficient deep learning to free AI from the parameterdata-computation hungry beast. First, I will introduce my contribution on neural network pruning under the pretrain-then-finetune paradigm, which improves the parameter efficiency of large-scale language models in the inference phase, resulting in pruned models with an order-of-magnitude fewer parameters than the original model while achieving the same or better prediction accuracy. Then, I will talk about my task-agnostic neural architecture search framework to reduce the computational cost in the training phase for finding the best-pruned models, which is complementary to improving the parameter efficiency in the inference phase. Finally, I will conclude my presentation with a brief overview of my ongoing and future work as part of a broader research agenda of new and related problems and potential collaborations in the next few years.

Bio: Dongkuan (DK) Xu is an incoming Assistant Professor in the CS Department at NC State. DK will get his Ph.D. at Penn State in June 2022 under the supervision of Dr. Xiang Zhang. His research interest is resource-efficient deep learning for AI at scale. DK has published more than 25 papers in top conferences and journals, including NeurIPS, AAAI, ACL, NAACL, and IJCAI. He has served as a PC member for over 28 major conferences and 14 journals. DK also has extensive research experience in the industry. He has interned at Microsoft Research Redmond, Moffett AI, and NEC Labs America, and holds 8 US patents/applications.

Video Link
Soheil Kolouri, Vanderbilt University
Brain-Inspired Lifelong Learning Machines
Abstract The next wave of AI demands a new type of machine learning framework that can continually learn and adapt to the stream of nonstationary multimodal information. This challenge is referred to as continual, lifelong, or incremental learning in the ML community. Since humans and primates are our best examples of lifelong learners, we believe that a better understanding of the biological underpinnings that support continual learning could be instrumental in advancing continual machine learning. In this talk, we first characterize continual learning as a multi-faceted problem and enumerate some of the known biological mechanisms in the brain that contribute to these characteristics. We then draw connections between existing AI/ML solutions for continual learning and known biological mechanisms and lay a road map for next-generation lifelong machine learners. Finally, we present some of our recent work toward advancing the field of continual learning with a focus on meta-plasticity and neuromodulation.

Bio: Soheil Kolouri is an Assistant Professor of Computer Science at Vanderbilt University, Nashville, TN, and the director of Machine Intelligence and Neural Technologies (MINT) lab. His research interests include continual learning, bio-inspired machine learning, geometric deep learning, and computational optimal transport. Before joining Vanderbilt University, he was a research scientist and principal investigator at HRL Laboratories, Malibu, CA, where he was the PI and the Co-PI on multiple DARPA programs involving next-generation machine learning. Soheil obtained his Ph.D. in Biomedical Engineering from Carnegie Mellon University where he received the Bertucci Fellowship Award for outstanding graduate students from the College of Engineering in 2014 and the Outstanding Dissertation Award from the Biomedical Engineering Department in 2015.

Video Link
Matthias Fey, TU Dortmund University
Auto-Scaling GNNs
Abstract In this talk, we will take a theoretical and practical look at scaling Neural Networks (GNNs) up to massive graphs, based on our GNNAutoScale (GAS) framework. GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption with respect to input node size without dropping any data. While existing solutions weaken the expressive power of message passing due to sub-sampling of edges or non-trainable propagations, our approach is provably able to maintain the expressive power of the original GNN. We further discuss challenges regarding its implementation within our PyTorch Geometric (PyG) library and verify its practical benefits on a variety of large graph benchmark datasets.

Bio: Matthias Fey is a fourth-year Ph.D. student at the computer graphics lab at the TU Dortmund University, Germany, and a co-founder of which aims to make state-of-the-art GNN solutions readily available to large-scale data warehouses. His main area of research lies in the development of new deep learning methods that can be directly applied to unstructured data such as graphs, point clouds, and manifolds. Furthermore, he is the creator of the PyTorch Geometric (PyG) library, which aims to bundle many of the proposed methods in this area to make research more accessible, comparable, and reproducible, and is a core member of the Open Graph Benchmark (OGB) team. Matthias studied Computer Science at the TU Dortmund where he received his B.Sc. in 2013 and his Master’s degree in 2017.

Video Link
Philipp Petersen, University of Vienna
Optimal Representation and Learning of Classifier Functions
Abstract Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime. We will interpret the learning of classifiers as finding piecewise constant functions from labeled samples. We then precisely link the hardness of the learning problem to the complexity of the regions. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning. In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.

Bio: Philipp Petersen is a tenure-track assistant professor for machine learning at the mathematical institute of the University of Vienna. Before that, he completed a post-doc position at the University of Oxford and did his PhD at the Technical University of Berlin. His research focuses on the interplay of deep neural networks and numerical analysis. Particular foci are the expressivity of various architectures of deep neural networks, structural challenges for the optimization or training of deep neural networks, and the applicability of deep learning in numerical algorithms to solve partial differential equations or inverse problems.

Video Link
Lingfei Wu, JD.COM
Graph Neural Networks: Foundations, Frontiers, and Applications
Abstract The field of graph neural networks (GNNs) has seen rapid and incredible strides over recent years. Graph neural networks, also known as deep learning on graphs, graph representation learning, or geometric deep learning, have become one of the fastest-growing research topics in machine learning, especially deep learning. This wave of research at the intersection of graph theory and deep learning has also influenced other fields of science, including recommendation systems, natural language processing, program synthesis, software mining, cybersecurity, and intelligent transportation. However, as the field rapidly grows, it has been extremely challenging to gain a global perspective of the developments of GNNs. Therefore, we feel the urgency to bridge the above gap and have a comprehensive tutorial on this fastgrowing yet challenging topic. In this talk, we will talk about our recent book titled Graph Neural Networks: Foundation, Frontiers and Applications , one of the most comprehensive books for researchers and practitioners for reading and studying in GNNs. It covers a broad range of topics in graph neural networks, by reviewing and introducing the fundamental concepts and algorithms, new research frontiers, and broad and emerging applications of GNNs.

Bio: Dr. Lingfei Wu is a Principal Scientist at JD.COM Silicon Valley Research Center, leading a team of 30+ ML/NLP scientists and software engineers to build intelligent e-commerce personalization systems. He earned his Ph.D. degree in computer science from the College of William and Mary in 2016. Previously, he was a research staff member at IBM Thomas J. Watson Research Center and led a 10+ research scientist team for developing novel Graph Neural Networks methods and systems, which leads to the #1 AI Challenge Project in IBM Research and multiple IBM Awards including three-time Outstanding Technical Achievement. He was the recipients of the Best Paper Award and Best Student Paper Award of several conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20, and KDD workshop on DLG’19. His research has been featured in numerous media outlets, including NatureNews, YahooNews, Venturebeat, TechTalks, SyncedReview, Leiphone, QbitAI, MIT News, IBM Research News, and SIAM News.

Video Link
Hamed Pirsiavash, UC Davis
Self-Supervised Learning for Visual Recognition
Abstract We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, selfsupervised feature learning methods exploiting unlabeled data can be more scalable and flexible. I will present some of our recent efforts in this direction. More specifically, I will talk about our recent work on using similarity between a random set of images to learn better visual representations and to compress selfsupervised features from deeper models to smaller ones.

Bio: Hamed Pirsiavash is an associate professor at the University of California, Davis. Prior to this, he was an associate professor at the University of Maryland Baltimore County and a postdoctoral research associate at MIT. He obtained his Ph.D. at the University of California, Irvine. He does research in the intersection of computer vision and machine learning. More specifically, he is interested in selfsupervised representation learning and the adversarial robustness of deep models.

Video Link
Evangelos Papalexakis, UC Riverside
Tensor Decompositions for Multi-Aspect Graph Analytics and Beyond
Abstract Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. In this talk, we will demonstrate the effectiveness of tensor decompositions in modeling and mining multi-aspect graphs. Finally, we conclude with very recent results that demonstrate the effectiveness of tensor methods in alleviating state-of-the-art adversarial attacks in Deep Neural Networks.

Bio: Evangelos (Vagelis) Papalexakis is an Associate Professor of the CSE Dept. at the University of California, Riverside. He received his Ph.D. degree at the School of Computer Science at Carnegie Mellon University (CMU). Prior to CMU, he obtained his Diploma and MSc in Electronic & Computer Engineering at the Technical University of Crete, in Greece. Broadly, his research interests span the fields of Data Science, Machine Learning, Artificial Intelligence, and Signal Processing. His research involves designing interpretable models and scalable algorithms for extracting knowledge from large multi-aspect datasets, with specific emphasis on tensor factorization models, and applying those algorithms to a variety of real-world problems, including detection of misinformation on the Web, explainable AI, and gravitational wave detection. His work has appeared in top-tier conferences and journals, and has attracted a number of distinctions, including the 2017 SIGKDD Dissertation Award (runner-up), several paper awards, the NSF CAREER award, and the 2021 IEEE DSAA Next Generation Data Scientist Award.

Video Link
Zsolt Kira, Georgia Tech
Handling Distribution Shift in Visual Learning
Abstract While deep learning has achieved remarkable computer vision successes, fundamentally both the theory and practice for these successes have relied on vanilla supervised learning where the training and testing datasets both are sampled from the same distribution. In reality, there is likely to be a significant distribution shift once models are deployed, including noise/weather/illumination/modality changes (covariate shift), new categories (semantic shift), or different label distributions. In this talk, I will present our recent work focusing on the fundamental handling of several of these shifts. For label distribution shifts, we propose a posterior-recalibration of classifiers that can be applied without re-training to handle imbalanced datasets. For covariate and semantic shift, we propose a geometric decoupling of classifiers into feature norms and angles, showing that it can be used to learn more sensitive feature spaces for better calibration and out-of-distribution detection. We demonstrate state-of-art results across multiple benchmark datasets and metrics. In the end, I will present connections to a wider set of problems including continual/lifelong learning, open-set discovery, and semi-supervised learning.

Bio: Zsolt Kira is an Assistant Professor at the Georgia Institute of Technology and Associate Director of Georgia Tech’s Machine Learning Center. His work lies at the intersection of machine learning and artificial intelligence for sensor processing, perception, and robotics. Current projects and interests relate to moving beyond the current limitations of supervised machine learning to tackle un/self-/semi-supervised methods, out-of-distribution detection, model calibration, learning under imbalance, continual/lifelong learning, and adaptation. Prof. Kira has grown a portfolio of projects funded by NSF, ONR, DARPA, and the IC community, has over 45 publications in top venues, and has received several best paper/student paper awards.

Video Link
Umut Şimşekli, INRIA
Towards Building a Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Abstract In this talk, I will focus on the 'tail behavior' of SGD in deep learning. I will first empirically illustrate that heavy tails arise in the gradient noise (i.e., the difference between the stochastic gradient and the true gradient). Accordingly, I will propose to model the gradient noise as a heavy-tailed α-stable random vector and accordingly propose to analyze SGD as a discretization of a stochastic differential equation (SDE) driven by a stable process. As opposed to classical SDEs that are driven by a Brownian motion, SDEs driven by stable processes can incur ‘jumps’, which force the SDE (and its discretization) transition from 'narrow minima' to 'wider minima', as proven by existing metastability theory and the extensions that we proved recently. These results open up a different perspective and shed more light on the view that SGD 'prefers' wide minima. In the second part of the talk, I will focus on the generalization properties of such heavy-tailed SDEs and show that the generalization error can be controlled by the Hausdorff dimension of the trajectories of the SDE, which is closely linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of capacity metric. Finally, if time permits, I will talk about the 'originating cause' of such heavy-tailed behavior and present theoretical results which show that heavy-tails can even emerge in very sterile settings such as linear regression with i.i.d Gaussian data.

Bio: Umut Şimşekli is a tenured Research Faculty at Inria Paris and Ecole Normale Superieure de Paris. He received his Ph.D. degree in 2015 from Bogaziçi University, İstanbul. During 2016-2020, he was affiliated with the Signals, Statistics, and Machine Learning Group at Telecom Paris as an associate professor and he visited the University of Oxford, Department of Statistics during the 2019-2020 academic year. He is a laureate of the European Research Council (ERC) Starting Grant 2021 and his current research interests are in the theory of deep learning.

Video Link

/// Older talks can be found in Archives.