Happening Today

Petar Veličković, DeepMind / University of Cambridge
Reasoning Algorithmically: from Toy Experiments to AGI Modules
Abstract Neural networks that are able to reliably execute algorithmic computation may hold transformative potential to both machine learning and theoretical computer science. On one hand, they could enable the kind of extrapolative generalisation scarcely seen with deep learning models. On another, they may allow for running classical algorithms on inputs previously considered inaccessible to them. Over the past few years, the pace of development in this area has gradually become intense. As someone who has been very active in its latest incarnation, I have witnessed these concepts grow from isolated 'toy experiments', through NeurIPS spotlights, all the way to helping detect patterns in complicated mathematical objects (published on the cover of Nature) and supporting the development of generalist reasoning agents. In this talk, I will give my personal account of this journey, and especially how our own interpretation of this methodology, and understanding of its potential, changed with time. It should be of interest to a general audience interested in graphs, (classical) algorithms, reasoning, and building intelligent systems.

Bio: Petar is a Staff Research Scientist at DeepMind, an Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. He holds a PhD in C.S from the University of Cambridge, working with Pietro Liò. His research concerns Geometric Deep Learning and has been featured in various top-tier conferences and news outlets. Currently, Petar focusing on Graph Representation Learning and its applications in Algorithmic Reasoning. He is also recognized as an ELLIS Scholar.
(*): incoming

Upcoming Talks

Yang Song, OpenAI / Caltech (*)
Abstract TBD
Durk Kingma, Google Research
Abstract TBD
Jason Wei, OpenAI
Abstract TBD
Johannes Brandstetter, Microsoft Research
Abstract TBD
Taco Cohen, Qualcomm AI Research
Abstract TBD
Jian Tang, HEC Montreal / Mila
Abstract TBD
(*): incoming

Previous Talks

Hieu Pham, Google Research
Deep Learning After the Transformer
Abstract The field of machine learning has been through several exciting moments – kernel methods, Bayesian inference, non-parametric methods, to name a few. Every time a new approach pushed an existing limit, people wondered if the approach was “the best”. In our time of 2023, the Transformer is prevalent. Hardly can one find a research paper that does not mention this immensely successful model. But is the Transformer the best neural architecture? If so, can we explain why? If not, how can we improve it; more ambitiously, how can we make something better than it? In this talk, I invite you to contemplate these questions. I share my insights on the properties of the Transformer that make it favorable or not favorable for certain domains and tasks. Based on these insights, I discuss the potential directions for subsequent developments. I will discuss some recent work from my group that makes learning algorithms more efficient, with or without the Transformer.

Bio: Hieu Pham is a Research Scientist at Google Brain. He is currently focusing on improving the efficiency for large vision and language models. Before joining Google, Hieu received his Ph.D. from Carnegie Mellon University (CMU), where he worked on various AutoML projects. His work provided the foundation for one-shot neural architecture search which reduced the cost of AutoML algorithms by several orders of magnitude.
Thomas Beckers, Vanderbilt University
Safe Learning-based Control of Mechanical Systems
Abstract In modern technologies such as autonomous vehicles and service robots, control engineering plays a crucial role for the overall performance and safety of the system. However, the control design becomes often very time-consuming or even infeasible due to the increasing complexity of mechanical systems. The classical control approaches, which are based on models of the systems using first principles, are not satisfactory in the presence of complex dynamics, e.g., for highly nonlinear systems or interaction with prior unknown environment. Recent findings in computational intelligence and machine learning have shown that data-driven approaches lead to very promising results in a wide application domain including the modeling of complex dynamics. However, the major drawback in data-driven approaches frequently manifests as unpredictable outcomes. Therefore, the current application of machine learning in control is typically limited to non-critical and low performance systems. In this talk, I will present our results on safe learning-based control of partially unknown mechanical systems. In the first part of the seminar, I will show how we leverage Gaussian processes for the learning of unknown dynamics in the system. Gaussian process (GP) models are of high interest due to many beneficial properties such as the bias-variance trade-off and the strong connection to Bayesian mathematics. We exploit the Bayesian structure to include prior knowledge about the system into the learning process. In the second part, I will present a learning-enhanced model-based control law which guarantees safe control of mechanical systems with partially unknown dynamics. This control law combines the strength of model-based control with the flexibility of machine learning techniques. I demonstrate how we actively exploit the uncertainty of the GP model to guarantee high-performance and stability of the closed-loop.

Bio: Thomas Beckers is an Assistant Professor of Computer Science and the Institute for Software Integrated Systems at Vanderbilt University. Before joining Vanderbilt, he was a postdoctoral researcher at the Department of Electrical and Systems Engineering, University of Pennsylvania, where he was member of the GRASP Lab, PRECISE Center and ASSET Center. In 2020, he earned his doctorate in Electrical Engineering at the Technical University of Munich (TUM), Germany. He received the B.Sc. and M.Sc. degree in Electrical Engineering in 2010 and 2013, respectively, from the Technical University of Braunschweig, Germany. In 2018, he was a visiting researcher at the University of California, Berkeley. He is a DAAD AInet fellow and was awarded with the Rhode & Schwarz Outstanding Dissertation price. His research interests include physics-enhanced learning, nonparametric models, and safe learning-based control.
Guanya Shi, University of Washington / CMU (*)
Neural-Control Family: Safe Agile Deep-learning-based Robotic Control in Dynamic Environments
Abstract Recent breathtaking advances in machine learning beckon to their applications in a wide range of autonomous systems. However, for safety-critical settings such as agile robotic control in hazardous environments, we must confront several key challenges before widespread deployment. Most importantly, the learning system must interact with the rest of the autonomous system (e.g., highly nonlinear and non-stationary dynamics) in a way that safeguards against catastrophic failures with formal guarantees. In addition, from both computational and statistical standpoints, the learning system must incorporate prior knowledge for efficiency and generalizability. In this talk, I will present progress toward establishing a unified framework that fundamentally connects learning and control. In particular, I will introduce a concrete example in such a unified framework called Neural-Control Family, a family of deep-learning-based nonlinear control methods with not only stability and robustness guarantees but also new capabilities in agile robotic control. For example, Neural-Swarm enables close-proximity flight of a drone swarm and Neural-Fly enables precise drone control in strong time-variant wind conditions.

Bio: Guanya Shi is an incoming (Fall 2023) Assistant Professor at the Robotics Institute and the School of Computer Science at Carnegie Mellon University (CMU). He is currently a postdoctoral scholar at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. He completed his Ph.D. in 2022 from Caltech and received a B.E. from Tsinghua University in 2017. He is broadly interested in the intersection of machine learning and control theory, spanning the entire spectrum from theory to real-world agile robotics. Guanya was the recipient of several awards, including the Simoudis Discovery Prize and the Ben P.C. Chou Doctoral Prize from Caltech, and the Rising Star in Data Science from the University of Chicago.
Thomas Kipf, Google Research
Structured Scene Understanding: Objects, Dynamics, 3D
Abstract The world around us — and our understanding of it — is rich in compositional structure: from atoms and their interactions to objects and agents in our environments. How can we learn scalable models of the physical world that capture this structure from raw, unstructured observations? In this talk, I will cover our team’s recent work on structured scene understanding: I will introduce an emergent class of slot-centric neural architectures that use a set of latent variables (“slots”) grounded in the physical scene. Slots are decoupled from the image grid and can learn to capture objects or more fine-grained scene components, model their dynamics, and learn 3D-consistent representations when a scene is observed from multiple viewpoints. I will briefly introduce the Slot Attention mechanism as a core representative for this class of models and cover recent extensions to video (SAVi, SAVi++), 3D (OSRT), and visual dynamics simulation (SlotFormer).

Bio: Thomas Kipf is a Senior Research Scientist at Google Brain in Amsterdam. His research focuses on developing machine learning models that can reason about the rich structure of the physical world. He obtained his PhD from the University of Amsterdam with a thesis on “Deep Learning with Graph-Structured Representations”, advised by Max Welling. He was recently elected as an ELLIS Scholar and received the ELLIS PhD Award.
Brandon Amos, Meta AI
Learning with differentiable and amortized optimization
Abstract Optimization has been a transformative modeling and decision-making paradigm over the past century that computationally encodes non-trivial reasoning operations. Developments in optimization foundations alongside domain experts have resulted in breakthroughs for 1) controlling robotic, autonomous, mechanical, and multi-agent systems, 2) making operational decisions based on future predictions, 3) efficiently transporting or matching resources, information, and measures, 4) allocating budgets and portfolios, 5) designing materials, molecules, and other structures, 6) solving inverse problems to infer underlying hidden costs, incentives, geometries, terrains, and other structures, and 7) learning and meta-learning the parameters of predictive and statistical models. These settings often analytically specify the relevant models of the world along with an explicit objective to optimize for. Once these are specified, computational optimization solvers are able to search over the space of possible solutions or configurations and return the best one. The magic of optimization stops when 1) the relevant models of the world are too difficult or impossible to specify, leading to inaccurate or incomplete representations of the true setting, and 2) solving the optimization problem is computationally challenging and takes too long to return a solution on today's hardware. Machine learning methods help overcome both of these by providing fast predictive models and powerful latent abstractions of the world. In this talk, I will cover two ways of tightly integrating optimization and machine learning methods: 1. *Differentiable optimization* characterizes how the solution to an optimization problem changes as the inputs change. In machine learning settings, differentiable optimization provides an implicit layer that integrates optimization-based domain knowledge into the model and enables unknown parts of the optimization problem to be learned. I will cover the foundations of learning these layers with implicit differentiation and highlight applications in robotics and control settings. 2. *Amortized optimization* rapidly predicts approximate solutions to optimization problems and is useful when repeatedly solving optimization problems. Traditional optimization methods typically solve every new problem instance from scratch, ignoring shared structures and information when solving a new instance. In contrast, a solver augmented with amortized optimization learns the shared structure present in the solution mappings and better-searches the domain. I will cover the foundations of amortized optimization and highlight new applications in control and optimal transport.

Bio: Brandon Amos is a Research Scientist in Meta AI’s Fundamental AI Research group in NYC. He holds a PhD in Computer Science from Carnegie Mellon University and was supported by the USA National Science Foundation Graduate Research Fellowship (NSF GRFP). Prior to joining Meta, he has worked at Adobe Research, DeepMind, and Intel Labs. His research interests are in machine learning and optimization with a recent focus on reinforcement learning, control, optimal transport, and geometry.
Nhat Ho, UT-Austin
Hierarchical and Sequential Perspectives on Sliced Wasserstein Distance
Abstract From its origins in work by Monge and Kantorovich, the Wasserstein distance has played an important role in the theory of mathematics. In the current era, the strong and increasing connection between optimization and machine learning has brought new applications of the Wasserstein distance to the fore. In these applications, the focus is on learning the probability distributions underlying the Wasserstein distance formulation. However, the Wasserstein distance has been known to suffer from expensive computation and the curse of dimensionality. It creates several hurdles of using the Wasserstein distance in statistical machine-learning applications. A well-known approach to overcome the statistical and computational limits of the Wasserstein distance is by projecting the probability distributions into the one-dimensional manifold, which refers to as the sliced Wasserstein distance. The sliced Wasserstein distance leverages the closed-form expression of the Wasserstein distance in one dimension; therefore, its computational complexity is only linear in the number of supports of the probability distributions while the statistical rate is parametric for learning probability distributions. Despite these advantages of the sliced Wasserstein distance, it still suffers from two fundamental challenges in large-scale high dimensional statistical machine learning settings: (1) High projection complexities, namely, the number of projections to approximate the value of the sliced Wasserstein distance is huge and scales with the dimension of the problem; (2) Uninformative projecting directions, namely, there are several redundant projections to approximate the value of the sliced Wasserstein distance In this talk, we propose two fundamental approaches to tackle the above challenges of the sliced Wasserstein distance. Our first approach hierarchically projects probability measures into low-dimensional spaces before projecting them into one-dimensional space. The hierarchical projections lead to an improvement in projection complexity and enhance the expressiveness of the projection of the sliced Wasserstein distance. Our second approach considers sequential sampling for projecting directions to allow the sharing of information on new projecting directions based on the previous directions. It increases the quality of projections in terms of highlighting the difference between the probability measures and leads to a smaller number of projections, which improves the computational complexity of the sliced Wasserstein distance.

Bio: Nhat Ho is currently an Assistant Professor of Data Science, Machine Learning, and Statistics at the University of Texas at Austin. He is a core member of the University of Texas Austin Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning. A central theme of his research focuses on four important aspects of complex and large-scale models and data: (1) Interpretability, efficiency, and robustness of deep learning and complex machine learning models, including Transformer architectures, Deep Generative Models, Convolutional Neural Networks, etc.; (2) Scalability of Optimal Transport for machine learning and deep learning applications; (3) Stability and optimality of optimization and sampling algorithms for solving complex statistical machine learning models; (4) Heterogeneity of complex data, including mixture and hierarchical models, Bayesian nonparametrics.
Animesh Garg, NVIDIA / UofT / Georgia Tech (*)
Building Blocks of Generalizable Autonomy: Duality of Discovery & Bias
Abstract Generalization in embodied intelligence, such as in robotics, requires interactive learning across families of tasks is essential for discovering efficient representation and inference mechanisms. Concurrent systems need a lot of hand-holding to even learn a single cognitive concept or a dexterous skill, say “open a door”, let alone generalizing to new windows and cupboards! This is far from our vision of everyday robots! would require a broader concept of generalization and continual update of representations. This study of the science of embodied AI opens three key questions: (a) Representational biases & Causal inference for interactive decision-making, (b) Perceptual representations learned by and for interaction, and (c) Systems and abstractions for scalable learning.

Bio: Animesh Garg is a Stephen Fleming Early Career Professor at the School of Interactive Computing at Georgia Tech. He leads the People, AI, and Robotics (PAIR) research group. He is on the core faculty in the Robotics and Machine Learning programs. Animesh is also a Senior Researcher at Nvidia Research. Animesh earned a Ph.D. from UC Berkeley and was a postdoc at the Stanford AI Lab. He is on leave from the department of Computer Science at the University of Toronto and the CIFAR Chair position at the Vector Institute. His work aims to build Generalizable Autonomy which involves a confluence of representations and algorithms for reinforcement learning, control, and perception. He currently studies three aspects: learning structured inductive biases in sequential decision-making, using data-driven causal discovery, and transfer to real robots — all in the purview of embodied systems.
Parinaz Naghizadeh, OSU
Social Bias Meets Data Bias: Biased Training Data and Fair AI
Abstract Biases in existing training datasets used in algorithmic decision making, which can arise due to, e.g., prior labeling or feature measurement errors, raise ethical and economic concerns due to the resulting disparate treatment of different groups. In this talk, we will first investigate the robustness of a few existing (demographic) fairness criteria when the algorithm is trained on biased data. We show, both analytically and numerically, that some constraints can remain robust when facing certain forms of statistical bias in the training data. I will then briefly talk about an algorithm for sequential debiasing of such datasets through adaptive and bounded exploration. This is joint work with Yiqiao Liao, Yifan Yang, and Yang Liu.

Bio: Parinaz Naghizadeh is an assistant professor in the Integrated Systems Engineering and Electrical and Computer Engineering departments at The Ohio State University. Prior to joining OSU in 2019, she was a postdoctoral researcher at Purdue University and Princeton University. She received her PhD in electrical engineering from the University of Michigan in 2016. Her research interests are in network economics, game theory, algorithmic economics, and reinforcement learning. She is a recipient of the NSF CAREER award in 2022, a Rising Stars in EECS in 2017, and a Barbour Scholarship in 2014.
Hua Wei, New Jersey Institute of Technology
Towards Actionable Decision-Making in the Real World
Abstract This talk presents how to utilize data and advanced learning methods for actionable decision-making in the real world. This talk will use the decision-making in the city as a running example, firstly examining why today we have the opportunity for a potential breakthrough in actionable decision-making. Second, the talk presents our research results in reinforcement learning for traffic signal control which are published in KDD, AAAI, and CIKM conferences. Finally, I would like to discuss the open challenges in this research topic, its implications for actionable decision-making, and our preliminary efforts in addressing these challenges.

Bio: Hua Wei is an assistant professor in the Department of Informatics at the New Jersey Institute of Technology (NJIT). He obtained his Ph.D. from the Pennsylvania State University. His research interests include reinforcement learning, data mining, and urban computing. His papers have been published at high-impact venues (e.g., NeurIPS, KDD, AAAI, IJCAI, CIKM, ECML-PKDD, etc.). His research has been awarded the Best Applied Data Science Paper Award at ECML-PKDD 2020 and funded by NSF and the Department of Energy.

Video Link
Ziv Goldfeld, Cornell University
Statistical and Computational Aspect of Sliced Optimal Transport
Abstract As machine learning/inference tasks boil down to comparing or transforming complicated probability distributions, optimal transport (OT) theory---which provides a potent framework for doing so---has emerged as a tool of choice for design and analysis. Its adoption was driven by an array of favorable properties, including robustness to support mismatch, a powerful duality theory, and the Wasserstein metric it defines on the space of probability measures, which endows it with a rich geometry. Alas, statistical OT is bottlenecked by the curse of dimensionality, whereby quantitative results either deteriorate exponentially with dimension or are largely unavailable (e.g., limit theorems, resampling, efficiency). In turn, resulting performance bounds for OT-based learning methods are often vacuous or, worse yet, missing. Slicing is a modern regularization technique by which one computes the average/maximized OT distance between different low-dimensional projections of the high-dimensional distributions. This framework inherits many structural properties of classical OT but alleviates the empirical curse of dimensionality. This talk will present recent advancements in the statistical and computational analysis of sliced OT methods. We will cover fast empirical convergence rates, high-dimensional limit distribution theorems, as well as formal guarantees for computational methods such as Monte Carlo integration (for average-slicing) and projected subgradient methods (for max-slicing). Applications to implicit generative modeling will be discussed and serve to motivate the statistical exploration.

Bio: Ziv Goldfeld is an assistant professor in the School of Electrical and Computer Engineering, and a graduate field member in Computer Science, Statistics, Data Science, and the Center of Applied Mathematics, at Cornell University. Before joining Cornell, he was a postdoctoral research fellow in LIDS at MIT. Ziv graduated with a B.Sc., M.Sc., and Ph.D. (all summa cum laude) in Electrical and Computer Engineering from Ben Gurion University, Israel. Ziv’s research interests include optimal transport theory, statistical learning theory, information theory, and mathematical statistics. He seeks to understand the theoretical foundations of modern inference and information processing systems by formulating and solving mathematical models. Honors include the NSF CAREER Award, the IBM University Award, and the Rothschild Postdoctoral Fellowship.

Video Link
Baharan Mirzasoleiman, UCLA
Coresets for Efficient and Robust Learning from Massive Datasets
Abstract Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes, noisy labels, and malicious data points. In such cases, training on the entire data does not result in a high-quality model. In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the most informative subsets for learning from massive datasets. Training on such subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels and data poisoning attacks. I will discuss how we can develop effective and theoretically rigorous techniques that provide strong guarantees for the learned models’ quality and robustness against noisy labels.

Bio: Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at University of California Los Angeles. Baharan’s research focuses on developing new methods that enable efficient and robust learning from massive datasets. She received her PhD from ETH Zurich, and was a Postdoc at Stanford University. She was awarded an ETH medal for Outstanding Doctoral Dissertation, and a Google Anita Borg Memorial Scholarship. She was also selected as a Rising Star in EECS from MIT, and received an NSF Career Award.

Video Link
Chen Feng, NYU
3D Deep Learning for Soft Robotics and Self-Driving
Abstract Deep learning on 3D data like point clouds offers many new possibilities for robotics and self-driving. It leads to efficient tools to represent complex objects and scenes in the 3D world which robots and autonomous vehicles need to interact with. In this talk, I will discuss my group's work on both object-level and scene-level 3D deep learning. At the object level, I will explain FoldingNet (CVPR'18), a 3D point cloud auto-encoder that essentially resembles the paper-folding operations in its lightweight decoder with better shape reconstruction performance. This new decoder can address a challenging robotics task: soft robot proprioception. At the scene level, I will explain DiscoNet (NeurIPS'21), an efficient collaborative perception method using a dynamic directed graph with matrix-valued edge weights for an ego-vehicle to adaptively retrieve the most important complementary information from its neighboring vehicles. This could improve LiDAR-based perception's performance and robustness in self-driving against challenges such as data sparsity and occlusions. At last, I will briefly introduce our new public dataset V2X-Sim (RA-L'22), to facilitate research in 3D (and 2D) deep learning for collaborative perception.

Bio: Dr. Chen Feng is an assistant professor at NYU, appointed across departments including civil and mechanical engineering and computer science. His lab AI4CE (pronounced as A-I-force) aims to advance robot vision and machine learning through multidisciplinary use-inspired research that originates from engineering domains. Before NYU, Chen was a research scientist in the computer vision group at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA, focusing on localization, mapping, and deep learning for self-driving cars and robotics. Chen holds a Bachelor's degree in geospatial engineering from Wuhan University in China, and a master’s degree in electrical engineering and a Ph.D. in civil engineering, both from the University of Michigan at Ann Arbor. While publishing in and reviewing for prestigious AI/Robotics venues like CVPR/ICCV/ICRA/IROS, Chen also serves as an associate editor for IEEE Robotics and Automation Letters (RA-L). More information on his research can be found at

Video Link
Daniel Moyer, Vanderbilt University
Invariant Representations
Abstract The removal of unwanted information is a surprisingly common task. Removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation all require the selective removal of information. In this talk, I will describe a method for constructing such representations by minimizing mutual information in a variational setting. This path also provides insight into adversarial methods and their training schema. We will then discuss applications and implications in multi-site MRI, style transfer, and fair representation.

Bio: Daniel Moyer will join the Computer Science Department at Vanderbilt University for the Fall 2022 semester as an Assistant Professor. Previously, he was a post-doc in CSAIL at MIT, working with Prof. Polina Golland on fetal MRI. He received his doctorate in 2019 from the University of Southern California under Paul Thompson and Greg Ver Steeg, where he worked on representation learning problems in diffusion MRI and neuroimaging.

Video Link
Kayhan Batmanghelich, University of Pittsburgh
Bridging between AI Models & Medical Insights: Learning, Inference, & Model Explanation Applications
Abstract The healthcare industry is arriving at a new era where the medical communities increasingly employ computational medicine and machine learning. Despite significant progress in the modern machine learning literature, adopting the new approaches has been slow in the biomedical and clinical research communities due to the lack of explainability and limited data. Such challenges present new opportunities to develop novel methods that address AI's unique challenges in medicine. This talk has three parts. In the first part of the talk, I show examples of model explainability (XAI) tailored toward AI in Radiology applications. More specifically, I integrate ideas from causal inference for XAI (e.g., counterfactual, mediation analysis). The second part presents examples of incorporating medical insight for self-supervised learning of imaging phenotype. Finally, I address the issue of partial missingness (a common problem using clinical data) in imaging genetics for statistical independence tests.

Bio: Kayhan Batmanghelich is an Assistant Professor of the Department of Biomedical Informatics and Intelligent Systems Program with secondary appointments in the Electrical and Computer Engineering and the Computer Science Department at the University of Pittsburgh. He received his Ph.D. from the University of Pennsylvania (UPenn) under the supervision of Prof. Ben Taskar and Prof. Christos Davatzikos. He spent three years as a postdoc in Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. Polina Golland. His research is at the intersection of medical vision, machine learning, and bioinformatics. His group develops machine learning methods that address the interesting challenges of AI in medicine, such as explainability, learning with limited and weak data, and integrating medical image data with other biomedical data modalities. His research is supported by awards from NIH and NSF and industry-sponsored projects.

Video Link
Nick Cheney, University of Vermont
A Case for an Embodied Intelligence Perspective on Neural Architecture Search
Abstract Neural Architecture Search (NAS) aims to find the optimal structure of deep neural network. Various approaches to the design of network architectures have been proposed in recent years. In this talk, I'll discuss how we might draw inspiration from the design of shape and form in biological systems to find complex and adaptable neural network designs. Specifically, I'll conjecture about how recent methods and principles from embodied cognition and evolutionary robotics may be translated into an embodied perspective on NAS.

Bio: Nick Cheney is an Assistant Professor of Computer Science at the University of Vermont, where he directs the UVM Neurobotics Lab and is a core member of the Complex Systems and Data Science program. Prior to Vermont, Nick received a Ph.D. in Computational Biology from Cornell, co-advised by Hod Lipson and Steve Strogatz, and was a postdoctoral researcher at the University of Wyoming working with Jeff Clune (now at OpenAI and the University of British Columbia). He has also served as a visiting researcher at the Santa Fe Institute, NASA Ames, and Columbia University. Nick's research aims to lower the barrier to machine learning by producing more robust, scalable, and self-configurable neural network algorithms and architectures -- with a specific focus on meta-learning methods.

Video Link
Suraj Srinivas, Harvard University
Pitfalls of Saliency Map Interpretation in Deep Neural Networks
Abstract A popular method of interpreting neural networks is to use saliency map representations, which assign importance scores to each input feature of the model. In this talk, I will discuss two of our works that expose pitfalls in these methods. First, we will discuss how existing saliency maps cannot satisfy two desirable properties simultaneously and propose the “full-gradient representation” which avoids these problems. Based on this representation, we propose an approximate saliency method called FullGrad which we find explains model behavior better than competing methods in the literature. Second, we find that a popular saliency map method, the input-gradients, can be arbitrarily structured due to the shift-invariance of SoftMax. We investigate why standard neural network models have input-gradients with interpretable structure even when this is unnecessary, and we find that standard models have an implicit generative modeling component, which is responsible for this behavior. Overall, our works show that interpreting black-box models using off-the-shelf interpretability methods can be risky and must be used with caution.

Bio: Suraj Srinivas is a postdoctoral research fellow at Harvard University where he works with Prof. Hima Lakkaraju on the foundations of interpretable deep learning. He completed his Ph.D. at Idiap Research Institute & EPFL in Switzerland, advised by Prof. François Fleuret. His Ph.D. thesis on the pitfalls of gradient-based explanation methods in deep learning received the EPFL thesis distinction award in electrical engineering. His research interests are interpretability, robustness, and compression of deep neural networks.

Video Link
Hossein Mobahi, Google Research
Sharpness-Aware Minimization (SAM): Current Method and Future Directions
Abstract In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a new and effective procedure for instead simultaneously minimizing loss value and loss sharpness. Our procedure, Sharpness- Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the art procedures that specifically target learning with noisy labels. Finally, we will discuss possible directions for further research around SAM.

Bio: Hossein Mobahi is a senior research scientist at Google Research. His current interests revolve around the interplay between optimization and generalization in deep neural networks. Prior to joining Google in 2016, he was a postdoctoral researcher at CSAIL of MIT. He obtained his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC).

Video Link
Xiaorui Liu, North Carolina State University
Communication-Efficient Distributed Machine Learning
Abstract The success of modern AI systems relies on large-scale machine learning on big data. Distributed machine learning systems provide the computational infrastructure for such success by utilizing the parallel computation power of massive computation devices. However, the scalability and efficiency of these systems are greatly limited by the high communication cost between the devices. In this talk, I will discuss how to design communication-efficient distributed ML algorithms. Specifically, I will introduce novel decentralized algorithms with communication compression that reduce 95% of the communication bits without sacrificing the convergence complexities. These algorithms fundamentally improve the efficiency of large-scale ML both theoretically and numerically.

Bio: Xiaorui Liu is an incoming assistant professor in the Computer Science Department at North Carolina State University starting from 2022 Fall. He will get his Ph.D. degree from Michigan State University advised by Prof. Jiliang Tang. His research interests include distributed and trustworthy machine learning, with a focus on big data and graph data. He was awarded the Best Paper Honorable Mention Award at ICHI 2019, MSU Engineering Distinguished Fellowship, and Cloud Computing Fellowship. He organized and co-presented five tutorials in KDD 2021, IJCAI 2021, ICAPS 2021, and WWW 2022, and he has published innovative works in top-tier conferences such as NeurIPS, ICML, ICLR, KDD, AISTATS, and SIGIR.

Video Link
Dongkuan Xu, North Carolina State University
Resource-efficient Deep Learning: Democratizing AI at Scale
Abstract The phenomenal success of deep learning in the past decade has been mostly driven by the construction of increasingly large deep neural network models. These models usually impose an ideal assumption that there are sufficient resources, including large-scale parameters, sufficient data, and massive computation, for the optimization. However, this assumption usually fails in real-world scenarios. For example, computer memory may be limited as in edge devices, large-scale data are difficult to obtain due to expensive costs and privacy constraints, and computational power is constrained as in most university labs. As a result, these resource discrepancy issues have hindered the democratization of deep learning techniques in many AI applications, and the development of efficient deep learning methods that can adapt to different resource constraints is of great importance. In this talk, I will present my recent research contributions centered around resource-efficient deep learning to free AI from the parameterdata-computation hungry beast. First, I will introduce my contribution on neural network pruning under the pretrain-then-finetune paradigm, which improves the parameter efficiency of large-scale language models in the inference phase, resulting in pruned models with an order-of-magnitude fewer parameters than the original model while achieving the same or better prediction accuracy. Then, I will talk about my task-agnostic neural architecture search framework to reduce the computational cost in the training phase for finding the best-pruned models, which is complementary to improving the parameter efficiency in the inference phase. Finally, I will conclude my presentation with a brief overview of my ongoing and future work as part of a broader research agenda of new and related problems and potential collaborations in the next few years.

Bio: Dongkuan (DK) Xu is an incoming Assistant Professor in the CS Department at NC State. DK will get his Ph.D. at Penn State in June 2022 under the supervision of Dr. Xiang Zhang. His research interest is resource-efficient deep learning for AI at scale. DK has published more than 25 papers in top conferences and journals, including NeurIPS, AAAI, ACL, NAACL, and IJCAI. He has served as a PC member for over 28 major conferences and 14 journals. DK also has extensive research experience in the industry. He has interned at Microsoft Research Redmond, Moffett AI, and NEC Labs America, and holds 8 US patents/applications.

Video Link
Soheil Kolouri, Vanderbilt University
Brain-Inspired Lifelong Learning Machines
Abstract The next wave of AI demands a new type of machine learning framework that can continually learn and adapt to the stream of nonstationary multimodal information. This challenge is referred to as continual, lifelong, or incremental learning in the ML community. Since humans and primates are our best examples of lifelong learners, we believe that a better understanding of the biological underpinnings that support continual learning could be instrumental in advancing continual machine learning. In this talk, we first characterize continual learning as a multi-faceted problem and enumerate some of the known biological mechanisms in the brain that contribute to these characteristics. We then draw connections between existing AI/ML solutions for continual learning and known biological mechanisms and lay a road map for next-generation lifelong machine learners. Finally, we present some of our recent work toward advancing the field of continual learning with a focus on meta-plasticity and neuromodulation.

Bio: Soheil Kolouri is an Assistant Professor of Computer Science at Vanderbilt University, Nashville, TN, and the director of Machine Intelligence and Neural Technologies (MINT) lab. His research interests include continual learning, bio-inspired machine learning, geometric deep learning, and computational optimal transport. Before joining Vanderbilt University, he was a research scientist and principal investigator at HRL Laboratories, Malibu, CA, where he was the PI and the Co-PI on multiple DARPA programs involving next-generation machine learning. Soheil obtained his Ph.D. in Biomedical Engineering from Carnegie Mellon University where he received the Bertucci Fellowship Award for outstanding graduate students from the College of Engineering in 2014 and the Outstanding Dissertation Award from the Biomedical Engineering Department in 2015.

Video Link
Matthias Fey, TU Dortmund University
Auto-Scaling GNNs
Abstract In this talk, we will take a theoretical and practical look at scaling Neural Networks (GNNs) up to massive graphs, based on our GNNAutoScale (GAS) framework. GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption with respect to input node size without dropping any data. While existing solutions weaken the expressive power of message passing due to sub-sampling of edges or non-trainable propagations, our approach is provably able to maintain the expressive power of the original GNN. We further discuss challenges regarding its implementation within our PyTorch Geometric (PyG) library and verify its practical benefits on a variety of large graph benchmark datasets.

Bio: Matthias Fey is a fourth-year Ph.D. student at the computer graphics lab at the TU Dortmund University, Germany, and a co-founder of which aims to make state-of-the-art GNN solutions readily available to large-scale data warehouses. His main area of research lies in the development of new deep learning methods that can be directly applied to unstructured data such as graphs, point clouds, and manifolds. Furthermore, he is the creator of the PyTorch Geometric (PyG) library, which aims to bundle many of the proposed methods in this area to make research more accessible, comparable, and reproducible, and is a core member of the Open Graph Benchmark (OGB) team. Matthias studied Computer Science at the TU Dortmund where he received his B.Sc. in 2013 and his Master’s degree in 2017.

Video Link
Philipp Petersen, University of Vienna
Optimal Representation and Learning of Classifier Functions
Abstract Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime. We will interpret the learning of classifiers as finding piecewise constant functions from labeled samples. We then precisely link the hardness of the learning problem to the complexity of the regions. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning. In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.

Bio: Philipp Petersen is a tenure-track assistant professor for machine learning at the mathematical institute of the University of Vienna. Before that, he completed a post-doc position at the University of Oxford and did his PhD at the Technical University of Berlin. His research focuses on the interplay of deep neural networks and numerical analysis. Particular foci are the expressivity of various architectures of deep neural networks, structural challenges for the optimization or training of deep neural networks, and the applicability of deep learning in numerical algorithms to solve partial differential equations or inverse problems.

Video Link
Lingfei Wu, JD.COM
Graph Neural Networks: Foundations, Frontiers, and Applications
Abstract The field of graph neural networks (GNNs) has seen rapid and incredible strides over recent years. Graph neural networks, also known as deep learning on graphs, graph representation learning, or geometric deep learning, have become one of the fastest-growing research topics in machine learning, especially deep learning. This wave of research at the intersection of graph theory and deep learning has also influenced other fields of science, including recommendation systems, natural language processing, program synthesis, software mining, cybersecurity, and intelligent transportation. However, as the field rapidly grows, it has been extremely challenging to gain a global perspective of the developments of GNNs. Therefore, we feel the urgency to bridge the above gap and have a comprehensive tutorial on this fastgrowing yet challenging topic. In this talk, we will talk about our recent book titled Graph Neural Networks: Foundation, Frontiers and Applications , one of the most comprehensive books for researchers and practitioners for reading and studying in GNNs. It covers a broad range of topics in graph neural networks, by reviewing and introducing the fundamental concepts and algorithms, new research frontiers, and broad and emerging applications of GNNs.

Bio: Dr. Lingfei Wu is a Principal Scientist at JD.COM Silicon Valley Research Center, leading a team of 30+ ML/NLP scientists and software engineers to build intelligent e-commerce personalization systems. He earned his Ph.D. degree in computer science from the College of William and Mary in 2016. Previously, he was a research staff member at IBM Thomas J. Watson Research Center and led a 10+ research scientist team for developing novel Graph Neural Networks methods and systems, which leads to the #1 AI Challenge Project in IBM Research and multiple IBM Awards including three-time Outstanding Technical Achievement. He was the recipients of the Best Paper Award and Best Student Paper Award of several conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20, and KDD workshop on DLG’19. His research has been featured in numerous media outlets, including NatureNews, YahooNews, Venturebeat, TechTalks, SyncedReview, Leiphone, QbitAI, MIT News, IBM Research News, and SIAM News.

Video Link
Hamed Pirsiavash, UC Davis
Self-Supervised Learning for Visual Recognition
Abstract We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, selfsupervised feature learning methods exploiting unlabeled data can be more scalable and flexible. I will present some of our recent efforts in this direction. More specifically, I will talk about our recent work on using similarity between a random set of images to learn better visual representations and to compress selfsupervised features from deeper models to smaller ones.

Bio: Hamed Pirsiavash is an associate professor at the University of California, Davis. Prior to this, he was an associate professor at the University of Maryland Baltimore County and a postdoctoral research associate at MIT. He obtained his Ph.D. at the University of California, Irvine. He does research in the intersection of computer vision and machine learning. More specifically, he is interested in selfsupervised representation learning and the adversarial robustness of deep models.

Video Link
Evangelos Papalexakis, UC Riverside
Tensor Decompositions for Multi-Aspect Graph Analytics and Beyond
Abstract Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. In this talk, we will demonstrate the effectiveness of tensor decompositions in modeling and mining multi-aspect graphs. Finally, we conclude with very recent results that demonstrate the effectiveness of tensor methods in alleviating state-of-the-art adversarial attacks in Deep Neural Networks.

Bio: Evangelos (Vagelis) Papalexakis is an Associate Professor of the CSE Dept. at the University of California, Riverside. He received his Ph.D. degree at the School of Computer Science at Carnegie Mellon University (CMU). Prior to CMU, he obtained his Diploma and MSc in Electronic & Computer Engineering at the Technical University of Crete, in Greece. Broadly, his research interests span the fields of Data Science, Machine Learning, Artificial Intelligence, and Signal Processing. His research involves designing interpretable models and scalable algorithms for extracting knowledge from large multi-aspect datasets, with specific emphasis on tensor factorization models, and applying those algorithms to a variety of real-world problems, including detection of misinformation on the Web, explainable AI, and gravitational wave detection. His work has appeared in top-tier conferences and journals, and has attracted a number of distinctions, including the 2017 SIGKDD Dissertation Award (runner-up), several paper awards, the NSF CAREER award, and the 2021 IEEE DSAA Next Generation Data Scientist Award.

Video Link
Zsolt Kira, Georgia Tech
Handling Distribution Shift in Visual Learning
Abstract While deep learning has achieved remarkable computer vision successes, fundamentally both the theory and practice for these successes have relied on vanilla supervised learning where the training and testing datasets both are sampled from the same distribution. In reality, there is likely to be a significant distribution shift once models are deployed, including noise/weather/illumination/modality changes (covariate shift), new categories (semantic shift), or different label distributions. In this talk, I will present our recent work focusing on the fundamental handling of several of these shifts. For label distribution shifts, we propose a posterior-recalibration of classifiers that can be applied without re-training to handle imbalanced datasets. For covariate and semantic shift, we propose a geometric decoupling of classifiers into feature norms and angles, showing that it can be used to learn more sensitive feature spaces for better calibration and out-of-distribution detection. We demonstrate state-of-art results across multiple benchmark datasets and metrics. In the end, I will present connections to a wider set of problems including continual/lifelong learning, open-set discovery, and semi-supervised learning.

Bio: Zsolt Kira is an Assistant Professor at the Georgia Institute of Technology and Associate Director of Georgia Tech’s Machine Learning Center. His work lies at the intersection of machine learning and artificial intelligence for sensor processing, perception, and robotics. Current projects and interests relate to moving beyond the current limitations of supervised machine learning to tackle un/self-/semi-supervised methods, out-of-distribution detection, model calibration, learning under imbalance, continual/lifelong learning, and adaptation. Prof. Kira has grown a portfolio of projects funded by NSF, ONR, DARPA, and the IC community, has over 45 publications in top venues, and has received several best paper/student paper awards.

Video Link
Umut Şimşekli, INRIA
Towards Building a Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Abstract In this talk, I will focus on the 'tail behavior' of SGD in deep learning. I will first empirically illustrate that heavy tails arise in the gradient noise (i.e., the difference between the stochastic gradient and the true gradient). Accordingly, I will propose to model the gradient noise as a heavy-tailed α-stable random vector and accordingly propose to analyze SGD as a discretization of a stochastic differential equation (SDE) driven by a stable process. As opposed to classical SDEs that are driven by a Brownian motion, SDEs driven by stable processes can incur ‘jumps’, which force the SDE (and its discretization) transition from 'narrow minima' to 'wider minima', as proven by existing metastability theory and the extensions that we proved recently. These results open up a different perspective and shed more light on the view that SGD 'prefers' wide minima. In the second part of the talk, I will focus on the generalization properties of such heavy-tailed SDEs and show that the generalization error can be controlled by the Hausdorff dimension of the trajectories of the SDE, which is closely linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of capacity metric. Finally, if time permits, I will talk about the 'originating cause' of such heavy-tailed behavior and present theoretical results which show that heavy-tails can even emerge in very sterile settings such as linear regression with i.i.d Gaussian data.

Bio: Umut Şimşekli is a tenured Research Faculty at Inria Paris and Ecole Normale Superieure de Paris. He received his Ph.D. degree in 2015 from Bogaziçi University, İstanbul. During 2016-2020, he was affiliated with the Signals, Statistics, and Machine Learning Group at Telecom Paris as an associate professor and he visited the University of Oxford, Department of Statistics during the 2019-2020 academic year. He is a laureate of the European Research Council (ERC) Starting Grant 2021 and his current research interests are in the theory of deep learning.

Video Link
(*): incoming

/// Older talks can be found in Archives.