Vanderbilt Machine Learning Seminar Series

Announcements

Our talks are open to the public. No registration is required.
Our virtual (Zoom) talks are on Mondays at 12:15 PM CT and typically last for 1 hour (approximately 45 to 50-minute talk plus Q&A).
Join our Google group for discussions and notifications of upcoming talks.

Upcoming Talks

TBD

Jörn-Henrik Jacobsen, Isomorphic Labs

Abstract

TBD

Lifang He, Lehigh University

Abstract

TBD

Previous Talks

03/31/2025

Nuno Moniz, University of Notre Dame

Responsible AI Beyond the Assumption of Normality

Abstract

For over three decades, imbalanced learning has risen to be one of the most challenging issues to cope with in machine learning. Not only is it a typical feature in real-world scenarios, but it has also been demonstrated how it impacts many of the efforts under the umbrella of Responsible AI, particularly in classification settings. In this talk, we'll dive into the more recent concept of imbalanced regression, explore how it can significantly move forward research in several pressing challenges associated with Responsible AI, such as fairness and selective regression, and demonstrate how it can unlock new possibilities for critical real-world applications, such as drug discovery.

Bio: Nuno Moniz is an Associate Research Professor at the Lucy Family Institute for Data and Society. He is also the Director of the Notre Dame-IBM Technology Ethics Lab and the Associate Director of the Data, Inference, Analytics, and Learning Lab. Moniz, who joined the University of Notre Dame in 2022, is an expert on machine learning, investigating challenges such as imbalanced learning, model interpretability, and data privacy, for which he has won multiple awards internationally. He is particularly interested in interdisciplinary efforts to understand the real-world impact of automated systems.

03/17/2025

Eric Nalisnick, Johns Hopkins University

Learning to defer to one, multiple, or a population of expert(s)

Abstract

Artificial intelligence is being deployed in ever more consequential settings such as healthcare and autonomous driving. Thus, we must ensure that these systems are safe and trustworthy. One near-term solution is to ensure that a human is involved in the decision making process and that the system can ask for help in difficult or high-risk scenarios. I will present recent advances in the “learning to defer” paradigm: decision-making responsibility is allocated to either a human or model, depending on who is more likely to take the correction action. In particular, I will present novel formulations that better model the human collaborator’s expertise and that can support multiple human decision makers.

Bio: Eric Nalisnick is an assistant professor at Johns Hopkins University. His research interests span statistical machine learning and probabilistic modeling, with an emphasis on quantifying uncertainty in deep learning, human-AI collaboration, specifying prior knowledge, and detecting distribution shift. He previously was an assistant professor at the University of Amsterdam, a postdoctoral researcher at the University of Cambridge and a PhD student at the University of California, Irvine. Eric has also held research positions at DeepMind, Microsoft, Twitter, and Amazon. His work has received funding from both industrial (Google, Amazon, Bosch) and government (Dutch Research Council) entities, and his papers have been recognized with selective oral presentations (ECCV 2024) and awards (AIStats 2023, AIStats 2024).

03/03/2025

Maximilian Nickel, Meta AI (FAIR) in New York

Epistemic Limits of Model Validation in Complex Systems

Abstract

AI has undergone a dramatic paradigm shift, not only in terms of the impressive capabilities of state-of-the-art models, but also in terms of how they are trained, deployed, and evaluated. Most importantly, AI systems do not exist in a controlled environment anymore (e.g., meticulously collected i.i.d. samples), but interact continuously with social systems, e.g., through training and evaluation data as well as their direct influence on social processes. Crucially, the validity of even our most basic machine learning methods is not guaranteed in this new context. Yet, without valid methodology we cannot ensure the intended outcomes of deployed AI systems nor continue to advance AI research in a scientifically sound way. In this talk, I will therefore argue that we need new theoretical foundations for machine learning and AI that explicitly account for the complex social system with which an AI system interacts or in which it is situated. I will discuss this on the example of the ubiquitous train-test paradigm. While this form of model validation has arguably been one of the single most important contributors to the breathtaking progress in AI, I will show via rigorous impossibility results that it is not valid anymore for key tasks in modern AI under current data collection practices. Based on these insights, I will also introduce a novel cooperative approach to data collection with strong game-theoretical guarantees that can alleviate these issues. I will conclude this talk with a call for increased interdisciplinary work at the intersection of AI theory, methods, and society.

Bio: Max Nickel is a research scientist manager at FAIR, Meta AI where he is leading the AI & Society team and also acted as a research area lead for Machine Learning and Society & Responsible AI. Before joining FAIR, Max was as a postdoctoral fellow at MIT where he was with the Laboratory for Computational and Statistical Learning and the Center for Brains, Minds and Machines. He received his PhD with summa cum laude from the Ludwig Maximilian University Munich as a research assistant at Siemens Corporate Technology. Recently, Max has also acted as Program Chair for ICLR 2023. Max’s research is focused on understanding the interplay of AI and social systems. For this purpose, he is combining machine learning theory and methods with complex systems theory including networks, dynamics, and emergence. Max aims to establish the necessary theoretical and methodological foundations for AI to safely interact with society and to obtain results that have a direct impact on AI practice, methods, and governance.

02/24/2025

Ishan Misra, Meta AI (FAIR)

Movie Gen: A Cast of Media Foundation Models

Abstract

Movie Gen is a cast of media-generation foundation models that enables users to use simple text inputs to generate high-quality videos, personalize or edit them, and add audio. When the generations are evaluated by humans, on all of these tasks Movie Gen establishes new state-of-the-art performance compared to existing solutions. I'll cover the basic innovations in MovieGen around simplified architecture, training objective, scaling data and other design choices that enabled this step change in media generation.

Bio: Ishan Misra is a Research Scientist in Meta's GenAI org. He leads the research efforts at Meta in video generation and was the tech lead for Meta's MovieGen and Emu Video foundational video models. In the past, he worked on self-supervised learning methods such as BarlowTwins, DINO. Ishan was featured in MIT Tech Review's 35 innovators under 35 and is the recipient of Carnegie Mellon's Recent Alumni Achievement Award.

02/10/2025

Pin-Yu Chen, IBM Research

Exploring and Mitigating Safety Risks in Large Language Models and Generative AI

Abstract

Abstract: Large language models (LLMs) and Generative AI (GenAI) are at the forefront of current AI research and technology. With their rapidly increasing popularity and availability, challenges and concerns about their misuse and safety risks are becoming more prominent than ever. In this talk, I will provide new tools and insights to explore and mitigate the safety and robustness risks associated with state-of-the-art LLMs and GenAI models. In particular, I will cover (i) safety risks in fine-tuning LLMs, (ii) LLM jailbreak mitigation, (iii) prompt engineering for safety debugging, and (iv) robust detection of AI-generated content.

Bio: Dr. Pin-Yu Chen is a principal research scientist at IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA. He is also the chief scientist of RPI-IBM AI Research Collaboration and PI of ongoing MIT-IBM Watson AI Lab projects. Dr. Chen received his Ph.D. in electrical engineering and computer science from the University of Michigan, Ann Arbor, USA, in 2016. Dr. Chen’s recent research focuses on AI safety and robustness. His long-term research vision is to build trustworthy machine learning systems. He received the IJCAI Computers and Thought Award in 2023. He is a co-author of the book “Adversarial Robustness for Machine Learning”. At IBM Research, he received several research accomplishment awards, including IBM Master Inventor, IBM Corporate Technical Award, and IBM Pat Goldberg Memorial Best Paper. His research contributes to IBM open-source libraries including Adversarial Robustness Toolbox (ART 360) and AI Explainability 360 (AIX 360). He has published more than 50 papers related to trustworthy machine learning at major AI and machine learning conferences, given tutorials at NeurIPS’22, AAAI(’22,’23,’24), IJCAI’21, CVPR(’20,’21,’23), ECCV’20, ICASSP(’20,’22,’23,’24), KDD’19, and Big Data’18, and organized several workshops for adversarial machine learning. He has been an IEEE Fellow since 2025. He is currently on the editorial board of Transactions on Machine Learning Research and IEEE Transactions on Signal Processing. He is also an Area Chair or Senior Program Committee member for NeurIPS, ICLR, ICML, AAAI, IJCAI, and PAKDD, and a Distinguished Lecturer of ACM. He received the IEEE GLOBECOM 2010 GOLD Best Paper Award and UAI 2022 Best Paper Runner-Up Award. In 2025, he received the IEEE SPS Industry Young Professional Leadership Award.

02/03/2025

Alexander Korotin, Skoltech

Building Light Schrödinger Bridges

Abstract

Schrödinger Bridges (SB) have recently gained attention of the ML community as a promising extension of classic diffusion models, which are also interconnected to the Entropic Optimal Transport (EOT). Despite the recent advances in the field of computational Schrödinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks.We address this issue and propose two novel light solvers for this problem: LightSB and LightSBM. Both utilize the optimal structure of Schrödinger Bridges but in different ways. The LightSB solver allows direct minimized KL-divergence with the ground-truth solution, knowing only start and end marginals. The LightSBM solver is based on bridge matching and introduces the new concept of optimal projection, allowing the Schrödinger Bridge to be solved in one bridge-matching iteration.

Bio: Prof. Korotin is an Assistant Professor and leads the Generative AI research group at Skoltech while serving as a senior research scientist at AIRI. He earned his PhD in Math & Physics in 2023 under the supervision of Prof. E. Burnaev. His research mostly revolves around generative modeling with a particular focus on developing novel algorithms based on Optimal Transport and Schrodinger Bridges.

01/20/2025

Qiang Liu, UT Austin

Rectified flow: A straight approach to generative modeling.

Abstract

Rectified Flow (RF) is a simple yet general approach to generative modeling, widely applied in state-of-the-art AI tasks such as image and video generation. It provides a straightforward method for learning transport mappings between two distributions—observed through either unpaired or paired data points—by learning neural ordinary differential equation (ODE) models that prioritizes path straightness. Straight paths are naturally preferred and allow for fast simulation with large discretization step sizes, enabling efficient one-step or few-step models. Although based solely on ODEs, RF can be extended to offer simplified perspectives on existing diffusion models. Furthermore, it draws close connections to mass transport theory, which we will briefly explore.

Bio: Qiang Liu is an associate professor of computer science at UT Austin. His research advances fundamental machine learning algorithms through deep mathematical insights.

12/09/2024

Jeff Clune, University of British Columbia / Google DeepMind

Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Abstract

Foundation models (e.g. large language models) create exciting new opportunities in our longstanding quests to produce open-ended and AI-generating algorithms, wherein agents can truly keep innovating and learning forever. In this talk I will share some of our recent work harnessing the power of foundation models to make progress in these areas. I will cover our recent work on OMNI (Open-endedness via Models of human Notions of Interestingness), Video Pre-Training (VPT), Thought Cloning, Automatically Designing Agentic Systems, and The AI Scientist.

Bio: Jeff Clune is a Professor of Computer Science at the University of British Columbia, a Canada CIFAR AI Chair at the Vector Institute, and a Senior Research Advisor at DeepMind. Jeff focuses on deep learning, including deep reinforcement learning. Previously he was a research manager at OpenAI, a Senior Research Manager and founding member of Uber AI Labs (formed after Uber acquired a startup he helped lead), the Harris Associate Professor in Computer Science at the University of Wyoming, and a Research Scientist at Cornell University. He received degrees from Michigan State University (PhD, master’s) and the University of Michigan (bachelor’s). More on Jeff’s research can be found at JeffClune.com or on Twitter (@jeffclune). Since 2015, he won the Presidential Early Career Award for Scientists and Engineers from the White House, had two papers in Nature, one in Science, and one in PNAS, won an NSF CAREER award, received Outstanding Paper of the Decade and Distinguished Young Investigator awards, received two Test of Time awards, and had best paper awards, oral presentations, and invited talks at the top machine learning conferences (NeurIPS, CVPR, ICLR, and ICML). His research is regularly covered in the press, including the New York Times, NPR, the New Yorker, CNN, NBC, Wired, the BBC, the Economist, Science, Nature, National Geographic, the Atlantic, and the New Scientist.

11/18/2024

Krishnaram Kenthapadi, Oracle Health

Deploying Trustworthy Generative AI

Abstract

While generative AI models and applications have huge potential across different industries, their successful commercial deployment requires addressing several ethical, trustworthiness, and safety considerations. These concerns include domain-specific evaluation, hallucinations, truthfulness and grounding, safety and alignment, bias and fairness, robustness and security, privacy, unlearning, and copyright implications, calibration and confidence, and transparency. In this talk, we first motivate the need for adopting responsible AI principles when developing and deploying large language models (LLMs), text-to-image models, and other generative AI models, and provide a roadmap for thinking about responsible AI and AI observability for generative AI in practice. Focusing on real-world generative AI use cases (e.g., evaluating LLMs for robustness, security, bias, etc. especially in health AI applications and user-facing & enterprise-internal chatbot settings), we present practical solution approaches / guidelines for applying responsible AI techniques effectively and discuss lessons learned from deploying responsible AI approaches for generative AI applications in practice. This talk will be based on our KDD 24 LLM grounding and evaluation tutorial and ICML/KDD/FAccT 2023 trustworthy generative AI tutorial.

Bio: Krishnaram Kenthapadi is the Chief Scientist, Clinical AI at Oracle Health, where he leads the AI initiatives for Clinical Digital Assistant and other Oracle Health products. Previously, as the Chief AI Officer & Chief Scientist of Fiddler AI, he led initiatives on generative AI (e.g., Fiddler Auditor, an open-source library for evaluating & red-teaming LLMs before deployment; AI safety, observability & feedback mechanisms for LLMs in production), and on AI safety, alignment, observability, and trustworthiness, as well as the technical strategy, innovation, and thought leadership for Fiddler. Prior to that, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in the Amazon AI platform, and shaped new initiatives such as Amazon SageMaker Clarify from inception to launch. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. He serves regularly on the senior program committees of FAccT, KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 60+ papers, with 7000+ citations and filed 150+ patents (72 granted). He has presented tutorials on trustworthy generative AI, privacy, fairness, explainable AI, model monitoring, and responsible AI at forums such as ICML, KDD, WSDM, WWW, FAccT, and AAAI, given several invited industry talks, and instructed a course on responsible AI at Stanford.

11/11/2024

Jingrui He, University of Illinois Urbana-Champaign

Multifaceted Robustness in Transfer Learning

Abstract

Transfer learning aims to build predictive models for target domains with limited label information by leveraging the relevant knowledge from one or more source domains with abundant data. It finds successful applications across multiple domains, such as agriculture and natural language processing. However, existing transfer learning techniques are often vulnerable to adversarial attacks and/or complex distribution shifts in rich data. In this talk, I will introduce some of our recent works addressing these limitations with multifaceted robustness, including poisoning attacks exploiting such vulnerabilities, a Byzantine-robust method for federated learning, as well as novel techniques based on the Gaussian process for modeling the distribution shifts in rich data. Towards the end, I will share my thoughts regarding future directions of multifaceted robustness in transfer learning.

Bio: Dr. Jingrui He is a Professor at School of Information Sciences, University of Illinois at Urbana-Champaign. She received her PhD from Carnegie Mellon University in 2010. Her research focuses on heterogeneous machine learning, active learning, neural bandits, and self-supervised learning, with applications in security, agriculture, social network analysis, healthcare, and finance. Dr. He is the recipient of the 2016 NSF CAREER Award, the 2020 OAT Award, three times recipient of the IBM Faculty Award in 2018, 2015 and 2014 respectively, and was selected as IJCAI 2017 Early Career Spotlight. Dr. He has more than 180 publications at major conferences (e.g., ICML, NeurIPS, ICLR, KDD) and journals (e.g., TKDE, TKDD, JMLR), and is the author of two books. Her papers have received the Distinguished Paper Award at FAccT 2022, as well as Bests of the Conference at ICDM 2016, ICDM 2010, and SDM 2010. Dr. He is a Distinguished Member of ACM, a Senior Member of AAAI and IEEE. She is also the Program Co-chair of IEEE BigData 2023.

11/04/2024

Jiajun Wu, Stanford University

Concept Learning Across Domains and Modalities

Abstract

I will discuss a concept-centric paradigm for building agents that can learn continually and reason flexibly across multiple domains and input modalities. The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts. These concepts, including object, relation, and action concepts, are grounded on sensory inputs and actuation outputs. They are also compositional, allowing for the creation of novel concepts through their structural combination. To facilitate learning and reasoning, the concepts are typed and represented using a combination of symbolic programs and neural network representations. Leveraging such neuro-symbolic concepts, the agent can efficiently learn and recombine them to solve various tasks across different domains and data modalities, ranging from 2D images, videos, 3D scenes, temporal data, and robotic manipulation data.

Bio: Jiajun Wu is an Assistant Professor of Computer Science and, by courtesy, of Psychology at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the Young Investigator Programs (YIP) by ONR and by AFOSR, the NSF CAREER award, the Okawa research grant, paper awards and finalists at ICCV, CVPR, SIGGRAPH Asia, CoRL, and IROS, dissertation awards from ACM, AAAI, and MIT, the 2020 Samsung AI Researcher of the Year, and faculty research awards from J.P. Morgan, Samsung, Amazon, and Meta.

10/28/2024

Huy Vo, Meta AI

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Abstract

Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size. In this work, we consider the problem of automatic curation of high-quality datasets for self-supervised pre-training. We posit that such datasets should be large, diverse and balanced, and propose a clustering-based approach for building ones satisfying all these criteria. Our method involves successive and hierarchical applications of k-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. Extensive experiments on three different data domains including web-based images, satellite images and text show that features trained on our automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data.

Bio: Huy V. Vo is currently a Research Scientist at Meta Fundamental AI Research (FAIR). He obtained his PhD in Computer Science from Ecole Normale Superieure in November 2022. His thesis was prepared in the WILLOW team at INRIA and the Valeo.ai team under the supervision of Prof. Jean Ponce and Prof. Patrick Pérez. Prior to his PhD, he obtained his Master Mathématique-Vision-Apprentissage (MVA) from Ecole Normale Supérieure de Paris Saclay and his engineering diploma from Ecole Polytechnique de Paris. His research focuses on learning problems in images that require less supervision including object discovery, self-supervised feature learning, weakly supervised object detection/segmentation and active learning.

10/21/2024

Youssef Mroueh, IBM Research/MIT-IBM Watson AI Lab

Distributional Preference Alignment of Large Language Models via Optimal Transport

Abstract

Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval.

Bio: Youssef Mroueh is a Principal Research Scientist in IBM Research with the Human Centered Trustworthy AI department. He received his PhD in computer science in February 2015 from MIT, CSAIL, where he was advised by Professor Tomaso Poggio. In 2011, he obtained his engineering diploma from Ecole Polytechnique Paris France, and a Master of Science in Applied Mathematics from Ecole des Mines de Paris. He is interested in Optimal transport, Deep multimodal learning, Large Language models, trustworthy ML, Statistical Learning Theory, scientific ML , and AI for social good.

10/14/2024

Yu Wang, University of Oregon

Data-Aware Graph Machine Learning

Abstract

Graph-structured data is ubiquitous in real-world applications (e.g., social networks, infrastructure, biomedical, etc.) and Graph Machine Learning (GML) has become a prominent method for handling graph-based data. Despite GML’s remarkable achievements, its reliance on node features and graph topology makes it susceptible to data quality challenges. This talk will focus on data-quality-aware graph machine learning, overcoming issues related to topology, imbalance, and bias in graph data through model/data-centric solutions. Specifically, I will first introduce several data/model-centric solutions to handle topology issues in link prediction, imbalance issues in graph classification and bias issues in node classification. Furthermore, I will introduce two applications of the developed data-quality-aware graph machine learning, including boosting generative performance with large graph generative models and overcoming hallucinations with knowledge graph prompting. In conclusion, I will highlight multiple future directions in graph machine learning.

Bio: Yu Wang is an Assistant Professor in the Dept of Computer and Information Science at the University of Oregon. Before that, he received his Ph.D. in the Computer Science Dept at Vanderbilt University under the supervision of Dr. Tyler Derr. His research mainly focuses on network analysis, data-centric graph machine learning, and responsible AI for social good with applications in biochemistry, information retrieval and infrastructure. He received numerous honors and awards including the sole recipient of Vanderbilt's Graduate Leadership Anchor Award for Research in 2023, the 2023-2024 Recipient of the Vanderbilt Outstanding Doctoral Student Award, the Best Paper Award in 2020 Smokey Mountain Data Challenge Competition by ORNL, first-author of Vanderbilt’s C.F. Chen Best Paper Award in 2022, first-author of the Best Paper Award at GLFrontiers Workshop at Neurips'23, Best Doctoral Forum Poster Runner-ups at SDM'24. He actively contributed to top venues in data mining and machine learning, both in terms of publishing such as ICLR, AAAI, KDD, WWW, CIKM, WSDM, TKDD, TIST and serving as a PC member/reviewer/organizer such as KDD, ICML, AAAI (ICWSM), WWW, WSDM, CIKM, TKDD, and TNNLS.

10/07/2024

Saining Xie, NYU

Grounding (and Evaluating) Visual Intelligence in Real Life

Abstract

This talk provides an overview of our recent work in multimodal foundation models. We start by exploring the visual shortcomings of multimodal large language models, followed by a discussion on how to enhance LLMs with better and more precise visual grounding. Our approach incorporates mechanisms such as visual self-supervised learning, human-like visual search and system II reasoning into multimodal LLMs. By integrating an informed visual search algorithm, we enable LLMs to identify relevant information within a multitude of stimuli and interact more effectively with real-world data. We also ground LLMs in real-life experiences using actionable environments like street view imagery, enriching their sensory grounding and resonating with urban life nuances. This line of research aims to empower LLMs to interact with and understand the sensory-rich world in a more realistic and meaningful way.

Bio: Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. He is also a visiting faculty researcher at Google Research. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. Prior to that, he received his Bachelor’s degree from Shanghai Jiao Tong University. Saining works in computer vision and machine learning, with a particular interest in scalable visual representation learning. His work has been recognized with the Marr Prize honorable mention, CVPR best paper finalists and an Amazon research award.

09/30/2024

Furong Huang, University of Maryland

Integrity in AI: Multi-Modality Approaches to Combat Misinformation for Content Authenticity

Abstract

As artificial intelligence technologies become increasingly sophisticated, the emergence of multi-modal deepfakes—combining text, images, and videos—presents new challenges in the realms of misinformation and digital content authenticity. This presentation delves into cutting-edge research aimed at fortifying AI against these challenges, focusing on the detection of AI-generated text and the robustness of watermarking across different media. Our discussion begins with :Towards Robust AI-Generated Text Detection (arXiv:2304.04736, ICML2024), which introduces advanced detection algorithms that discern synthetic text, critical for preventing the spread of AI-facilitated misinformation. We extend this discussion to the visual domain, examining strategies for enhancing the security of digital watermarking in AI models, as explored in :Advancing Watermark Robustness in AI Systems (arXiv:2401.08573, ICML2024). This work is pivotal for asserting content authenticity and ownership, particularly against the backdrop of easily manipulated digital media. By integrating insights from both textual and visual data, this presentation not only addresses technical solutions but also encourages a broader dialogue on the societal and economic implications of multi-modal deepfakes, aiming to align AI advancements with ethical standards and regulatory frameworks.

Bio: Furong Huang is an Associate Professor of the Department of Computer Science at the University of Maryland. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she spent one year as a postdoctoral researcher at Microsoft Research NYC. She works on statistical and trustworthy machine learning, foundation models and reinforcement learning, with specialization in domain adaptation, algorithmic robustness and fairness. With a focus on high-dimensional statistics and sequential decision-making, she develops efficient, robust, scalable, sustainable, ethical and responsible machine learning algorithms. She is recognized for her contributions with awards including best paper awards, the MIT Technology Review Innovators Under 35 Asia Pacific, the MLconf Industry Impact Research Award, the NSF CRII Award, the Microsoft Accelerate Foundation Models Research award, the Adobe Faculty Research Award, three JP Morgan Faculty Research Awards and Finalist of AI in Research - AI researcher of the year for Women in AI Awards North America.

09/23/2024

Yilun Du, Google DeepMind / Harvard University (*)

Generalizing Outside the Training Distribution Through Compositional Generation

Abstract

Generative AI has led to stunning successes in recent years but is fundamentally limited by the amount of data available. This is especially limiting in the embodied setting – where an agent must solve new tasks in new environments. In this talk, I’ll introduce the idea of compositional generative modeling, which enables generalization beyond the training data by building complex generative models from smaller constituents. I’ll first introduce the idea of energy-based models and illustrate how they enable compositional generative modeling. I’ll then illustrate how such compositional models enable us to synthesize complex plans for unseen tasks at inference time. Finally, I'll show how such compositionality can be applied to multiple foundation models trained on various forms of Internet data, enabling us to construct decision-making systems that can hierarchically plan and solve long-horizon problems in a zero-shot manner.

Bio: Yilun Du is senior research scientist at Google Deepmind and an Incoming Assistant Professor in the Harvard Kempner Institute and Computer Science. He received his PhD at MIT, advised by Leslie Kaelbling, Tomas Lozano-Perez, and Joshua Tenenbaum. His research spans the fields of machine learning and robotics, with a focus on generative models. Yilun was a recipient of the NSF Graduate Fellowship and a finalist for Qualcomm and Open Philanthropy fellowships. Previous, he has done research fellowships/internships at OpenAI, FAIR, DeepMind. His work has received best paper awards at ICLR and at NeurIPS and ICRA workshops.

08/26/2024

Leon Bottou, Meta AI (FAIR) in New York

Memory Mosaics

Abstract

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways, and we can start to understand why and how transformer-like architectures can learn compositional structures. We demonstrate these capabilities on toy examples and we also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.

Bio: Léon Bottou received the Diplôme d’Ingénieur de École Polytechnique (X84) in 1987, the Magistère de Mathématiques Fondamentales et Appliquées et d’Informatique from Ecole normale supérieure in 1988, and a Ph.D. in Computer Science from Université de Paris-Sud in 1991. His research career took him to AT&T Bell Laboratories, AT&T Labs Research, NEC Labs America and Microsoft. He joined Meta AI (formerly Facebook AI Research) in 2015. Leon’s research has followed many practical and theoretical turns: neural networks applications in the late 1980s, stochastic gradient learning algorithms and statistical properties of learning systems in the early 1990s, computer vision applications with structured outputs in the late 1990s, theory of large scale learning in the 2000s. During the last few years, Léon Bottou’s research aims to clarify the relation between learning and reasoning, with more and more focus on the many aspects of causation (inference, invariance, reasoning, affordance, and intuition.)

03/25/2024

Frank Tong, Vanderbilt University

Understanding The Computational Bases of Robust Object Recognition In Humans and Deep Neural Networks

Abstract

Deep neural networks (DNNs) trained on object classification provide the best current models of human vision, with accompanying claims that they have attained or even surpassed human-level performance. However, DNNs tend to fail catastrophically in situations where humans do not, especially when faced with noisy, degraded, or ambiguous visual inputs. Such findings imply that the computations performed by DNNs do not adequately match those performed by the human brain. In this talk, I will discuss whether the brittleness of current DNN models is caused by flaws in their architectural design, imperfections in their learning protocols, or inadequacies in their training experiences. Here, we evaluated the hypothesis that everyday encounters with visual blur may be a critical feature for conferring robustness to biological and artificial visual systems. Our studies show how learning has a critical role in the acquisition of robust object representations, such that appropriately trained DNN models can better predict human behavioral and neural responses across a range of challenging viewing conditions.

Bio: Dr. Frank Tong studies the neurocomputational bases of human vision using behavioral psychophysics, functional MRI, computational modeling and deep learning techniques. A major focus of his lab is developing more robust and human-aligned DNN models of visual processing. He received his BS in Psychology from Queen’s University, Canada and PhD from Harvard University. He worked as an Assistant Professor at Princeton University from 2000-2004, and moved to Vanderbilt University thereafter, where he is now a Centennial Professor of Psychology. For his research contributions, he has received awards from the Cognitive Neuroscience Society, the Vision Sciences Society, and the National Academy of Sciences.

03/18/2024

Nicolas Papernot, Google DeepMind / U of Toronto

Characterizing Machine Unlearning through Definitions and Implementations

Abstract

The talk presents open problems in the study of machine unlearning. The need for machine unlearning, i.e., obtaining a model one would get without training on a subset of data, arises from privacy legislation and as a potential solution to data poisoning or copyright claims. The first part of the talk discusses approaches that provide exact unlearning: these approaches output the same distribution of models as would have been obtained by training without the subset of data to be unlearned in the first place. While such approaches can be computationally expensive, we discuss why it is difficult to relax the guarantee they provide to pave the way for more efficient approaches. The second part of the talk asks if we can verify unlearning. Here we show how an entity can claim plausible deniability when challenged about an unlearning request that was claimed to be processed, and conclude that at the level of model weights, being unlearnt is not always a well-defined property. Instead, unlearning is an algorithmic property.

Bio: Nicolas Papernot is an Assistant Professor of Computer Engineering and Computer Science at the University of Toronto. He also holds a Canada CIFAR AI Chair at the Vector Institute, and is a faculty affiliate at the Schwartz Reisman Institute. His research interests span the security and privacy of machine learning. Some of his group’s recent projects include generative model collapse, cryptographic auditing of ML, private learning, proof-of-learning, and machine unlearning. Nicolas is an Alfred P. Sloan Research Fellow in Computer Science and a Member of the Royal Society of Canada’s College of New Scholars. His work on differentially private machine learning was awarded an outstanding paper at ICLR 2022 and a best paper at ICLR 2017. He co-created the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) and is co-chairing its first two editions in 2023 and 2024. He previously served as an associate chair of the IEEE Symposium on Security and Privacy (Oakland), and an area chair of NeurIPS. Nicolas earned his Ph.D. at the Pennsylvania State University, working with Prof. Patrick McDaniel and supported by a Google PhD Fellowship. Upon graduating, he spent a year at Google Brain where he still spends some of his time.

03/11/2024

Max Welling, University of Amsterdam

The Synergy between Machine Learning and the Natural Sciences

Abstract

Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster Density Functional Theory, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks and Neural Quantum Error Correction codes.

Bio: Prof. Dr. Max Welling is a full professor and research chair in Machine Learning at the University of Amsterdam and a Merkin Distinguished Visiting Professor at Caltech. He is a Fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he served on the founding board. His previous appointments include Partner and VP at Microsoft Research, VP at Qualcomm Technologies, professor at UC Irvine, postdoc at UCL & U. Toronto under supervision of Prof. Geoffrey Hinton, and postdoc at Caltech under supervision of Prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate prof. Gerard ‘t Hooft.

03/04/2024

Ricky Chen, Meta AI (FAIR) in New York

Discovering Latent Dynamics of the World: A Simulation-Free Perspective

Abstract

Latent dynamics pervade the world and hence our observations of it, a.k.a. data. However, we never fully observe the data generation process, so how should we go about filling in the blanks in our observations? In this talk, I will discuss my perspective on the field of generative modeling as that of learning dynamical systems of the world. In particular, I will motivate and discuss general recipes for constructing and training generative models, with the central theme of simulation-free training paradigms. This simulation-free perspective allows us to decouple the algorithmic cost of training from the complexity of the data generation process. However, simple methods within this family class such as diffusion models are not readily amenable to additional constraints or regularizations that we wish to impose on the generation process. I will first introduce the Flow Matching approach for learning generative models where the generation process is directly prescribed. I will then discuss generalizations of this approach to setups where the generation process must lie on a manifold, and where the generation process is only implicitly defined as the solution to some task-specific objective function, connecting to problems appearing in stochastic optimal control and optimal transport.

Bio: Ricky is a Research Scientist at FAIR, Meta, based in New York. His research is on building simplified abstractions of the world through the lens of dynamical systems and flows. He generally works on integrating structured transformations into probabilistic modeling, with the goal of improved interpretability, tractable optimization, or extending into novel areas of application.

02/19/2024

Christopher Rackauckas, MIT

SciML: Adding Scientific Models as Structure to Improve Machine Learning

Abstract

Scientific machine learning (SciML) is the practice of adding scientific structure to improve the predictions from machine learning. In this talk we will showcase and explain how SciML techniques such as universal differential equations (UDEs) make it possible to improve the prediction and extrapolation capabilities of machine learning on small data. We will show various ways that physical laws, prior chemical knowledge, and conservation laws can be incorporated into a general learning process in order to give better predictions out of the same data. We will end by discussing some of the ways the SciML techniques can improve general machine learning with methods that automatically optimize hyperparameters, showing how solvers for ordinary differential equations can be used to give neural architectures with optimal depth and fast infinite layer architectures.

Bio: Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at https://chrisrackauckas.com/. He is the lead developer of the Pumas project and has received a top presentation award at every ACoP in the last 3 years for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.

02/12/2024

Aapo Hyvärinen, University of Helsinki

Painful Intelligence: What AI Can Tell Us About Human Suffering

Abstract

This talk discusses Aapo’s new book, which is freely available on his website (https://www.cs.helsinki.fi/u/ahyvarin/).The book uses the modern theory of artificial intelligence (AI) to understand human suffering or mental pain. Both humans and sophisticated AI agents process information about the world in order to achieve goals and obtain rewards, which is why AI can be used as a model of the human brain and mind. The book starts with the assumption that suffering is mainly caused by frustration. Frustration means the failure of an agent (whether AI or human) to achieve a goal or a reward it wanted or expected. Frustration is inevitable because of the overwhelming complexity of the world, limited computational resources, and scarcity of good data. In particular, such limitations imply that an agent acting in the real world must cope with uncontrollability, unpredictability, and uncertainty, which all lead to frustration. Such computational theory is finally used to derive various interventions or training methods that will reduce suffering in humans. The ensuing interventions are very similar to those proposed by Buddhist and Stoic philosophy, and include mindfulness meditation.

Bio: Aapo Hyvärinen studied undergraduate mathematics at the Universities of Helsinki (Finland), Vienna (Austria), and Paris (France), and obtained a Ph.D. degree in Information Science at the Helsinki University of Technology in 1997. After post-doctoral work at the Helsinki University of Technology, he moved to the University of Helsinki in 2003, where he was appointed Professor in 2008, at the Department of Computer Science. From 2016 to 2019, he was Professor of Machine Learning at the Gatsby Computational Neuroscience Unit, University College London, UK. Aapo Hyvarinen is the main author of the books Independent Component Analysis (2001), Natural Image Statistics (2009), and Painful Intelligence (2022). He is Action Editor at the Journal of Machine Learning Research and Neural Computation, and has worked as Area Chair at ICML, ICLR, AISTATS, UAI, ACML and NeurIPS.

02/05/2024

Peyman Milanfar, Google Research

Denoising as a Building Block for Imaging, Inverse Problems, and Machine Learning

Abstract

Denoising is one of the oldest problems in imaging. There are thousands of papers on this topic, and their scope is vast and the approaches so diverse that putting them in some order (as I will do) is both useful and challenging. In the last decade, the quality of denoising algorithms has reached phenomenal levels – almost as good as we can ever hope. But besides this, we've found completely unexpected, brand new uses for denoising. I will describe what we can say about this general class of operators, and what makes them so special. I will argue that denoising is more important than ever; not simply as a process for removing noise, but especially now as a core engine and building block for much more complex tasks in imaging, inverse problems, and machine learning.

Bio: Peyman is a Distinguished Scientist / Senior Director at Google Research, where he leads the Computational Imaging team. Prior to this, he was a Professor of Electrical Engineering at UC Santa Cruz from 1999-2014. He was Associate Dean for Research at the School of Engineering from 2010-12. From 2012-2014 he was on leave at Google-x, where he helped develop the imaging pipeline for Google Glass. Over the last several years, Peyman's team at Google has developed several core technologies including the digital zoom pipeline for the Pixel phones, which includes the multi-frame super-resolution (Super Res Zoom) pipeline, and the RAISR upscaling algorithm. Most recently, his team led the development of the Unblur feature launched with Pixel 7/pro. Peyman received his undergraduate education in electrical engineering and mathematics from the University of California, Berkeley, and the MS and PhD degrees in electrical engineering from the Massachusetts Institute of Technology. He holds numerous patents, several of which are commercially licensed. He founded MotionDSP, which was acquired by Cubic Inc. Peyman has been keynote speaker at numerous technical conferences including Picture Coding Symposium (PCS), SIAM Imaging Sciences, SPIE, and the International Conference on Multimedia (ICME). Along with his students, he has won several best paper awards from the IEEE Signal Processing Society. He was a Distinguished Lecturer of the IEEE Signal Processing Society, and is a Fellow of the IEEE for contributions to inverse problems and super-resolution in imaging.

01/22/2024

Graham Neubig, LTI @ Carnegie Mellon University

Towards Automating Machine Learning Engineering

Abstract

When a skilled machine learning engineer is tasked with building a system for a specific application, they take several steps. Some of these include doing a literature review of the most appropriate models and datasets, choosing which ones to utilize based on accuracy and other constraints such as efficiency or latency, creating or curating training and testing data, training and comparing models, identifying weak points of the current modeling paradigm and iteratively improving. In this talk, I will discuss some two projects that take steps towards automation of this entire process. The first, prompt2model, is a method to solve the task of taking in a natural language task description (similar to a prompt that is provided to a system like ChatGPT) and utilize the entire open source model training ecosystem to train a small, easily deployable model that nonetheless has competitive accuracy with large language models. The second, Zeno, is an intelligent model comparison and error analysis tool that makes it possible for machine learning engineers to quickly uncover errors and weak spots, including methods for automatic blind-spot discovery.

Bio: Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses natural language processing, with a particular interest in fundamentals, applications, and understanding of large language models for tasks such as question answering, code generation, and multilingual applications. His final goal is that every person in the world should be able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.

11/27/2023

Dongwon Lee, Penn State

Deepfakes, Language Models, and The Age of Synthetic Truth

Abstract

The recent explosive advancements in both deepfake-enabling methods in Computer Vision and generative language models in NLP have enabled the generation of human-quality artifacts in various modalities. However, at the same time, these new AI technologies can be used by adversaries for malicious purposes, opening a window of opportunity for disinformation purveyors and state-sponsored hackers. In this talk, I’ll showcase some examples of deepfake artifacts and their underlying AI technologies, especially reviewing the current landscape of large language models. Then, I’ll discuss how adversaries may use such recent developments to create the so-called “Fake News 2.0,” which can erode the public’s confidence in democracy. Finally, I will conclude the talk by sharing the important implications of deepfakes within the information ecosystem as well as in society at large.

Bio: Dongwon Lee is a full professor and the director of the Ph.D. program in the Information School (also known as iSchool) at Penn State University, USA. He is also an ACM Distinguished Scientist (2019) and a Fulbright Cyber Security Scholar (2022). Before joining Penn State, he worked at AT&T Bell Labs in New Jersey and earned his Ph.D. in Computer Science from UCLA. From 2015 to 2017, he served as a Program Director at the National Science Foundation (NSF), co-managing cybersecurity education and research programs and contributing to the development of national research priorities. In general, his research focuses on problems at the intersection of data science, machine learning, and cybersecurity. For more details about his research, you can visit: https://pike.psu.edu/.

11/13/2023

Stella Yu, University of Michigan

Unsupervised Learning Of Segmentation By Recognition and For Recognition

Abstract

Image segmentation in computer vision has evolved such that it is routinely treated as an end task. For example, for autonomous driving, we are interested in segmenting a road scene into (cars, bikes, motorcycles, persons, trees, lamp-posts, traffic signs, curbs), etc. To differentiate a person in different contexts, we label (a person on a bike) a (bike-rider), (a person on a curb) a (it pedestrian), (a person on a horse) a (horse-rider). To understand the intent and action of a person, we want to segment a person into (head, torso, arms, legs). Segment-Anything-Model (SAM) takes supervised segmentation to a large scale, giving a false impression that segmentation is now solved. My view is that segmentation underlies the generalization capability of visual intelligence and supervised segmentation is simply the wrong approach. Segmentation should be treated not as an end-goal itself, but as an internal mid-level representation that serves visual recognition. I will present our recent works in this direction, including unsupervised learning of objectness and visual context, unsupervised discovery of visual semantic hierarchies and part-whole hierarchies.

Bio: Stella Yu received her Ph.D. from Carnegie Mellon University, where she studied robotics at the Robotics Institute and vision science at the Center for the Neural Basis of Cognition. Before she joined the University of Michigan faculty in Fall 2022, she has been the Director of Vision Group at the International Computer Science Institute, a Senior Fellow at the Berkeley Institute for Data Science, and on the faculty of Computer Science, Vision Science, Cognitive and Brain Sciences at UC Berkeley. Dr. Yu is interested not only in understanding visual perception from multiple perspectives, but also in using computer vision and machine learning to automate and exceed human expertise in practical applications.

11/06/2023

David Stutz, Google DeepMind

Conformal prediction under ambiguous ground truth

Abstract

In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.

Bio: David is a research scientist at Google DeepMind interested in robust and safe deep learning. Before, he completed his PhD at the Max Planck Institute for Informatics which included an internship at Google DeepMind and a collaboration with IBM Research. His PhD was supported by a Qualcomm Innovation Fellowship 2019 and received the DAGM MVTec Dissertation Award 2023. Other notable honors include an outstanding paper award at the CVPR 2021 CV-AML workshop, participation in the 7th and 10th Heidelberg Laureate forum, the RWTH Aachen University Springorum Denkmünze as well as the STEM-Award IT 2018 for his master thesis, and several national scholarships. He was repeatedly recognized as an outstanding/top reviewer for CVPR, ICML and NeurIPS. More details can be found on his blog at davidstutz.de.

10/30/2023

Atlas Wang, Picsart / UT Austin

Whispers in the Weight: Unraveling the Mysteries of LLM Compression

Abstract

Modern Large Language Models (LLMs) have revolutionized Natural Language Processing, yet their computational demands require compression. Through a series of studies, we delve into the intricacies of LLM compression and explore potential remedies. First, we challenge conventional compression evaluation metrics by introducing the Knowledge-Intensive Compressed LLM BenchmarK (LLM-KICK). This curated task collection provides nuanced insights into compression methods beyond perplexity. We illuminate pitfalls in existing pruning and quantization techniques, uncovering , for instance, the robustness of pruned LLMs in contextually demanding tasks. Next, we navigate the trade-offs of post-compression re-training and explore the promise of prompt-driven recovery. Through Inference-time Dynamic Prompting (IDP), prompts are autonomously selected based on context, resulting in a notable performance boost across a diverse range of tasks. Further, drawing inspiration from genomics, we conduct a holistic scientific study to examine weight redundancy in LLMs, articulating our findings as the Junk DNA Hypothesis for LLMs. This challenges common assumptions about low-magnitude weights, revealing their pivotal role in complex tasks, and that removing them risks irreversible knowledge loss.

Bio: Professor Zhangyang “Atlas” Wang is a tenured Associate Professor and holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. He is also a faculty member of UT Computer Science and the Oden Institute CSEM program. Meanwhile, in a part-time role, he serves as the Director of AI Research & Technology for Picsart, where he leads the development of cutting-edge, GenAI-powered tools for creative visual editing. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning (ML). At present, his core research mission is to leverage, understand and expand the role of low-dimensionality, from classical optimization to modern neural networks, whose impacts span over many important topics such as: efficient scaling, training and inference of large language models (LLMs); robustness and trustworthiness; learning to optimize (L2O); generative AI; and graph learning. Prof. Wang has received many research awards and is fortunate enough to work with a sizable group of accomplished students. His group: https://vita-group.github.io/

10/16/2023

Tri Dao, Together.AI / Princeton University

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Abstract

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. FlashAttention trains Transformers faster than existing baselines, with 2-4x speedup on the attention kernel. FlashAttention enables longer context in Transformers (4-16x longer than previous), yielding higher quality models. We will also describe recent improvements of FlashAttention: making use of new hardware features on A100 and H100 GPUs (another 2x speedup), optimizations for long-context LLM inference (2-4x faster end-to-end inference time), as well as how these ideas transfer to other model architectures.

Bio: Tri Dao is currently chief scientist of Together.AI and is an incoming Assistant Professor at Princeton University. He completed his PhD in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the interface of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.

10/09/2023

Sharon Yixuan Li, UW Madison

How to Detect Out-of-Distribution Data in the Wild? Challenges, Research Progress, and Path Forward

Abstract

When deploying machine learning models in the open and non-stationary world, their reliability is often challenged by the presence of out-of-distribution (OOD) samples. Since data shifts happen prevalently in the real world, identifying OOD inputs has become an important problem in machine learning. In this talk, I will discuss challenges, research progress, and opportunities in OOD detection. Our work is motivated by the insufficiency of existing learning objective such as ERM --- which focuses on minimizing error only on the in-distribution (ID) data, but do not explicitly account for the uncertainty that arises outside ID data. To mitigate the fundamental limitation, I will introduce a new algorithmic framework, which jointly optimizes for both accurate classification of ID samples, and reliable detection of OOD data. The learning framework integrates distributional uncertainty as a first-class construct in the learning process, thus enabling both accuracy and safety guarantees.

Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in the open world environments. She has served as Area Chair for ICLR, NeurIPS, ICML, and Program Chair for Workshop on Uncertainty and Robustness in Deep Learning. Her work is recognized by the AFOSR Young Investigator Program (YIP) award, NSF CAREER award, MIT Technology Review TR-35 Award, Forbes 30Under30 in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works also received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.

09/25/2023

Hyung Won Chung, OpenAI

Large Language Models (in 2023)

Abstract

There is one unique aspect of large language models (LLMs): larger models exhibit abilities that were not present in the smaller models. These emergent abilities have far-reaching consequences in how we should work in the field of AI. I will share some of my observations on the implications of scaling and emergent abilities. After that, I will introduce multiple stages involved in the current generations of LLM training:: pre-training and post-training (including instruction fine-tuning and RLHF). While a huge volume of research exists for each stage, the core aspects can be expressed relatively simply. I will introduce the fundamental aspects of each stage and discuss the unique challenges they pose.

Bio: Hyung Won is a research scientist at OpenAI ChatGPT team. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT.

09/11/2023

Micah Goldblum, NYU

Bridging the gap between deep learning theory and practice

Abstract

Despite the widespread proliferation of neural networks, the mechanisms through which they operate so successfully are not well understood. In this talk, we will first explore empirical and theoretical investigations into neural network training and generalization and what they can tell us about why deep learning works. Then, we will examine a recent line of work on algorithm learning. While typical neural networks are designed for pattern matching tasks, we consider whether neural networks can learn algorithms that scale to problem instances orders of magnitude larger than those seen during training.

Bio: Micah is a postdoctoral researcher at New York University working with Yann LeCun and Andrew Gordon Wilson. His research portfolio includes award winning work in Bayesian inference, generalization theory, and AI security. Before his current position, Micah received a Ph.D. in mathematics at the University of Maryland.

08/28/2023

Guido Montúfar, UCLA

FoSR: First-order spectral rewiring for addressing oversquashing in GNNs

Abstract

Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On the other hand, adding edges to the message-passing graph can lead to increasingly similar node representations and a problem known as oversmoothing. We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. We combine this with a relational architecture, which lets the GNN preserve the original graph structure and provably prevents oversmoothing. We find experimentally that our algorithm outperforms existing graph rewiring methods in several graph classification tasks. This is work with Kedar Karhadkar and Pradeep Kr. Banerjee.

Bio: Dr. Guido Montúfar is an Associate Professor at UCLA in Mathematics and Statistics & Data Science, and Research Group Leader at the Max Planck Institute for Mathematics in the Sciences. His research interests include Deep Learning Theory, Graphical Models, and Mathematical Machine Learning. He is a recipient of many prestigious awards including the ERC Starting Grant for Deep Learning Theory, the NSF CAREER award, and the 2022 Sloan Research Fellowship. Dr. Montúfar's work bridges the theoretical foundations of mathematics and machine learning, making significant contributions to both fields.

08/21/2023

Nicholas Carlini, Google DeepMind

Are aligned language models adversarially aligned?

Abstract

An aligned model is helpful and harmless. In this talk I will show that while language models may be aligned under typical situations, they are not adversarially aligned. Using standard techniques from adversarial examples, we can construct inputs to otherwise-aligned language models to coerce them into emitting harmful text and performing harmful behavior.

Bio: Nicholas Carlini is a research scientist at Google DeepMind working at the intersection of machine learning and computer security. His most recent line of work studies properties of neural networks from an adversarial perspective, for which he received best paper awards at ICML, USENIX, and IEEE S&P.

08/14/2023

Taco Cohen, Qualcomm AI Research

Geometric Algebra Transformers: A Universal Architecture for Geometric Data

Abstract

Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to Pin(3,0,1), the double cover of E(3): the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In various geometric problems, GATr shows strong improvements over non-geometric baselines.

Bio: Taco Cohen is a machine learning researcher (Principal Engineer) at Qualcomm AI Research in Amsterdam. He received a BSc in theoretical computer science from Utrecht University, and a MSc in artificial intelligence and PhD in machine learning (with prof. Max Welling) from the University of Amsterdam (all three cum laude). He was a co-founder of Scyfer, a company focussed on deep active learning, acquired by Qualcomm in 2017. His research is focused on geometric deep learning and reinforcement learning. During his studies he has interned at Google Deepmind (working with Geoff Hinton) and OpenAI. He received the 2014 University of Amsterdam MSc thesis prize, a Google PhD Fellowship, ICLR 2018 best paper award for “Spherical CNNs”, was named one of 35 innovators under 35 by MIT Tech Review, and won the 2022 ELLIS PhD Award and 2022 Kees Schouhamer Immink prize for his PhD research.

07/10/2023

Johannes Brandstetter, Microsoft Research

Is it the network, or is it the data? Towards large-scale PDE surrogates

Abstract

Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. In this talk, we approach such surrogates from two different angles. First, we have a closer look into possible ideas to best integrate physics into neural PDE surrogates. Second, we let known tricks from computer vision and the pure power of the data speak and assume that all the physics is in the data. Especially for the second, the model needs to be designed in a way to leverage it all. Finally, we compare these paradigms against each other and give an outlook.

Bio: Johannes Brandstetter did his PhD studying Higgs boson decays at the CMS experiment at the Large Hadron Collider at CERN. In 2018, he joined Sepp Hochreiter’s group in Linz, Austria. In 2021, he become ELLIS PostDoc at Max Welling’s lab at the University of Amsterdam. Since 2022, he is a Senior Researcher at the newly founded Microsoft Lab in Amsterdam. His current research interests comprise Geometric Deep Learning, neural PDE solving, and large-scale scientific simulations.

06/05/2023

Jason Wei, OpenAI

Scaling Unlocks Emergent Abilities In Language Models

Abstract

Scaling up language models has been shown to predictably improve performance on a wide range of downstream tasks. In this talk, we will instead discuss an unpredictable phenomenon that we refer to as emergent abilities of large language models. An ability is considered emergent if it is not present in smaller models but is present in larger models, which means that the ability cannot be predicted simply by extrapolating the performance of smaller models. With the popularization of large language models such as GPT-3, Chinchilla, and PaLM, dozens of emergent abilities have been discovered, including chain-of-thought prompting, which enables state-of-the-art mathematical reasoning, and instruction finetuning, which enables large language models to be usable by the broader population. The existence of such emergent phenomena raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.

Bio: Jason Wei is an AI researcher working on ChatGPT at OpenAI in San Francisco. He was previously a senior research scientist at Google Brain, where he popularized chain-of-thought prompting, co-led the first efforts on instruction tuning, and wrote about emergence in large language models. Chain-of-thought prompting was presented by Sundar Pichai at the Google I/O press event in 2022.

04/17/2023

Diederik P. (Durk) Kingma, Google Research

Infinitely Deep Learning

Abstract

Diffusion models have demonstrated amazing abilities for image and video generation. In this talk we explain some recent breakthroughs in understanding state-of-the-art diffusion models as infinitely deep variational autoencoders (VAEs). We start by introducing VAEs. Twe then introduce continuous-time diffusion models as infinitely deep VAEs, and how to optimize their evidence lower bound (ELBO). Finally, we present a new result that explains the objective functions used in state-of-the-art (SOTA) diffusion models as the ELBO with simple data augmentation. This opens up new avenues for optimizing other model families with the same objective as successful diffusion models. We will list some interesting open research questions in the diffusion model space.

Bio: Diederik P. (Durk) Kingma is a machine learning researcher at Google, with a focus on generative models. His contributions include the Variational Autoencoder (VAE), the Adam optimizer, Glow, and Variational Diffusion Models. He obtained a PhD (cum laude) from University of Amsterdam in 2017, and was part of the founding team of OpenAI in 2015.

04/10/2023

Yang Song, OpenAI / Caltech (*)

Breaking the Curse of Dimensionality in Generative Modeling: A Homotopic Approach

Abstract

Generative modeling for high-dimensional data, such as images and audio, is extremely challenging due to the curse of dimensionality. To overcome this difficulty, I introduce a homotopic approach inspired by numerical equation solving, which involves designing a homotopy of probability distributions that smoothly progresses from simple noise distribution to complex data distribution. I will present two families of approaches that rely on such homotopies: score-based diffusion models and consistency models. Both approaches use a differential equation to convert data to noise and learn to estimate the time reversal with deep neural networks. These models allow for flexible neural networks, enable zero-shot image editing, and generate high-quality samples that achieve state-of-the-art performance in many generative modeling benchmarks.

Bio: Yang Song is a research scientist at OpenAI and an incoming Assistant Professor at Caltech. His research interest is in deep generative models, inverse problem solving and AI safety. His research has been recognized with an Outstanding Paper Award at ICLR-2021, an Apple PhD Fellowship in AI/ML, a J.P. Morgan PhD Fellowship, and a WAIC rising star award.

03/27/2023

Petar Veličković, DeepMind / University of Cambridge

Reasoning Algorithmically: from Toy Experiments to AGI Modules

Abstract

Neural networks that are able to reliably execute algorithmic computation may hold transformative potential to both machine learning and theoretical computer science. On one hand, they could enable the kind of extrapolative generalisation scarcely seen with deep learning models. On another, they may allow for running classical algorithms on inputs previously considered inaccessible to them. Over the past few years, the pace of development in this area has gradually become intense. As someone who has been very active in its latest incarnation, I have witnessed these concepts grow from isolated 'toy experiments', through NeurIPS spotlights, all the way to helping detect patterns in complicated mathematical objects (published on the cover of Nature) and supporting the development of generalist reasoning agents. In this talk, I will give my personal account of this journey, and especially how our own interpretation of this methodology, and understanding of its potential, changed with time. It should be of interest to a general audience interested in graphs, (classical) algorithms, reasoning, and building intelligent systems.

Bio: Petar is a Staff Research Scientist at DeepMind, an Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. He holds a PhD in C.S from the University of Cambridge, working with Pietro Liò. His research concerns Geometric Deep Learning and has been featured in various top-tier conferences and news outlets. Currently, Petar focusing on Graph Representation Learning and its applications in Algorithmic Reasoning. He is also recognized as an ELLIS Scholar.

03/20/2023

Hieu Pham, Google Research

Deep Learning After the Transformer

Abstract

The field of machine learning has been through several exciting moments – kernel methods, Bayesian inference, non-parametric methods, to name a few. Every time a new approach pushed an existing limit, people wondered if the approach was “the best”. In our time of 2023, the Transformer is prevalent. Hardly can one find a research paper that does not mention this immensely successful model. But is the Transformer the best neural architecture? If so, can we explain why? If not, how can we improve it; more ambitiously, how can we make something better than it? In this talk, I invite you to contemplate these questions. I share my insights on the properties of the Transformer that make it favorable or not favorable for certain domains and tasks. Based on these insights, I discuss the potential directions for subsequent developments. I will discuss some recent work from my group that makes learning algorithms more efficient, with or without the Transformer.

Bio: Hieu Pham is a Research Scientist at Google Brain. He is currently focusing on improving the efficiency for large vision and language models. Before joining Google, Hieu received his Ph.D. from Carnegie Mellon University (CMU), where he worked on various AutoML projects. His work provided the foundation for one-shot neural architecture search which reduced the cost of AutoML algorithms by several orders of magnitude.

03/06/2023

Thomas Beckers, Vanderbilt University

Safe Learning-based Control of Mechanical Systems

Abstract

In modern technologies such as autonomous vehicles and service robots, control engineering plays a crucial role for the overall performance and safety of the system. However, the control design becomes often very time-consuming or even infeasible due to the increasing complexity of mechanical systems. The classical control approaches, which are based on models of the systems using first principles, are not satisfactory in the presence of complex dynamics, e.g., for highly nonlinear systems or interaction with prior unknown environment. Recent findings in computational intelligence and machine learning have shown that data-driven approaches lead to very promising results in a wide application domain including the modeling of complex dynamics. However, the major drawback in data-driven approaches frequently manifests as unpredictable outcomes. Therefore, the current application of machine learning in control is typically limited to non-critical and low performance systems. In this talk, I will present our results on safe learning-based control of partially unknown mechanical systems. In the first part of the seminar, I will show how we leverage Gaussian processes for the learning of unknown dynamics in the system. Gaussian process (GP) models are of high interest due to many beneficial properties such as the bias-variance trade-off and the strong connection to Bayesian mathematics. We exploit the Bayesian structure to include prior knowledge about the system into the learning process. In the second part, I will present a learning-enhanced model-based control law which guarantees safe control of mechanical systems with partially unknown dynamics. This control law combines the strength of model-based control with the flexibility of machine learning techniques. I demonstrate how we actively exploit the uncertainty of the GP model to guarantee high-performance and stability of the closed-loop.

Bio: Thomas Beckers is an Assistant Professor of Computer Science and the Institute for Software Integrated Systems at Vanderbilt University. Before joining Vanderbilt, he was a postdoctoral researcher at the Department of Electrical and Systems Engineering, University of Pennsylvania, where he was member of the GRASP Lab, PRECISE Center and ASSET Center. In 2020, he earned his doctorate in Electrical Engineering at the Technical University of Munich (TUM), Germany. He received the B.Sc. and M.Sc. degree in Electrical Engineering in 2010 and 2013, respectively, from the Technical University of Braunschweig, Germany. In 2018, he was a visiting researcher at the University of California, Berkeley. He is a DAAD AInet fellow and was awarded with the Rhode & Schwarz Outstanding Dissertation price. His research interests include physics-enhanced learning, nonparametric models, and safe learning-based control.

02/27/2023

Guanya Shi, University of Washington / CMU (*)

Neural-Control Family: Safe Agile Deep-learning-based Robotic Control in Dynamic Environments

Abstract

Recent breathtaking advances in machine learning beckon to their applications in a wide range of autonomous systems. However, for safety-critical settings such as agile robotic control in hazardous environments, we must confront several key challenges before widespread deployment. Most importantly, the learning system must interact with the rest of the autonomous system (e.g., highly nonlinear and non-stationary dynamics) in a way that safeguards against catastrophic failures with formal guarantees. In addition, from both computational and statistical standpoints, the learning system must incorporate prior knowledge for efficiency and generalizability. In this talk, I will present progress toward establishing a unified framework that fundamentally connects learning and control. In particular, I will introduce a concrete example in such a unified framework called Neural-Control Family, a family of deep-learning-based nonlinear control methods with not only stability and robustness guarantees but also new capabilities in agile robotic control. For example, Neural-Swarm enables close-proximity flight of a drone swarm and Neural-Fly enables precise drone control in strong time-variant wind conditions.

Bio: Guanya Shi is an incoming (Fall 2023) Assistant Professor at the Robotics Institute and the School of Computer Science at Carnegie Mellon University (CMU). He is currently a postdoctoral scholar at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. He completed his Ph.D. in 2022 from Caltech and received a B.E. from Tsinghua University in 2017. He is broadly interested in the intersection of machine learning and control theory, spanning the entire spectrum from theory to real-world agile robotics. Guanya was the recipient of several awards, including the Simoudis Discovery Prize and the Ben P.C. Chou Doctoral Prize from Caltech, and the Rising Star in Data Science from the University of Chicago.

02/20/2023

Thomas Kipf, Google Research

Structured Scene Understanding: Objects, Dynamics, 3D

Abstract

The world around us — and our understanding of it — is rich in compositional structure: from atoms and their interactions to objects and agents in our environments. How can we learn scalable models of the physical world that capture this structure from raw, unstructured observations? In this talk, I will cover our team’s recent work on structured scene understanding: I will introduce an emergent class of slot-centric neural architectures that use a set of latent variables (“slots”) grounded in the physical scene. Slots are decoupled from the image grid and can learn to capture objects or more fine-grained scene components, model their dynamics, and learn 3D-consistent representations when a scene is observed from multiple viewpoints. I will briefly introduce the Slot Attention mechanism as a core representative for this class of models and cover recent extensions to video (SAVi, SAVi++), 3D (OSRT), and visual dynamics simulation (SlotFormer).

Bio: Thomas Kipf is a Senior Research Scientist at Google Brain in Amsterdam. His research focuses on developing machine learning models that can reason about the rich structure of the physical world. He obtained his PhD from the University of Amsterdam with a thesis on “Deep Learning with Graph-Structured Representations”, advised by Max Welling. He was recently elected as an ELLIS Scholar and received the ELLIS PhD Award.

02/13/2023

Brandon Amos, Meta AI

Learning with differentiable and amortized optimization

Abstract

Optimization has been a transformative modeling and decision-making paradigm over the past century that computationally encodes non-trivial reasoning operations. Developments in optimization foundations alongside domain experts have resulted in breakthroughs for 1) controlling robotic, autonomous, mechanical, and multi-agent systems, 2) making operational decisions based on future predictions, 3) efficiently transporting or matching resources, information, and measures, 4) allocating budgets and portfolios, 5) designing materials, molecules, and other structures, 6) solving inverse problems to infer underlying hidden costs, incentives, geometries, terrains, and other structures, and 7) learning and meta-learning the parameters of predictive and statistical models. These settings often analytically specify the relevant models of the world along with an explicit objective to optimize for. Once these are specified, computational optimization solvers are able to search over the space of possible solutions or configurations and return the best one. The magic of optimization stops when 1) the relevant models of the world are too difficult or impossible to specify, leading to inaccurate or incomplete representations of the true setting, and 2) solving the optimization problem is computationally challenging and takes too long to return a solution on today's hardware. Machine learning methods help overcome both of these by providing fast predictive models and powerful latent abstractions of the world. In this talk, I will cover two ways of tightly integrating optimization and machine learning methods: 1. *Differentiable optimization* characterizes how the solution to an optimization problem changes as the inputs change. In machine learning settings, differentiable optimization provides an implicit layer that integrates optimization-based domain knowledge into the model and enables unknown parts of the optimization problem to be learned. I will cover the foundations of learning these layers with implicit differentiation and highlight applications in robotics and control settings. 2. *Amortized optimization* rapidly predicts approximate solutions to optimization problems and is useful when repeatedly solving optimization problems. Traditional optimization methods typically solve every new problem instance from scratch, ignoring shared structures and information when solving a new instance. In contrast, a solver augmented with amortized optimization learns the shared structure present in the solution mappings and better-searches the domain. I will cover the foundations of amortized optimization and highlight new applications in control and optimal transport.

Bio: Brandon Amos is a Research Scientist in Meta AI’s Fundamental AI Research group in NYC. He holds a PhD in Computer Science from Carnegie Mellon University and was supported by the USA National Science Foundation Graduate Research Fellowship (NSF GRFP). Prior to joining Meta, he has worked at Adobe Research, DeepMind, and Intel Labs. His research interests are in machine learning and optimization with a recent focus on reinforcement learning, control, optimal transport, and geometry.

02/06/2023

Nhat Ho, UT-Austin

Hierarchical and Sequential Perspectives on Sliced Wasserstein Distance

Abstract

From its origins in work by Monge and Kantorovich, the Wasserstein distance has played an important role in the theory of mathematics. In the current era, the strong and increasing connection between optimization and machine learning has brought new applications of the Wasserstein distance to the fore. In these applications, the focus is on learning the probability distributions underlying the Wasserstein distance formulation. However, the Wasserstein distance has been known to suffer from expensive computation and the curse of dimensionality. It creates several hurdles of using the Wasserstein distance in statistical machine-learning applications. A well-known approach to overcome the statistical and computational limits of the Wasserstein distance is by projecting the probability distributions into the one-dimensional manifold, which refers to as the sliced Wasserstein distance. The sliced Wasserstein distance leverages the closed-form expression of the Wasserstein distance in one dimension; therefore, its computational complexity is only linear in the number of supports of the probability distributions while the statistical rate is parametric for learning probability distributions. Despite these advantages of the sliced Wasserstein distance, it still suffers from two fundamental challenges in large-scale high dimensional statistical machine learning settings: (1) High projection complexities, namely, the number of projections to approximate the value of the sliced Wasserstein distance is huge and scales with the dimension of the problem; (2) Uninformative projecting directions, namely, there are several redundant projections to approximate the value of the sliced Wasserstein distance In this talk, we propose two fundamental approaches to tackle the above challenges of the sliced Wasserstein distance. Our first approach hierarchically projects probability measures into low-dimensional spaces before projecting them into one-dimensional space. The hierarchical projections lead to an improvement in projection complexity and enhance the expressiveness of the projection of the sliced Wasserstein distance. Our second approach considers sequential sampling for projecting directions to allow the sharing of information on new projecting directions based on the previous directions. It increases the quality of projections in terms of highlighting the difference between the probability measures and leads to a smaller number of projections, which improves the computational complexity of the sliced Wasserstein distance.

Bio: Nhat Ho is currently an Assistant Professor of Data Science, Machine Learning, and Statistics at the University of Texas at Austin. He is a core member of the University of Texas Austin Machine Learning Laboratory and senior personnel of the Institute for Foundations of Machine Learning. A central theme of his research focuses on four important aspects of complex and large-scale models and data: (1) Interpretability, efficiency, and robustness of deep learning and complex machine learning models, including Transformer architectures, Deep Generative Models, Convolutional Neural Networks, etc.; (2) Scalability of Optimal Transport for machine learning and deep learning applications; (3) Stability and optimality of optimization and sampling algorithms for solving complex statistical machine learning models; (4) Heterogeneity of complex data, including mixture and hierarchical models, Bayesian nonparametrics.

01/30/2023

Animesh Garg, NVIDIA / UofT / Georgia Tech (*)

Building Blocks of Generalizable Autonomy: Duality of Discovery & Bias

Abstract

Generalization in embodied intelligence, such as in robotics, requires interactive learning across families of tasks is essential for discovering efficient representation and inference mechanisms. Concurrent systems need a lot of hand-holding to even learn a single cognitive concept or a dexterous skill, say “open a door”, let alone generalizing to new windows and cupboards! This is far from our vision of everyday robots! would require a broader concept of generalization and continual update of representations. This study of the science of embodied AI opens three key questions: (a) Representational biases & Causal inference for interactive decision-making, (b) Perceptual representations learned by and for interaction, and (c) Systems and abstractions for scalable learning.

Bio: Animesh Garg is a Stephen Fleming Early Career Professor at the School of Interactive Computing at Georgia Tech. He leads the People, AI, and Robotics (PAIR) research group. He is on the core faculty in the Robotics and Machine Learning programs. Animesh is also a Senior Researcher at Nvidia Research. Animesh earned a Ph.D. from UC Berkeley and was a postdoc at the Stanford AI Lab. He is on leave from the department of Computer Science at the University of Toronto and the CIFAR Chair position at the Vector Institute. His work aims to build Generalizable Autonomy which involves a confluence of representations and algorithms for reinforcement learning, control, and perception. He currently studies three aspects: learning structured inductive biases in sequential decision-making, using data-driven causal discovery, and transfer to real robots — all in the purview of embodied systems.

01/23/2023

Parinaz Naghizadeh, OSU

Social Bias Meets Data Bias: Biased Training Data and Fair AI

Abstract

Biases in existing training datasets used in algorithmic decision making, which can arise due to, e.g., prior labeling or feature measurement errors, raise ethical and economic concerns due to the resulting disparate treatment of different groups. In this talk, we will first investigate the robustness of a few existing (demographic) fairness criteria when the algorithm is trained on biased data. We show, both analytically and numerically, that some constraints can remain robust when facing certain forms of statistical bias in the training data. I will then briefly talk about an algorithm for sequential debiasing of such datasets through adaptive and bounded exploration. This is joint work with Yiqiao Liao, Yifan Yang, and Yang Liu.

Bio: Parinaz Naghizadeh is an assistant professor in the Integrated Systems Engineering and Electrical and Computer Engineering departments at The Ohio State University. Prior to joining OSU in 2019, she was a postdoctoral researcher at Purdue University and Princeton University. She received her PhD in electrical engineering from the University of Michigan in 2016. Her research interests are in network economics, game theory, algorithmic economics, and reinforcement learning. She is a recipient of the NSF CAREER award in 2022, a Rising Stars in EECS in 2017, and a Barbour Scholarship in 2014.

10/18/2022

Hua Wei, New Jersey Institute of Technology

Towards Actionable Decision-Making in the Real World

Abstract

This talk presents how to utilize data and advanced learning methods for actionable decision-making in the real world. This talk will use the decision-making in the city as a running example, firstly examining why today we have the opportunity for a potential breakthrough in actionable decision-making. Second, the talk presents our research results in reinforcement learning for traffic signal control which are published in KDD, AAAI, and CIKM conferences. Finally, I would like to discuss the open challenges in this research topic, its implications for actionable decision-making, and our preliminary efforts in addressing these challenges.

Bio: Hua Wei is an assistant professor in the Department of Informatics at the New Jersey Institute of Technology (NJIT). He obtained his Ph.D. from the Pennsylvania State University. His research interests include reinforcement learning, data mining, and urban computing. His papers have been published at high-impact venues (e.g., NeurIPS, KDD, AAAI, IJCAI, CIKM, ECML-PKDD, etc.). His research has been awarded the Best Applied Data Science Paper Award at ECML-PKDD 2020 and funded by NSF and the Department of Energy.

Video Link

10/18/2022

Ziv Goldfeld, Cornell University

Statistical and Computational Aspect of Sliced Optimal Transport

Abstract

As machine learning/inference tasks boil down to comparing or transforming complicated probability distributions, optimal transport (OT) theory---which provides a potent framework for doing so---has emerged as a tool of choice for design and analysis. Its adoption was driven by an array of favorable properties, including robustness to support mismatch, a powerful duality theory, and the Wasserstein metric it defines on the space of probability measures, which endows it with a rich geometry. Alas, statistical OT is bottlenecked by the curse of dimensionality, whereby quantitative results either deteriorate exponentially with dimension or are largely unavailable (e.g., limit theorems, resampling, efficiency). In turn, resulting performance bounds for OT-based learning methods are often vacuous or, worse yet, missing. Slicing is a modern regularization technique by which one computes the average/maximized OT distance between different low-dimensional projections of the high-dimensional distributions. This framework inherits many structural properties of classical OT but alleviates the empirical curse of dimensionality. This talk will present recent advancements in the statistical and computational analysis of sliced OT methods. We will cover fast empirical convergence rates, high-dimensional limit distribution theorems, as well as formal guarantees for computational methods such as Monte Carlo integration (for average-slicing) and projected subgradient methods (for max-slicing). Applications to implicit generative modeling will be discussed and serve to motivate the statistical exploration.

Bio: Ziv Goldfeld is an assistant professor in the School of Electrical and Computer Engineering, and a graduate field member in Computer Science, Statistics, Data Science, and the Center of Applied Mathematics, at Cornell University. Before joining Cornell, he was a postdoctoral research fellow in LIDS at MIT. Ziv graduated with a B.Sc., M.Sc., and Ph.D. (all summa cum laude) in Electrical and Computer Engineering from Ben Gurion University, Israel. Ziv’s research interests include optimal transport theory, statistical learning theory, information theory, and mathematical statistics. He seeks to understand the theoretical foundations of modern inference and information processing systems by formulating and solving mathematical models. Honors include the NSF CAREER Award, the IBM University Award, and the Rothschild Postdoctoral Fellowship.

Video Link

09/20/2022

Baharan Mirzasoleiman, UCLA

Coresets for Efficient and Robust Learning from Massive Datasets

Abstract

Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes, noisy labels, and malicious data points. In such cases, training on the entire data does not result in a high-quality model. In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the most informative subsets for learning from massive datasets. Training on such subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels and data poisoning attacks. I will discuss how we can develop effective and theoretically rigorous techniques that provide strong guarantees for the learned models’ quality and robustness against noisy labels.

Bio: Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at University of California Los Angeles. Baharan’s research focuses on developing new methods that enable efficient and robust learning from massive datasets. She received her PhD from ETH Zurich, and was a Postdoc at Stanford University. She was awarded an ETH medal for Outstanding Doctoral Dissertation, and a Google Anita Borg Memorial Scholarship. She was also selected as a Rising Star in EECS from MIT, and received an NSF Career Award.

Video Link

08/22/2022

Chen Feng, NYU

3D Deep Learning for Soft Robotics and Self-Driving

Abstract

Deep learning on 3D data like point clouds offers many new possibilities for robotics and self-driving. It leads to efficient tools to represent complex objects and scenes in the 3D world which robots and autonomous vehicles need to interact with. In this talk, I will discuss my group's work on both object-level and scene-level 3D deep learning. At the object level, I will explain FoldingNet (CVPR'18), a 3D point cloud auto-encoder that essentially resembles the paper-folding operations in its lightweight decoder with better shape reconstruction performance. This new decoder can address a challenging robotics task: soft robot proprioception. At the scene level, I will explain DiscoNet (NeurIPS'21), an efficient collaborative perception method using a dynamic directed graph with matrix-valued edge weights for an ego-vehicle to adaptively retrieve the most important complementary information from its neighboring vehicles. This could improve LiDAR-based perception's performance and robustness in self-driving against challenges such as data sparsity and occlusions. At last, I will briefly introduce our new public dataset V2X-Sim (RA-L'22), to facilitate research in 3D (and 2D) deep learning for collaborative perception.

Bio: Dr. Chen Feng is an assistant professor at NYU, appointed across departments including civil and mechanical engineering and computer science. His lab AI4CE (pronounced as A-I-force) aims to advance robot vision and machine learning through multidisciplinary use-inspired research that originates from engineering domains. Before NYU, Chen was a research scientist in the computer vision group at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA, focusing on localization, mapping, and deep learning for self-driving cars and robotics. Chen holds a Bachelor's degree in geospatial engineering from Wuhan University in China, and a master’s degree in electrical engineering and a Ph.D. in civil engineering, both from the University of Michigan at Ann Arbor. While publishing in and reviewing for prestigious AI/Robotics venues like CVPR/ICCV/ICRA/IROS, Chen also serves as an associate editor for IEEE Robotics and Automation Letters (RA-L). More information on his research can be found at https://ai4ce.github.io/.

Video Link

08/01/2022

Daniel Moyer, Vanderbilt University

Invariant Representations

Abstract

The removal of unwanted information is a surprisingly common task. Removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation all require the selective removal of information. In this talk, I will describe a method for constructing such representations by minimizing mutual information in a variational setting. This path also provides insight into adversarial methods and their training schema. We will then discuss applications and implications in multi-site MRI, style transfer, and fair representation.

Bio: Daniel Moyer will join the Computer Science Department at Vanderbilt University for the Fall 2022 semester as an Assistant Professor. Previously, he was a post-doc in CSAIL at MIT, working with Prof. Polina Golland on fetal MRI. He received his doctorate in 2019 from the University of Southern California under Paul Thompson and Greg Ver Steeg, where he worked on representation learning problems in diffusion MRI and neuroimaging.

Video Link

07/19/2022

Kayhan Batmanghelich, University of Pittsburgh

Bridging between AI Models & Medical Insights: Learning, Inference, & Model Explanation Applications

Abstract

The healthcare industry is arriving at a new era where the medical communities increasingly employ computational medicine and machine learning. Despite significant progress in the modern machine learning literature, adopting the new approaches has been slow in the biomedical and clinical research communities due to the lack of explainability and limited data. Such challenges present new opportunities to develop novel methods that address AI's unique challenges in medicine. This talk has three parts. In the first part of the talk, I show examples of model explainability (XAI) tailored toward AI in Radiology applications. More specifically, I integrate ideas from causal inference for XAI (e.g., counterfactual, mediation analysis). The second part presents examples of incorporating medical insight for self-supervised learning of imaging phenotype. Finally, I address the issue of partial missingness (a common problem using clinical data) in imaging genetics for statistical independence tests.

Bio: Kayhan Batmanghelich is an Assistant Professor of the Department of Biomedical Informatics and Intelligent Systems Program with secondary appointments in the Electrical and Computer Engineering and the Computer Science Department at the University of Pittsburgh. He received his Ph.D. from the University of Pennsylvania (UPenn) under the supervision of Prof. Ben Taskar and Prof. Christos Davatzikos. He spent three years as a postdoc in Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. Polina Golland. His research is at the intersection of medical vision, machine learning, and bioinformatics. His group develops machine learning methods that address the interesting challenges of AI in medicine, such as explainability, learning with limited and weak data, and integrating medical image data with other biomedical data modalities. His research is supported by awards from NIH and NSF and industry-sponsored projects.

Video Link

06/27/2022

Nick Cheney, University of Vermont

A Case for an Embodied Intelligence Perspective on Neural Architecture Search

Abstract

Neural Architecture Search (NAS) aims to find the optimal structure of deep neural network. Various approaches to the design of network architectures have been proposed in recent years. In this talk, I'll discuss how we might draw inspiration from the design of shape and form in biological systems to find complex and adaptable neural network designs. Specifically, I'll conjecture about how recent methods and principles from embodied cognition and evolutionary robotics may be translated into an embodied perspective on NAS.

Bio: Nick Cheney is an Assistant Professor of Computer Science at the University of Vermont, where he directs the UVM Neurobotics Lab and is a core member of the Complex Systems and Data Science program. Prior to Vermont, Nick received a Ph.D. in Computational Biology from Cornell, co-advised by Hod Lipson and Steve Strogatz, and was a postdoctoral researcher at the University of Wyoming working with Jeff Clune (now at OpenAI and the University of British Columbia). He has also served as a visiting researcher at the Santa Fe Institute, NASA Ames, and Columbia University. Nick's research aims to lower the barrier to machine learning by producing more robust, scalable, and self-configurable neural network algorithms and architectures -- with a specific focus on meta-learning methods.

Video Link

06/07/2022

Suraj Srinivas, Harvard University

Pitfalls of Saliency Map Interpretation in Deep Neural Networks

Abstract

A popular method of interpreting neural networks is to use saliency map representations, which assign importance scores to each input feature of the model. In this talk, I will discuss two of our works that expose pitfalls in these methods. First, we will discuss how existing saliency maps cannot satisfy two desirable properties simultaneously and propose the “full-gradient representation” which avoids these problems. Based on this representation, we propose an approximate saliency method called FullGrad which we find explains model behavior better than competing methods in the literature. Second, we find that a popular saliency map method, the input-gradients, can be arbitrarily structured due to the shift-invariance of SoftMax. We investigate why standard neural network models have input-gradients with interpretable structure even when this is unnecessary, and we find that standard models have an implicit generative modeling component, which is responsible for this behavior. Overall, our works show that interpreting black-box models using off-the-shelf interpretability methods can be risky and must be used with caution.

Bio: Suraj Srinivas is a postdoctoral research fellow at Harvard University where he works with Prof. Hima Lakkaraju on the foundations of interpretable deep learning. He completed his Ph.D. at Idiap Research Institute & EPFL in Switzerland, advised by Prof. François Fleuret. His Ph.D. thesis on the pitfalls of gradient-based explanation methods in deep learning received the EPFL thesis distinction award in electrical engineering. His research interests are interpretability, robustness, and compression of deep neural networks.

Video Link

05/25/2022

Hossein Mobahi, Google Research

Sharpness-Aware Minimization (SAM): Current Method and Future Directions

Abstract

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a new and effective procedure for instead simultaneously minimizing loss value and loss sharpness. Our procedure, Sharpness- Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the art procedures that specifically target learning with noisy labels. Finally, we will discuss possible directions for further research around SAM.

Bio: Hossein Mobahi is a senior research scientist at Google Research. His current interests revolve around the interplay between optimization and generalization in deep neural networks. Prior to joining Google in 2016, he was a postdoctoral researcher at CSAIL of MIT. He obtained his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC).

Video Link

05/25/2022

Xiaorui Liu, North Carolina State University

Communication-Efficient Distributed Machine Learning

Abstract

The success of modern AI systems relies on large-scale machine learning on big data. Distributed machine learning systems provide the computational infrastructure for such success by utilizing the parallel computation power of massive computation devices. However, the scalability and efficiency of these systems are greatly limited by the high communication cost between the devices. In this talk, I will discuss how to design communication-efficient distributed ML algorithms. Specifically, I will introduce novel decentralized algorithms with communication compression that reduce 95% of the communication bits without sacrificing the convergence complexities. These algorithms fundamentally improve the efficiency of large-scale ML both theoretically and numerically.

Bio: Xiaorui Liu is an incoming assistant professor in the Computer Science Department at North Carolina State University starting from 2022 Fall. He will get his Ph.D. degree from Michigan State University advised by Prof. Jiliang Tang. His research interests include distributed and trustworthy machine learning, with a focus on big data and graph data. He was awarded the Best Paper Honorable Mention Award at ICHI 2019, MSU Engineering Distinguished Fellowship, and Cloud Computing Fellowship. He organized and co-presented five tutorials in KDD 2021, IJCAI 2021, ICAPS 2021, and WWW 2022, and he has published innovative works in top-tier conferences such as NeurIPS, ICML, ICLR, KDD, AISTATS, and SIGIR.

Video Link

05/10/2022

Dongkuan Xu, North Carolina State University

Resource-efficient Deep Learning: Democratizing AI at Scale

Abstract

The phenomenal success of deep learning in the past decade has been mostly driven by the construction of increasingly large deep neural network models. These models usually impose an ideal assumption that there are sufficient resources, including large-scale parameters, sufficient data, and massive computation, for the optimization. However, this assumption usually fails in real-world scenarios. For example, computer memory may be limited as in edge devices, large-scale data are difficult to obtain due to expensive costs and privacy constraints, and computational power is constrained as in most university labs. As a result, these resource discrepancy issues have hindered the democratization of deep learning techniques in many AI applications, and the development of efficient deep learning methods that can adapt to different resource constraints is of great importance. In this talk, I will present my recent research contributions centered around resource-efficient deep learning to free AI from the parameterdata-computation hungry beast. First, I will introduce my contribution on neural network pruning under the pretrain-then-finetune paradigm, which improves the parameter efficiency of large-scale language models in the inference phase, resulting in pruned models with an order-of-magnitude fewer parameters than the original model while achieving the same or better prediction accuracy. Then, I will talk about my task-agnostic neural architecture search framework to reduce the computational cost in the training phase for finding the best-pruned models, which is complementary to improving the parameter efficiency in the inference phase. Finally, I will conclude my presentation with a brief overview of my ongoing and future work as part of a broader research agenda of new and related problems and potential collaborations in the next few years.

Bio: Dongkuan (DK) Xu is an incoming Assistant Professor in the CS Department at NC State. DK will get his Ph.D. at Penn State in June 2022 under the supervision of Dr. Xiang Zhang. His research interest is resource-efficient deep learning for AI at scale. DK has published more than 25 papers in top conferences and journals, including NeurIPS, AAAI, ACL, NAACL, and IJCAI. He has served as a PC member for over 28 major conferences and 14 journals. DK also has extensive research experience in the industry. He has interned at Microsoft Research Redmond, Moffett AI, and NEC Labs America, and holds 8 US patents/applications.

Video Link

02/24/2022

Soheil Kolouri, Vanderbilt University

Brain-Inspired Lifelong Learning Machines

Abstract

The next wave of AI demands a new type of machine learning framework that can continually learn and adapt to the stream of nonstationary multimodal information. This challenge is referred to as continual, lifelong, or incremental learning in the ML community. Since humans and primates are our best examples of lifelong learners, we believe that a better understanding of the biological underpinnings that support continual learning could be instrumental in advancing continual machine learning. In this talk, we first characterize continual learning as a multi-faceted problem and enumerate some of the known biological mechanisms in the brain that contribute to these characteristics. We then draw connections between existing AI/ML solutions for continual learning and known biological mechanisms and lay a road map for next-generation lifelong machine learners. Finally, we present some of our recent work toward advancing the field of continual learning with a focus on meta-plasticity and neuromodulation.

Bio: Soheil Kolouri is an Assistant Professor of Computer Science at Vanderbilt University, Nashville, TN, and the director of Machine Intelligence and Neural Technologies (MINT) lab. His research interests include continual learning, bio-inspired machine learning, geometric deep learning, and computational optimal transport. Before joining Vanderbilt University, he was a research scientist and principal investigator at HRL Laboratories, Malibu, CA, where he was the PI and the Co-PI on multiple DARPA programs involving next-generation machine learning. Soheil obtained his Ph.D. in Biomedical Engineering from Carnegie Mellon University where he received the Bertucci Fellowship Award for outstanding graduate students from the College of Engineering in 2014 and the Outstanding Dissertation Award from the Biomedical Engineering Department in 2015.

Video Link

02/17/2022

Matthias Fey, TU Dortmund University

Auto-Scaling GNNs

Abstract

In this talk, we will take a theoretical and practical look at scaling Neural Networks (GNNs) up to massive graphs, based on our GNNAutoScale (GAS) framework. GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption with respect to input node size without dropping any data. While existing solutions weaken the expressive power of message passing due to sub-sampling of edges or non-trainable propagations, our approach is provably able to maintain the expressive power of the original GNN. We further discuss challenges regarding its implementation within our PyTorch Geometric (PyG) library and verify its practical benefits on a variety of large graph benchmark datasets.

Bio: Matthias Fey is a fourth-year Ph.D. student at the computer graphics lab at the TU Dortmund University, Germany, and a co-founder of kumo.ai which aims to make state-of-the-art GNN solutions readily available to large-scale data warehouses. His main area of research lies in the development of new deep learning methods that can be directly applied to unstructured data such as graphs, point clouds, and manifolds. Furthermore, he is the creator of the PyTorch Geometric (PyG) library, which aims to bundle many of the proposed methods in this area to make research more accessible, comparable, and reproducible, and is a core member of the Open Graph Benchmark (OGB) team. Matthias studied Computer Science at the TU Dortmund where he received his B.Sc. in 2013 and his Master’s degree in 2017.

Video Link

02/10/2022

Philipp Petersen, University of Vienna

Optimal Representation and Learning of Classifier Functions

Abstract

Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime. We will interpret the learning of classifiers as finding piecewise constant functions from labeled samples. We then precisely link the hardness of the learning problem to the complexity of the regions. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning. In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.

Bio: Philipp Petersen is a tenure-track assistant professor for machine learning at the mathematical institute of the University of Vienna. Before that, he completed a post-doc position at the University of Oxford and did his PhD at the Technical University of Berlin. His research focuses on the interplay of deep neural networks and numerical analysis. Particular foci are the expressivity of various architectures of deep neural networks, structural challenges for the optimization or training of deep neural networks, and the applicability of deep learning in numerical algorithms to solve partial differential equations or inverse problems.

Video Link

02/03/2022

Lingfei Wu, JD.COM

Graph Neural Networks: Foundations, Frontiers, and Applications

Abstract

The field of graph neural networks (GNNs) has seen rapid and incredible strides over recent years. Graph neural networks, also known as deep learning on graphs, graph representation learning, or geometric deep learning, have become one of the fastest-growing research topics in machine learning, especially deep learning. This wave of research at the intersection of graph theory and deep learning has also influenced other fields of science, including recommendation systems, natural language processing, program synthesis, software mining, cybersecurity, and intelligent transportation. However, as the field rapidly grows, it has been extremely challenging to gain a global perspective of the developments of GNNs. Therefore, we feel the urgency to bridge the above gap and have a comprehensive tutorial on this fastgrowing yet challenging topic. In this talk, we will talk about our recent book titled Graph Neural Networks: Foundation, Frontiers and Applications , one of the most comprehensive books for researchers and practitioners for reading and studying in GNNs. It covers a broad range of topics in graph neural networks, by reviewing and introducing the fundamental concepts and algorithms, new research frontiers, and broad and emerging applications of GNNs.

Bio: Dr. Lingfei Wu is a Principal Scientist at JD.COM Silicon Valley Research Center, leading a team of 30+ ML/NLP scientists and software engineers to build intelligent e-commerce personalization systems. He earned his Ph.D. degree in computer science from the College of William and Mary in 2016. Previously, he was a research staff member at IBM Thomas J. Watson Research Center and led a 10+ research scientist team for developing novel Graph Neural Networks methods and systems, which leads to the #1 AI Challenge Project in IBM Research and multiple IBM Awards including three-time Outstanding Technical Achievement. He was the recipients of the Best Paper Award and Best Student Paper Award of several conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20, and KDD workshop on DLG’19. His research has been featured in numerous media outlets, including NatureNews, YahooNews, Venturebeat, TechTalks, SyncedReview, Leiphone, QbitAI, MIT News, IBM Research News, and SIAM News.

Video Link

01/27/2022

Hamed Pirsiavash, UC Davis

Self-Supervised Learning for Visual Recognition

Abstract

We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, selfsupervised feature learning methods exploiting unlabeled data can be more scalable and flexible. I will present some of our recent efforts in this direction. More specifically, I will talk about our recent work on using similarity between a random set of images to learn better visual representations and to compress selfsupervised features from deeper models to smaller ones.

Bio: Hamed Pirsiavash is an associate professor at the University of California, Davis. Prior to this, he was an associate professor at the University of Maryland Baltimore County and a postdoctoral research associate at MIT. He obtained his Ph.D. at the University of California, Irvine. He does research in the intersection of computer vision and machine learning. More specifically, he is interested in selfsupervised representation learning and the adversarial robustness of deep models.

Video Link

01/20/2022

Evangelos Papalexakis, UC Riverside

Tensor Decompositions for Multi-Aspect Graph Analytics and Beyond

Abstract

Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. In this talk, we will demonstrate the effectiveness of tensor decompositions in modeling and mining multi-aspect graphs. Finally, we conclude with very recent results that demonstrate the effectiveness of tensor methods in alleviating state-of-the-art adversarial attacks in Deep Neural Networks.

Bio: Evangelos (Vagelis) Papalexakis is an Associate Professor of the CSE Dept. at the University of California, Riverside. He received his Ph.D. degree at the School of Computer Science at Carnegie Mellon University (CMU). Prior to CMU, he obtained his Diploma and MSc in Electronic & Computer Engineering at the Technical University of Crete, in Greece. Broadly, his research interests span the fields of Data Science, Machine Learning, Artificial Intelligence, and Signal Processing. His research involves designing interpretable models and scalable algorithms for extracting knowledge from large multi-aspect datasets, with specific emphasis on tensor factorization models, and applying those algorithms to a variety of real-world problems, including detection of misinformation on the Web, explainable AI, and gravitational wave detection. His work has appeared in top-tier conferences and journals, and has attracted a number of distinctions, including the 2017 SIGKDD Dissertation Award (runner-up), several paper awards, the NSF CAREER award, and the 2021 IEEE DSAA Next Generation Data Scientist Award.

Video Link

01/13/2022

Zsolt Kira, Georgia Tech

Handling Distribution Shift in Visual Learning

Abstract

While deep learning has achieved remarkable computer vision successes, fundamentally both the theory and practice for these successes have relied on vanilla supervised learning where the training and testing datasets both are sampled from the same distribution. In reality, there is likely to be a significant distribution shift once models are deployed, including noise/weather/illumination/modality changes (covariate shift), new categories (semantic shift), or different label distributions. In this talk, I will present our recent work focusing on the fundamental handling of several of these shifts. For label distribution shifts, we propose a posterior-recalibration of classifiers that can be applied without re-training to handle imbalanced datasets. For covariate and semantic shift, we propose a geometric decoupling of classifiers into feature norms and angles, showing that it can be used to learn more sensitive feature spaces for better calibration and out-of-distribution detection. We demonstrate state-of-art results across multiple benchmark datasets and metrics. In the end, I will present connections to a wider set of problems including continual/lifelong learning, open-set discovery, and semi-supervised learning.

Bio: Zsolt Kira is an Assistant Professor at the Georgia Institute of Technology and Associate Director of Georgia Tech’s Machine Learning Center. His work lies at the intersection of machine learning and artificial intelligence for sensor processing, perception, and robotics. Current projects and interests relate to moving beyond the current limitations of supervised machine learning to tackle un/self-/semi-supervised methods, out-of-distribution detection, model calibration, learning under imbalance, continual/lifelong learning, and adaptation. Prof. Kira has grown a portfolio of projects funded by NSF, ONR, DARPA, and the IC community, has over 45 publications in top venues, and has received several best paper/student paper awards.

Video Link

01/06/2022

Umut Şimşekli, INRIA

Towards Building a Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

Abstract

In this talk, I will focus on the 'tail behavior' of SGD in deep learning. I will first empirically illustrate that heavy tails arise in the gradient noise (i.e., the difference between the stochastic gradient and the true gradient). Accordingly, I will propose to model the gradient noise as a heavy-tailed α-stable random vector and accordingly propose to analyze SGD as a discretization of a stochastic differential equation (SDE) driven by a stable process. As opposed to classical SDEs that are driven by a Brownian motion, SDEs driven by stable processes can incur ‘jumps’, which force the SDE (and its discretization) transition from 'narrow minima' to 'wider minima', as proven by existing metastability theory and the extensions that we proved recently. These results open up a different perspective and shed more light on the view that SGD 'prefers' wide minima. In the second part of the talk, I will focus on the generalization properties of such heavy-tailed SDEs and show that the generalization error can be controlled by the Hausdorff dimension of the trajectories of the SDE, which is closely linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of capacity metric. Finally, if time permits, I will talk about the 'originating cause' of such heavy-tailed behavior and present theoretical results which show that heavy-tails can even emerge in very sterile settings such as linear regression with i.i.d Gaussian data.

Bio: Umut Şimşekli is a tenured Research Faculty at Inria Paris and Ecole Normale Superieure de Paris. He received his Ph.D. degree in 2015 from Bogaziçi University, İstanbul. During 2016-2020, he was affiliated with the Signals, Statistics, and Machine Learning Group at Telecom Paris as an associate professor and he visited the University of Oxford, Department of Statistics during the 2019-2020 academic year. He is a laureate of the European Research Council (ERC) Starting Grant 2021 and his current research interests are in the theory of deep learning.

Video Link

/// Older talks can be found in Archives.