Saining Xie

Assistant Professor of Computer Science

Courant Institute of Mathematical Sciences

New York University

Email: saining.xie [at] nyu [dot] edu

Google Scholar / DBLP / Twitter

About Me

Hello! I am an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. I am also affiliated with NYU Center for Data Science. Before that I was a research scientist at Facebook AI Research (FAIR), Menlo Park. I received my Ph.D. and M.S. degrees from CSE Department at UC San Diego, advised by Zhuowen Tu. During my PhD study, I also interned at NEC Labs, Adobe, Facebook, Google, DeepMind. Prior to that, I obtained my bachelor degree from Shanghai Jiao Tong University. My primary areas of interest in research are computer vision and machine learning.

Most of what humans know, and nearly all of what animals know, is acquired through sensory experiences, with vision playing a particularly crucial role. My research focuses on advancing robust visual intelligence - the creation of scalable and reliable intelligent systems that can interpret visual events, answer questions about them on demand, and develop a common sense understanding of the world.

Prospective PhD Students

I'm always looking for highly motivated PhD candidates to join my group. If you're interested in collaborating with me for your doctoral studies, please submit your application to the Courant CS Ph.D. program and be sure to include my name in your application materials.

Internship opportunity

From time to time, our group at NYU offers visiting researcher positions for individuals from different backgrounds. If you possess a strong passion for applying representation learning to tackle complex challenges in machine learning, computer vision (among many other exciting domains!), please feel free to send me an email with your CV.

Research Group

PhD Students

Ellis Brown (w/ Rob Fergus) Fred Lu (w/ Andrew Gordon Wilson) Xichen Pan Peter Tong (w/ Yann LeCun)
Anjali Gupta Willis Ma Oscar Michel Shusheng Yang

Postdoc/Faculty Fellows

Jihan Yang

Alumni

I will update this soon. (I promise)
Alumnus Name 1 (Current Position/Company)
Alumnus Name 2 (Current Position/Company)
Alumnus Name 3 (Current Position/Company)

Teaching

Spring 2025
CSCI-GA.3033: Learning with Large Language and Vision Models
Fall 2024
CSCI-GA.2271: Computer Vision
Spring 2024
CSCI-GA.2565: Machine Learning
Fall 2023
CSCI-GA.3033: Learning with Large Language and Vision Models

Talks

(I will try to upload videos/slides here when possible.)

Invited Talk @ Urban AI

Grounding Virtual Intelligence in Real Life

Invited Talk @ MIDAS mini-symposium: Generative AI: From Theory to Scientific Applications

Scalable Visual Intelligence in the Era of Generative AI

Invited Talk @ TTIC Summer Workshop on Multimodal Artificial Intelligence

Language Models Need Better Visual Grounding For Meaning And Understanding

Invited Talk @ Generative Models for Computer Vision workshop at CVPR 2024

Diffusion Transformers and Beyond (and why you should stop worrying and love DiT)

Invited Talk @ T4V: Transformers for Vision workshop at CVPR 2023

ConvNet vs. Transformer ROUND 2: Self-Supervised Learning and Diffusion Models

Invited Talk @ T4V: Transformers for Vision workshop at CVPR 2022

Everything is All You Need: Vision Architectures for the 2020s

Invited Talk @ VisDA-2021 NeurIPS Workshop: Universal Visual Adaptation Challenge

Model Robustness: Corruptions, Augmentations, and Representations

Invited Talk @ 3rd ScanNet Indoor Scene Understanding Challenge, CVPR'21

Transfer3D: Learning Transferrable Representations of 3D Scenes

Invited Talk @ Tutorial on Learning Representations via Graph-structured Networks, CVPR'20

Graph Structure of Neural Networks

Organizing/Invited Talk @ Tutorials on Visual Recognition for Images, Video, and 3D

ICCV'19, CVPR'20, ECCV'20

Invited Talk @ AI2, 2018

Deep Representation Learning with Induced Structural Priors

Selected Publications

(* indicate equal contribution)
For full publication list, please refer to my

Google Scholar page.
(Actually, the best way to stay updated on my latest research is to check there, as I may not update this website regularly.)

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

NeurIPS 2024

Shengbang Tong*, Ellis Brown*, Penghao Wu*, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

[Blog] [Paper] [Code]

Oral Presentation

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

NeurIPS 2024

Yuexiang Zhai*, Hao Bai†, Zipeng Lin†, Jiayi Pan†, Shengbang Tong†, Yifei Zhou†, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

[Blog] [Paper] [Code]

On Scaling Up 3D Gaussian Splatting Training

ArXiv 2024

Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie

[Blog] [Paper] [Code]

V-IRL: Grounding virtual intelligence in real life

ECCV 2024

Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie

[Blog] [Paper] [Code] [Video]

SIT: Exploring flow and diffusion-based generative models with scalable interpolant transformers

ECCV 2024

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, Saining Xie

[Blog] [Paper] [Code]

Deconstructing denoising diffusion models for self-supervised learning

ArXiv 2024

Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He

[Paper]

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

CVPR 2024

Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie

[Blog] [Paper] [Code]

Oral Presentation

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs

CVPR 2024

Penghao Wu, Saining Xie

[Blog] [Paper] [Code]

Image Sculpting: Precise Object Editing with 3D Geometry Control

CVPR 2024

Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie

[Blog] [Paper] [Video] [Code]

Demystifying CLIP Data

ICLR 2023

Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

[Paper] [Code]

Spotlight Presentation

Scalable Diffusion Models with Transformers

ICCV 2023

William Peebles, Saining Xie

[Paper] [Project] [Code]

Oral Presentation

CiT: Curation in Training for Effective Vision-Language Data

ICCV 2023

Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

[Paper] [Code]

Going Denser with Open-Vocabulary Part Segmentation

ICCV 2023

Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan

[Paper] [Code]

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

CVPR 2023

Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

[Paper] [Code]

A ConvNet for the 2020s

CVPR 2022

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

[Paper] [Code]

SLIP: Self-supervision meets Language-Image Pre-training

ECCV 2022

Norman Mu, Alexander Kirillov, David Wagner, Saining Xie

[Paper] [Code]

Masked Feature Prediction for Self-Supervised Visual Pre-Training

CVPR 2022

Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, Christoph Feichtenhofer

[Paper]

Benchmarking Detection Transfer Learning with Vision Transformers

arXiv 2021

Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollár, Kaiming He, Ross Girshick

[Paper]

Masked Autoencoders are Scalable Vision Learners

CVPR 2022

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

[Paper] [Code]

Oral Presentation

Pri3D: Can 3D Priors Help 2D Representation Learning?

ICCV 2021

Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner

[Paper] [Video] [Code]

An Empirical Study of Training Self-supervised Vision Transformers

ICCV 2021

Xinlei Chen*, Saining Xie*, Kaiming He,

[Paper] [Code]

Oral Presentation

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness

NeurIPS 2021

Eric Mintun, Alexander Kirillov, Saining Xie

[Paper] [Code]

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

CVPR 2021

Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

[Data-Efficient ScanNet Challenge] [Project] [Paper] [Code]

Oral Presentation

Sample-Efficient Neural Architecture Search by Learning Action Space

TPAMI 2021

Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

[Paper] [Code]

2020

Graph Structure of Neural Networks

ICML 2020

Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie

[Paper] [Code]

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

ECCV 2020

Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany

[Paper] [Code]

Spotlight Presentation

Are Labels Necessary for Neural Architecture Search?

ECCV 2020

Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

[Paper] [Code] [Talk]

Spotlight Presentation

Momentum Contrast for Unsupervised Visual Representation Learning

CVPR 2020

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

[Paper] [Code] [Slides]

Best Paper Nomination (top 30)

Decoupling Representation and Classifier for Long-Tailed Recognition

ICLR 2020

Bingyi Kang, Saining Xie Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

[Paper] [Code] [Slides]

2019

On Network Design Spaces for Visual Recognition

ICCV 2019

Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

[Paper] [Code]

Exploring Randomly Wired Neural Networks for Image Recognition

ICCV 2019

Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He

[Paper] [Talk]

Oral Presentation

Deep Representation Learning with Induced Structural Priors

Ph.D. Thesis, UC San Diego 2018

Saining Xie

[Thesis] [Talk @AI2]

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

ECCV 2018

Saining Xie, Chen Sun, Jonathan Huang, Kevin Murphy

[Paper] [Code]

Attentional ShapeContextNet for Point Cloud Recognition

CVPR 2018

Saining Xie*, Sainan Liu*, Zeyu Chen, Zhuowen Tu

[Paper] [Code]

Aggregated Residual Transformations for Deep Neural Networks

CVPR 2017

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

[Paper] [Code]

Top-down Learning for Structured Labeling with Convolutional Pseudoprior

ECCV 2016

Saining Xie*, Xun Huang*, Zhuowen Tu

[Paper]

Holistically-Nested Edge Detection

ICCV 2015

Saining Xie, Zhuowen Tu

[Paper] [Code] [IJCV version]

Marr Prize Honorable Mention

Deeply-Supervised Nets

AISTATS 2015

Chen-Yu Lee*, Saining Xie*, Patrick Gallagher*, Zhengyou Zhang, Zhuowen Tu

[Paper] [Code]

Oral Presentation at the NeurIPS'14 Deep Learning Workshop

Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification

CVPR 2015

Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin

[Paper]