About Me

Hello! I am an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. I am also affiliated with NYU Center for Data Science. Before that I was a research scientist at Facebook AI Research (FAIR), Menlo Park. I received my Ph.D. and M.S. degrees from CSE Department at UC San Diego, advised by Zhuowen Tu. During my PhD study, I also interned at NEC Labs, Adobe, Facebook, Google, DeepMind. Prior to that, I obtained my bachelor degree from Shanghai Jiao Tong University. My primary areas of interest in research are computer vision and machine learning.

Most of what humans know, and nearly all of what animals know, is acquired through sensory experiences, with vision playing a particularly crucial role. My research focuses on advancing robust visual intelligence - the creation of scalable and reliable intelligent systems that can interpret visual events, answer questions about them on demand, and develop a common sense understanding of the world.

Prospective PhD Students
I'm always looking for highly motivated PhD candidates to join my group. If you're interested in collaborating with me for your doctoral studies, please submit your application to the Courant CS Ph.D. program and be sure to include my name in your application materials.
Internship opportunity
From time to time, our group at NYU offers visiting researcher positions for individuals from different backgrounds. If you possess a strong passion for applying representation learning to tackle complex challenges in machine learning, computer vision (among many other exciting domains!), please feel free to send me an email with your CV.

Research Group


PhD Students
Postdoc/Faculty Fellows

Teaching

Fall 2024
CSCI-GA.2271: Computer Vision
Spring 2024
CSCI-GA.2565: Machine Learning
Fall 2023
CSCI-GA.3033: Learning with Large Language and Vision Models

Selected Publications

(* indicate equal contribution)
For full publication list, please refer to my Google Scholar Google Scholar page.
(Actually, the best way to stay updated on my latest research is to check there, as I may not update this website regularly.)
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
NeurIPS 2024
Shengbang Tong*, Ellis Brown*, Penghao Wu*, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie
Oral Presentation
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
NeurIPS 2024
Yuexiang Zhai*, Hao Bai†, Zipeng Lin†, Jiayi Pan†, Shengbang Tong†, Yifei Zhou†, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine
On Scaling Up 3D Gaussian Splatting Training
ArXiv 2024
Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie
V-IRL: Grounding virtual intelligence in real life
ECCV 2024
Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie
SIT: Exploring flow and diffusion-based generative models with scalable interpolant transformers
ECCV 2024
Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, Saining Xie
Deconstructing denoising diffusion models for self-supervised learning
ArXiv 2024
Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie
Oral Presentation
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
CVPR 2024
Penghao Wu, Saining Xie
Image Sculpting: Precise Object Editing with 3D Geometry Control
CVPR 2024
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie
Demystifying CLIP Data
ICLR 2023
Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer
Spotlight Presentation
Scalable Diffusion Models with Transformers
ICCV 2023
William Peebles, Saining Xie
Oral Presentation
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023
Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer
Going Denser with Open-Vocabulary Part Segmentation
ICCV 2023
Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
CVPR 2023
A ConvNet for the 2020s
CVPR 2022
SLIP: Self-supervision meets Language-Image Pre-training
ECCV 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
CVPR 2022
Benchmarking Detection Transfer Learning with Vision Transformers
arXiv 2021
Masked Autoencoders are Scalable Vision Learners
CVPR 2022
Oral Presentation
Pri3D: Can 3D Priors Help 2D Representation Learning?
ICCV 2021
An Empirical Study of Training Self-supervised Vision Transformers
ICCV 2021
Xinlei Chen*, Saining Xie*, Kaiming He,
Oral Presentation
On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
NeurIPS 2021
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
CVPR 2021
Oral Presentation
Sample-Efficient Neural Architecture Search by Learning Action Space
TPAMI 2021

2020
Graph Structure of Neural Networks
ICML 2020
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
ECCV 2020
Spotlight Presentation
Are Labels Necessary for Neural Architecture Search?
ECCV 2020
Spotlight Presentation
Momentum Contrast for Unsupervised Visual Representation Learning
CVPR 2020
Best Paper Nomination (top 30)
Decoupling Representation and Classifier for Long-Tailed Recognition
ICLR 2020

2019
On Network Design Spaces for Visual Recognition
ICCV 2019
Exploring Randomly Wired Neural Networks for Image Recognition
ICCV 2019
Oral Presentation

Previous
Deep Representation Learning with Induced Structural Priors
Ph.D. Thesis, UC San Diego 2018
Saining Xie
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
ECCV 2018
Attentional ShapeContextNet for Point Cloud Recognition
CVPR 2018
Saining Xie*, Sainan Liu*, Zeyu Chen, Zhuowen Tu
Aggregated Residual Transformations for Deep Neural Networks
CVPR 2017
Top-down Learning for Structured Labeling with Convolutional Pseudoprior
ECCV 2016
Saining Xie*, Xun Huang*, Zhuowen Tu
Holistically-Nested Edge Detection
ICCV 2015
Saining Xie, Zhuowen Tu
Marr Prize Honorable Mention
Deeply-Supervised Nets
AISTATS 2015
Chen-Yu Lee*, Saining Xie*, Patrick Gallagher*, Zhengyou Zhang, Zhuowen Tu
Oral Presentation at the NeurIPS'14 Deep Learning Workshop
Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification
CVPR 2015