Skip to main content Link Search Menu Expand Document (external link)

CSCI-GA.2271-001 (Advanced) Computer Vision

Overview

Welcome to CSCI-GA.2271-001, (Advanced) Computer Vision. This graduate-level course focuses on modern, deep learning-based computer vision research and applications. While we will cover traditional vision techniques, our primary emphasis will be on contextualizing these methods within the current landscape of computer vision. The course aims to provide students with both historical perspectives and a deep understanding of current applications powered by large-scale vision foundation models. We will explore cutting-edge topics such as deep learning architectures for various vision tasks, object detection and segmentation, image and video generation, 3D vision and scene understanding, and vision transformers and self-supervised learning. Additionally, we will delve into multimodal learning, examining how computer vision serves as a connective layer to many other domains. This interdisciplinary approach will highlight the broader impact and applications of computer vision in fields such as natural language processing, robotics, and augmented reality. Throughout the course, students will gain hands-on experience with state-of-the-art algorithms and frameworks, preparing them for both research and practical applications in the rapidly evolving field of computer vision.


Teaching Team

Saining Xie

saining.xie@nyu.edu

Office Hours: Monday 3:00 - 4:00 pm

Calendly for Booking

Vishnu Beji

vishnu.b@nyu.edu

Office Hours: Wed 11:00AM - 12:00 PM

Srivats Poddar

sp7811@nyu.edu

Office Hours: Mon 12:00 PM - 1:00 PM

Kushagra Khatwani

kk5395@nyu.edu

Office Hours: Mon 12:00 PM - 1:00 PM


Prerequisite

Students are expected to have a solid mathematics background and strong programming skills. Students are expected to have completed at least one of these undergraduate courses: 1) Deep Learning, 2) Machine Learning, and 3) Undergraduate Computer Vision. Other requirements include: Python programming; Algorithms and data structure (CSCI-UA.102); Deep learning programming with PyTorch or JAX; Foundations of machine learning; Foundations of deep learning; Linear algebra; Probability and statistics (DS-GA.1002, MATH-UA.140, MATH-UA.235);


Logistics

When: Thursdays, 7:10 PM - 9:10 PM

Where: 31 Washington Pl (Silver Ctr) Room 405

Format: Lectures and Discussions.

Discord Group: We will use Discord to faliciate discussion and host course materials. You can find the Discord link on Brightspace.

Students auditing the course should email the instructor or any of the TAs to get access to the Discord server.


Class Schedule

An updated schedule of individual classes and topics can be found on the Calendar page.


Coursework

Grading will be based on three activities:

  1. Early assignment (10%)
  2. Graded Homework Assignments (15% * 3)
  3. Semester-long project including report and presentation (50%)
  4. Class participation and online discussions (additional bonus)

1. Early assignment

This small warm-up exercise aims to give you hands-on experience and prepare you for the class. More information is available here. Please follow the instructions to submit your assignment.

2. Semester-long project

See the information below.

Project

The main deliverable of the course is a semester-long project, designed to give you the open-ended opportunity to either:

  1. Build a computer vision-powered application or demo. Computer vision models are powerful tools to solve exciting real-world problems. Utilizing various image processing techniques, deep learning architectures, and computer vision APIs, these models can function as ready-to-use tools. They can be employed to automate vision-based interactions with the environment, perform image-based data analytics, generate or manipulate visual content, enhance real-time video streams, reconstruct 3D scenes, or simply build something cool.

  2. Conduct a research project. Should you wish to explore the research aspects of computer vision more thoroughly, we invite you to undertake a research project tailored to your interests. Your focus could be on identifying a specific research topic within the realms of computer vision. You can conduct comparative studies to uncover the limitations of current vision models, or enhance the overall design—be it through optimizing data pipelines, training objectives, or architectures.

Be aware that the line separating a demo from a research project can be somewhat indistinct; the instructor will assist you in appropriately categorizing your project idea. Additionally, there is no grading preference for either application/demo or research projects, so feel free to select the option that most excite you!

Project logistics

Both project formats may be done in teams of 2-5 students (individual projects are not allowed as teamwork is an essential learning objective). We expect every team member to contribute to the project (and individual contributions should be clearly listed).

We will organize your project progress into two key milestones: (1) a preliminary proposal, and (2) a final submission/presentation. The dates for these milestones will be disclosed soon.

The preliminary proposal should sketch out the research question or application you’re keen to explore, along with the methodology you intend to employ. This should feature a concise overview of the CV methods you aim to utilize, as well as a list of potential metrics for evaluating success.

For the final submission, both write-up and code repo will be required, regardless of the project format. We will schedule a presentation/poster session for each team to present their work during the final week of the semester. Additional specifics will be provided soon, but anticipate the following:

  • Application/demo submissions to include a functional demo of your application, possibly through platforms like Gradio or Streamlit. This should be accompanied by a brief written explanation that outlines the problem you’re addressing, the CV model(s) you’ve employed, and your implementation and evaluation process.

  • Research project submissions to a final report resembling a research paper (ranging from 4 to 9 pages, excluding references) and a code repository to replicate your findings. Clear and succinct writing is crucial; any lack of clarity or unnecessary complexity may lead to point deductions. For projects involving multiple contributors, a delineation of each participant’s role is mandatory. All submissions must be LaTeX-formatted and provided in PDF format (exceptions must be approved by the instructor). Utilizing user-friendly web platforms like Overleaf is strongly encouraged.

Project Proposal Submission Instructions

Please submit your project proposal by October 3 at 4:00 PM ET via Gradescope. Your submission should be in PDF format and include detailed information about your project idea, significance, expected deliverables, potential risks, and a preliminary timeline to monitor progress. We recommend using the following draft template : Overleaf, for your project proposal and report to ensure all required sections are included.

Proposal Feedback Sessions

Take advantage of the opportunity to receive feedback from the instructors and enhance your proposal. We’ve allocated times on [TBD], for proposal review sessions. Each group will be allotted a 15-minute session to discuss your proposal with the instructor. To book your slot, please use the Calendly link provided in the email sent to you.

Access to Computing Facilities

Upon reviewing your proposal, you might be eligible to get access to both NYU HPC and Google Cloud credits. You are free to use your own computng resources. As you draft your proposal, consider which of these resources would be best suited for your project and mention it in your submission. You are encouraged to build upon current open-source codebases such as Diffuser, LLaVA, etc.

3. Class attendance and participation

Daily class attendance will be recorded.

4. Textbooks

The course does not closely follow a particular text; the lectures are meant to be self-contained. Nevertheless, the following texts (though not required) may be useful as general references:

Late Submission Policy

  • Each student will be provided 3 grace days to submit their assignment without any penalty. They will be free to use these grace days at their convenience. Some examples of how a student could use these grace days are:
    • Student_1 submits 3 assignments each of which is one day late. They will not be penalized for any assignment.
    • Student_2 submits 1 assignment which is 1 day late and another assignment which is 2 days late. They will not be penalized for these two assignments.
    • Student_3 submits 1 assignment 3 days late. They will not be penalized for this single assignment.
  • Once the student exhausts its 3 graces days, they will receive:
    • 75% grade if their assignment is late by one day
    • 50% grade if their assignment is late by two days
    • 0% grade if their assignment is late by more than two days
  • Note:
    • If assignment 1 is 20 hours late it is calculated as 1 day late.
    • If assignment 2 is 28 hours late it’s as calculated as 2 days late.
    • So the total will be 3 days late (and not 22+28 = 48 == 2 days)