Class Topics (Spring 2025)

LLVM Reading List

1. Vision & Language Models | Feb 10

CLIP: Learning Transferable Visual Models From Natural Language Supervision
SigLIP: Sigmoid Loss for Language Image Pre-Training

2. Large Language Models | Feb 18 (Tue)

GPT-3: Language Models are Few-Shot Learners
DeepSeekV3: DeepSeek-V3 Technical Report

3. Reasoning | Feb 24

Chain of Thought: Chain of Thought Prompting Elicits Reasoning in Large Language Models
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

4. Multimodal LLMs | Mar 3

LLaVA: Large Language and Vision Assistant
Molmo: Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

5. Generative AI – Image | Mar 10

Latent Diffusion Models (LDM): High-Resolution Image Synthesis with Latent Diffusion Models
Stable Diffusion 3 (SD3): Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

6. Generative AI – Video and 3D | Mar 17

Meta Movie Gen: Movie Gen: A Cast of Media Foundation Models
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

(Spring Break - Mar 24 Skipped)

7. World Models | Mar 31

UniSim: Learning Interactive Real-World Simulators
Genie 2: A Large-Scale Foundation World Model

8. Open-World Perception | Apr 7

Segment Anything Model (SAM): Segment Anything
DINOv2: Learning Robust Visual Features without Supervision

9. LLVM Agents | Apr 14

Generative Agents: Interactive Simulacra of Human Behavior
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

10. LLVM Data | Apr 21

LAION 5B: An Open Large-Scale Dataset for Training Next-Generation Image-Text Models
Extracting Training Data from Diffusion Models: Extracting Training Data from Diffusion Models

11. LLM Infrastructure | Apr 28

FlashAttention V1+V2: Fast and Memory-Efficient Exact Attention with IO-Awareness
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models