Class Topics (Spring 2025)
LLVM Reading List
1. Vision & Language Models | Feb 10
- CLIP: Learning Transferable Visual Models From Natural Language Supervision
- SigLIP: Sigmoid Loss for Language Image Pre-Training
2. Large Language Models | Feb 18 (Tue)
- GPT-3: Language Models are Few-Shot Learners
- DeepSeekV3: DeepSeek-V3 Technical Report
3. Reasoning | Feb 24
- Chain of Thought: Chain of Thought Prompting Elicits Reasoning in Large Language Models
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
4. Multimodal LLMs | Mar 3
- LLaVA: Large Language and Vision Assistant
- Molmo: Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
5. Generative AI – Image | Mar 10
- Latent Diffusion Models (LDM): High-Resolution Image Synthesis with Latent Diffusion Models
- Stable Diffusion 3 (SD3): Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
6. Generative AI – Video and 3D | Mar 17
- Meta Movie Gen: Movie Gen: A Cast of Media Foundation Models
- CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
(Spring Break - Mar 24 Skipped)
7. World Models | Mar 31
8. Open-World Perception | Apr 7
- Segment Anything Model (SAM): Segment Anything
- DINOv2: Learning Robust Visual Features without Supervision
9. LLVM Agents | Apr 14
- Generative Agents: Interactive Simulacra of Human Behavior
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
10. LLVM Data | Apr 21
- LAION 5B: An Open Large-Scale Dataset for Training Next-Generation Image-Text Models
- Extracting Training Data from Diffusion Models: Extracting Training Data from Diffusion Models