LLM background literature

CSCE 689 LLMs:: Course Readings as shared by Maria Teleki

Parameter-Efficient Tuning, Compression

Efficient inference

Readings:
- Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
- Fast Inference from Transformers via Speculative Decoding
Optional:
- Flash-Decoding for long-context inference
- Some of Andrej Karpathy’s github repos
  - https://github.com/karpathy/nanoGPT
  - https://github.com/karpathy/llm.c
  - Some of Georgi Gerganov’s github repos
    - https://github.com/ggerganov/llama.cpp
    - https://github.com/ggerganov/ggml

Model distillation

Data Efficiency in the Age of LLMs

References:
- Sorscher, Ben, et al. "Beyond neural scaling laws: beating power law scaling via data pruning." NeurIPS (2022)
- Abbas, Amro, et al. "Semdedup: Data-efficient learning at web-scale through semantic deduplication." arXiv preprint arXiv:2303.09540 (2023).
- Sachdeva, Noveen, et al. "How to Train Data-Efficient LLMs." arXiv preprint arXiv:2402.09668 (2024).
- Marion, Max, et al. "When less is more: Investigating data pruning for pretraining llms at scale." arXiv preprint arXiv:2309.04564 (2023).
- Brown, Tom B. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020).
- Xie, Sang Michael, et al. "Data selection for language models via importance resampling." NeurIPS (2023)
- Engstrom, Logan, Axel Feldmann, and Aleksander Madry. "Dsdm: Model-aware dataset selection with datamodels." arXiv preprint arXiv:2401.12926 (2024).
- Fadhel, et al. "Data pruning and neural scaling laws: fundamental limitations of score-based algorithms." TMLR ‘23.
- Guo, et al. "Deepcore: A comprehensive library for coreset selection in deep learning." arXiv preprint arXiv:2204.08499 (2022).

Tools, Agents, and MoE

Readings:
- What Are Tools Anyway? A Survey from the Language Model Perspective
- Toolformer: Language Models Can Teach Themselves to Use Tools
- ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629
- Mixture-of-Agents Enhances Large Language Model Capabilities
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity
Optional:
- Lilian’s Blog on LLM Powered Autonomous Agents
- Visual Programming: Compositional Visual Reasoning Without Training
- Great collection of papers on tools: https://github.com/zorazrw/awesome-tool-llm
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Long context, extending context

Scaling Laws

Self-play

LLM Applications: Text mining, user modeling, …

VLM Part 2

Model development

Transformers and New Directions (Linear Attention, Linear RNNs, State Space Models)

Bias

Diffusion Models

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

Miscellaneous

Chess as a Testbed for Language Model State Tracking | Proceedings of the AAAI Conference on Artificial Intelligence
Topic: How well can an LLM learn Conway’s Game of Life and be prompted to solve for different tasks, such as: given some NxN space how well can it maximize still life or given the opportunity to modify the state of the system minimizing entropy while maximizing stable life. This could test how well they can reason with very simple rules and how far out it can predict for a highly chaotic systems. These kind of questions are typically for mathematicians and very fast computers with lots of RAM