LLM background literature

CSCE 689 LLMs:: Course Readings as shared by Maria Teleki

Parameter-Efficient Tuning, Compression

Efficient inference

Model distillation

Data Efficiency in the Age of LLMs

  • References:
    • Sorscher, Ben, et al. "Beyond neural scaling laws: beating power law scaling via data pruning." NeurIPS (2022)
    • Abbas, Amro, et al. "Semdedup: Data-efficient learning at web-scale through semantic deduplication." arXiv preprint arXiv:2303.09540 (2023).
    • Sachdeva, Noveen, et al. "How to Train Data-Efficient LLMs." arXiv preprint arXiv:2402.09668 (2024).
    • Marion, Max, et al. "When less is more: Investigating data pruning for pretraining llms at scale." arXiv preprint arXiv:2309.04564 (2023).
    • Brown, Tom B. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020).
    • Xie, Sang Michael, et al. "Data selection for language models via importance resampling." NeurIPS (2023)
    • Engstrom, Logan, Axel Feldmann, and Aleksander Madry. "Dsdm: Model-aware dataset selection with datamodels." arXiv preprint arXiv:2401.12926 (2024).
    • Fadhel, et al. "Data pruning and neural scaling laws: fundamental limitations of score-based algorithms." TMLR ‘23.
    • Guo, et al. "Deepcore: A comprehensive library for coreset selection in deep learning." arXiv preprint arXiv:2204.08499 (2022).

Tools, Agents, and MoE

Long context, extending context

Scaling Laws

Self-play

LLM Applications: Text mining, user modeling, …

VLM Part 2

Model development

Transformers and New Directions (Linear Attention, Linear RNNs, State Space Models)

Bias

Diffusion Models

Miscellaneous

  • Chess as a Testbed for Language Model State Tracking | Proceedings of the AAAI Conference on Artificial Intelligence
  • Topic: How well can an LLM learn Conway’s Game of Life and be prompted to solve for different tasks, such as: given some NxN space how well can it maximize still life or given the opportunity to modify the state of the system minimizing entropy while maximizing stable life. This could test how well they can reason with very simple rules and how far out it can predict for a highly chaotic systems. These kind of questions are typically for mathematicians and very fast computers with lots of RAM