- “Improving Language Understanding by Generative Pre-Training” by Radford et al. (2018): This is the paper that introduced the first version of the GPT model. It laid the foundation for the use of transformer-based models in natural language processing.
- “Language Models are Unsupervised Multitask Learners” by Radford et al. (2019): This paper presents GPT-2, an extension of the original GPT model, with significantly more parameters and trained on a larger dataset.
- “Language Models are Few-Shot Learners” by Brown et al. (2020): This paper introduces GPT-3, the third iteration in the GPT series. It highlights the model’s few-shot learning capabilities, where it performs tasks with minimal task-specific data.
- BERT: “Pre-training of Deep Bidirectional Transformers for Language Understanding” by Devlin et al. (2018): While not a GPT paper, this work by researchers at Google is a seminal paper in the field of LLMs. BERT introduced a new method of pre-training language representations that was revolutionary in the field.
- “Attention Is All You Need” by Vaswani et al. (2017): This paper, although not directly related to GPT, is crucial as it introduced the transformer architecture, which is the backbone of models like GPT-2 and GPT-3.
- “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” by Raffel et al. (2019): This paper from Google researchers presents the T5 model, which treats every language problem as a text-to-text problem, providing a unified framework for various NLP tasks.
- “XLNet: Generalized Autoregressive Pretraining for Language Understanding” by Yang et al. (2019): XLNet is another important model in the LLM domain, which outperformed BERT on several benchmarks by using a generalized autoregressive pretraining method.
- “ERNIE: Enhanced Representation through Knowledge Integration” by Sun et al. (2019): Developed by Baidu, ERNIE is an LLM that integrates lexical, syntactic, and semantic information effectively, showing significant improvements over BERT in various NLP tasks
Seminal Papers about Large Language Models
-
Il decreto CER è stato ufficialmente pubblicato ed è in vigore!
-
Retrieval-Augmented Generation (RAG): la svolta del 2024
Retrieval-Augmented Generation (RAG) represents a significant leap forward in the realm of artificial intelligence, particularly in the enhancement of Large Language Models (LLMs). By seamlessly integrating external data retrieval into the generative capabilities of LLMs, RAG addresses critical gaps in traditional language models, such as outdated or limited knowledge bases. This integration is achieved through…
-
Seminal Papers about Large Language Models
Lascia un commento