-
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
Evaluating Large Language Models Trained on Code
Paper β’ 2107.03374 β’ Published β’ 10 -
Training language models to follow instructions with human feedback
Paper β’ 2203.02155 β’ Published β’ 24 -
GPT-4 Technical Report
Paper β’ 2303.08774 β’ Published β’ 7
Collections
Discover the best community collections!
Collections including paper arxiv:2302.13971
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper β’ 1409.0473 β’ Published β’ 7 -
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 26 -
Hierarchical Reasoning Model
Paper β’ 2506.21734 β’ Published β’ 50
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 9 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 170 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 64
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 120 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 251
-
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper β’ 1406.1078 β’ Published β’ 1 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 23 -
deepseek-ai/DeepSeek-Math-V2
Text Generation β’ Updated β’ 2.96k β’ 687
-
Reinforcement Pre-Training
Paper β’ 2506.08007 β’ Published β’ 265 -
A Survey on Latent Reasoning
Paper β’ 2507.06203 β’ Published β’ 94 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper β’ 1910.10683 β’ Published β’ 18
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper β’ 2211.04325 β’ Published β’ 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 26 -
On the Opportunities and Risks of Foundation Models
Paper β’ 2108.07258 β’ Published β’ 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper β’ 2204.07705 β’ Published β’ 2
-
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
Evaluating Large Language Models Trained on Code
Paper β’ 2107.03374 β’ Published β’ 10 -
Training language models to follow instructions with human feedback
Paper β’ 2203.02155 β’ Published β’ 24 -
GPT-4 Technical Report
Paper β’ 2303.08774 β’ Published β’ 7
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 120 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 251
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper β’ 1409.0473 β’ Published β’ 7 -
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 26 -
Hierarchical Reasoning Model
Paper β’ 2506.21734 β’ Published β’ 50
-
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper β’ 1406.1078 β’ Published β’ 1 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 23 -
deepseek-ai/DeepSeek-Math-V2
Text Generation β’ Updated β’ 2.96k β’ 687
-
Reinforcement Pre-Training
Paper β’ 2506.08007 β’ Published β’ 265 -
A Survey on Latent Reasoning
Paper β’ 2507.06203 β’ Published β’ 94 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 20 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper β’ 1910.10683 β’ Published β’ 18
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper β’ 2211.04325 β’ Published β’ 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 26 -
On the Opportunities and Risks of Foundation Models
Paper β’ 2108.07258 β’ Published β’ 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper β’ 2204.07705 β’ Published β’ 2
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 9 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 170 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 64