Bidirectional transformers for language understanding (2018), introducing pre-training and fine-tuning paradigm.