User Tools

Site Tools


ai:nlp:start

This is an old revision of the document!


Notes on NLP

Papers / Websites

Websites

Videos

Literature overview on NLP

These tables should give an overview over recent and influential literature in the field of Natural Language Processing from the past few years.

General overview

NLP, transfer learning, language models.

Author Title Link to code Abstract (short)
Vaswani et al. (2017) Attention Is All You Need Code used for training and evaluation: https://github.com/tensorflow/tensor2tensor Introduction of a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Kim et al. (2017) Structured Attention Networks https://github.com/harvardnlp/struct-attn In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees.
Radford et al. (2018) Improving Language Understanding by Generative Pre-Training https://github.com/openai/finetune-transformer-lm Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. GPT-1
Devlin et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://github.com/openai/finetune-transformer-lm Introduction of a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
Radford et al. (2019) Language Models are Unsupervised Multitask Learners https://github.com/openai/gpt-2 Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText.(…) Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits Web-Text.
Ruder (2019) Neural Transfer Learning for Natural Language Processing https://github.com/sebastianruder Multiple novel methods for different transfer learning scenarios were presented and evaluated across a diversity of settings where they outperformed single-task learning as well as competing transfer learning methods.
Kovaleva et al. (2019) Revealing the Dark Secrets of BERT - BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT.
Rogers et al. (2020) A Primer in BERTology: What We Know About How BERT Works - This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression.
Brown et al. (2020) Language Models are Few-Shot Learners https://github.com/openai/gpt-3 Demonstration that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.
Schick and Schütze (2020) It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners https://github.com/timoschick/pet We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements.
Jaegle et al. (2021) Perceiver IO: A General Architecture for Structured Inputs & Outputs https://github.com/deepmind/deepmind-research/tree/master/perceiver The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original’s appealing properties by learning to flexibly query the model’s latent space to produce outputs of arbitrary size and semantics.

Specific overview

Speech recognition

Author Title Link to code Abstract (short)
Amodei et al. (2015) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin - We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech—two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages.
Agarwal and Zesch (2019) German End-to-end Speech Recognition based on DeepSpeech https://github.com/AASHISHAG/deepspeech-german Description of the process of training German models based on the Mozilla DeepSpeech architecture using publicly available data.

Information Extraction

Named Entity Recognition

Author Title Link to code Abstract (short)
Anthofer (2017) A Neural Network for Open Information Extraction from German Text https://github.com/danielanthofer/nnoiegt Systems that extract information from natural language texts usually need to consider language-dependent aspects like vocabulary and grammar. Compared to the develop ment of individual systems for different languages, development of multilingual information extraction (IE) systems has the potential to reduce cost and effort. One path towards IE from different languages is to port an IE system from one language to another. PropsDE is an open IE (OIE) system that has been ported from the English system PropS to the German language.
Riedl and Padó (2018) A Named Entity Recognition Shootout for German https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/german-ner/ We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario.
Torge et al. (2021) Transfer Learning for Domain-Specific Named Entity Recognition in German - Investigation of different transfer learning approaches to recognize unknown domain-specific entities, including the influence on varying training data size.
ai/nlp/start.1631700306.txt.gz · Last modified: 2023/01/05 14:38 (external edit)