Category: Nature Language Tech

AI Machine Learning & Data Science Nature Language Tech Research

‘Ask Me Anything’: Stanford U, Numbers Station & UW Madison’s Novel Prompting Strategy Enables LLMs With 30x Fewer Parameters to Outperform Few-Shot GPT3-175B

In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.

AI Machine Learning & Data Science Nature Language Tech Research

Google Brain’s Vec2Text Models for Sentence Generation Excel in Universality, Diversity, Fluency & Semantic Structure

In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.

AI Machine Learning & Data Science Nature Language Tech Research

Peking U & Microsoft’s Knowledge Attribution Method Enables Editing Factual Knowledge in Pretrained Transformers Without Fine-Tuning

In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.

AI Machine Learning & Data Science Nature Language Tech Research

Plan, Edit, Explain and Repeat: The PEER Collaborative Language Model Brings a Humanlike Process to Text Generation

In the new paper PEER: A Collaborative Language Model, a research team from Meta AI, Carnegie Mellon University, PSL University, and University College London presents PEER, a collaborative language model that performs a humanlike writing process — composing drafts, adding suggestions, proposing edits and providing explanations for its actions.

AI Machine Learning & Data Science Nature Language Tech Research

CMU Details 6 Years of Contributions to the National Science Foundation- Funded DialPort Project for Dialog Research

Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft’s Parameter-Efficient Z-Code++ Language Model Beats the 200x Larger GPT3-175B on Abstractive Text Summarization

In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Meet Atlas: A Pretrained Retrieval Augmented Language Model That Outperforms a 540B Parameter Model But Requires 50x Fewer Parameters

In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.

AI Machine Learning & Data Science Nature Language Tech Research

Fancy a Friendly Chat? Stanford NLP’s Chirpy Cardinal Enables Open-Domain and Humanlike Conversations

In the new paper Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent, a Stanford NLP research team presents Chirpy Cardinal, an open-domain conversational social chatbot with emotional and social intelligence that enables authentic and engaging interactions with real people.

AI Machine Learning & Data Science Nature Language Tech Research

CMU’s Novel ‘ReStructured Pre-training’ NLP Approach Scores 40 Points Above Student Average on a Standard English Exam

In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Imagen Text-to-Image Diffusion Model With Deep Language Understanding Defeats DALL-E 2

In the new paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, a Google Brain research team presents Imagen, a text-to-image diffusion model that combines deep language understanding and photorealistic image generation capabilities to achieve a new state-of-the-art FID score of 7.27 on the COCO dataset.

AI Machine Learning & Data Science Nature Language Tech Research

Fact Tracing in LMs: MIT & Google Dataset and Benchmark Track Learned Knowledge Back to the Training Data

In the new paper Tracing Knowledge in Language Models Back to the Training Data, a team from MIT CSAIL and Google Research proposes a benchmark for tracing language models’ assertions to the associated training data, aiming to establish a principled ground truth and mitigate high compute demands for large neural language model training.

AI Machine Learning & Data Science Nature Language Tech Research

Tokyo U & Google Brain Train Large Language Models as Zero-Shot Reasoners

In the new paper Large Language Models are Zero-Shot Reasoners, a research team from the University of Tokyo and Google Brain demonstrates that large language models (LLMs) can become good zero-shot reasoners through the addition of a simple prompt — “Let’s think step by step” — that elicits a step-by-step thinking process before each question is answered. Their Zero-shot-CoT model achieves huge performance gains compared to the zero-shot baseline.

AI Machine Learning & Data Science Nature Language Tech Research

Adobe’s UDoc Captures Cross-Modal Correlations in a Unified Pretraining Framework to Improve Document Understanding

In the new paper Unified Pretraining Framework for Document Understanding, an Adobe Research and Adobe Document Cloud team presents a unified pretraining framework for document understanding that enables cross-modal connections, relevant information highlighting in both visual and textual modalities, and cross-modal connections. UDoc achieves impressive performance on various downstream tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Training Compute-Optimal Large Language Models: DeepMind’s 70B Parameter Chinchilla Outperforms 530B Parameter Megatron-Turing

In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Google & IDSIA’s Block-Recurrent Transformer Dramatically Outperforms Transformers Over Very Long Sequences

A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model

A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

Sapienza U & OpenAI Propose Explanatory Learning to Enable Machines to Understand and Create Explanations

A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.

AI Machine Learning & Data Science Nature Language Tech Research

Peng Cheng Laboratory & Baidu Release PCL-BAIDU Wenxin: The World’s First Knowledge-Enhanced 100-Billion-Scale Pretrained Language Model

Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.

AI Machine Learning & Data Science Nature Language Tech Research

Introducing MetaICL: A Language Model Meta-Training Framework for Few-Shot In-Context Learning

A research team from the University of Washington, Facebook AI Research and the Allen Institute for AI introduces Meta-training for InContext Learning (MetaICL), a new meta-training framework for few-shot learning where an LM is meta-trained to learn in-context — conditioning on training examples to recover the task and make predictions.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Mention Memory: Incorporating Factual Knowledge From Various Sources Into Transformers Without Supervision

A research team from the University of Southern California and Google proposes TOME, a “mention memory” approach to factual knowledge extraction for NLU tasks. A transformer model with attention over a semi-parametric representation of the entire Wikipedia text corpus, TOME can extract information without supervision and achieves strong performance on multiple open-domain question answering benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

NYU & UNC Reveal How Transformers’ Learned Representations Change After Fine-Tuning

In the paper Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers, a research team from New York University and the University of North Carolina at Chapel Hill uses centered kernel alignment (CKA) to measure the similarity of representations across layers and explore how fine-tuning changes transformers’ learned representations.

AI Machine Learning & Data Science Nature Language Tech Research

MIT’s Automatic Data-Driven Media Bias Measurement Method Achieves Human-Level Results

MIT researchers present an automated, objective and transparent data-driven method for measuring media bias. The study analyses roughly a million articles from about a hundred newspapers for bias on various news topics, maps the newspapers into a two-dimensional media bias landscape, and shows that the data-driven results agree well with human-judgement classifications.

AI Machine Learning & Data Science Nature Language Tech Research

Apple Neural TTS System Study: Combining Speakers of Multiple Languages to Improve Synthetic Voice Quality

An Apple research team explores multiple architectures and training procedures to develop a novel multi-speaker and multi-lingual neural TTS system. The study combines speech from 30 speakers from 15 locales in 8 languages, and demonstrates that for the vast majority of voices, such multi-lingual and multi-speaker models can yield better quality than single speaker models.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Google Researchers Enable Transformers to Solve Compositional NLP Tasks

A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

Google’s H-Transformer-1D: Fast One-Dimensional Hierarchical Attention With Linear Complexity for Long Sequence Processing

A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.

AI Machine Learning & Data Science Nature Language Tech Research

Melbourne U, Facebook & Twitter Expose Novel Numerical Errors in NMT Systems

A research team from the University of Melbourne, Facebook AI, and Twitter Cortex proposes a black-box test method for assessing and debugging the numerical translation of neural machine translation systems in a systematic manner. The approach reveals novel types of errors that are general across multiple state-of-the-art translation systems.