In the new paper Wide Attention Is The Way Forward For Transformers, a research team from the University of Cambridge, Imperial College London, and the University of Oxford challenges the commonly held belief that deeper is better for transformer architectures, demonstrating that wider layers result in superior performance on natural language processing tasks.
Colossal-AI has successfully used a heterogeneous training strategy to increase the number of NLP model training parameters capacity by hundreds of times at the same hardware. And experiment results show that it only needs to keep 1~5% of the embedding parameters in the GPU, and is still able to maintain excellent end-to-end training speed.
In the new paper Foundation Transformers, a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and can handle diverse tasks and modalities without performance degradation.
In the new paper On Distillation of Guided Diffusion Models, researchers from Google Brain and Stanford University propose a novel approach for distilling classifier-free guided diffusion models with high sampling efficiency. The resulting models achieve performance comparable to the original model but with sampling steps reduced by up to 256 times.
In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.
In the new paper Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods, DeepMind and NYU Center for Neural Systems researchers introduce computational efficiency evaluation approaches designed to aid in the selection of optimal methods, datasets and models for pretraining visual tasks on a fixed FLOP budget.
In the new paper TVLT: Textless Vision-Language Transformer, researchers from UNC Chapel Hill present the Textless Vision-Language Transformer (TVLT) for vision-and-language representation learning. TVLT uses only raw visual and audio inputs and performs comparably to its text-based counterparts but requires only 1/3 the parameters and achieves 28x faster inference speeds.
In the new paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity, a research team from Google and DeepMind proposes Geometric Complexity (GC), a measure of deep neural network model complexity that serves as a useful tool for understanding the underlying mechanisms of complexity control.
In the new paper Progressive Distillation for Dense Retrieval, a research team from Xiamen U and Microsoft Research presents PROD, a progressive distillation method for dense retrieval that achieves state-of-the-art performance on five widely used benchmarks.
In the new paper A Generalist Neural Algorithmic Learner, a research team from DeepMind, University of Oxford, IDSIA, Mila, and Purdue University presents a novel generalist neural algorithmic learner — a single graph neural network (GNN) capable of solving various classical algorithms at single-task expert level.
In the new paper Promptagator: Few-shot Dense Retrieval From 8 Examples, a Google Research team proposes Prompt-based Query Generation for Retriever (Promptagator), a novel and simple approach for few-shot retrieval that leverages large language model (LLM) prompting to generate synthetic task-specific training data.
In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
In the new paper Human-level Atari 200x Faster, a DeepMind research team applies a set of diverse strategies to Agent57, with their resulting MEME (Efficient Memory-based Exploration) agent surpassing the human baseline on all 57 Atari games in just 390 million frames — two orders of magnitude faster than Agent57.
In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.
The new DeepMind paper Data Augmentation for Efficient Learning from Parametric Experts proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts. The method significantly improves data efficiency across various control and reinforcement learning settings.
In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.
In the new paper MO2: Model-Based Offline Options, a DeepMind research team introduces Model-Based Offline Options (MO2), an offline hindsight bottleneck options framework that supports sample-efficient option discovery over continuous state-action spaces for efficient skill transfer to new tasks.
A research team from Microsoft and Harvard University demonstrates that neural networks can discover succinct learning algorithms on their own in polynomial time and presents an architecture that combines recurrent weight-sharing between layers and convolutional weight-sharing to reduce parameter size from even trillions of nodes down to a constant.
In the new paper Decoding Speech From Non-Invasive Brain Recordings, a research team from Meta AI and the Inria Saclay Centre presents a single end-to-end architecture for decoding natural speech processing from non-invasive magnetoencephalography (MEG) or electroencephalography (EEG) brain recordings that can detect macroscopic brain signals in real-time.
In the new paper Faithful Reasoning Using Large Language Models, a DeepMind research team proposes a forward-chaining selection-inference model that performs faithful reasoning and provides a valid reasoning trace to improve reasoning quality and help users validate the model’s final answers.
In the new paper PEER: A Collaborative Language Model, a research team from Meta AI, Carnegie Mellon University, PSL University, and University College London presents PEER, a collaborative language model that performs a humanlike writing process — composing drafts, adding suggestions, proposing edits and providing explanations for its actions.
In the new paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation, a team from Princeton University and Adobe Research presents 3D-FM GAN, a novel conditional GAN framework that enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring any manual tuning or optimizations.
In the new paper Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, a Microsoft research team presents BEiT-3, a general-purpose state-of-the-art multimodal foundation model for both vision and vision-language tasks that advances the big convergence of backbone architectures, pretraining tasks, and model scaling.
Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.
In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks.
In the new paper Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing, a research team from Adobe Research and Australian National University presents paint2pix, a novel model that learns to predict users’ intentions and produce photorealistic images from primitive and coarse human brushstroke inputs.
Colossal-AI team and BioMap open-source their latest solution – xTrimo Multimer for protein monomer and multimer structure prediction. This new solution can predict both monomer and multimer structure simultaneously accelerating the process by up to 11 times!
Google Research and Carnegie Mellon University have open-sourced a library for constructing Python program graph representations used in machine learning for code research. Details are presented in the report A Library for Representing Python Programs as Graphs for Machine Learning.
In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.
In the new paper Semi-supervised Vision Transformers at Scale, a research team from AWS AI Labs proposes a semi-supervised learning pipeline for vision transformers that is stable, reduces hyperparameter tuning sensitivity, and outperforms conventional convolutional neural networks.
In the new paper Learning to Improve Code Efficiency, a research team from the Georgia Institute of Technology and Google Research presents a novel discrete generative latent-variable model designed to help programmers identify more computationally efficient code variants, taking a step toward automating the process of code performance optimization.
In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.
In the new paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage, researchers from Meta AI and Mila/McGill University release BlenderBot 3, a 175B parameter state-of-the-art open-domain dialogue model deployed on a public website. BlenderBot 3 is designed for continual learning via its user interactions.
A Tencent AI Lab research team introduces Efficient and Intelligent Editing (Effidit), a digital writing assistant that leverages large-scale neural language models to provide high-quality assistance in text completion, error checking, text polishing, keywords to sentences (K2S) and cloud input methods (cloud IME).