AI Machine Learning & Data Science Research

Meet Magneto: Microsoft’s Foundation Transformer for General-Purpose Modelling Across Tasks and Modalities

In the new paper Foundation Transformers, a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and can handle diverse tasks and modalities without performance degradation.

The machine learning community has seen a trend in recent years, with researchers working to converge their model architectures across language, vision, speech, and multimodal classes. While transformer architectures have become the de facto standard for building such highly desirable general-purpose foundation models, the optimal transformer variants still differ for different input modalities.

In the new paper Foundation Transformers , a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and is capable of handling diverse tasks and modalities without performance degradation.

The team first identifies the properties a foundation model should possess for true general-purpose modelling: 1) The desired modelling should be able to serve as a go-to architecture for various tasks and modalities, so that we can use the same backbone without trial and error, and 2) The architecture should provide guaranteed training stability.

The proposed Magneto is a Foundation Transformer implementation designed to achieve the abovementioned goals. Magento uses Sub-LayerNorm (Sub-LN), which adds another LayerNorm inside each sublayer. The team also introduces a novel initialization method theoretically proven to guarantee training stability, enabling the model to be scaled relatively easily.

In their empirical studies, the team compared Magneto with popular transformer variants such as BERT, GPT, and BEiT-3 on a wide range of tasks and modalities, including natural language processing, speech recognition, vision tasks, etc.

Magneto significantly surpassed its baseline counterparts in the experiments. Moreover, it was shown to be more stable in terms of optimization, indicating its potential for effectively scaling up all manner of transformer models.

The paper Foundation Transformers is on arXiv .


Author : Hecate He | Editor : Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “ Meet Magneto: Microsoft’s Foundation Transformer for General-Purpose Modelling Across Tasks and Modalities

Leave a Reply

Your email address will not be published.

%d bloggers like this: