Category: Computer Vision & Graphics

AI Computer Vision & Graphics Machine Learning & Data Science Research

Maximizing FLOPS Utilization: DeepMind & NYU Propose Efficiency Evaluations for Visual Pretraining Methods

In the new paper Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods, DeepMind and NYU Center for Neural Systems researchers introduce computational efficiency evaluation approaches designed to aid in the selection of optimal methods, datasets and models for pretraining visual tasks on a fixed FLOP budget.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Princeton U & Adobe’s 3D-FM GAN Enables Precise 3D-Controllable Face Manipulation

In the new paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation, a team from Princeton University and Adobe Research presents 3D-FM GAN, a novel conditional GAN framework that enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring any manual tuning or optimizations.

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Microsoft’s BEiT-3 Foundation Model: A ‘Big Convergence of Language, Vision, and Multimodal Pretraining’ That Achieves SOTA Results on Popular Benchmarks

In the new paper Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, a Microsoft research team presents BEiT-3, a general-purpose state-of-the-art multimodal foundation model for both vision and vision-language tasks that advances the big convergence of backbone architectures, pretraining tasks, and model scaling.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Adobe and ANU’s Paint2Pix: Intent-Accurate Image Synthesis from Simple Brushstroke Inputs

In the new paper Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing, a research team from Adobe Research and Australian National University presents paint2pix, a novel model that learns to predict users’ intentions and produce photorealistic images from primitive and coarse human brushstroke inputs.

AI Computer Vision & Graphics Machine Learning & Data Science Research

IITM & UT Austin’s Generalizable NeRF Transformer Demonstrates Transformers’ Capabilities for Graphical Rendering

In the new paper Is Attention All NeRF Needs?, a research team from the Indian Institute of Technology Madras and the University of Texas at Austin proposes Generalizable NeRF Transformer (GNT), a pure and universal transformer-based architecture for efficient on-the-fly reconstruction of NeRFs. The work demonstrates that a pure attention mechanism suffices for learning a physically-grounded rendering process.

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Academia Sinica’s YOLOv7 Outperforms All Object Detectors, Reduces Costs by 50%

In the new paper YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors, an Academia Sinica research team releases YOLOv7. This latest YOLO version introduces novel “extend” and “compound scaling” methods that effectively utilize parameters and computation; and surpasses all known real-time object detectors in speed and accuracy.

AI Computer Vision & Graphics Machine Learning & Data Science Research

NVIDIA’s Global Context ViT Achieves SOTA Performance on CV Tasks Without Expensive Computation

In the new paper Global Context Vision Transformers, an NVIDIA research team proposes the Global Context Vision Transformer, a novel yet simple hierarchical ViT architecture comprising global self-attention and token generation modules that enables the efficient modelling of both short- and long-range dependencies without costly compute operations while achieving SOTA results across various computer vision tasks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Brain’s UViM: A Unified Approach for Modelling Diverse Vision Tasks Without Modifications

In the new paper UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes, a Google Brain research team proposes UViM, a unified approach that leverages language modelling and discrete representation learning to enable the modelling of a wide range of computer vision tasks without task-specific modifications.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft Azure Introduces i-Code: A General Framework That Enables Flexible Multimodal Representation Learning

In the new paper i-Code: An Integrative and Composable Multimodal Learning Framework, a Microsoft Azure Cognitive Services Research team presents i-Code, a self-supervised pretraining framework that enables the flexible integration of vision, speech, and language modalities and learns their vector representations in a unified manner.

AI Computer Vision & Graphics Machine Learning & Data Science Research

LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling

A research team from Rikkyo University and AnyTech Co., Ltd. examines the suitability of different inductive biases for computer vision and proposes Sequencer, an architectural alternative to ViTs that leverages long short-term memory (LSTM) rather than self-attention layers to achieve ViT-competitive performance on long sequence modelling.

AI Computer Vision & Graphics Machine Learning & Data Science Research

UC Berkeley & Intel’s Photorealistic Denoising Method Boosts Video Quality on Moonless Nights

In the new paper Dancing Under the Stars: Video Denoising in Starlight, a research team from UC Berkeley and Intel Labs leverages a GAN-tuned, physics-based noise model to represent camera noise under low light conditions and trains a novel denoiser that, for the first time, achieves photorealistic video denoising in starlight.

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind’s Upgraded Hierarchical Perceiver Is Faster, Scales to Larger Data Without Preprocessing, and Delivers Higher Resolution and Accuracy

DeepMind researchers propose Hierarchical Perceiver (HiP), a model that retains the original Perceiver’s ability to process arbitrary modalities but is faster, can scale up to even more inputs/outputs, reduces the need for input engineering, and improves both efficiency and accuracy on classical computer vision benchmarks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Tsinghua & NKU’s Visual Attention Network Combines the Advantages of Convolution and Self-Attention, Achieves SOTA Performance on CV Tasks

In the new paper Visual Attention Network, a research team from Tsinghua University and Nankai University introduces a novel large kernel attention (LKA) mechanism for an extremely simple and efficient Visual Attention Network (VAN) that significantly outperforms state-of-the-art vision transformers and convolutional neural networks on various computer vision tasks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google’s MaskGIT Outperforms SOTA Transformer Models on Conditional Image Generation and Accelerates Autoregressive Decoding by up to 64x

A Google Research team proposes Masked Generative Image Transformer (MaskGIT), a novel image synthesis paradigm that uses a bidirectional transformer decoder. MaskGIT significantly outperforms state-of-the-art transformer models on the ImageNet dataset and accelerates autoregressive decoding by up to 64x.

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Pushing the Limits of Self-Supervised ResNets: DeepMind’s ReLICv2 Beats Strong Supervised Baselines on ImageNet

A DeepMind research team proposes ReLICv2, which demonstrates for the first time that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet and even achieve comparable results to state-of-the-art self-supervised vision transformers (ViTs).

AI Computer Vision & Graphics Machine Learning & Data Science Research

Facebook AI & JHU’s MaskFeat Method Surpasses Kaiming He’s MAE, Sets New SOTA in Video Action Recognition

In the new paper Masked Feature Prediction for Self-Supervised Visual Pre-Training, a Facebook AI Research and Johns Hopkins University team presents a novel Masked Feature Prediction (MaskFeat) approach for the self-supervised pretraining of video models that achieves SOTA results on video benchmarks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft’s ‘Florence’ General-Purpose Foundation Model Achieves SOTA Results on Dozens of CV Benchmarks

In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off

Researchers from Fudan University, University of Surrey and Huawei Noah’s Ark Lab identify the limitations of quadratic complexity for vision transformers (ViTs) as rooted in keeping the softmax self-attention during approximations. The team proposes the first softmax-free transformer (SOFT), which reduces the self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Open-Sources SCENIC: A JAX Library for Rapid Computer Vision Model Prototyping and Cutting-Edge Research

A research team from Google Brain and Google Research introduces SCENIC, an open-source JAX library for fast and extensible computer vision research and beyond. JAX currently supports implementations of state-of-the-art vision models such as ViT, DETR and MLP Mixer, and more open-sourced cutting-edge projects will be added in the near future.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Are Patches All You Need? New Study Proposes Patches Are Behind Vision Transformers’ Strong Performance

A research team proposes ConvMixer, an extremely simple model designed to support the argument that the impressive performance of vision transformers (ViTs) is mainly attributable to their use of patches as the input representation. The study shows that ConvMixer can outperform ViTs, MLP-Mixers and classical vision models.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Apple Study Reveals the Learned Visual Representation Similarities and Dissimilarities Between Self-Supervised and Supervised Methods

An Apple research team performs a comparative analysis on a contrastive self-supervised learning (SSL) algorithm (SimCLR) and a supervised learning (SL) approach for simple image data in a common architecture, shedding light on the similarities and dissimilarities in their learned visual representation patterns.