Texture mapping with variational auto-encoders

Overview

vae-textures

This is an experiment with using variational autoencoders (VAEs) to perform mesh parameterization. This was also my first project using JAX and Flax, and I found them both quite intuitive and easy to use.

To get straight to the results, check out the Results section. The Background section describes the goals of this project in a bit more detail.

Background

In geometry processing, mesh parameterization allows high-resolution details of a 3D object, such as color and material variations, to be stored in a highly-optimized 2D image format. The strategy is to map each vertex of the 3D model's mesh to a unique 2D location in the plane, with the constraint that nearby points in 3D are also nearby in 2D. In general, we want this mapping to distort the geometry of the surface as little as possible, so for example large features on the 3D surface get a lot of pixels in the 2D image.

This might ring a bell to those familiar with machine learning. In ML, mapping a higher-dimensional space to a lower-dimensional space is called "embedding" and is often performed to aid in visualization or to remove extraneous information. VAEs are one technique in ML for mapping a high-dimensional space to a well-behaved latent space, and have the desirable property that probability densities are (approximately) preserved between the two spaces.

Given the above observations, here is how we can use VAEs for mesh parameterization:

  1. For a given 3D model, create a "surface dataset" with random points on the surface and their respective normals.
  2. Train a VAE to generate points on the surface using a 2D Gaussian latent space.
  3. Use the gaussian CDF to convert the above latents to the uniform distribution, so that "probability preservation" becomes "area preservation".
  4. Apply the 3D -> 2D mapping from the VAE encoder + gaussian CDF to map the vertices of the original mesh to the unit square.
  5. Render the resulting model with some test 2D texture image acting as the unit square.

The above process sounds pretty solid, but there are some quirks to getting it to work. Coming into this project, I predicted two possible reasons it would fail. It turns out that number 2 isn't that big of an issue (an extra orthogonality loss helps a lot), and there was a third issue I didn't think of (described in the Results section).

  1. Some triangles will be messed up because of cuts/seams. In particular, the VAE will have to "cut up" the surface to place it into the latent space, and we won't know exactly where these cuts are when mapping texture coordinates to triangle vertices. As a result, a few triangles must have points which are very far away in latent space.
  2. It will be difficult to force the mapping to be conformal. The VAE objective will mostly attempt to preserve areas (i.e. density), and ideally we care about conformality as well.

Results

This was my first time using JAX. Nevertheless, I was able to get interesting results right out of the gate. I ran most of my experiments on a torus 3D model, but I have since verified that it works for more complex models as well.

Initially, I trained VAEs with a Gaussian decoder loss. I also played around with an orthogonality bonus based on the eigenvalues of the Jacobian of the encoder. This resulted in texture mappings like this one:

Torus with orthogonality bonus and Gaussian loss

The above picture looks like a clean mapping, but it isn't actually bijective. To see why, let's sample from this VAE. If everything works as expected, we should get points on the surface of the torus. For this "sampling", I'll use the mean prediction from the decoder (even though its output is a Gaussian distribution) since we really just want a deterministic mapping:

A flat disk with a hole in the middle

It might be hard to tell from a single rendering, but this is just a flat disk with a low-density hole in the middle. In particular, the VAE isn't encoding the z axis at all, but rather just the x and y axes. The resulting texture map looks smooth, but every point in the texture is reused on each side of the torus, so the mapping is not bijective.

I discovered that this caused by the Gaussian likelihood loss on the decoder. It is possible for the model to reduce this loss arbitrarily by shrinking the standard deviations of the x and y axes, so there is little incentive to actually capture every axis accurately.

To achieve better results, we can drop the Gaussian likelihood loss and instead use pure MSE for the decoder. This isn't very well-principled, and we now have to select a reasonable coefficient for the KL term of the VAE to balance the reconstruction accuracy with the quality of the latent distribution. I found good hyperparameters for the torus, but these will likely require tuning for other models.

With the better reconstruction loss function, sampling the VAE gives the expected point cloud:

The surface of a torus, point cloud

The mappings we get don't necessarily seem angle-preserving, though:

A tiled grid mapped onto a torus

To preserve angles, we can add an orthogonality bonus to the loss. When we try to make the map preserve angles, we might make it less area preserving, as can be seen here:

A tiled grid mapped onto a torus which attempts to preserve angles

Also note from the last two images that there are seams along which the texture looks totally messed up. This is because the surface cannot be flattened to a plane without some cuts, along which the VAE encoder has to "jump" from one point on the 2D plane to another. This was one of my predicted shortcomings of the method.

Running

First, install the package with

pip install -e .

Training

My initial VAE experiments were run like so, via scripts/train_vae.py:

python scripts/train_vae.py --ortho-coeff 0.002 --num-iters 20000 models/torus.stl

This will save a model checkpoint to vae.pkl after 20000 iterations, which only takes a minute or two on a laptop CPU.

The above will train a VAE with Gaussian reconstruction loss, which may not learn a good bijective map (as shown above). To instead use the MSE decoder loss, try:

python scripts/train_vae.py --recon-loss-fn mse --kl-coeff 0.001 --batch-size 1024 --num-iters 20000 models/torus.stl

I also found a better orthogonality loss function. To get reasonable mappings that attempt to preserve angles, add --ortho-coeff 0.01 --ortho-loss-fn rel.

Using the VAE

Once you have trained a VAE, you can export a 3D model with the resulting texture mapping like so:

python scripts/map_vae.py models/torus.stl outputs/mapped_output.obj

Note that the resulting .obj file references a material.mtl file which should be in the same directory. I already include such a file with a checkerboard texture in outputs/material.mtl.

You can also sample a point cloud from the VAE using point_cloud_gen.py:

python scripts/point_cloud_gen.py outputs/point_cloud.obj

Finally, you can produce a texture image such that the pixel at point (x, y) is an RGB-encoded, normalized (x, y, z) coordinate from decoder(x, y).

python scripts/inv_map_vae.py models/torus.stl outputs/rgb_texture.png
Owner
Alex Nichol
Web developer, math geek, and AI enthusiast.
Alex Nichol
ICCV2021 Expert-Goal Trajectory Prediction

ICCV 2021: Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples This repository contains the code for the paper Where are yo

hz 21 Dec 12, 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 123 Dec 23, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Sergey Zagoruyko 122 Dec 07, 2022
Scalable Graph Neural Networks for Heterogeneous Graphs

Neighbor Averaging over Relation Subgraphs (NARS) NARS is an algorithm for node classification on heterogeneous graphs, based on scalable neighbor ave

Facebook Research 67 Dec 03, 2022
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

GINC small-scale in-context learning dataset GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context

P-Lambda 29 Dec 19, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022
Embodied Intelligence via Learning and Evolution

Embodied Intelligence via Learning and Evolution This is the code for the paper Embodied Intelligence via Learning and Evolution Agrim Gupta, Silvio S

Agrim Gupta 111 Dec 13, 2022
Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

Viewer for MuJoCo in Python Interactive renderer to use with the official Python bindings for MuJoCo. Starting with version 2.1.2, MuJoCo comes with n

Rohan P. Singh 62 Dec 30, 2022
tensorflow code for inverse face rendering

InverseFaceRender This is tensorflow code for our project: Learning Inverse Rendering of Faces from Real-world Videos. (https://arxiv.org/abs/2003.120

Yuda Qiu 18 Nov 16, 2022
VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech

Disong Wang 262 Dec 31, 2022
MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

MiraiML Mirai: future in japanese. MiraiML is an asynchronous engine for continuous & autonomous machine learning, built for real-time usage. Usage In

Arthur Paulino 25 Jul 27, 2022
Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Learning Tracking Representations via Dual-Branch Fully Transformer Networks DualTFR ⭐ We achieves the runner-ups for both VOT2021ST (short-term) and

phiphi 19 May 04, 2022
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022
Implementation of Pix2Seq in PyTorch

pix2seq-pytorch Implementation of Pix2Seq paper Different from the paper image input size 1280 bin size 1280 LambdaLR scheduler used instead of Linear

Tony Shin 9 Dec 15, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022