Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

Overview

MINER_pl

Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning.

image

📖 Ref readings

⚠️ Main differences w.r.t. the original paper before continue:

  • In the pseudo code on page 8, where the author states Weight sharing for images, it means finer level networks are initialized with coarser level network weights. However, I did not find the correct way to implement this. Therefore, I initialize the network weights from scratch for all levels.
  • The paper says it uses sinusoidal activation (does he mean SIREN? I don't know), but I use gaussian activation (in hidden layers) with trainable parameters (per block) like my experiments in the other repo. In finer levels where the model predicts laplacian pyramids, I use sinusoidal activation x |-> sin(ax) with trainable parameters a (per block) as output layer (btw, this performs significantly better than simple tanh). Moreover, I precompute the maximum amplitude for laplacian residuals, and use it to scale the output, and I find it to be better than without scaling.
  • I experimented with a common trick in coordinate mlp: positional encoding and find that using it can increase training time/accuracy with the same number of parameters (by reducing 1 layer). This can be turned on/off by specifying the argument --use_pe. The optimal number of frequencies depends on the patch size, the larger patch sizes, the more number of frequencies you need and vice versa.
  • Some difference in the hyperparameters: the default learning rate is 3e-2 instead of 5e-4. Optimizer is RAdam instead of Adam. Block pruning happens when the loss is lower than 1e-4 (i.e. when PSNR>=40) for image and 5e-3 for occupancy rather than 2e-7.

💻 Installation

  • Run pip install -r requirements.txt.
  • Download the images from Acknowledgement or prepare your own images into a folder called images.
  • Download the meshes from Acknowledgement or prepare your own meshes into a folder called meshes.

🔑 Training

image

Pluto example:

python train.py \
    --task image --path images/pluto.png \
    --input_size 4096 4096 --patch_size 32 32 --batch_size 256 --n_scales 4 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 200 \
    --exp_name pluto4k_4scale

Tokyo station example:

python train.py \
    --task image --path images/tokyo-station.jpg \
    --input_size 6000 4000 --patch_size 25 25 --batch_size 192 --n_scales 5 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 50 150 \
    --exp_name tokyo6k_5scale
Image (size) Train time (s) GPU mem (MiB) #Params (M) PSNR
Pluto (4096x4096) 53 3171 9.16 42.14
Pluto (8192x8192) 106 6099 28.05 45.09
Tokyo station (6000x4000) 68 6819 35.4 42.48
Shibuya (7168x2560) 101 8967 17.73 37.78
Shibuya (14336x5120) 372 8847 75.42 39.32
Shibuya (28672x10240) 890 10255 277.37 41.93
Shibuya (28672x10240)* 1244 6277 98.7 37.59

*paper settings (6 scales, each network has 4 layer with 9 hidden units)

The original image will be resized to img_wh for reconstruction. You need to make sure img_wh divided by 2^(n_scales-1) (the resolution at the coarsest level) is still a multiple of patch_wh.


mesh

First, convert the mesh to N^3 occupancy grid by

python preprocess_mesh.py --N 512 --M 1 --T 1 --path <path/to/mesh> 

This will create N^3 occupancy to be regressed by the neural network. For detailed options, please see preprocess_mesh.py. Typically, increase M or T if you find the resulting occupancy bad.

Next, start training (bunny example):

python train.py \
    --task mesh --path occupancy/bunny_512.npy \
    --input_size 512 --patch_size 16 --batch_size 512 --n_scales 4 \
    --use_pe --n_freq 5 --n_layers 2 --n_hidden 8 \
    --loss_thr 5e-3 --b_chunks 512 \
    --num_epochs 50 50 50 150 \
    --exp_name bunny512_4scale

For full options, please see here. Some important options:

  • If your GPU memory is not enough, try reducing batch_size.
  • By default it will not log intermediate images to tensorboard to save time. To visualize image reconstruction and active blocks, add --log_image argument.

You are recommended to monitor the training progress by

tensorboard --logdir logs

where you can see training curves and images.

🟥 🟩 🟦 Block decomposition

To reconstruct the image using trained model and to visualize block decomposition per scale like Fig. 4 in the paper, see image_test.ipynb or mesh_test.ipynb

Examples:

💡 Implementation tricks

  • Setting num_workers=0 in dataloader increased the speed a lot.
  • As suggested in training details on page 4, I implement parallel block inference by defining parameters of shape (n_blocks, n_in, n_out) and use @ operator (same as torch.bmm) for faster inference.
  • To perform block pruning efficiently, I create two copies of the same network, and continually train and prune one of them while copying the trained parameters to the target network (somehow like in reinforcement learning, e.g. DDPG). This allows the network as well as the optimizer to shrink, therefore achieve higher memory and speed performance.
  • In validation, I perform inference in chunks like NeRF, and pass each chunk to cpu to reduce GPU memory usage.

💝 Acknowledgement

Further readings

During a stream, my audience suggested me to test on this image with random pixels:

random

The default 32x32 patch size doesn't work well, since the texture varies too quickly inside a patch. Decreasing to 16x16 and increasing network hidden units make the network converge right away to 43.91 dB under a minute. Surprisingly, with the other image reconstruction SOTA instant-ngp, the network is stuck at 17 dB no matter how long I train.

ngp-random

Is this a possible weakness of instant-ngp? What effect could it bring to real application? You are welcome to test other methods to reconstruct this image!

Owner
AI葵
AI R&D in computer vision. Doing VTuber about DL algorithms. Check my channel! If you find my works helpful, please consider sponsoring! 我有在做VTuber,歡迎訂閱我的頻道!
AI葵
CausaLM: Causal Model Explanation Through Counterfactual Language Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart Abstract: Understan

Amir Feder 39 Jul 10, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
A copy of Ares that costs 30 fucking dollars.

Finalement, j'ai décidé d'abandonner cette idée, je me suis comporté comme un enfant qui été en colère. Comme m'ont dit certaines personnes j'ai des c

Bleu 24 Apr 14, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
🕺Full body detection and tracking

Pose-Detection 🤔 Overview Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign

Abbas Ataei 20 Nov 21, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Generates all variables from your .tf files into a variables.tf file.

tfvg Generates all variables from your .tf files into a variables.tf file. It searches for every var.variable_name in your .tf files and generates a v

1 Dec 01, 2022
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
EdMIPS: Rethinking Differentiable Search for Mixed-Precision Neural Networks

EdMIPS is an efficient algorithm to search the optimal mixed-precision neural network directly without proxy task on ImageNet given computation budgets. It can be applied to many popular network arch

Zhaowei Cai 47 Dec 30, 2022
Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

Facebook Research 85 Jan 02, 2023
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
Continuous Diffusion Graph Neural Network

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE.

Twitter Research 227 Jan 05, 2023
Semantic Segmentation with Pytorch-Lightning

This is a simple demo for performing semantic segmentation on the Kitti dataset using Pytorch-Lightning and optimizing the neural network by monitoring and comparing runs with Weights & Biases.

Boris Dayma 58 Nov 18, 2022
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하

Junhyuk Park 9 Dec 01, 2022
Greedy Gaussian Segmentation

GGS Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please

Stanford University Convex Optimization Group 72 Dec 07, 2022
Generative code template for PixelBeasts 10k NFT project.

generator-template Generative code template for combining transparent png attributes into 10,000 unique images. Used for the PixelBeasts 10k NFT proje

Yohei Nakajima 9 Aug 24, 2022
HybridNets: End-to-End Perception Network

HybridNets: End2End Perception Network HybridNets Network Architecture. HybridNets: End-to-End Perception Network by Dat Vu, Bao Ngo, Hung Phan 📧 FPT

Thanh Dat Vu 370 Dec 29, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID,

Intermediate Domain Module (IDM) This repository is the official implementation for IDM: An Intermediate Domain Module for Domain Adaptive Person Re-I

Yongxing Dai 87 Nov 22, 2022
This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation) Usage example python dynamic_inverted_softmax.py --sims_train

36 Dec 29, 2022