Rendering Point Clouds with Compute Shaders

Overview

Compute Shader Based Point Cloud Rendering

This repository contains the source code to our techreport:
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Markus Schütz, Bernhard Kerbl, Michael Wimmer. (not peer-reviewed, currently in submission)

  • Compute shaders can render point clouds up to an order of magnitude faster than GL_POINTS.
  • With a combination of warp-wide deduplication and early-z, compute shaders able to render 796 million points (12.7GB) at stable 62 to 64 frames per second in various different viewpoints on an RTX 3090. This corresponds to a memory bandwidth utilization of about 802GB/s, or a throughput of about 50 billion points per second.
  • The vertex order also strongly affects the performance. Some locality of points that are consecutive in memory is beneficial, but excessive locality can result in drastic slowdowns if it leads to thousands of GPU threads attempting to update a single pixel. As such, neither Morton ordered nor shuffled buffers are optimal. However combining both by first sorting by Morton code, and then shuffling batches of 128 points but leaving points within a batch in order, results in an improved ordering that ensures high performance with our compute approaches, and it also increases the performance of GL_POINTS by up to 5 times.

About the Framework

This framework is written in C++ and JavaScript (using V8). Most of the rendering is done in JavaScript with bindings to OpenGL 4.5 functions. It is written with live-coding in mind, so many javascript files are immediately executed at runtime as soon as they are saved by any text editor. As such, code has to be written with repeated execution in mind.

Getting Started

  • Compile Skye.sln project with Visual Studio.
  • Open the workspace in vscode.
  • Open "load_pointcloud.js" (quick search files via ctrl + e).
    • Adapt the path to the correct location of the las file.
    • Adapt position and lookAt to a viewpoint that fits your point cloud.
    • Change window.x to something that fits your monitor setup, e.g., 0 if you've got a single monitor, or 2540 if you've got two monitors and your first one has a with of 2540 pixels.
  • Press "Ctrl + Shift + B" to start the app. You should be seing an empty green window. (window.x is not yet applied)
  • Once you save "load_pointcloud.js" via ctrl+s, it will be executed, the window will be repositioned, and the point cloud will be loaded.
  • You can change position and lookAt at runtime and apply them by simply saving load_pointcloud.js again. The pointcloud will not be loaded again - to do so, you'll need to restart first.

After loading the point cloud, you should be seeing something like the screenshot below. The framework includes an IMGUI window with frame times, and another window that lets you switch between various rendering methods. Best try with data sets with tens of millions or hundreds of millions of points!

sd

Code Sections

Code for the individual rendering methods is primarily found in the modules/compute_<methods> folders.

Method Location
atomicMin ./modules/compute
reduce ./modules/compute_ballot
early-z ./modules/compute_earlyDepth
reduce & early-z ./modules/compute_ballot_earlyDepth
dedup ./modules/compute_ballot_earlyDepth_dedup
HQS ./modules/compute_hqs
HQS1R ./modules/compute_hqs_1x64bit_fast
busy-loop ./modules/compute_guenther
just-set ./modules/compute_just_set

Results

Frame times when rendering 796 million points on an RTX 3090 in a close-up viewpoint. Times in milliseconds, lower is better. The compute methods reduce (with early-z) and dedup (with early-z) yield the best results with Morton order (<16.6ms, >60fps). The shuffled Morton order greatly improves performance of GL_POINTS and some compute methods, and it is usually either the fastest or within close margins of the fastest combinations of rendering method and ordering.

Not depicted is that the dedup method is the most stable approach that continuously maintains >60fps in all viewpoints, while the performance of the reduce method varies and may drop to 50fps in some viewpoints. As such, we would recomend to use dedup in conjunction with Morton order if the necessary compute features are available, and reduce (with early-z) for wider support.

Comments
  • Can provide some dataset to test the demo (like default Candi Banyunibo data set)?

    Can provide some dataset to test the demo (like default Candi Banyunibo data set)?

    Hi, just found this paper & project via graphics weekly news.. just compiled the demo, and seems uses that dataset by default (banyunibo_inside_morton).. anyway to obtain that dataset? if not can you provide some download link to some huge & equivalent data set used by the demo like retz,eclepens,etc.. are this datasets under non "open" licenses? just wanted to test performance on my Titan V compared to a 3090.. thanks..

    opened by oscarbg 11
  • invisible window

    invisible window

    If I compile in DEBUG (VS2019) it crashes in void V8Helper::setupGL in this line: setupV8GLExtBindings(tpl);

    If I compile in RELEASE it compiles and the console shows no error but the windows is invisible, I can only see the console (and if I click keys I see them in the log).

    I have a 1070

    opened by jagenjo 3
  • A question of render.cs

    A question of render.cs

    image in render.cs there's vec2 variable called imgPos, i don't know why it times 0.5 and plus 0.5, what does it mean? Thank you very much if you can answer my questions. : )

    opened by UMR19 2
  • Questions about point cloud display when zoom in

    Questions about point cloud display when zoom in

    Hi, Thanks for your excellent job, just i compiled the demo and modified the setting to load my point cloud, it loaded successfully and have a better performance. but when i roll the mouse wheel to zoom in to look at the detail, lots of point missed, but when i use potree to display my point cloud, it display perfect. I am a beginner in computer graphics,may be it's point size is too small? image In the potree image

    Thanks for you reply:)

    opened by UMR19 0
  • ssRG and ssBA not bound to resolve?

    ssRG and ssBA not bound to resolve?

    In https://github.com/m-schuetz/compute_rasterizer/blob/master/compute_hqs/render.js

    Both buffers are bound in the render attribute pass, but neither is explicitly bound in the resolve pass. Always assumed that bindBufferBase affects the currently bound shader, but apparently the binding caries over to the next shader? Just a reminder to look this up to clarify my understanding of how they work.

    gl.bindBufferBase(gl.SHADER_STORAGE_BUFFER, 3, ssRG);
    gl.bindBufferBase(gl.SHADER_STORAGE_BUFFER, 4, ssBA);
    
    opened by m-schuetz 0
Releases(build_laz_crf)
Owner
Markus Schütz
Markus Schütz
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

David Mascharka 351 Nov 18, 2022
Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022 A Multi-Modal Fact Verification dataset released as part of the De-Factify workshop in

Wei-Yao Wang 11 Nov 15, 2022
Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

Hez-X 3 Feb 17, 2022
An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

customer_segmentation_with_rfm Business Problem : An e-commerce company wants to

Buse Yıldırım 3 Jan 06, 2022
Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Riskfolio-Lib Quantitative Strategic Asset Allocation, Easy for Everyone. Description Riskfolio-Lib is a library for making quantitative strategic ass

Riskfolio 1.7k Jan 07, 2023
Code for the paper Hybrid Spectrogram and Waveform Source Separation

Demucs Music Source Separation This is the 3rd release of Demucs (v3), featuring hybrid source separation. For the waveform only Demucs (v2): Go this

Meta Research 4.8k Jan 04, 2023
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

This repository contains the scripts needed to generate the results from the paper Unity Propagation in Bayesian Networks Handling Inconsistency via U

0 Jan 19, 2022
This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

NNProject - DeepMask This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. Th

189 Nov 16, 2022
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”.

3.7k Dec 31, 2022
The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

Deep Residual Fourier Transformation for Single Image Deblurring Xintian Mao, Yiming Liu, Wei Shen, Qingli Li and Yan Wang code will be released soon

145 Dec 13, 2022
Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Intro Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" Robotics:Science and

Yunho Kim 21 Dec 07, 2022
A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning

Rce-KGQA A novel pipeline framework for multi-hop complex KGQA task. This framework mainly contains two modules, answering_filtering_module and relati

金伟强 -上海大学人工智能小渣渣~ 16 Nov 18, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

ELD The implementation of CVPR 2020 (Oral) paper "A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising" and its journal (TPAMI) v

Kaixuan Wei 359 Jan 01, 2023
Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Deep-RTC [project page] This repository contains the source code accompanying our ECCV 2020 paper. Solving Long-tailed Recognition with Deep Realistic

Gina Wu 16 May 26, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
Official PyTorch implementation of RIO

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-

NVIDIA Research Projects 17 May 20, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022