๐Ÿงฎ Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Overview

LDA4Rec

Project generated with PyScaffold

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model After All" by Florian Wilhelm. The preprint can be found here along with the following statement:

"ยฉ Florian Wilhelm 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in RecSys '21: Fifteenth ACM Conference on Recommender Systems Proceedings, https://doi.org/10.1145/3460231.3474266."

Installation

In order to set up the necessary environment:

  1. review and uncomment what you need in environment.yml and create an environment lda4rec with the help of conda:
    conda env create -f environment.yml
    
  2. activate the new environment with:
    conda activate lda4rec
    
  3. (optionally) get a free neptune.ai account for experiment tracking and save the api token under ~/.neptune_api_token (default).

Running Experiments

First check out and adapt the default experiment config configs/default.yaml and run it with:

lda4rec -c configs/default.yaml run

A config like configs/default.yaml can also be used as a template to create an experiment set with:

lda4rec -c configs/default.yaml create -ds movielens-100k

using the Movielens-100k dataset. Check out cli.py for more details.

Cloud Setup

Commands for setting up an Ubuntu 20.10 VM with at least 20 GiB of HD on e.g. a GCP c2-standard-30 instance:

tmux
sudo apt-get install -y build-essential
curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
cargo install pueue
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
git clone https://github.com/FlorianWilhelm/lda4rec.git
cd lda4rec
conda env create -f environment.yml
conda activate lda4rec
vim ~/.neptune_api_token # and copy it over

Then create and run all experiments for full control over parallelism with pueue:

pueued -d # only once to start the daemon
pueue parallel 10
export OMP_NUM_THREADS=4  # to limit then number of threads per model
lda4rec -c configs/default.yaml create # to create the config files
find ./configs -maxdepth 1 -name "exp_*.yaml" -exec pueue add "lda4rec -c {} run" \; -exec sleep 30 \;

Remark: -exec sleep 30 avoids race condition when reading datasets if parallelism is too high.

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yml for the exact reproduction of your environment with:
    conda env export -n lda4rec -f environment.lock.yml
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yml using:
    conda env update -f environment.lock.yml --prune

Project Organization

โ”œโ”€โ”€ AUTHORS.md              <- List of developers and maintainers.
โ”œโ”€โ”€ CHANGELOG.md            <- Changelog to keep track of new features and fixes.
โ”œโ”€โ”€ LICENSE.txt             <- License as chosen on the command-line.
โ”œโ”€โ”€ README.md               <- The top-level README for developers.
โ”œโ”€โ”€ configs                 <- Directory for configurations of model & application.
โ”œโ”€โ”€ data                    <- Downloaded datasets will be stored here.
โ”œโ”€โ”€ docs                    <- Directory for Sphinx documentation in rst or md.
โ”œโ”€โ”€ environment.yml         <- The conda environment file for reproducibility.
โ”œโ”€โ”€ notebooks               <- Jupyter notebooks. Naming convention is a number (for
โ”‚                              ordering), the creator's initials and a description,
โ”‚                              e.g. `1.0-fw-initial-data-exploration`.
โ”œโ”€โ”€ logs                    <- Generated logs are collected here.
โ”œโ”€โ”€ results                 <- Results as exported from neptune.ai.
โ”œโ”€โ”€ setup.cfg               <- Declarative configuration of your project.
โ”œโ”€โ”€ setup.py                <- Use `python setup.py develop` to install for development or
โ”‚                              or create a distribution with `python setup.py bdist_wheel`.
โ”œโ”€โ”€ src
โ”‚   โ””โ”€โ”€ lda4rec             <- Actual Python package where the main functionality goes.
โ”œโ”€โ”€ tests                   <- Unit tests which can be run with `py.test`.
โ”œโ”€โ”€ .coveragerc             <- Configuration for coverage reports of unit tests.
โ”œโ”€โ”€ .isort.cfg              <- Configuration for git hook that sorts imports.
โ””โ”€โ”€ .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

How to Cite

Please cite LDA4Rec if it helps your research. You can use the following BibTeX entry:

@inproceedings{wilhelm2021lda4rec,
author = {Wilhelm, Florian},
title = {Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All},
year = {2021},
month = sep,
isbn = {978-1-4503-8458-2/21/09},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3460231.3474266},
doi = {10.1145/3460231.3474266},
booktitle = {Fifteenth ACM Conference on Recommender Systems},
numpages = {8},
location = {Amsterdam, Netherlands},
series = {RecSys '21}
}

License

This sourcecode is AGPL-3-only licensed. If you require a more permissive licence, e.g. for commercial reasons, contact me to obtain a licence for your business.

Acknowledgement

Special thanks goes to Du Phan and Fritz Obermeyer from the (Num)Pyro project for their kind help and helpful comments on my code.

Note

This project has been set up using PyScaffold 4.0 and the dsproject extension 0.6. Some source code was taken from Spotlight (MIT-licensed) by Maciej Kula as well as lrann (MIT-licensed) by Florian Wilhelm and Marcel Kurovski.

Owner
Florian Wilhelm
Data Scientist with a mathematical background.
Florian Wilhelm
Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

Sergey Zagoruyko 1.4k Dec 23, 2022
Pytorch implementation of MLP-Mixer with loading pre-trained models.

MLP-Mixer-Pytorch PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained p

Qiushi Yang 2 Sep 29, 2022
LineBoard - Python+React+MySQL-็™ฝๆฟๅณๆ™‚็ณป็ตฑๆ”นๅ–„ไบบ็พค่กŒ็‚บ

LineBoard-็™ฝๆฟๅณๆ™‚็ณป็ตฑๆ”นๅ–„ไบบ็พค่กŒ็‚บ ๅณๆ™‚้กฏ็คบๅฏฆ้ฉ—ๅฎค็š„ไฝฟ็”จ็‹€ๆณ๏ผŒไธฆ้ ็ซฏ้ ็ด„ๆŽ’้šŠ๏ผŒไปฅๆญคไพ†ๆ”นๅ–„ไบบๅ€‘็š„ๅทฅไฝœๆ•ˆ็އ ็จ‹ๅผๆžถๆง‹ ้‹ไฝœๆต็จ‹ ไฝฟ็”จ่€…ๅ…ˆ่‡ณ่ฉฒๅฏฆ้ฉ—ๅฎค็ถฒ็ซ™้ ็ด„

Bo-Jyun Huang 1 Feb 22, 2022
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip โ€” Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 115 Dec 28, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
Construct a neural network frame by Numpy

ๆœฌ้กน็›ฎ็š„CSDNๅšๅฎข้“พๆŽฅ๏ผšhttps://blog.csdn.net/weixin_41578567/article/details/111482022 1. ๆฆ‚่งˆ ๆœฌ้กน็›ฎไธป่ฆ็”จไบŽ็ฅž็ป็ฝ‘็ปœ็š„ๅญฆไน ๏ผŒ้€š่ฟ‡ๅŸบไบŽnumpy็š„ๅฎž็Žฐ๏ผŒไบ†่งฃ็ฅž็ป็ฝ‘็ปœๅบ•ๅฑ‚ๅ‰ๅ‘ไผ ๆ’ญใ€ๅๅ‘ไผ ๆ’ญไปฅๅŠๅ„็ฑปไผ˜ๅŒ–ๅ™จ็š„ๅŽŸ็†ใ€‚ ่ฏฅ้กน็›ฎ็›ฎๅ‰ๅทฒๅฎž็Žฐ็š„ๅŠŸ

24 Jan 22, 2022
FLVIS: Feedback Loop Based Visual Initial SLAM

FLVIS Feedback Loop Based Visual Inertial SLAM 1-Video EuRoC DataSet MH_05 Handheld Test in Lab FlVIS on UAV Platform 2-Relevent Publication: Under Re

UAV Lab - HKPolyU 182 Dec 04, 2022
SplineConv implementation for Paddle.

SplineConv implementation for Paddle This module implements the SplineConv operators from Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Mรผl

ๅŒ—ๆตท่‹ฅ 3 Dec 29, 2021
A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

MPT A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities. Implementation for our AAAI 2022 paper: Multi-

yidiLi 4 May 08, 2022
Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

CRL_EGPG Pytorch Implementation of Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation We use contrastive loss implemented b

YHR 25 Nov 14, 2022
A symbolic-model-guided fuzzer for TLS

tlspuffin TLS Protocol Under FuzzINg A symbolic-model-guided fuzzer for TLS Master Thesis | Thesis Presentation | Documentation Disclaimer: The term "

69 Dec 20, 2022
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Satellite labelling tool About this app A tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, ri

Czech Hydrometeorological Institute - Satellite Department 10 Sep 14, 2022
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022
Low-dose Digital Mammography with Deep Learning

Impact of loss functions on the performance of a deep neural network designed to restore low-dose digital mammography ====== This repository contains

WANG-AXIS 6 Dec 13, 2022
A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

Code For AC-GC: Lossy Activation Compression with Guaranteed Convergence This code is intended to be used as a supplemental material for submission to

Dave Evans 2 Nov 01, 2022
A hybrid framework (neural mass model + ML) for SC-to-FC prediction

The current workflow simulates brain functional connectivity (FC) from structural connectivity (SC) with a neural mass model. Gradient descent is applied to optimize the parameters in the neural mass

Yilin Liu 1 Jan 26, 2022
Semi-Supervised Learning, Object Detection, ICCV2021

End-to-End Semi-Supervised Object Detection with Soft Teacher By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai,

Microsoft 789 Dec 27, 2022