Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Overview

Find relative paths from a project root directory

Finding project directories in Python (data science) projects, just like there R here and rprojroot packages.

Problem: I have a project that has a specific folder structure, for example, one mentioned in Noble 2009 or something similar to this project template, and I want to be able to:

  1. Run my python scripts without having to specify a series of ../ to get to the data folder.
  2. cd into the directory of my python script instead of calling it from the root project directory and specify all the folders to the script.
  3. Reference datasets from a root directory when using a jupyter notebook because everytime I use a jupyter notebook, the working directory changes to the location of the notebook, not where I launched the notebook server.

Solution: pyprojroot finds the root working directory for your project as a pathlib object. You can now use the here function to pass in a relative path from the project root directory (no matter what working directory you are in the project), and you will get a full path to the specified file. That is, in a jupyter notebook, you can write something like pandas.read_csv(here('./data/my_data.csv')) instead of pandas.read_csv('../data/my_data.csv'). This allows you to restructure the files in your project without having to worry about changing file paths.

Great for reading and writing datasets!

Installation

pip

pip install pyprojroot

conda

https://anaconda.org/conda-forge/pyprojroot

conda install -c conda-forge pyprojroot 

Usage

from pyprojroot import here

here()

Example

Load the packages

In [1]: from pyprojroot import here
In [2]: import pandas as pd

The current working directory is the "notebooks" folder

In [3]: !pwd
/home/dchen/git/hub/scipy-2019-pandas/notebooks

In the notebooks folder, I have all my notebooks

In [4]: !ls
01-intro.ipynb  02-tidy.ipynb  03-apply.ipynb  04-plots.ipynb  05-model.ipynb  Untitled.ipynb

If I wanted to access data in my notebooks I'd have to use ../data

In [5]: !ls ../data
billboard.csv  country_timeseries.csv  gapminder.tsv  pew.csv  table1.csv  table2.csv  table3.csv  table4a.csv  table4b.csv  weather.csv

However, with there here function, I can access my data all from the project root. This means if I move the notebook to another folder or subfolder I don't have to change the path to my data. Only if I move the data to another folder would I need to change the path in my notebook (or script)

In [6]: pd.read_csv(here('./data/gapminder.tsv'), sep='\t').head()
Out[6]:
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106

By the way, you get a pathlib object path back!

In [7]: here('./data/gapminder.tsv')
Out[7]: PosixPath('/home/dchen/git/hub/scipy-2019-pandas/data/gapminder.tsv')
Owner
Daniel Chen
bow ties are cool
Daniel Chen
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande

Thanh Dat Vu 1 Feb 27, 2022
sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

Saiem Gilani 37 Dec 27, 2022
Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022
4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

Digital Methods Initiative 147 Dec 20, 2022
Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022
Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

Python for Data 2.5k Jan 09, 2023
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

📈 Statistical Quality Control 📉 This repo contains a simple but effective tool made using python which can be used for quality control in statistica

SasiVatsal 8 Oct 18, 2022
vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

gg I wasn't satisfied with any of the other available Gemini clients, so I wrote my own. Requires Python 3.9 (maybe older, I haven't checked) and opti

RAFAEL RODRIGUES 5 Jan 03, 2023
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

6 Oct 11, 2022
Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day.

Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day. Correlate the market activity with the Apple Keynote presentations.

2 Jan 04, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Includes all files needed to satisfy hw02 requirements

HW 02 Data Sets Mean Scale Score for Asian and Hispanic Students, Grades 3 - 8 This dataset provides insights into the New York City education system

7 Oct 28, 2021
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 09, 2022
InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

CRISPRanalysis InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family. In this work, we present a workflow

2 Jan 31, 2022
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
Pip install minimal-pandas-api-for-polars

Minimal Pandas API for Polars Install From PyPI: pip install minimal-pandas-api-for-polars Example Usage (see tests/test_minimal_pandas_api_for_polars

Austin Ray 6 Oct 16, 2022
TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)

136 Dec 22, 2022
BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022