Toolchest provides APIs for scientific and bioinformatic data analysis.

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

  • you need a key
  • you’ve forgotten your key
  • the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments
  • Enable paired reads for `kraken2`

    Enable paired reads for `kraken2`

    Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

    Adds/removes --paired to tool_args as necessary.

    opened by bcai2 3
  • v0.4.0

    v0.4.0

    • Add Poetry, remove Twine

    • Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

    Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.

    opened by lebovic 3
  • S3 chaining

    S3 chaining

    Adds:

    • Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables
    • S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool
    • the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
    opened by lebovic 2
  • Polish tool_arg handling, add more STAR args

    Polish tool_arg handling, add more STAR args

    Adds:

    • More STAR args
    • Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)
    • Error on unknown or blacklisted args
    • Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

    Requires:

    • https://github.com/trytoolchest/toolchest-worker-node/pull/24
    • https://github.com/trytoolchest/toolchest-api/pull/22

    This does not fix:

    • Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
    opened by lebovic 2
  • STAR whitelist options

    STAR whitelist options

    • Adds basic whitelist options for STAR.

    • Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

    (This should be merged in after the kraken2 paired read commit.)

    opened by bcai2 2
  • feat: centrifuge base

    feat: centrifuge base

    • Adds the centrifuge tool.
    • Adds docs.
    • Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
    opened by bcai2 1
  • fix: upload/download tracker bugfixes

    fix: upload/download tracker bugfixes

    • Refactors the tracking printed statements into a pythonic print call with string formatting.
    • Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)
    • Adds integration test dirs to .gitignore.
    opened by bcai2 1
  • fix: remove pysam due to multiple issues

    fix: remove pysam due to multiple issues

    Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

    opened by jherr-dev 1
  • feat: add preliminary alphafold support

    feat: add preliminary alphafold support

    Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

    use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

    opened by jherr-dev 1
  • feat: support async execution

    feat: support async execution

    Adds:

    • Support for async execution

    See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.

    opened by lebovic 1
  • fix: set default version number

    fix: set default version number

    Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

    Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

    opened by bcai2 1
Releases(v0.11.3)
Owner
Toolchest
Toolchest
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
MoRecon - A tool for reconstructing missing frames in motion capture data.

MoRecon - A tool for reconstructing missing frames in motion capture data.

Yuki Nishidate 38 Dec 03, 2022
PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma

Ralph Seichter 11 Nov 24, 2022
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Basis Set Format Converter

Basis Set Format Converter Repository for the online tool that allows you to enter a basis set in the form of text input for a variety of Quantum Chem

Manas Sharma 3 Jun 27, 2022
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 06, 2022
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o

HyperSpy 411 Dec 27, 2022
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

CRISPRanalysis InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family. In this work, we present a workflow

2 Jan 31, 2022
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 06, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

Ryan McGeehan 3 Nov 04, 2022
Tools for analyzing data collected with a custom unity-based VR for insects.

unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul

Hannah Haberkern 1 Dec 14, 2022
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022
Analyzing Covid-19 Outbreaks in Ontario

My group and I took Covid-19 outbreak statistics from ontario, and analyzed them to find different patterns and future predictions for the virus

Vishwaajeeth Kamalakkannan 0 Jan 20, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

C$50 Finance In this guide we want to implement a website via which users can “register”, “login” “buy” and “sell” stocks, like below: Background If y

1 Jan 24, 2022