A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

Last update: Jan 04, 2023

Overview

LineFlow: Framework-Agnostic NLP Data Loader in Python

LineFlow is a simple text dataset loader for NLP deep learning tasks.

LineFlow was designed to use in all deep learning frameworks.
LineFlow enables you to build pipelines via functional APIs (.map, .filter, .flat_map).
LineFlow provides common NLP datasets.

LineFlow is heavily inspired by tensorflow.data.Dataset and chainer.dataset.

Basic Usage

lineflow.TextDataset expects line-oriented text files:

import lineflow as lf


'''/path/to/text will be expected as follows:
i 'm a line 1 .
i 'm a line 2 .
i 'm a line 3 .
'''
ds = lf.TextDataset('/path/to/text')

ds.first()  # "i 'm a line 1 ."
ds.all() # ["i 'm a line 1 .", "i 'm a line 2 .", "i 'm a line 3 ."]
len(ds)  # 3
ds.map(lambda x: x.split()).first()  # ["i", "'m", "a", "line", "1", "."]

Example

Please check out the examples to see how to use LineFlow, especially for tokenization, building vocabulary, and indexing.

Loads Penn Treebank:

>>> import lineflow.datasets as lfds
>>> train = lfds.PennTreebank('train')
>>> train.first()
' aer banknote berlitz calloway centrust cluett fromstein gitano guterman hydro-quebec ipo kia memotec mlx nahb punts rake regatta rubens sim snack-food ssangyong swapo wachter '

Splits the sentence to the words:

>>> # continuing from above
>>> train = train.map(str.split)
>>> train.first()
['aer', 'banknote', 'berlitz', 'calloway', 'centrust', 'cluett', 'fromstein', 'gitano', 'guterman', 'hydro-quebec', 'ipo', 'kia', 'memotec', 'mlx', 'nahb', 'punts', 'rake', 'regatta', 'rubens', 'sim', 'snack-food', 'ssangyong', 'swapo', 'wachter']

Obtains words in dataset:

>>> # continuing from above
>>> words = train.flat_map(lambda x: x)
>>> words.take(5) # This is useful to build vocabulary.
['aer', 'banknote', 'berlitz', 'calloway', 'centrust']

Further more:

How to fine-tune BERT with pytorch-lightning by @sobamchan

Requirements

Python3.6+

Installation

To install LineFlow:

pip install lineflow

Datasets

Is the dataset you want to use not supported? Suggest a new dataset 🎉

Commonsense Reasoning
Language Modeling
Machine Translation
Paraphrase
Question Answering
Sentiment Analysis
Sequence Tagging
Text Summarization

Commonsense Reasoning

CommonsenseQA

Loads the CommonsenseQA dataset:

>> dev = lfds.CommonsenseQA("dev") >>> test = lfds.CommonsenseQA("test")">

>>> import lineflow.datasets as lfds

>>> train = lfds.CommonsenseQA("train")
>>> dev = lfds.CommonsenseQA("dev")
>>> test = lfds.CommonsenseQA("test")

The items in this datset as follows:

>> train.first() {"id": "075e483d21c29a511267ef62bedc0461", "answer_key": "A", "options": {"A": "ignore", "B": "enforce", "C": "authoritarian", "D": "yell at", "E": "avoid"}, "stem": "The sanctions against the school were a punishing blow, and they seemed to what the efforts the school had made to change?"} }">

>>> import lineflow.datasets as lfds

>>> train = lfds.CommonsenseQA("train")
>>> train.first()
{"id": "075e483d21c29a511267ef62bedc0461",
 "answer_key": "A",
 "options": {"A": "ignore",
 "B": "enforce",
 "C": "authoritarian",
 "D": "yell at",
 "E": "avoid"},
 "stem": "The sanctions against the school were a punishing blow, and they seemed to what the efforts the school had made to change?"}
}

Language Modeling

Penn Treebank

Loads the Penn Treebank dataset:

import lineflow.datasets as lfds

train = lfds.PennTreebank('train')
dev = lfds.PennTreebank('dev')
test = lfds.PennTreebank('test')

WikiText-103

Loads the WikiText-103 dataset:

import lineflow.datasets as lfds

train = lfds.WikiText103('train')
dev = lfds.WikiText103('dev')
test = lfds.WikiText103('test')

This dataset is preprossed, so you can tokenize each line with str.split:

>>> import lineflow.datasets as lfds
>>> train = lfds.WikiText103('train').flat_map(lambda x: x.split() + ['
   
    '
   ])
>>> train.take(5)
['
   
    '
   , '=', 'Valkyria', 'Chronicles', 'III']

WikiText-2 (Added by @sobamchan, thanks.)

Loads the WikiText-2 dataset:

import lineflow.datasets as lfds

train = lfds.WikiText2('train')
dev = lfds.WikiText2('dev')
test = lfds.WikiText2('test')

This dataset is preprossed, so you can tokenize each line with str.split:

>>> import lineflow.datasets as lfds
>>> train = lfds.WikiText2('train').flat_map(lambda x: x.split() + ['
   
    '
   ])
>>> train.take(5)
['
   
    '
   , '=', 'Valkyria', 'Chronicles', 'III']

Machine Translation

small_parallel_enja:

Loads the small_parallel_enja dataset which is small English-Japanese parallel corpus:

import lineflow.datasets as lfds

train = lfds.SmallParallelEnJa('train')
dev = lfds.SmallParallelEnJa('dev')
test = lfd.SmallParallelEnJa('test')

This dataset is preprossed, so you can tokenize each line with str.split:

>>> import lineflow.datasets as lfds
>>> train = lfds.SmallParallelEnJa('train').map(lambda x: (x[0].split(), x[1].split()))
>>> train.first()
(['i', 'can', "'t", 'tell', 'who', 'will', 'arrive', 'first', '.'], ['誰', 'が', '一番', 'に', '着', 'く', 'か', '私', 'に', 'は', '分か', 'り', 'ま', 'せ', 'ん', '。']

Paraphrase

Microsoft Research Paraphrase Corpus:

Loads the Miscrosoft Research Paraphrase Corpus:

import lineflow.datasets as lfds

train = lfds.MsrParaphrase('train')
test = lfds.MsrParaphrase('test')

The item in this dataset as follows:

>>> import lineflow.datasets as lfds
>>> train = lfds.MsrParaphrase('train')
>>> train.first()
{'quality': '1',
 'id1': '702876',
 'id2': '702977',
 'string1': 'Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.',
 'string2': 'Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.'
}

Question Answering

SQuAD:

Loads the SQuAD dataset:

import lineflow.datasets as lfds

train = lfds.Squad('train')
dev = lfds.Squad('dev')

The item in this dataset as follows:

>>> import lineflow.datasets as lfds
>>> train = lfds.Squad('train')
>>> train.first()
{'answers': [{'answer_start': 515, 'text': 'Saint Bernadette Soubirous'}],
 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'id': '5733be284776f41900661182',
 'title': 'University_of_Notre_Dame',
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'}

Sentiment Analysis

IMDB:

Loads the IMDB dataset:

import lineflow.datasets as lfds

train = lfds.Imdb('train')
test = lfds.Imdb('test')

The item in this dataset as follows:

>>> import lineflow.datasets as lfds
>>> train = lfds.Imdb('train')
>>> train.first()
('For a movie that gets no respect there sure are a lot of memorable quotes listed for this gem. Imagine a movie where Joe Piscopo is actually funny! Maureen Stapleton is a scene stealer. The Moroni character is an absolute scream. Watch for Alan "The Skipper" Hale jr. as a police Sgt.', 0)

Sequence Tagging

CoNLL2000

Loads the CoNLL2000 dataset:

import lineflow.datasets as lfds

train = lfds.Conll2000('train')
test = lfds.Conll2000('test')

Text Summarization

CNN / Daily Mail:

Loads the CNN / Daily Mail dataset:

import lineflow.datasets as lfds

train = lfds.CnnDailymail('train')
dev = lfds.CnnDailymail('dev')
test = lfds.CnnDailymail('test')

This dataset is preprossed, so you can tokenize each line with str.split:

>>> import lineflow.datasets as lfds
>>> train = lfds.CnnDailymail('train').map(lambda x: (x[0].split(), x[1].split()))
>>> train.first()
... # the output is omitted because it's too long to display here.

SciTLDR

Loads the TLDR dataset:

import lineflow.datasets as lfds

train = lfds.SciTLDR('train')
dev = lfds.SciTLDR('dev')
test = lfds.SciTLDR('test')

Comments

Revert "Added CommonsenseQA dataset."
I'm sorry to mention after merging but I'd like you to fix these below:

Add CommonsenseQA to README.md

Add lineflow.commonsenseqa.get_commonsenseqa to lineflow/datasets/__init__.py
opened by yasufumy 3
Should slice of IterableDataset return IterableDataset not List?
Is your feature request related to a problem? Please describe.

train = lfds.SciTLDR(split="train") # IterableDataset train_mini = train[:10] # Now this is just a python list (List[Any])

If I make a subset of a dataset, it loses all the features such as .map.

Describe the solution you'd like Return IterableDataset in stead of List.
opened by sobamchan 2
Bump pytest-cov from 2.10.1 to 2.11.0
Bumps pytest-cov from 2.10.1 to 2.11.0.

Changelog

Sourced from pytest-cov's changelog.

2.11.0 (2021-01-18)

Bumped minimum coverage requirement to 5.2.1. This prevents reporting issues. Contributed by Mateus Berardo de Souza Terra in #433.

Improved sample projects (from the examples directory) to support running tox -e pyXY. Now the example configures a suffixed coverage data file, and that makes the cleanup environment unnecessary. Contributed by Ganden Schaffner in #435.

Removed the empty console_scripts entrypoint that confused some Gentoo build script. I didn't ask why it was so broken cause I didn't want to ruin my day. Contributed by Michał Górny in #434.

Fixed the missing coverage context when using subprocesses. Contributed by Bernát Gábor in #443.

Updated the config section in the docs. Contributed by Pamela McA'Nulty in #429.

Migrated CI to travis-ci.com (from .org).

Commits

b45388d Bump version: 2.10.1 → 2.11.0

bd7e850 Fix link name.

5f935d5 Update changelog.

d3c382a Skip this on 3.8+

a2493d5 Skip this on 3.8+

25eed21 Turns out there were some internal changes in the pytester plugin.

42d7705 Oops.

4ce7ac3 Update test deps.

0eada98 Update skel; migrate to travis-ci.com.

6592810 Update setup.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 2
Bump flake8 from 3.7.9 to 3.8.1
Bumps flake8 from 3.7.9 to 3.8.1.

Commits

f94e009 Release 3.8.1

00985a6 Merge branch 'issue638-ouput-file' into 'master'

e6d8a90 options: Forward --output-file to be reparsed for BaseFormatter

b4d2850 Release 3.8.0

03c7dd3 Merge branch 'exclude_dotfiles' into 'master'

9e67511 Fix using --exclude=.* to not match . and ..

6c4b5c8 Merge branch 'linters_py3' into 'master'

309db63 switch dogfood to use python3

8905a7a Merge branch 'logical_position_out_of_bounds' into 'master'

609010c Fix logical checks which report position out of bounds

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 2
Bump ipython from 7.22.0 to 7.23.0
Bumps ipython from 7.22.0 to 7.23.0.

Commits

a0c0411 release 7.23.0

d1b43f2 Merge pull request #12936 from meeseeksmachine/auto-backport-of-pr-12934-on-7.x

5fee80a Merge pull request #12935 from Carreau/auto-backport-of-pr-12932-on-7.x

9f04101 Backport PR #12934: 7.23 release notes

994fcbe Backport PR #12932: remove use of deprecated pipes module

a8955db Merge pull request #12925 from Carreau/auto-backport-of-pr-12817-on-7.x

feeb4ea Backport PR #12817: Use matplotlib-inline instead of ipykernel.pylab

288ca33 Merge pull request #12919 from Carreau/auto-backport-of-pr-12823-on-7.x

0f52b53 Backport PR #12823: Added clear kwarg to display()

197b993 Merge pull request #12911 from meeseeksmachine/auto-backport-of-pr-12758-on-7.x

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump autopep8 from 1.5.4 to 1.5.5
Bumps autopep8 from 1.5.4 to 1.5.5.

Release notes

Sourced from autopep8's releases.

v1.5.5

bug fix and minor improvements

improvement

hhatto/autopep8#566: lazy load toml package

fix bug

hhatto/autopep8#580: not convert raw string for w605 fixed method

Commits

ae7b7de version 1.5.5

afe49a4 fix

1598ca1 drop py35

b3b5d9a change test python version

aa45072 add unit test that multilines w605

831fbe5 Merge pull request #580 from hhatto/fix-w605

91de31e remove unnecessary function

b99a6b4 change: not convert raw string for w605 fixed method

55887dc Merge pull request #578 from timgates42/bugfix_typo_required

bf01df2 docs: fix simple typo, requred -> required

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump ipython from 7.19.0 to 7.20.0
Bumps ipython from 7.19.0 to 7.20.0.

Commits

ebfd01d release 7.20.0

5f2788e Merge pull request #12800 from meeseeksmachine/auto-backport-of-pr-12799-on-7.x

9407cc3 Merge pull request #12790 from Jongy/remove-old-ultratb-getargs

f4ee99f Backport PR #12799: Whats new 7.20

6009e67 Merge pull request #12798 from Carreau/auto-backport-of-pr-12255-on-7.x

8abc94f Backport PR #12255: fix getting signatures in jedi 0.17

09698b5 Merge pull request #12793 from Carreau/fix-jedi-compat

7f7493e Merge pull request #12769 from meeseeksmachine/auto-backport-of-pr-12765-on-7.x

21d3379 Backport PR #12227 on brnach 7.x

477d9a6 forgotten imports

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump pytest from 6.2.1 to 6.2.2
Bumps pytest from 6.2.1 to 6.2.2.

Release notes

Sourced from pytest's releases.

6.2.2

pytest 6.2.2 (2021-01-25)

Bug Fixes

#8152: Fixed "(<Skipped instance>)" being shown as a skip reason in the verbose test summary line when the reason is empty.

#8249: Fix the faulthandler plugin for occasions when running with twisted.logger and using pytest --capture=no.

Changelog

Sourced from pytest's changelog.

Commits

b9c9876 Prepare release version 6.2.2

8003fd2 Merge pull request #8259 from nicoddemus/backport-8250

8d605b9 Merge pull request #8250 from daq-tools/fix-twisted-capture

14e0c3e Merge pull request #8225 from The-Compiler/training-update (#8226)

45facc1 Merge pull request #8224 from nicoddemus/backport-8220

99fe887 Merge pull request #8220 from xuhdev/module-doc

8dbf9dc Merge pull request #8167 from nicoddemus/backport-8166

baaee21 Add Changelog to setup.cfg (#8166)

f7d1ab8 Merge pull request #8163 from bluetech/backport-8152

b8201c2 Merge pull request #8152 from bluetech/empty-skip

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump pytest-cov from 2.10.1 to 2.11.1
Bumps pytest-cov from 2.10.1 to 2.11.1.

Changelog

Sourced from pytest-cov's changelog.

2.11.1 (2021-01-20)

Fixed support for newer setuptools (v42+). Contributed by Michał Górny in #451.

2.11.0 (2021-01-18)

Bumped minimum coverage requirement to 5.2.1. This prevents reporting issues. Contributed by Mateus Berardo de Souza Terra in #433.

Improved sample projects (from the examples directory) to support running tox -e pyXY. Now the example configures a suffixed coverage data file, and that makes the cleanup environment unnecessary. Contributed by Ganden Schaffner in #435.

Removed the empty console_scripts entrypoint that confused some Gentoo build script. I didn't ask why it was so broken cause I didn't want to ruin my day. Contributed by Michał Górny in #434.

Fixed the missing coverage context when using subprocesses. Contributed by Bernát Gábor in #443.

Updated the config section in the docs. Contributed by Pamela McA'Nulty in #429.

Migrated CI to travis-ci.com (from .org).

Commits

5e1913e Bump version: 2.11.0 → 2.11.1

51f1f2b Update changelog.

d39de1e Fix custom commands with newer setuptools

b45388d Bump version: 2.10.1 → 2.11.0

bd7e850 Fix link name.

5f935d5 Update changelog.

d3c382a Skip this on 3.8+

a2493d5 Skip this on 3.8+

25eed21 Turns out there were some internal changes in the pytester plugin.

42d7705 Oops.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump isort from 5.6.4 to 5.7.0
Bumps isort from 5.6.4 to 5.7.0.

Release notes

Sourced from isort's releases.

5.7.0 December 30th 2020

Fixed #1612: In rare circumstances an extra comma is added after import and before comment.

Fixed #1593: isort encounters bug in Python 3.6.0.

Implemented #1596: Provide ways for extension formatting and file paths to be specified when using streaming input from CLI.

Implemented #1583: Ability to output and diff within a single API call to isort.file.

Implemented #1562, #1592 & #1593: Better more useful fatal error messages.

Implemented #1575: Support for automatically fixing mixed indentation of import sections.

Implemented #1582: Added a CLI option for skipping symlinks.

Implemented #1603: Support for disabling float_to_top from the command line.

Implemented #1604: Allow toggling section comments on and off for indented import sections.

Changelog

Sourced from isort's changelog.

5.7.0 December 30th 2020

Fixed #1612: In rare circumstances an extra comma is added after import and before comment.

Fixed #1593: isort encounters bug in Python 3.6.0.

Implemented #1596: Provide ways for extension formatting and file paths to be specified when using streaming input from CLI.

Implemented #1583: Ability to output and diff within a single API call to isort.file.

Implemented #1562, #1592 & #1593: Better more useful fatal error messages.

Implemented #1575: Support for automatically fixing mixed indentation of import sections.

Implemented #1582: Added a CLI option for skipping symlinks.

Implemented #1603: Support for disabling float_to_top from the command line.

Implemented #1604: Allow toggling section comments on and off for indented import sections.

Commits

473d150 Merge in lastest from develop for 5.7.0

a8f4ff3 Regenerate config option docs

681b26c Add @dwanderson-intel, Quentin Santos (@qsantos), and @gofr to acknowledgements

8372d71 Bump to version 5.7.0

3eb14eb 100% test coverage

8e70db8 100% test coverage for new identify module

0383c36 Fix indented identification isort

15502c8 Expose ImportKey from main isort import

69a89c0 Add additional identification test cases

8b83d56 Fix handling of yield and raise statements in import identification

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump pytest from 6.1.2 to 6.2.0
Bumps pytest from 6.1.2 to 6.2.0.

Release notes

Sourced from pytest's releases.

6.2.0

pytest 6.2.0 (2020-12-12)

Breaking Changes

#7808: pytest now supports python3.6+ only.

Deprecations

#7469: Directly constructing/calling the following classes/functions is now deprecated:

_pytest.cacheprovider.Cache

_pytest.cacheprovider.Cache.for_config()

_pytest.cacheprovider.Cache.clear_cache()

_pytest.cacheprovider.Cache.cache_dir_from_config()

_pytest.capture.CaptureFixture

_pytest.fixtures.FixtureRequest

_pytest.fixtures.SubRequest

_pytest.logging.LogCaptureFixture

_pytest.pytester.Pytester

_pytest.pytester.Testdir

_pytest.recwarn.WarningsRecorder

_pytest.recwarn.WarningsChecker

_pytest.tmpdir.TempPathFactory

_pytest.tmpdir.TempdirFactory

These have always been considered private, but now issue a deprecation warning, which may become a hard error in pytest 7.0.0.

#7530: The --strict command-line option has been deprecated, use --strict-markers instead.

We have plans to maybe in the future to reintroduce --strict and make it an encompassing flag for all strictness related options (--strict-markers and --strict-config at the moment, more might be introduced in the future).

#7988: The @pytest.yield_fixture decorator/function is now deprecated. Use pytest.fixture instead.

yield_fixture has been an alias for fixture for a very long time, so can be search/replaced safely.

Features

#5299: pytest now warns about unraisable exceptions and unhandled thread exceptions that occur in tests on Python>=3.8. See unraisable for more information.

#7425: New pytester fixture, which is identical to testdir but its methods return pathlib.Path when appropriate instead of py.path.local.

This is part of the movement to use pathlib.Path objects internally, in order to remove the dependency to py in the future.

Internally, the old Testdir <_pytest.pytester.Testdir> is now a thin wrapper around Pytester <_pytest.pytester.Pytester>, preserving the old interface.

Changelog

Sourced from pytest's changelog.

Commits

e7073af Prepare release version 6.2.0

683f29f Merge pull request #8129 from bluetech/docs-pygments-workaround

0feeddf doc: temporary workaround for pytest-pygments lexing error

b478275 Merge pull request #8128 from bluetech/skip-reason-empty

3302ff9 terminal: when the skip/xfail is empty, don't show it as "()"

59bd0f6 Merge pull request #8126 from bluetech/tox-regen-pretend-scm2

6298ff1 tox: use pip legacy resolver for regen job

d51ecbd Merge pull request #8125 from bluetech/tox-rm-pip-req

f237b07 tox: remove requires: pip>=20.3.1

95e0e19 Merge pull request #8124 from bluetech/s0undt3ch-feature/skip-context-hook

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
wmt14 google drive link is dead now.
Describe the bug the google drive link to download wmt14 dataset is now unavailable.

To Reproduce

import lineflow.datasets as lfds train_dataset = lfds.Wmt14("train")

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]

Browser [e.g. chrome, safari]

Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]

OS: [e.g. iOS8.1]

Browser [e.g. stock browser, safari]

Version [e.g. 22]

Additional context I can try finding a working URL when I have some time.
opened by sobamchan 0
Add support for

Datasets

*** SNLI dataset *** : https://nlp.stanford.edu/projects/snli/ *** MLNLI dataset *** : https://www.nyu.edu/projects/bowman/multinli/

Please provide both of the datasets individually and the combined dataset as ALLNLI

opened by ashutosh-dwivedi-e3502 0