Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Last update: Dec 31, 2022

Overview

eBook Reader Dictionaries

Finally, decent dictionaries based on Wiktionary for your beloved eBook reader.

Dictionaries

Catalan
🚧 Ελληνικά (help welcome)
English
Español
Français (nouveautés)
Italiano
🚧 norsk (help welcome)
Português
Svenska

Requirements

Kobo

Kobo firmware >= 4.24. For older firmwares, you can find outdated dictionaries here.

Updating Dictionaries

All dictionaries are automatically re-generated every day at midnight. The process uses the latest Wiktionary dump available at that time. Note that download links never change.

You should open an issue if:
- you do not find a word;
- a definition is not similar to the one on Wiktionary;
- a definition is missing.
If a definition is not good for you, changes must be done on Wiktionary directly. Your changes will likely be included in the next Wiktionary dump, so when it will come, at most 24h later the new dictionary will contain your stuff :)

Adding a new Dictionary

Pull requests are very welcome. It is quite straightforward to add a new locale, see HOWTO Add a New Local.

Contributors ✨

Thanks go to these wonderful people (emoji key):

_{Nicolas Froment}

_Attilio

_{Saeed Rasooli}

This project follows the all-contributors specification. Contributions of any kind welcome!

Comments

Generate SVG rather than GIF for embedded pictures

A successfull experiementation was done in https://github.com/BoboTiG/ebook-reader-dict/issues/1182#issuecomment-1027245425 about moving embedded pictures from GIF to SVG. Results are way better, so let's do the move.

We first need to ensure this works with PyGlossary and StarDict display.

Note: PyGlossary 4.4.2 or newer is required.

opened by BoboTiG 60
PyGlossary conversion errors (missing images)

Note from @BoboTiG: issue tightly coupled to #1182, interesting details can be found there too.

I just downloaded, parsed and rendered the EN Wiktionary, and it apparently has some problems with erroneous and/or missing GIFs:

output.txt

All of the .gif files in data/en/res appear to be very ugly rendered fomulae (?).

opened by Moonbase59 51
New locale: DE
My goal is to have (and share) a good German Wiktionary-based dictionary that displays well on small e-reader screens and is a little more informative (i.e., has word form, gender, hyphenation, IPA pronunciation, meaning, abbreviations, synonyms and examples). My main target format would be StarDict, with possible spinoff formats for Kobo (dicthtml?), PocketBook (?) and Tolino (quickdic).

Too bad pyglossary doesn’t support R. Döffinger’s quickdic format, because Tolino devices use that, and we do have a rather large Tolino user base in Germany. Not everybody wants to jailbreak their device…

I currently use DE Wiktionary dumps and a rather brute-force Rexx script to generate a Tabfile, which I then convert to StarDict and dicthtml formats. (See attached screenshots for how it looks in GoldenDict on Linux.)

This is of course a flakey way to do it, and I’d prefer to collaborate with a more sound foundation like yours and integrate it there, also because yours gets auto-updated.

Unfortunately, the HOWTO Add a New Locale section in the wiki here isn’t too detailed, and I’d probably need quite a bit of help to get started. I’m especially unsure about the first two steps and the "Remove all data from the old lang."

So my questions are:

Would you be interested in a German dictionary that should look approximately like the screenshots show?

Is it possible to do, without investing too much time? (There’s a lot of other things I have to spend my time on, but I’d be willing to invest a substantial amount of time to get it started and polished a little.)

Is there any assistance possible in getting me set up to get the first steps done? I reckon that’d be to set up a working environment on my Linux Mint 20.3 machine, do a fork, and start adding a language "de".

Since I know almost nothing about Wiktionary’s internal structures, I fear the templates most. But having had a glance at your code, I think there is some expertise here…

Screenshots: This is how I envision it to look like. Users on MobileRead and the German E-Reader Forum have been quite enthusiastic about the first version. Screenshots show the StarDict version used by GoldenDict on a Linux desktop.

Links to what exists already:

Rexx script to convert wiktionary dump to tab file

Resulting StarDict DE dictionary

Resulting dicthtml DE dictionary

locale:German
opened by Moonbase59 49

[EL] Add EL locale

I am trying to add Greek. I wonder if you could give me some feedback on the regexes. Below you see some examples and what I have come up with so far (I tried editing the IT file). The pronunciation appears to have variant structures, not sure how to accommodate that.

# Regex to find the pronunciation
# {{ΔΦΑ|tɾeˈlos|γλ=el}}
# {{ΔΦΑ|γλ=el|ˈni.xta}}
pronunciation = r"{ΔΦΑ\|γλ=el\|/([^/]+)/"
# Regex to find the gender
# '''{{PAGENAME}}''' {{θ}}
# '''{{PAGENAME}}''' {{ο}}
# '''{{PAGENAME}}''' {{α}}
gender = r"'''{{PAGENAME}}''' ([θαο])"

I tried running it and I got

>> Processing data\el\pages-20210620.xml ...
Traceback (most recent call last):
  File "C:\Users\spiros\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\spiros\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\path1\wikidict\wikidict\__main__.py", line 118, in <module>
    sys.exit(main())
  File "C:\path1\wikidict\wikidict\__main__.py", line 110, in main
    parse.main(args["LOCALE"])
  File "C:\path1\wikidict\wikidict\parse.py", line 103, in main
    words = process(file, locale)
  File "C:\path1\wikidict\wikidict\parse.py", line 70, in process
    word, code = xml_parse_element(element, locale)
  File "C:\path1\wikidict\wikidict\parse.py", line 57, in xml_parse_element
    if all(section not in code for section in head_sections[locale]):
KeyError: 'el'

This is all the file

"""Greek language."""
from typing import Dict, Tuple

# Regex to find the pronunciation
# {{ΔΦΑ|tɾeˈlos|γλ=el}}
# {{ΔΦΑ|γλ=el|ˈni.xta}}
pronunciation = r"{ΔΦΑ\|γλ=el\|/([^/]+)/"
# Regex to find the gender
# '''{{PAGENAME}}''' {{θ}}
# '''{{PAGENAME}}''' {{ο}}
# '''{{PAGENAME}}''' {{α}}
gender = r"'''{{PAGENAME}}''' ([θαο])"

# Float number separator
float_separator = ","

# Thousands separator
thousands_separator = " "

# Markers for sections that contain interesting text to analyse.
head_sections = ("{{-el-}}",)
etyl_section = ["{{ετυμολογία}}"]
sections = (
    *head_sections,
    *etyl_section,
    "{{ουσιαστικό}},
    "{{ρήμα}},
    "{{επίθετο}},
    "{{επίρρημα}},
    "{{επίρρημα}},
    "{{σύνδεσμος}},
    "{{συντομομορφή}},
    "{{κύριο όνομα}},
    "{{αριθμητικό}},
    "{{άρθρο}},
    "{{μετοχή}},
    "{{μόριο}},
    "{{αντωνυμία}},
    "{{επιφώνημα}},
    "{{ρηματική έκφραση}},
    "{{επιρρηματική έκφραση}},
)

# Some definitions are not good to keep (plural, gender, ... )
definitions_to_ignore = (
    "{{μορφή ουσιαστικού",
    "{{μορφή ρήματος",
    "{{μορφή επιθέτου",
    "{{εκφράσεις",
)

# Templates to ignore: the text will be deleted.
templates_ignored: Tuple[str, ...] = tuple()

# Templates that will be completed/replaced using italic style.
templates_italic: Dict[str, str] = {}

# Templates more complex to manage.
templates_multi: Dict[str, str] = {
    # {{Term|statistica|it}}   
    # "term": "small(term(parts[1]))",
}

# Release content on GitHub
# https://github.com/BoboTiG/ebook-reader-dict/releases/tag/el
release_description = """\
Αριθμός λέξεων: {words_count}
Εξαγωγή Wiktionary: {dump_date}

Διαθέσιμα αρχεία:

- [Kobo]({url_kobo}) (dicthtml-{locale}.zip)
- [StarDict]({url_stardict}) (dict-{locale}.zip)
- [DictFile]({url_dictfile}) (dict-{locale}.df)

<sub>Aggiornato il {creation_date}</sub>
"""  # noqa

# Dictionary name that will be printed below each definition
wiktionary = "Βικιλεξικό (ɔ) {year}"

locale:Greek

opened by chopinesque 47

[FR] Redirect conjuged verbs to their infinitive form

As requested it would be cool to have conjuged verbs redirecting to their infinitive form instead of nothing.

I already tried some things, but without success. I think we could make use of variants, but it is not clear yet how to do that.
locale:French

opened by BoboTiG 31
Support mediawiki extension
Wiktionary page: https://fr.wiktionary.org/wiki/djed

Wikicode:

<hiero>R11</hiero>

Output:

R11

Expected:

Model link, if any: https://www.mediawiki.org/wiki/Extension:WikiHiero https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:WikiHiero/Syntax https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php
opened by lasconic 26
[Meta] Project refactoring
Note: the description is updated with comments and changes requested in comments.

The goal is to rework the script module to allow more flexibility and clearly separate concerns.

First, about the module name: script. It has been decided to change to wikidict.

Overview

I would like to see the module splitted into 4 parts (each part will independent from others and can be replayed & extended easily). This will also help leveraging multithreading to speed-up the whole process.

[x] Download the data (#466)

[x] Parse and store raw data (#469)

[x] Render templates and store results (#469)

[ ] Output to the proper eBook reader format

I have in mind a SQLite database where raw data will be stored and updated when needed. Then, the parts will only use the data from the database. It should speed-up regenerating a whole dictionary when we update a template.

Then, each and every part will have its own CLI:

$ python -m wikidict --download ... $ python -m wikidict --parse ... $ python -m wikidict --render ... $ python -m wikidict --output ...

And the all-in-one operation would be:

$ python -m wikidict --run ...

Side note: we could use an entry point to only having to type wikidict instead of python -m wikidict.

Splitting get.py

Here we are talking about parts 1 and 2.

Part 1 is already almost fine as-is, we just need to move the code into its own submodule. We could improve the CLI by allowing passing the Wiktionary dump date as argument, instead of relying on an envar.

Part 2 is only the mater of parsing the big XML file and storing raw data into a SQLite database. I am thinking of using this schema:

table: Word fields: - word: varchar(256) - code: text index on: word table: Render fields: - word_id: int - nature: varchar(16) - text: text foreign key: word_id (Word._rowid_)

The Word table will contain raw data from the Wiktionary.

The Render table will be used to store the transformed text for a given word (after being cleaned up and where templates were processed). It will allow to have multiple texts for a given word (noun 1, noun 2, verb, adjective, ...).

We will have one database per locale, located at data/$LOCALE/$WIKIDUMP_DATE.db.

At the download step, if no database exists, it will be retrieved from GitHub releases where they will be saved alongside dictionaries. This is a cool thing IMO: everyone will have the good and up-to-date local database. Of course, we will have options to skip it if the local file already exists or if we would like to force the download.

At the parse step, we will have to find a way to prevent parsing again if we run the command twice on the same Wiktionary dump. I was thinking of using the PRAGME user_version that would contain the Wiktionary dump date as integer. It would be set only after the full parsing is done with success.

Splitting convert.py

Here we are talking about parts 3 and 4.

Part 3 will call clean() and process_templates() on the wikicode. And store the result into the rendered field. This is the most time and CPU consuming part. It will be parallelized.

Part 4 will rethink how we are handling dictionary output to easily add more formats.

I was thinking of using a class with those methods (not really thought about it, I am just proposing the idea):

class BaseFormat: __slots__ = {"locale", "output_dir"} def __init__(self, locale: str, output_dir: Path) -> None: self.locale = locale self.output_dir = output_dir def process(self) -> None: raise NotImplementedError() def save(self) -> None: raise NotImplementedError() class KoboFormat(BaseFormat): def process(self, words) -> None: groups = self.make_groups(self.words) variants = self.make_variants(self.words) wordlist = [] for word in words: wordlist.append(self.process_word(word)) self.save(wordlist, groups, variants) def save(self, ...) -> None: ...

That part is way from being finished, but when we have a fully working format, in our code will will use that kind of code to generate the dict file:

# Get all registered formats formaters = get_formaters() # Get all words from the database words = get_words() # And distribute the workload from multiprocessing import Pool def run(cls): formater = cls(locale, output_dir) formater.process(words) with Pool(len(formaters)) as pool: pool.map(run_formatter, formaters))
opened by BoboTiG 26
Use a custom docker image for tests

For each PR tests job, most of the time is taken by LateX installation. For instance, it takes about 2m40s to install it against 30s to run tests.

Maybe should we investigate the creation of a custom Docker image with LaTeX preinstalled. If so, I would be in favor of using a Debian-based light distribution, but I am open to any distribution as soon as tests are passing as-is (e.g: no modifications to be done on the source code).
QA/CI

opened by BoboTiG 22
[EN] Discover unhandled templates

I added some code at the end of the english last_template_handler in order to log the templates that are rendered by default. To limit the number of templates, I print only templates with more than 2 parts and with data, especially if nocat is not the only data

The code and the result is available here: https://gist.github.com/lasconic/139942e3761200eaa62e0a3a9be3d4f6 First file is the code. Second file is the template name and the number of hits : it gives a sense of the impact if the support for a template is handled Third file is the full list, convenient to find one or more examples of the template used on wiktionary.

I discovered a couple of templates that should be ignored: https://github.com/BoboTiG/ebook-reader-dict/issues/395 and many others that needs to be implemented...

I was not sure where to put this, so I open an issue. Please, let me know if it's not the right place.
locale:English

opened by lasconic 22
utils: $formulas rendered to SVGs without using LaTeX tools$
Fixes #1427. Fixes #1198. Closes #1209.

Tests to pass before merging (the rendering is good, but not the display):

[x] $ python -m wikidict fr --gen-dict "cercle unité" --output issue-1427

[x] $ python -m wikidict en --gen-dict "Wallis product,primitive recursion,Horner's rule" --output issue-1427
opened by BoboTiG 21

Rendering errors ( and $)$

Note from @BoboTiG: issue tightly coupled to #1183, interesting details can be found there too.

I did a fresh download and render of the EN wiktionary today, and got the following errors:

>>> Loading data/en/data_wikicode-20220120.json ...
>>> Loaded 1,038,672 words from data/en/data_wikicode-20220120.json
<chem> ERROR with ^-N=\overset{+}N=N^- in [azide]
<math> ERROR with \begin{align}\frac{\pi}{2} & = \prod_{n=1}^{\infty} \frac{ 4n^2 }{ 4n^2 - 1 } = \prod_{n=1}^{\infty} \left(\frac{2n}{2n-1} \cdot \frac{2n}{2n+1}\right) \\[6pt]& = \Big(\frac{2}{1} \cdot \frac{2}{3}\Big) \cdot \Big(\frac{4}{3} \cdot \frac{4}{5}\Big) \cdot \Big(\frac{6}{5} \cdot \frac{6}{7}\Big) \cdot \Big(\frac{8}{7} \cdot \frac{8}{9}\Big) \cdot \; \cdots \\\end{align} in [Wallis product]
<math> ERROR with \begin{align}a_0 &+ a_1x + a_2x^2 + a_3x^3 + \cdots + a_nx^n \\ &= a_0 + x \bigg(a_1 + x \Big(a_2 + x \big(a_3 + \cdots + x(a_{n-1} + x \, a_n) \cdots \big) \Big) \bigg).\end{align} in [Horner's rule]
<math> ERROR with \frac = \frac in [circle of Apollonius]
<math> ERROR with \begin{align}\rho(g, h) (0,x_1,\ldots,x_k) &= g(x_1,\ldots,x_k) \\\rho(g, h) (y+1,x_1,\ldots,x_k) &= h(y,\rho(g, h) (y,x_1,\ldots,x_k),x_1,\ldots,x_k)\,\end{align} in [primitive recursion]
>>> Saved 697,169 words into data/en/data-20220120.json
>>> Render done!

bug

opened by Moonbase59 19

[FR] Handle "équiv-pour" additionnal arguments

Wiktionary page: https://fr.wiktionary.org/wiki/chercheureuse

Wikicode:

{{équiv-pour|une femme|chercheuse|chercheure|langue=fr|2egenre=un homme|2egenre1=chercheur}}

Output:

<i>(pour une femme, on peut dire</i>&nbsp: chercheuse, chercheure<i>)</i>

Expected:

<i>(pour une femme, on peut dire</i>&nbsp: chercheuse, chercheure<i>&nbsp; <i>pour un homme, on dit<i>&nbsp: chercheur<i>)</i>

Model link, if any: https://fr.wiktionary.org/wiki/Mod%C3%A8le:%C3%A9quiv-pour

locale:French

opened by BoboTiG 0

[FR] Add "siècle2" HTML filter
Wiktionary page: https://fr.wiktionary.org/wiki/t%C5%8D-on

Model link, if any: https://fr.wiktionary.org/wiki/Mod%C3%A8le:si%C3%A8cle2

$ python -m wikidict fr --check-word "tō-on"
locale:French
opened by BoboTiG 0
[FR] Adapt "composé de" output
Wiktionary page: https://fr.wiktionary.org/wiki/hexavalent

Wikicode:

{{composé de|lang=fr|hexa-|-valent|m=1}}

Output:

Composé de hexa- et de -valent

Expected:

Dérivé du préfix hexa-, avec le suffixe -valent

Model link, if any: https://fr.wiktionary.org/wiki/Mod%C3%A8le:compos%C3%A9_de
locale:French
opened by BoboTiG 2
[CA] Improve 'etim-lang' support
Wiktionary page: https://ca.wiktionary.org/wiki/feocromocitoma

Wikicode:

{{etim-lang|grc|ca|φαιός|trad=gris}}

Output:

Del grec antic φαιός («gris»)

Expected:

Del grec antic φαιός (phaiós, «gris»)

Model link, if any: https://ca.wiktionary.org/wiki/Plantilla:etim-lang
locale:Catalan
opened by BoboTiG 1

[EL] missing αγγειοχειρουργός

Wiktionary page: https://el.wiktionary.org/w/index.php?title=%CE%B1%CE%B3%CE%B3%CE%B5%CE%B9%CE%BF%CF%87%CE%B5%CE%B9%CF%81%CE%BF%CF%85%CF%81%CE%B3%CF%8C%CF%82&action=edit

Wikicode:

'''{{PAGENAME}}''' {{αθ}}
* {{ετ|ιατρική}} ο [[χειρουργός]] που ειδικεύεται στην αποκατάσταση βλαβών στα αιμοφόρα [[αγγείο|αγγεία]]
*: {{μορφ}} [[αγγειοχειρούργος]]

Output:

αγγειοχειρουργός el '<i>αρσενικό ή θηλυκό</i>.'

'<b>αγγειοχειρουργός</b> < <i>(Π)</i> + χειρουργός'

Expected:

αγγειοχειρουργός αρσενικό ή θηλυκό
(ιατρική) ο χειρουργός που ειδικεύεται στην αποκατάσταση βλαβών στα αιμοφόρα αγγεία
άλλες μορφές: αγγειοχειρούργος

Model link, if any:

I guess the {{μορφ}} template can be resolved via

    if tpl == "μορφ":
        phrase = "άλλες μορφές"
        if not data["0"]:
            phrase += ":"
        return phrase

Not sure how to resolve the other issues or whether on should expect pronunciation data to be included too.

locale:Greek

opened by chopinesque 24

Releases(sv)

sv(Jun 2, 2020)
Ord räknas: 321 071 Dumpa Wiktionary: 2022-12-20

Tillgängliga filer:

Kobo (dicthtml-sv-sv.zip)

StarDict (dict-sv-sv.zip)

DictFile (dict-sv-sv.df.bz2)

_{Uppdaterad på 2023-01-02T01:35:14.741434+00:00}
Source code(tar.gz)
Source code(zip)
dict-sv-sv-noetym.df.bz2(3.12 MB)
dict-sv-sv-noetym.zip(5.09 MB)
dict-sv-sv.df.bz2(3.12 MB)
dict-sv-sv.zip(5.09 MB)
dicthtml-sv-sv-noetym.zip(5.43 MB)
dicthtml-sv-sv.zip(5.43 MB)
pt(Jun 3, 2020)
As palavras contam: 60 835 Exportação Wikcionário: 2022-12-20

Arquivos disponíveis:

Kobo (dicthtml-pt-pt.zip)

StarDict (dict-pt-pt.zip)

DictFile (dict-pt-pt.df.bz2)

_{Actualizado em 2023-01-02T01:32:08.735074+00:00}
Source code(tar.gz)
Source code(zip)
dict-pt-pt-noetym.df.bz2(1.71 MB)
dict-pt-pt-noetym.zip(2.79 MB)
dict-pt-pt.df.bz2(2.05 MB)
dict-pt-pt.zip(3.28 MB)
dicthtml-pt-pt-noetym.zip(3.09 MB)
dicthtml-pt-pt.zip(3.56 MB)
no(May 13, 2021)
Ord räknas: 2 491 Dumpa Wiktionary: 2022-12-20

Tillgängliga filer:

Kobo (dicthtml-no-no.zip)

StarDict (dict-no-no.zip)

DictFile (dict-no-no.df.bz2)

_{Uppdaterad på 2023-01-02T01:30:54.721852+00:00}
Source code(tar.gz)
Source code(zip)
dict-no-no-noetym.df.bz2(81.52 KB)
dict-no-no-noetym.zip(121.70 KB)
dict-no-no.df.bz2(89.96 KB)
dict-no-no.zip(135.34 KB)
dicthtml-no-no-noetym.zip(200.49 KB)
dicthtml-no-no.zip(218.81 KB)
it(May 12, 2021)
Numero di parole: 53 304 Export Wiktionary: 2022-12-20

File disponibili:

Kobo (dicthtml-it-it.zip)

StarDict (dict-it-it.zip)

DictFile (dict-it-it.df.bz2)

_{Aggiornato il 2023-01-02T01:33:03.258395+00:00}
Source code(tar.gz)
Source code(zip)
dict-it-it-noetym.df.bz2(1.86 MB)
dict-it-it-noetym.zip(3.03 MB)
dict-it-it.df.bz2(2.41 MB)
dict-it-it.zip(3.80 MB)
dicthtml-it-it-noetym.zip(3.20 MB)
dicthtml-it-it.zip(3.94 MB)
fr(Apr 16, 2020)
Nombre de mots : 1 823 284 Export Wiktionnaire : 2022-12-20

Fichiers disponibles :

Kobo (dicthtml-fr-fr.zip)

StarDict (dict-fr-fr.zip)

DictFile (dict-fr-fr.df.bz2)

_{Mis à jour le 2023-01-02T01:53:20.860628+00:00}
Source code(tar.gz)
Source code(zip)
dict-fr-fr-noetym.df.bz2(16.20 MB)
dict-fr-fr-noetym.zip(24.25 MB)
dict-fr-fr.df.bz2(20.82 MB)
dict-fr-fr.zip(31.09 MB)
dicthtml-fr-fr-noetym.zip(28.61 MB)
dicthtml-fr-fr.zip(35.48 MB)
es(Jun 11, 2020)
Número de palabras: 745 765 exportación Wikcionario: 2022-12-20

Archivos disponibles:

Kobo (dicthtml-es-es.zip)

StarDict (dict-es-es.zip)

DictFile (dict-es-es.df.bz2)

_{Actualizado el 2023-01-02T01:38:33.610430+00:00}
Source code(tar.gz)
Source code(zip)
dict-es-es-noetym.df.bz2(3.60 MB)
dict-es-es-noetym.zip(4.87 MB)
dict-es-es.df.bz2(4.12 MB)
dict-es-es.zip(5.63 MB)
dicthtml-es-es-noetym.zip(6.24 MB)
dicthtml-es-es.zip(7.00 MB)
en(Jun 15, 2020)
Words count: 1,066,270 Wiktionary dump: 2022-12-20

Available files:

Kobo (dicthtml-en-en.zip)

StarDict (dict-en-en.zip)

DictFile (dict-en-en.df.bz2)

_{Updated on 2023-01-02T01:55:29.392275+00:00}
Source code(tar.gz)
Source code(zip)
dict-en-en-noetym.df.bz2(19.99 MB)
dict-en-en-noetym.zip(32.57 MB)
dict-en-en.df.bz2(25.58 MB)
dict-en-en.zip(40.69 MB)
dicthtml-en-en-noetym.zip(36.42 MB)
dicthtml-en-en.zip(44.69 MB)
el(Jul 3, 2021)
Αριθμός λέξεων: 165.303 Εξαγωγή Βικιλεξικού: 2022-12-20

Διαθέσιμα αρχεία:

Kobo (dicthtml-el-el.zip)

StarDict (dict-el-el.zip)

DictFile (dict-el-el.df.bz2)

_{Ημερομηνία δημιουργίας: 2023-01-02T01:35:23.523931+00:00}
Source code(tar.gz)
Source code(zip)
dict-el-el-noetym.df.bz2(2.45 MB)
dict-el-el-noetym.zip(4.83 MB)
dict-el-el.df.bz2(3.70 MB)
dict-el-el.zip(6.92 MB)
dicthtml-el-el-noetym.zip(5.46 MB)
dicthtml-el-el.zip(7.28 MB)
de(Jan 28, 2022)
Anzahl Worte: 811.909 Wiktionary-Dump vom: 2022-12-20

Verfügbare Wörterbuch-Formate:

Kobo (dicthtml-de-de.zip)

StarDict (dict-de-de.zip)

DictFile (dict-de-de.df.bz2)

_{Letzte Aktualisierung: 2023-01-02T01:40:09.028796+00:00.}
Source code(tar.gz)
Source code(zip)
dict-de-de-noetym.df.bz2(6.61 MB)
dict-de-de-noetym.zip(9.68 MB)
dict-de-de.df.bz2(8.72 MB)
dict-de-de.zip(12.88 MB)
dicthtml-de-de-noetym.zip(12.91 MB)
dicthtml-de-de.zip(16.51 MB)
ca(Jun 2, 2020)
Les paraules compten: 42.686 Abocador Viccionari: 2022-12-20

Fitxers disponibles:

Kobo (dicthtml-ca-ca.zip)

StarDict (dict-ca-ca.zip)

DictFile (dict-ca-ca.df.bz2)

_{Actualitzat el 2023-01-02T01:32:36.185777+00:00}
Source code(tar.gz)
Source code(zip)
dict-ca-ca-noetym.df.bz2(1.09 MB)
dict-ca-ca-noetym.zip(1.83 MB)
dict-ca-ca.df.bz2(1.38 MB)
dict-ca-ca.zip(2.26 MB)
dicthtml-ca-ca-noetym.zip(2.10 MB)
dicthtml-ca-ca.zip(2.53 MB)
ru(Jul 2, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Mickaël Schoentgen

Software Engineer. Creator of Python module MSS, FOSS contributor. Maintainer of watchdog, and MARISA Trie.

GitHub Repository http://www.tiger-222.fr/?d=2020/04/17/22/14/21-un-dictionnaire-alternatif-et-complet-pour-votre-liseuse

Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

408 Dec 29, 2022

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT This is the code base for the paper DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. Code in this repository is also available

132 Nov 14, 2022

Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

47 Jul 26, 2022

Machine Psychology: Python Generated Art

Machine Psychology: Python Generated Art A limited collection of 64 algorithmically generated artwork. Each unique piece is then given a title by the

67 Dec 13, 2022

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

2 Jan 17, 2022

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

6.4k Jan 09, 2023

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Ucto for Python This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task,

27 Dec 14, 2022

MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022

Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

1.1k Dec 16, 2022

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

CJK computer science terms comparison This repository contains the source code of the website. You can see the website from the following link: Englis

88 Dec 23, 2022

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

1 Jan 21, 2022

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

Pocket Galaxy 아주 간단한 개인용, 혹은 내부용 툴을 만들어야하는데 이왕이면 웹이 편하죠? 그럴때를 위해 만들어둔 django와 vue(vuetify)로 이뤄진 boilerplate 입니다. 각 폴더에 있는 설명서대로 실행을 시키면 일단 당장 뭔가가 돌아갑니

16 Dec 03, 2021

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

1 Dec 17, 2021

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

0 Feb 06, 2022

HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

110 Jan 01, 2023

ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

409 Jan 06, 2023

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

10 Dec 27, 2022

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Related tags

Overview

eBook Reader Dictionaries

Dictionaries

Requirements

Kobo

Updating Dictionaries

Adding a new Dictionary

Contributors ✨

Comments

Overview

Splitting get.py

Splitting convert.py

Releases(sv)

sv(Jun 2, 2020)

pt(Jun 3, 2020)

no(May 13, 2021)

it(May 12, 2021)

fr(Apr 16, 2020)

es(Jun 11, 2020)

en(Jun 15, 2020)

el(Jul 3, 2021)

de(Jan 28, 2022)

ca(Jun 2, 2020)

ru(Jul 2, 2022)

Owner

Mickaël Schoentgen

Open-World Entity Segmentation

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Implementation of Multistream Transformers in Pytorch

Machine Psychology: Python Generated Art

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

MiCECo - Misskey Custom Emoji Counter

Fast, general, and tested differentiable structured prediction in PyTorch

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

HF's ML for Audio study group

ByT5: Towards a token-free future with pre-trained byte-to-byte models

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

A Fast Sequence Transducer Implementation with PyTorch Bindings

结巴中文分词