trstop

Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with the highest frequency. In order to test the new candidate words in future, I add a small python script, and a 10 thousand item word list with highest frequency. At https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt are some Turkish stop words. However, some stop words in that list do not belong to the ten thousand highest frequency words.

In order to use the module:

import trstop

print(trstop.is_stop_word(parameter))

Contributors:

Ahmet Aksoy
Toprak Öztürk

Bu depoya en sık kullanılan 10 bin Türkçe sözcük listesinde yer alan dolgu sözcüklerini ekledim. Dolgu sözcükleri (stop words), sık kullanılan, ama iptal edildiklerinde ayrıldıkları cümlenin anlamında önemli değişiklikler oluşturmayan sözcüklerdir.

"Stop words" terimine karşılık "dolgu sözcükleri" terimini kullandım. Daha iyi bir seçenek varsa, değiştirmeye hazırım. Depoya eklediğim "turkce-stop-words-dict.py" betiğini, ileride listeye yeni sözcükler eklemek istediğimizde kullanım sıklığını denetlemek amacıyla kullanabiliriz.

https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt adresinde de bazı dolgu sözcükleri listelenmiş. Ancak buradaki bazı sözcükler ilk on bine girecek kadar yoğun frekansa sahip değil.

Modülü kullanmak için:

import trstop

print(trstop.is_stop_word(parametre))

Projeye katkıda bulunanlar:

Ahmet Aksoy
Toprak Öztürk

Son güncelleme: 29.06.2018

Turkish Stop Words Türkçe Dolgu Sözcükleri

Related tags

Overview

trstop

In order to use the module:

Contributors:

Modülü kullanmak için:

Projeye katkıda bulunanlar:

Owner

Ahmet Aksoy

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Toward Model Interpretability in Medical NLP

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Shellcode antivirus evasion framework

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Repository for Project Insight: NLP as a Service

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

A programming language with logic of Python, and syntax of all languages.

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

BookNLP, a natural language processing pipeline for books

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

BERT score for text generation

A paper list for aspect based sentiment analysis.

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

Library for Russian imprecise rhymes generation