Generating Korean Slogans with phonetic and structural repetition

Overview

LexPOS_ko

Generating Korean Slogans with phonetic and structural repetition

Generating Slogans with Linguistic Features

LexPOS is a sequence-to-sequence transformer model that generates slogans with phonetic and structural repetition. For phonetic repetition, it searches for phonetically similar words with user keywords. Both the sound-alike words and user keywords become the lexical constraints while generating slogans. It also adjusts the logits distribution to implement further phonetic constraints. For structural repetition, LexPOS uses POS constraints. Users can specify any repeated phrase structure by POS tags.

Generating slogans with lexical, POS constraints

1. Code

  • Need to download pretrained Korean word2vec model from here and put it below phonetic_similarity/KoG2P
# clone this repo
git clone https://github.com/yeounyi/LexPOS_ko
cd LexPOS
# generate slogans 
python3 generate_slogans.py --keywords 카드,혜택 --num_beams 3 --temperature 1.2
  • -keywords: Keywords that you want to be included in slogans. You can enter multiple keywords, delimited by comma
  • -pos_inputs: You can either specify the particular list of POS tags delimited by comma, or the model will generate slogans with the most frequent syntax used in corpus. POS tags generally follow the format of Konlpy Mecab POS tags.
  • -num_beams: Number of beams for beam search. Default to 1, meaning no beam search.
  • -temperature: The value used to module the next token probabilities. Default to 1.0.
  • -model_path: Path to the pretrained model

2. Examples

Keyword: 카드, 혜택
POS: [NNG, JK, VV, EC, SF, NNG, JK, VV, EF]
Output: 카드를 택하면, 혜택이 바뀐다

Keyword: 안전, 항공
POS: [MM, NNG, SF, MM, NNG, SF]
Output: 새로운 공항, 안전한 항공

Keywords: 추석, 선물
POS: [NNG, JK, MM, NNG, SF, NNG, JK, MM, NNG]
Output: 추석을 앞둔 추억, 당신을 위한 선물

Model Architecture


Pretrained Model

https://drive.google.com/drive/folders/1opkhDApURnjibVYmmhj5bqLTWy4miNe4?usp=sharing

References

https://github.com/scarletcho/KoG2P

Citation

@misc{yi2021lexpos,
  author = {Yi, Yeoun},
  title = {Generating Korean Slogans with Linguistic Constraints using Sequence-to-Sequence Transformer},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/yeounyi/LexPOS_ko}}
}
Owner
Yeoun Yi
Studying Computational Linguistics | Interested in Advertising & Marketing
Yeoun Yi
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
Natural language Understanding Toolkit

Natural language Understanding Toolkit TOC Requirements Installation Documentation CLSCL NER References Requirements To install nut you need: Python 2

Peter Prettenhofer 119 Oct 08, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023
BiNE: Bipartite Network Embedding

BiNE: Bipartite Network Embedding This repository contains the demo code of the paper: BiNE: Bipartite Network Embedding. Ming Gao, Leihui Chen, Xiang

leihuichen 214 Nov 24, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Udit Arora 19 Oct 28, 2022
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.6k Jan 04, 2023
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 8.4k Dec 26, 2022
A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A sample Python project A sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects. T

Python Packaging Authority 4.5k Dec 30, 2022
Making text a first-class citizen in TensorFlow.

TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run

1k Dec 26, 2022
Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

Florian Quirin 3 Jun 20, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

30 Aug 29, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022
A Transformer Implementation that is easy to understand and customizable.

Simple Transformer I've written a series of articles on the transformer architecture and language models on Medium. This repository contains an implem

Naoki Shibuya 4 Jan 20, 2022
Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

Ceyda Cinarel 22 Nov 16, 2022
FewCLUE: 为中文NLP定制的小样本学习测评基准

FewCLUE: 为中文NLP定制的小样本学习测评基准

CLUE benchmark 387 Jan 04, 2023