Random Directed Acyclic Graph Generator

Overview

DAG_Generator

Random Directed Acyclic Graph Generator

verison1.0

简介

工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权的有向边$E_{ij}$表示。每个有向边$E_{ij}$表示$T_i$是$T_j$的父任务,$T_j$只能在其所有父任务完成后执行。为了方便操作和展示,一般在所有任务之前设立一个Start虚拟节点,作为所有没有父任务节点的父节点;同理,在所有任务之后设立一个Exit虚拟节点,作为所有没有子任务节点的子节点,这两个虚拟节点都没有计算资源需求。

此程序用于随机生成有向无环图(DAG)。本来的想法是按照文献[1]的方法来生成DAG,但是原文没有给出代码,难以实现,所以就仿照文章的设定来生成DAG。

确定表示一个DAG需要三个数据,分别是是节点连接信息,各节点的父节点数,各节点的子节点数。由这三个元素可以确定一个独立的DAG。例如一个10个节点的DAG:

Edges: [(1, 5), (1, 6), (2, 4), (2, 6), (3, 6), (4, 7), (4, 9), (5, 9), (5, 7), (6, 7), ('Start', 1), ('Start', 2), ('Start', 3), ('Start', 8), ('Start', 10), (7, 'Exit'), (8, 'Exit'), (9, 'Exit'), (10, 'Exit')]

In_degree: [1, 1, 1, 1, 1, 3, 3, 1, 2, 1] #不包括虚拟节点

out_degree: [2, 2, 1, 2, 2, 1, 1, 1, 1, 1] #不包括虚拟节点

DAG

参数设定
set_dag_size = [20,30,40,50,60,70,80,90]             #random number of DAG  nodes       
set_max_out = [1,2,3,4,5]                            #max out_degree of one node
set_alpha = [0.5,1.0,1.5]                            #DAG shape
set_beta = [0.0,0.5,1.0,2.0]                         #DAG regularity
  1. set_dag_size : 设定的DAG任务大小,有[20,30,40,50,60,70,80,90],默认为20。
  2. set_max_out: 设定的一个节点最大出度,有[1,2,3,4,5],默认为2。
  3. set_alpha : 设定DAG控制形状的参数,有[0.5,1.0,1.5],默认为1.0。
  4. set_beta :设定DAG每层宽度的规则度参数,有[0.0,0.5,1.0,2.0] ,默认为1.0。

DAG的层数(深度)被设置为$\sqrt{set_dag_size}/set_alpha$, $\alpha$控制着DAG的Fat,$\alpha$越小,DAG越瘦长,$\alpha$越大,DAG越肥密。

DAGS

每层的宽度被随机设置为均值为$set_dag_size/length$,标准差为$\beta$的正态分布。$\beta$越大,DAG越不规则。

绘制

使用networkx库绘制DAG图。

def plot_DAG(edges,postion):
    g1 = nx.DiGraph()
    g1.add_edges_from(edges)
    nx.draw_networkx(g1, arrows=True, pos=postion)
    plt.savefig("DAG.png", format="PNG")
    return plt.clf

n = 30,max_out = 3, $\alpha$ = 1, $\beta$ = 1.0

DAG

n = 30,max_out = 3, $\alpha$ = 0.5, $\beta$ = 1.0

DAG

n = 30,max_out = 3, $\alpha$ = 1.5, $\beta$ = 1.0

DAG

代码
import random,math,argparse
import numpy as np
from numpy.random.mtrand import sample
from matplotlib import pyplot as plt
import networkx as nx

parser = argparse.ArgumentParser()
parser.add_argument('--mode', default='default', type=str)#parameters setting
parser.add_argument('--n', default=10, type=int)          #number of DAG  nodes
parser.add_argument('--max_out', default=2, type=float)   #max out_degree of one node
parser.add_argument('--alpha',default=1,type=float)       #shape 
parser.add_argument('--beta',default=1.0,type=float)      #regularity
args = parser.parse_args()

set_dag_size = [20,30,40,50,60,70,80,90]             #random number of DAG  nodes       
set_max_out = [1,2,3,4,5]                              #max out_degree of one node
set_alpha = [0.5,1.0,2.0]                            #DAG shape
set_beta = [0.0,0.5,1.0,2.0]                         #DAG regularity

def DAGs_generate(mode = 'default',n = 10,max_out = 2,alpha = 1,beta = 1.0):
    ##############################################initialize###########################################
    args.mode = mode
    if args.mode != 'default':
        args.n = random.sample(set_dag_size,1)[0]
        args.max_out = random.sample(set_max_out,1)[0]
        args.alpha = random.sample(set_alpha,1)[0]
        args.beta = random.sample(set_alpha,1)[0]
    else: 
        args.n = n
        args.max_out = max_out
        args.alpha = alpha
        args.beta = beta

    length = math.floor(math.sqrt(args.n)/args.alpha)
    mean_value = args.n/length
    random_num = np.random.normal(loc = mean_value, scale = args.beta,  size = (length,1))    
    ###############################################division############################################
    position = {'Start':(0,4),'Exit':(10,4)}
    generate_num = 0
    dag_num = 1
    dag_list = [] 
    for i in range(len(random_num)):
        dag_list.append([]) 
        for j in range(math.ceil(random_num[i])):
            dag_list[i].append(j)
        generate_num += math.ceil(random_num[i])

    if generate_num != args.n:
        if generate_num<args.n:
            for i in range(args.n-generate_num):
                index = random.randrange(0,length,1)
                dag_list[index].append(len(dag_list[index]))
        if generate_num>args.n:
            i = 0
            while i < generate_num-args.n:
                index = random.randrange(0,length,1)
                if len(dag_list[index])==1:
                    i = i-1 if i!=0 else 0
                else:
                    del dag_list[index][-1]
                i += 1

    dag_list_update = []
    pos = 1
    max_pos = 0
    for i in range(length):
        dag_list_update.append(list(range(dag_num,dag_num+len(dag_list[i]))))
        dag_num += len(dag_list_update[i])
        pos = 1
        for j in dag_list_update[i]:
            position[j] = (3*(i+1),pos)
            pos += 5
        max_pos = pos if pos > max_pos else max_pos
        position['Start']=(0,max_pos/2)
        position['Exit']=(3*(length+1),max_pos/2)

    ############################################link###################################################
    into_degree = [0]*args.n            
    out_degree = [0]*args.n             
    edges = []                          
    pred = 0

    for i in range(length-1):
        sample_list = list(range(len(dag_list_update[i+1])))
        for j in range(len(dag_list_update[i])):
            od = random.randrange(1,args.max_out+1,1)
            od = len(dag_list_update[i+1]) if len(dag_list_update[i+1])<od else od
            bridge = random.sample(sample_list,od)
            for k in bridge:
                edges.append((dag_list_update[i][j],dag_list_update[i+1][k]))
                into_degree[pred+len(dag_list_update[i])+k]+=1
                out_degree[pred+j]+=1 
        pred += len(dag_list_update[i])


    ######################################create start node and exit node################################
    for node,id in enumerate(into_degree):#给所有没有入边的节点添加入口节点作父亲
        if id ==0:
            edges.append(('Start',node+1))
            into_degree[node]+=1

    for node,od in enumerate(out_degree):#给所有没有出边的节点添加出口节点作儿子
        if od ==0:
            edges.append((node+1,'Exit'))
            out_degree[node]+=1

    #############################################plot##################################################
    return edges,into_degree,out_degree,position

参考

[1] [List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table](https://ieeexplore.ieee.org/abstract/document/6471969/)

[2] Building DAGs / Directed Acyclic Graphs with Python

[3] DAG Dependencies

[4] Networkx Lirbrary

[5] Python生成依赖性应用的DAG(有向无环图)拓扑

Owner
Livion
無限進步 Email: [email protected] Wechat: Livion2018
Livion
Partially offline multi-language translator built upon Huggingface transformers.

Translate Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then us

Richard Jarry 8 Oct 25, 2022
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Plugin 3 Jan 12, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

Dani El-Ayyass 23 Sep 05, 2022
Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022
Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

张宇 6 Dec 20, 2022
Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

GDAP The code of paper "Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"" Event Datasets Prep

45 Oct 29, 2022
Pangu-Alpha for Transformers

Pangu-Alpha for Transformers Usage Download MindSpore FP32 weights for GPU from here to data/Pangu-alpha_2.6B.ckpt Activate MindSpore environment and

One 5 Oct 01, 2022
Code-autocomplete, a code completion plugin for Python

Code AutoComplete code-autocomplete, a code completion plugin for Python.

xuming 13 Jan 07, 2023
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023
Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

NLP Boot Camp (Jan) Synopsis Full Name: Prameya Mohanty Name of your School: Delhi Public School, Rourkela Class: VIII Title of the Project: iTransect

TheCodingHub 1 Feb 01, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they

Adrien Barbaresi 70 Dec 29, 2022
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 08, 2023