Automatically download and crop key information from the arxiv daily paper. (cpu version)

Related tags

DownloaderFocusAX
Overview

FocusAX

按关键词筛选arxiv每日最新paper或从arxiv搜索。

  • 自动下载、获取摘要、自动截取文中表格和图片。

安装必要的环境

  • 安装 paddle
# GPU安装
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple

# CPU安装
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
  • 安装 Layout-Parser
=2.2"">
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install "paddleocr>=2.2"
  • 按照其他必要的包
pip3 install -r requirements.txt
  • 下载模型权重
  • PubLayNet 下载解压后放置在paperparse目录下。目录结构如下
FocusAX
    - paperparse
        - ppyolov2_r50vd_dcn_365e_publaynet
            - inference.pdiparams
            - inference.pdiparams.info
            - inference.pdmodel
        - ...
    - downloader
        - ...
    - utils
        - ...
    - configs.py
    - focus_daily.py
    - focus_search.py
    - README.py
    - ...

使用教程

  • configs.py :程序参数配置文件
# =============== 网络代理 ================
# proxy = None # 不使用代理
proxy = {"http": "socks5://127.0.0.1:8080", "https": "socks5://127.0.0.1:8080"}
# =============== 保存文件根目录 ================
root_path = "./arxiv"
# =============== DNN模型推理配置信息 ================
threshold = 0.5
enable_mkldnn = True
enforce_cpu = True
thread_num = 4
  • focus_daily.py :按关键字过滤arxiv daily上的文章(仅当日)
if __name__ == '__main__':
    key_words = ['GAN'] # 要包含的关键词
    subject_words = ['ML', 'CV', 'AI']  # 要包含的类别
    start_parse(key_words, subject_words, needPDF=True, needZip=False)
  • focus_search.py :按关键字在arxiv检索
start_parse('Keyword')
  • root_path 目录中将创建新的文件夹保存结果

效果图

每个文件夹中的abs.md文件保留的是当前pdf的介绍,使用Typora等markdown编辑器打开。 image

image

ps:论文排版不规范会导致截图混乱。

其他

Owner
HeoLis
Interesting in generate methods.
HeoLis
Make YouTube videos tasks in Todoist faster and time efficient!

Youtubist Basically fork of yt-dlp python module to my needs. You can paste playlist or channel link on the YouTube. It will automatically format to s

Konrad Konieczny 1 Dec 04, 2022
Making the process of downloading youtube videos faster and more convinient.

Easy-YT Making the process of downloading youtube videos faster and more convinient. What can it do? This python script can be used to download youtub

Meynam 39 Nov 15, 2021
Google Art Image Downloader Tkinter

Google-Art-Image-Downloader-Tkinter 由 google-art-downloader 整改的批量 Google 艺术展平台高清图片下载 ⭐ It works perfectly from 2018 year till today, thanks for stars!

PY-GZKY 1 Jan 05, 2022
YoutubeDownloader - Repo for downloading YT audio and videos

YoutubeDownloader Downloads video/playlist/audio from youtube url. install all t

Anuj SP 2 Feb 17, 2022
A small distributed download manager to help bypass device-specific bandwidth limitations.

Distributed Download Manager A small distributed download manager to help bypass device-specific bandwidth limitations. Architecture The download mana

Anand Balaji 3 Sep 23, 2022
Libretrofuzz - Fuzzy Retroarch thumbnail downloader

Fuzzy Retroarch thumbnail downloader In Retroarch, when you use the manual scann

8 Nov 26, 2022
YouTube Downloader Bot With Python

TG YᴏᴜTᴜʙᴇ Uᴘʟᴏᴀᴅᴇʀ * Commands YouTube for Audio & Video and sends it to telegram after receiving valid URL [Do not forwarded any just copy and paste

Pʀᴇᴅᴀᴛᴏʀ 5 Oct 21, 2022
YouPlay is a python based tool for downloading YouTube videos through its URL

YouPlay is a python based tool for downloading YouTube videos through its URL. It is capable to download videos from YouTube playlists too and can extract the audio file only from the video. It can r

Nitin Choudhury 10 Sep 15, 2022
Web Downloader With Python

Web Downloader Introduction This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href li

3 Dec 28, 2022
Downloads files and folders

PyDownloader Downloads files and folders at high speed (based on your interent speed). This is very useful to transfer big files from one computer to

ArmenG 4 Feb 24, 2022
Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery.

ksy-dl Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery. This tool will: Fetch any of the official

3 Jun 20, 2022
Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms

youtube-dl 116.4k Jan 07, 2023
Downloads photos you saved from a specific profile.

instagram-download-saved A python script that downloads photos you saved from a specific profile. The only dependency is instaloader, an open-source p

Aviv 1 Dec 19, 2021
Newsemble is an API that provides easy access to the current news for programmatic analysis

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

Rishabh 43 Dec 16, 2022
Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

Shekhar Chander 16 Jul 09, 2022
YouTube to MP3 or 4, you get to choose...

UTubeToMP YouTube to MP3 or 4, you get to choose... If you don't wanna git clone andor dont wanna install python. Here: Repl.it Instructions: Pretty s

1 Jan 29, 2022
Youtube videos and channels scraper python wrapper!

YouTubeCrawle Wrapper for python Why This wrapper? This is wrapper is not limited to videos only it can scrape both channel and videos seperately ;D

Kei 16 Aug 08, 2022
This Program helps you download songs from the Spotify track's link you give in.

Spotify-Downloader-GUI This Program helps you download songs from the Spotify track's link you give in. It uses yt-dlp to download songs from Youtube.

Harish 12 Jun 14, 2022
A manga download script written in python.

manga-dlp python script to download mangas Description A manga download script written in python. It only supports mangadex.org for now. But support f

Ivan Schaller 15 Nov 28, 2022