LightCSV - This CSV reader is implemented in just pure Python.

Overview

LightCSV

Python 3.8 Python 3.9 Code style: black

Simple light CSV reader

This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.

Usage

Usage is pretty straightforward:

from lightcsv import LightCSV

for row in LightCSV().read_file("myfile.csv"):
    print(row)

This will open a file named myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either \n or \r\n. The file will be opened in text-mode with utf-8 encoding.

You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:

from lightcsv import LightCSV

with open("myfile.csv") as f:
    for row in LightCSV().read(f):
        print(row)
NOTE: Blank lines at any point in the file will be ignored

Parameters

LightCSV can be parametrized during initialization to fine-tune its behaviour.

The following example shows initialization with default parameters:

from lightcsv import LightCSV

myCSV_reader = LightCSV(
    separator=",",
    quote_char='"',
    field_names = None,
    strict=True,
    has_headers=False
)

Available settings:

  • separator: character used as separator (defaults to ,)
  • quote_char: character used to quote strings (defaults to ").
    This char can be escaped by duplicating it.
  • field_names: can be any iterable or sequence of str (i.e. a list of strings).
    If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
  • strict: Sets whether the parser runs in strict mode or not.
    In strict mode the parser will raise a ValueError exception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
  • has_headers: whether the first row should be taken as column titles or not.
    If set, field_names cannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.

Data types recognized

The parser will try to match the following types are recognized in this order:

  • None (empty values). Unlike CSV reader, it will return None (null) for empty values.
    Empty strings ("") are recognized correctly.
  • str (strings): Anything that is quoted with the quotechar. Default quotechar is ".
    If the string contains a quote, it must be escaped duplicating it. i.e. "HELLO ""WORLD""" decodes to HELLO "WORLD" string.
  • int (integers): an integer with a preceding optional sign.
  • float: any float recognized by Python
  • datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like 2022-02-02 22:02:02
  • date: a date in ISO format, like 2022-02-02
  • time: a time in ISO format, like 22:02:02

If all this parsing attempts fails, a string will be returned, unless strict_mode is set to True. In the latter case, a ValueError exception will be raised.

Implementing your own type recognizer

You can implement your own deserialization by subclassing LightCSV and override the method parse_obj().

For example, suppose we want to recognize hexadecimal integers in the format 0xNNN.... We can implement it this way:

import re
from lightcsv import LightCSV

RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$')  # matches 0xNNNN (hexadecimals)


class CSVHexRecognizer(LightCSV):
    def parse_obj(self, lineno: int, chunk: str):
        if RE_HEXA.match(chunk):
            return int(chunk[2:], 16)
        
        return super().parse_obj(lineno, chunk)

As you can see, you have to override parse_obj(). If your match fails, you have to invoke super() (overridden) parse_obj() method and return its result.


Why

Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With LightCSV you just open a filename and iterate over its rows.

Decoding None for empty cells is needed very often and can be really cumbersome as the standard csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).

Owner
Jose Rodriguez
Computer Scientist. Software Engineer. Opinions expressed here are solely my own and not necessarily those of my employer.
Jose Rodriguez
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem

RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This le

Robert Schroll 82 Nov 24, 2022
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 05, 2022
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applicati

The Alan Turing Institute 1k Dec 19, 2022
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
FileGenerator - File Generator for sites that accepts documents

File Generator for sites that accepts documents This code generates files as per

Shaunak 2 Mar 19, 2022
OnedataFS is a PyFilesystem interface to Onedata virtual file system

OnedataFS OnedataFS is a PyFilesystem interface to Onedata virtual file system. As a PyFilesystem concrete class, OnedataFS allows you to work with On

onedata 0 Jan 10, 2022
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 02, 2023
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 506 Dec 29, 2022
A simple library for temporary storage of small files

TemporaryStorage An simple library for temporary storage of small files. Navigation Install Usage In Python console As a standalone application List o

2 Apr 17, 2022
A wrapper for DVD file structure and ISO files.

vs-parsedvd DVDs were an error. A wrapper for DVD file structure and ISO files. You can find me in the IEW Discord server

7 Nov 17, 2022
Simple addon to create folder structures in blender.

BlenderCreateFolderStructure Simple Add-on to create a folder structure in Blender. Installation Download BlenderCreateFolderStructure.py Open Blender

Dominik Strasser 2 Feb 21, 2022
Remove [x]_ from StudIP zip Archives and archive_filelist.csv completely

This tool removes the "[x]_" at the beginning of StudIP zip Archives. It also deletes the "archive_filelist.csv" file

Kelke vl 1 Jan 19, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
Python interface for reading and appending tar files

Python interface for reading and appending tar files, while keeping a fast index for finding and reading files in the archive. This interface has been

Lawrence Livermore National Laboratory 1 Nov 12, 2021
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

1 Feb 13, 2022
pydicom - Read, modify and write DICOM files with python code

pydicom is a pure Python package for working with DICOM files. It lets you read, modify and write DICOM data in an easy "pythonic" way.

DICOM in Python 1.5k Jan 04, 2023
Python package to read and display segregated file names present in a directory based on type of the file

tpyfilestructure Python package to read and display segregated file names present in a directory based on type of the file. Installation You can insta

Tharun Kumar T 2 Nov 28, 2021