[Python-checkins] cpython: Mention RFC 4180. Based on input by Tony Wallace in issue 11456. (original) (raw)

skip.montanaro python-checkins at python.org
Sat Mar 19 19:08:02 CET 2011


http://hg.python.org/cpython/rev/c63d7374b89a changeset: 68682:c63d7374b89a parent: 68584:b153c341e6ef user: Skip Montanaro <skip at pobox.com> date: Sat Mar 19 09:09:30 2011 -0500 summary: Mention RFC 4180. Based on input by Tony Wallace in issue 11456.

files: Doc/library/csv.rst

diff --git a/Doc/library/csv.rst b/Doc/library/csv.rst --- a/Doc/library/csv.rst +++ b/Doc/library/csv.rst @@ -11,15 +11,15 @@ pair: data; tabular

The so-called CSV (Comma Separated Values) format is the most common import and -export format for spreadsheets and databases. There is no "CSV standard", so -the format is operationally defined by the many applications which read and -write it. The lack of a standard means that subtle differences often exist in -the data produced and consumed by different applications. These differences can -make it annoying to process CSV files from multiple sources. Still, while the -delimiters and quoting characters vary, the overall format is similar enough -that it is possible to write a single module which can efficiently manipulate -such data, hiding the details of reading and writing the data from the -programmer. +export format for spreadsheets and databases. CSV format was used for many +years prior to attempts to describe the format in a standardized way in +:rfc:4180. The lack of a well-defined standard means that subtle differences +often exist in the data produced and consumed by different applications. These +differences can make it annoying to process CSV files from multiple sources. +Still, while the delimiters and quoting characters vary, the overall format is +similar enough that it is possible to write a single module which can +efficiently manipulate such data, hiding the details of reading and writing the +data from the programmer.

The :mod:csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, "write this data in the format preferred @@ -418,50 +418,101 @@

The simplest example of reading a CSV file::

+<<<<<<< local

+======= import csv with open('some.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row) +>>>>>>> other

Reading a file with an alternate format::

+<<<<<<< local

+======= import csv with open('passwd') as f: reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE) for row in reader: print(row) +>>>>>>> other

The corresponding simplest possible writing example is::

+<<<<<<< local

+>>>>>>> other

Since :func:open is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see :func:locale.getpreferredencoding). To decode a file using a different encoding, use the encoding argument of open::

+<<<<<<< local

+======= import csv with open('some.csv', newline='', encoding='utf-8') as f: reader = csv.reader(f) for row in reader: print(row) +>>>>>>> other

The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file.

Registering a new dialect::

+<<<<<<< local

+======= import csv csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) with open('passwd') as f: reader = csv.reader(f, 'unixpwd') +>>>>>>> other

A slightly more advanced use of the reader --- catching and reporting errors::

+<<<<<<< local

+======= import csv, sys filename = 'some.csv' with open(filename, newline='') as f: @@ -471,13 +522,14 @@ print(row) except csv.Error as e: sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e)) +>>>>>>> other

And while the module doesn't directly support parsing strings, it can easily be done::

-- Repository URL: http://hg.python.org/cpython



More information about the Python-checkins mailing list