Wednesday, December 14, 2005

CSV File Format

CSV stands for Comma Separated Values. It is a common file format for storing tabular data, e.g. spread sheet data. The file format of CSV is simple -- a text file, values are separated by comma (,) and rows are separated by newline. However, there is still some tricky in the format.

The tuck point is how to escape the comma in the values. If a value contains commas, the value should be quoted with double-quote ("). e.g.:

value one,"value two with ',' inside",value three
If a value contains a double-quote, the value should be quoted with double-quote and the double-quote in the value should be escaped with another double-quote. e.g.:
value one,"value two with '""' inside",value three
The specification of the CSV is described in RFC4180.

Even there is an RFC standard for CSV format, there are some deviation standard exists. In some applications, the comma and double-quote is escaped by a back-slash (\). Moreover, the character encoding is not stored in the file format and causes ambiguity. So, it is not trivial while using this format.

No comments: