.csv

CSV File Format

Plain-text format where each row is a line and each value is separated by a comma. The lowest-common-denominator export from every spreadsheet, database, and analytics tool.

What a Comma-Separated Values file is

A CSV file is plain text. Each line is one record, and fields within a record are separated by commas. The first line is usually, but not always, a header row naming the columns.

A value that contains a comma, double-quote, or line break must be wrapped in double-quotes, and an embedded double-quote is escaped by doubling it. Everything else is raw text, which is why CSV has no concept of a number, date, or boolean — only strings that something downstream chooses to interpret.

A short history

Comma-separated lists predate the personal computer — they were used in FORTRAN programs in the early 1970s, and the format was a standard export in spreadsheet software like VisiCalc and Lotus 1-2-3 by the early 1980s.

Despite its age, CSV was only loosely specified until RFC 4180 in 2005, which wrote down the de facto rules: comma delimiter, double-quote escaping, CRLF line endings. RFC 4180 describes common practice rather than mandating it, which is exactly why real-world CSV files still vary so much.

Strengths

  • Opens in any text editor, spreadsheet, or programming language
  • Streamable — readable line-by-line, no full-file load
  • Tiny on disk; no markup overhead
  • Universal export target for databases and BI tools

Weaknesses

  • No types — every cell is a string until interpreted
  • No formulas, formatting, multiple sheets, or merged cells
  • Quoting and delimiter rules vary; commas inside values break naive parsers
  • No native date/number locale — "3,14" vs "3.14" causes silent corruption

When CSV is the right choice

  • Moving data between systems that don't share a native format
  • Loading into a database, warehouse, or analytics tool
  • Keeping tabular data in version control where line-by-line diffs matter
  • Handing a dataset to someone who might open it in anything

When to reach for something else

  • You need formulas, formatting, charts, or multiple sheets (use XLSX)
  • Your values frequently contain commas and you can't control the quoting (consider TSV)
  • Type fidelity matters and downstream tools will guess wrong on leading zeros, long IDs, or dates

CSV pitfalls that cause silent data corruption

These are the traps that turn a clean file into wrong data, usually without an error message.

Leading zeros vanish

Open a CSV in Excel and a ZIP code like 02134 becomes 2134, because Excel infers a number. The data on disk is fine; the viewer corrupts the display. When you compare files, compare the raw values, not what a spreadsheet app renders.

Long numbers turn into scientific notation

Account numbers and IDs longer than 15 digits get rounded or shown as 1.23E+15 once a tool treats them as numbers. Keep identifier columns as text from end to end.

Locale flips decimals and delimiters

Many European locales use a comma as the decimal separator and switch CSV exports to semicolons. The same file opened in a different locale silently reinterprets every number.

Encoding mismatches mangle characters

A UTF-8 file opened as Latin-1 turns accented characters into garbage, and a stray byte-order mark (BOM) can show up as extra characters in the first header cell.

Frequently asked questions

Is there an official CSV standard?

RFC 4180 (2005) documents the common rules, but it describes existing practice rather than enforcing it. Tools still differ on delimiters, quoting, and line endings, so "valid CSV" is best understood as "whatever the receiving parser accepts."

Why do my numbers change when I open a CSV in Excel?

Excel infers a type for every cell on open: leading zeros are dropped, long digit strings become scientific notation, and date-like text is reformatted. The file on disk is unchanged; Excel's display is the problem. To preserve values, import the CSV as text or convert it to .xlsx with the types you want.

Comma or semicolon delimiter?

US and UK locales use commas; many European locales default to semicolons because the comma is their decimal separator. If a CSV opens as a single column, the delimiter probably doesn't match your locale. A good parser detects the delimiter instead of assuming a comma.

What encoding should a CSV use?

UTF-8 without a BOM is the safest modern default. A BOM can appear as stray characters in the first header for some parsers, and non-UTF-8 encodings risk mangling accented or non-Latin characters.

Compare two CSV files

Drop two versions into SheetCompare and see every changed cell. Free, private, and runs in your browser.