Glossary

UTF-8

The dominant character encoding for text and data files. It can represent every Unicode character and is the safe default for CSV, JSON, and TSV.

UTF-8 is a variable-width Unicode encoding: ASCII characters take one byte, and other characters take two to four. It is backward-compatible with ASCII and can represent every script and symbol, which is why it has become the default for the web, JSON (which requires Unicode), and most modern data tooling.

For spreadsheet and CSV work, UTF-8 without a byte-order mark is the safe choice. It preserves accented and non-Latin characters across systems, and avoids the silent corruption that comes from reading a file in the wrong encoding. The one caveat is a leading BOM, which some tools display as stray characters in the first header cell.

Compare two spreadsheets

Drop two files into SheetCompare and see every changed cell. Free, private, and runs in your browser.