Glossary

Character encoding

The scheme that maps the bytes in a file to readable characters. A mismatch between the file's real encoding and the reader's assumed encoding garbles text.

Character encoding is how text is stored as bytes. UTF-8 is the modern default and can represent every character; older encodings like Latin-1 (ISO-8859-1) and Windows-1252 cover smaller sets. The trouble starts when a file written in one encoding is read as another: accented letters, currency symbols, and non-Latin scripts turn into garbage like Ã© or .

For spreadsheet data this matters most on import and export. A CSV saved as UTF-8 but opened as Windows-1252 will mangle names and addresses, and the corruption is silent, no error, just wrong characters. When you compare two exports of the same data, an encoding difference can make identical rows look changed. Standardizing on UTF-8 end to end avoids most of it.

Related terms

Related tools & guides

CSV format reference

Compare two spreadsheets

Drop two files into SheetCompare and see every changed cell. Free, private, and runs in your browser.