.xlsx

XLSX File Format

Modern Excel format. A ZIP archive of XML parts that stores values, formulas, formatting, multiple sheets, and metadata. The default save format in Excel 2007 and later.

What a Office Open XML Spreadsheet file is

An .xlsx file is actually a ZIP archive. Rename it to .zip, unzip it, and you'll find a tree of XML files: one per worksheet, plus shared strings, styles, and workbook metadata.

Values, formulas, formatting, named ranges, and comments are all stored as XML. That structure is what lets XLSX preserve everything a spreadsheet contains — at the cost of being opaque as a flat file.

A short history

XLSX arrived with Excel 2007 as part of the Office Open XML effort to replace Microsoft's old binary formats with an open, XML-based standard.

It was standardized as ECMA-376 in 2006 and later as ISO/IEC 29500. Because the spec is open, LibreOffice, Google Sheets, Apple Numbers, and dozens of libraries can read and write it without reverse-engineering.

Strengths

  • Preserves formulas, formatting, types, and multiple sheets
  • Compressed — smaller than legacy .xls for the same data
  • Open standard (ECMA-376) — readable by LibreOffice, Google Sheets, Numbers
  • Supports cell comments, named ranges, conditional formatting

Weaknesses

  • Not human-readable; requires a parser
  • Formula behavior can drift between Excel and other engines
  • Slow to load for very large workbooks (100k+ rows × many sheets)
  • Not diff-friendly in git — binary changes obscure row-level edits

When XLSX is the right choice

  • You need formulas, formatting, or charts preserved
  • The workbook has multiple related sheets
  • You're sharing with stakeholders who expect Excel
  • You want types carried explicitly rather than guessed

When to reach for something else

  • You're feeding a system that only reads plain text (use CSV or TSV)
  • You want clean line-by-line diffs in git (the ZIP/XML is opaque)
  • The file is enormous and you only need raw rows — CSV streams, XLSX must be fully parsed

XLSX pitfalls that cause silent data corruption

These are the traps that turn a clean file into wrong data, usually without an error message.

Formula results can drift between engines

A formula that returns one value in Excel may return another in LibreOffice or a parsing library if a function is implemented slightly differently. Before comparing workbooks, decide whether you're diffing formulas or computed values.

Stored values can be stale

XLSX caches the last-computed result of each formula. If a file was saved without recalculating, the cached value and the formula can disagree, which surfaces as phantom differences in a diff.

Dates are numbers with a format

Excel stores dates as serial numbers and applies a display format. Two cells showing the same date can hold different underlying values, and the 1900 vs 1904 date system can shift dates by four years across files.

Multiple sheets hide changes

A change on a sheet you're not looking at is easy to miss. A real comparison walks every sheet, not just the active one.

Frequently asked questions

Is XLSX really a ZIP file?

Yes. Rename any .xlsx to .zip and unzip it to see the XML parts inside — one per sheet, plus shared strings and styles. That's why it preserves far more than CSV, and why it isn't human-readable on its own.

XLSX vs XLS, which should I use?

XLSX is the modern, open, compressed standard and the right default. XLS is the pre-2007 binary format, capped at 65,536 rows, and effectively deprecated. Keep XLS only for compatibility with old systems that can't read XLSX.

Why do two copies of the same workbook show differences?

Common causes are cached formula results that weren't recalculated before saving, the 1900 vs 1904 date system, and floating-point precision in computed cells. Comparing values rather than formulas, with a small numeric tolerance, removes most of these.

Can I diff XLSX files in git?

Not usefully out of the box — git sees the ZIP as a binary blob. To track real row-level changes, compare the parsed contents with a spreadsheet diff rather than relying on git's text diff.

Compare two XLSX files

Drop two versions into SheetCompare and see every changed cell. Free, private, and runs in your browser.