JSON isn't JSON
JSON (JavaScript Object Notation) was supposed to be the universal data interchange format that would solve the compatibility nightmares of XML. What looks like astraightforward data format becomes a minefield of subtle incompatibilities, edge cases, and implementation quirks that can break your applications in unexpected ways.1
Common pitfalls (selected by myself from the article):
- The Number Nightmare
- not only precision loss for large integers
- but also
NaNandInf, see also Daniel Lemire's tweet 2
- String Encoding Chaos
- The character
écan be represented as:- A single codepoint:
U+00E9(é) - Composed form:
U+0065 U+0301(e+́)
- A single codepoint:
- The character
- Object Key Ordering
- It's fun that this may influence LLM KV cache performance now
- Null vs. Undefined vs. Missing
- Date and Time Fun
Authors' Recommendations:
- Use Schema Validation: The First Law of JSON Robotics
- Normalize Data Types
- Library Selection Matters
- Test Cross-Language Compatibility
May the Parse Be With You