In Defense of YAML

Every programmer has opinions about configuration files. These opinions tend to be strongly held and inversely proportional to the stakes involved. In the last few years, the consensus view has shifted: YAML is bad, TOML is good, and enthusiastic users of YAML just might be plainly uninformed. This post takes a different view. We intend to present an argument for YAML which is grounded in history, its specification, and the state of tooling in 2026.

The case against YAML was, for a long time, a reasonable one. The format attracted its critics for real reasons, through years of surprising behavior that burned even careful users. But the specification evolved, and the tooling is finally catching up. To understand why the current consensus is outdated, we need to trace the lineage of configuration formats themselves, because this sort of argument has played out before.

A brief history of configuration formats#

The INI file emerged in the early 1980s alongside MS-DOS and the first versions of Windows.¹ It was the simplest thing that could possibly work: key-value pairs, grouped into sections denoted by square brackets, with semicolons for comments. They are flat, readable, and human-editable. For the needs of that era (like configuring device drivers, specifying font paths, or setting application preferences) it was entirely adequate. Its only real limitation was structural: you could not nest deeper than one level, and there was no formal specification, which meant every parser implemented its own dialect. But for two decades, this was fine.

[boot]
shell=COMMAND.COM
device=HIMEM.SYS

[display]
resolution=640x480
colors=256

Then came XML. In the late 1990s, the enterprise software world adopted angle brackets broadly. XML could represent arbitrary hierarchy. It had schemas, namespaces, transformations. It was self-describing. For a while it seemed as though the debate was settled. But XML configuration files grew large in practice. Anyone who maintained a Java web.xml or an Ant build file in 2003 knows what it was like to edit dozens of nested elements just to change a database connection string. The verbosity made the files difficult to maintain by hand, which is precisely what configuration files demand.

<web-app>
  <servlet>
    <servlet-name>myServlet</servlet-name>
    <servlet-class>com.example.MyServlet</servlet-class>
    <init-param>
      <param-name>database.url</param-name>
      <param-value>jdbc:postgresql://localhost/mydb</param-value>
    </init-param>
  </servlet>
</web-app>

JSON appeared as the lightweight reaction. Douglas Crockford, who claims to have discovered rather than invented the format,² offered the simplicity of the JavaScript object literal: curly braces, square brackets, quoted strings, and a tiny set of types. JSON displaced XML in web APIs through the late 2000s and early 2010s. But as people began using it for configuration (rather than machine-to-machine data exchange), its limitations became apparent. JSON has no comments. It has no multiline strings. Trailing commas are illegal. These are reasonable constraints for a serialization format, but they make JSON miserable for files that humans must author and maintain. The removal of comments from JSON’s spec was, according to Crockford himself, motivated by people abusing them for parsing directives.³ It was the right call for data interchange, but it left a gap.

YAML (2001) and TOML (2013) each arose to fill that gap, and both positioned themselves explicitly against what came before. YAML offered the full expressive power of a serialization language (including arbitrary nesting, multiple documents, references, and custom types) with a syntax built on indentation rather than brackets. TOML, created by Tom Preston-Werner a dozen years later, was a reaction to YAML’s complexity: it aimed to be a “standardized INI” with explicit typing, obvious semantics, and a formal specification.⁴ The pattern repeats in each generation: the previous format’s excess becomes the new format’s founding grievance. What is interesting about the current moment is that YAML’s problems were not inherent to the format’s design. They were artifacts of a particular specification version and the parsers frozen on it.

The case against YAML (as it was)#

The criticisms of YAML are not fabricated. They reflect real experiences that real programmers had over many years.

The most infamous problem is the Norway incident, which has become shorthand for YAML’s implicit typing behavior. In YAML 1.1, the bare scalar NO was interpreted as the boolean value false. This meant that a list of country codes would silently transform Norway into a falsehood:

# What you wrote:
countries:
  - dk
  - fi
  - is
  - no
  - se

# What YAML 1.1 parsed:
["dk", "fi", "is", false, "se"]

The same applied to yes, on, off, y, n, and various capitalizations thereof. Ruud van Asseldonk’s widely-circulated “YAML Document from Hell”⁵ catalogued these and other problems: port mappings like 22:22 parsed as sexagesimal (base-60) integers, version numbers like 10.23 parsed as floats rather than strings, date-like values parsed as timestamps,⁶ and tags beginning with ! could trigger arbitrary code execution in some parsers. {=html}

This is not only a country-code problem. In data science and machine learning code, n and y are natural variable names:

variables:
  x: features
  n: sample_size
  y: target

Under YAML 1.1’s implicit boolean rules, a parser can resolve those keys as booleans instead of strings:

{"variables": {"x": "features", False: "sample_size", True: "target"}}

These were not edge cases encountered only by the reckless. They emerged from the YAML 1.1 specification’s design philosophy of aggressive implicit typing, where the parser attempted to be “helpful” by guessing the intended type of unquoted values. The intention was readability (you could write true without quotes and get a boolean), but the result was unpredictable behavior in practice. Configuration files are precisely the domain where surprising behavior is least tolerable. They are edited infrequently, often by people who did not write the original file, and a silent misparse can propagate through a system undetected for months.

The complexity of the full specification compounded the problem. The YAML 1.2.2 spec runs to ten chapters with sections numbered four levels deep.⁷ There are dozens of ways to express multiline strings. Anchors and aliases create a reference system that, while powerful, adds conceptual weight far beyond what most configuration tasks require. And the security implications of tag-based object deserialization (the yaml.load() vulnerability in Python) became a well-known attack vector.⁸ All of these criticisms were valid, and they were valid specifically of YAML 1.1 and the tooling ecosystem built around it.

What TOML gets right#

TOML deserves some credit. For flat or shallow configuration structures, it is clean, readable, and unambiguous. The syntax is familiar to anyone who has seen an INI file, but with the addition of explicit types, a formal specification, and support for nested tables via dot-separated keys.

Consider a pyproject.toml or a Cargo.toml. These files are typically one or two levels deep, with well-defined sections and predictable content. TOML handles them well. Strings are always quoted, so there is no ambiguity about whether no is a boolean or the word “no”. Integers are integers, floats are floats, and dates are first-class types. Comments work exactly as you would expect. For this class of problem, TOML works well, and its adoption by the Python packaging ecosystem (PEP 518)⁹ and the Rust community (Cargo) makes sense.

TOML also benefits from simplicity of implementation. The specification is short enough that a competent programmer can write a compliant parser in a weekend. This means that the ecosystem of parsers is large, well-tested, and consistent. There is no equivalent of the YAML 1.1/1.2 version split. TOML 1.0 is TOML 1.0, everywhere. These are real advantages.

Where TOML strains#

The trouble begins when configuration needs to express depth. TOML’s handling of nested structures relies on either dot-separated section headers ([servers.alpha]) or arrays of tables ([[products]]), both of which become difficult to read as the nesting increases. This is not a theoretical concern: it is the reason that Martin Vejnár, the author of the PyTOML parser, eventually abandoned his own project. When asked whether his library should become a dependency for pip, he declined and explained: “TOML is a bad file format. It looks good at first glance, and for really really trivial things it is probably good. But once I started using it and the configuration schema became more complex, I found the syntax ugly and hard to read”.¹⁰

Consider a moderately complex configuration. In YAML, the indentation communicates the hierarchy at a glance:

services:
  web:
    image: nginx:latest
    environment:
      DB_HOST: postgres
      DB_PORT: 5432
    resources:
      limits:
        memory: 512M
        cpu: "0.5"

The equivalent in TOML requires repeating the full path in each section header:

[services.web]
image = "nginx:latest"

[services.web.environment]
DB_HOST = "postgres"
DB_PORT = 5432

[services.web.resources.limits]
memory = "512M"
cpu = "0.5"

The reader must mentally reconstruct the tree from a flat sequence of qualified names. The StrictYAML documentation measured this concretely: equivalent TOML files use approximately 50% more characters to represent the same data, largely because of the repeated path prefixes.¹¹

There is also the matter of meaningful indentation itself. Python demonstrated decades ago that indentation as structure is not a weakness but a strength: it eliminates the class of bugs where visual structure disagrees with syntactic structure. YAML inherits this property. TOML does not require indentation (though many authors add it voluntarily, as a non-parsed visual aid), which means that the relationship between a key and its containing table exists only in the section header, not in the physical layout of the file. For deeply nested configurations, this makes TOML files harder to scan and harder to edit confidently.

What YAML 1.2 changed#

The YAML 1.2 specification was published in 2009, with a clarifying revision (1.2.2) completed in October 2021.¹² Its changes address the complaints described above directly.

The implicit typing that created the Norway problem is gone. In the YAML 1.2 Core Schema, only true and false (and their capitalizations True, False, TRUE, FALSE) are recognized as booleans. The words yes, no, on, off, y, and n are plain strings. Sexagesimal number literals (the 22:22 problem) are removed entirely. Timestamp is no longer a core type, so an unquoted 2026-05-05 is a string under the core schema rather than an automatically detected date. JSON is now a strict, proper subset of YAML 1.2, which means any valid JSON document parses identically as YAML. The tag resolution rules are tightened and clarified. The specification itself, while still substantial, is written more clearly and maintained openly on GitHub.

In short, the YAML that people complain about is YAML 1.1. The specification that actually governs the language today is a different, safer, more predictable document. The problem is that most people’s experience of YAML is mediated not by the specification but by their parser, and for most Python users, that parser has been PyYAML, which implements YAML 1.1 and has not changed its core semantics since 2006.

The Python YAML parser landscape#

PyYAML , written by Kirill Simonov in 2006, is the de facto standard YAML library in Python. It wraps LibYAML (a C library) for performance and provides a pure-Python fallback. It is downloaded millions of times per week, it is a dependency of countless packages, and it implements YAML 1.1. This last fact is the root of most YAML complaints in the Python ecosystem. When someone says “YAML parsed my country code as a boolean”, they are describing PyYAML’s behavior, not YAML’s specification. PyYAML’s GitHub repository shows over 200 open issues and 100 open pull requests.¹³ The project is maintained but moves slowly, and a major version bump to YAML 1.2 semantics has not materialized.

The ruamel.yaml library, maintained by Anthon van der Neut, offers YAML 1.2 support with round-trip preservation of comments, flow style, and key order.¹⁴ It is widely used and is significantly more capable than PyYAML for tasks requiring comment preservation or format-aware editing. However, it is primarily a pure-Python implementation in its default round-trip mode, which makes it considerably slower than PyYAML’s C-backed fast path. Its packaging history has also been complicated: namespace package issues and a dependency chain that has occasionally confused deployment pipelines.

StrictYAML takes a different approach entirely, implementing a deliberate subset of YAML with all implicit typing removed, no tags, no anchors, and no flow style.¹⁵ Philosophically it is closer to TOML than to full YAML: a safe, simple format that happens to use YAML’s indentation syntax. It is Python-only, has no implementations in other languages, and does not aim for spec compliance.

What has been missing from this landscape is a library that is fast, fully 1.2-compliant, and simple enough to use as a drop-in replacement for PyYAML’s basic interface.

Introducing py-yaml12#

The py-yaml12 library is a YAML 1.2 parser and formatter for Python, implemented in Rust for speed and correctness. It is built on the saphyr crate¹⁶ (a Rust YAML library) and exposes a minimal, focused API: parse_yaml() and read_yaml() for loading, format_yaml() and write_yaml() for serialization.

Simple#

The design philosophy is straightforward. For the vast majority of use cases, you work with plain Python builtins end to end: dict, list, int, float, str, and None. There is no special document class, no custom node types in the common path. Because YAML 1.2 is a superset of JSON, all valid JSON parses identically. The library achieves 100% compliance with the yaml-test-suite,¹⁷ the community-maintained corpus of edge cases and conformance tests.

:::: {.cell execution_count=“1”}

from yaml12 import parse_yaml, format_yaml
from rich.pretty import pprint

config = """
server:
  host: 0.0.0.0
  port: 8080
  debug: false

database:
  url: postgres://localhost/mydb
  pool_size: 5

regions:
  - us-east-1
  - eu-west-1
  - no         # Norway, not false
"""

data = parse_yaml(config)
pprint(data)

::: {.cell-output .cell-output-display}

{
│   'server': {'host': '0.0.0.0', 'port': 8080, 'debug': False},
│   'database': {'url': 'postgres://localhost/mydb', 'pool_size': 5},
│   'regions': ['us-east-1', 'eu-west-1', 'no']
}

::: ::::

Notice the no in the regions list. Under PyYAML (YAML 1.1), this would silently become False. Under py-yaml12 (YAML 1.2), it is the string "no", as the specification requires. This single behavioral difference encapsulates the entire argument: the format is not broken, the old tooling was.

The file API is similarly direct:

::: {.cell execution_count=“2”}

from yaml12 import write_yaml, read_yaml

path = "config.yaml"
write_yaml(data, path)

:::

The round-trip is lossless. Writing a Python dictionary to disk and reading it back produces an identical object:

:::: {.cell execution_count=“3”}

round_tripped = read_yaml(path)
assert round_tripped == data

print(f"Round-trip matches: {round_tripped == data}")

::: {.cell-output .cell-output-stdout} Round-trip matches: True ::: ::::

For advanced YAML features like tagged values, py-yaml12 provides the Yaml wrapper type. It is opt-in and unnecessary for typical configuration work.

Safe#

The defaults in py-yaml12 are not just about ergonomics and simplicity; they also improve safety. PyYAML shows the risk of the opposite approach: treating tags as instructions can execute arbitrary Python code simply by reading a YAML file.¹⁸

For example, someone can produce a YAML file that aliases PyYAML’s Python object-apply tag namespace:

::: {.cell execution_count=“4”}

dangerous_yaml = """\
%TAG !py! tag:yaml.org,2002:python/object/apply:
--- !py!builtins.eval
- "(__import__('os').environ.__setitem__('YAML_PAYLOAD_RAN', '1'), {'debug': False, 'retries': 3})[1]"
"""

with open("dangerous.yaml", "w") as f:
    f.write(dangerous_yaml)

:::

Then a user expecting only to load a config file runs that code during parsing:

:::: {.cell execution_count=“5”}

import yaml

with open("dangerous.yaml") as f:
    data = yaml.load(f, Loader=yaml.Loader)

print(data)

::: {.cell-output .cell-output-stdout} {‘debug’: False, ‘retries’: 3} ::: ::::

The yaml.load() call looks like ordinary data loading: it returns an ordinary dictionary. But producing that dictionary executed Python code first. Unless you inspect the YAML itself, nothing in the result tells you that happened.

:::: {.cell execution_count=“6”}

import os

os.environ["YAML_PAYLOAD_RAN"]

::: {.cell-output .cell-output-display execution_count=“14”} ‘1’ ::: ::::

In contrast, py-yaml12 keeps an unhandled tag as data unless you explicitly opt in:

:::: {.cell execution_count=“7”}

from yaml12 import read_yaml

read_yaml("dangerous.yaml")

::: {.cell-output .cell-output-display execution_count=“15”} Yaml(value=["(import(‘os’).environ.setitem(‘YAML_PAYLOAD_RAN’, ‘1’), {‘debug’: False, ‘retries’: 3})[1]"], tag=‘tag:yaml.org,2002:python/object/apply:builtins.eval’) ::: ::::

Fast#

Performance is a practical concern for any library that might be called in startup paths or CI pipelines. The py-yaml12 benchmarks¹⁹ compare read and write performance against PyYAML (both its default pure-Python path and the fast CSafeLoader/CSafeDumper backed by LibYAML) and ruamel.yaml, across file sizes ranging from kilobytes to megabytes. Because the core parsing and formatting logic is implemented in compiled Rust rather than interpreted Python, py-yaml12 is competitive with PyYAML’s C extension while maintaining full 1.2 compliance. As of this writing, few other Python libraries offer both.

Conclusion#

The YAML-versus-TOML debate, as typically conducted, is an argument against a format that no longer exists in its problematic form. The complaints are real, but they are historical. They describe YAML 1.1 as mediated by PyYAML, not YAML 1.2 as specified and now properly implemented. TOML remains a good choice for shallow, flat configurations, and pyproject.toml is well-suited to its role. But the claim that YAML is inherently unsafe or unpredictable does not hold against a compliant 1.2 parser.

This is, in the end, a familiar pattern in computing. Every generation of configuration format is a correction of the previous generation’s excesses: INI was too flat, so XML added hierarchy; XML was too verbose, so JSON stripped it bare; JSON was too austere for humans, so YAML and TOML each offered different compromises. The interesting question is never “which format is best in the abstract” but “which format, with which tooling, serves this particular task well”. For complex, nested, human-authored configuration, YAML 1.2 with a modern parser is a strong answer. Perhaps in another decade, something new will arise to correct YAML’s remaining rough edges, and the cycle will continue. That is how formats improve.

In the meantime, you can pip install py-yaml12 and see what a modern, spec-compliant YAML experience looks like in Python.