Processing Streaming Multi-Document YAML Inputs

๐Ÿท๏ธ Structured Data Formats: JSON, YAML, and CSV / YAML Processing

When working with YAML in real-world systems, you will often encounter files that contain multiple YAML documents separated by the --- delimiter. This is common in configuration management tools, CI/CD pipelines, and Kubernetes manifests where a single file defines multiple resources. Processing these streaming multi-document inputs requires a different approach than reading a single YAML document.


๐Ÿง  Context: What Are Multi-Document YAML Files?

A multi-document YAML file contains several independent YAML documents within a single file, each separated by a line containing three dashes (---). An optional trailing ... can mark the end of a document.

Example of a multi-document YAML structure:

  • Document 1: A server configuration
  • Document 2: A database configuration
  • Document 3: A load balancer configuration

Each document is processed independently, and they do not share any context or references unless explicitly linked through external logic.


โš™๏ธ The Core Challenge: Streaming vs. Loading

The standard yaml.safe_load() function can only read a single document. If you try to use it on a multi-document file, it will only return the first document and ignore the rest.

What happens with standard loading:

  • yaml.safe_load(file) โ†’ Returns only the first document as a Python dictionary
  • yaml.safe_load_all(file) โ†’ Returns a generator that yields each document one at a time

The safe_load_all() function is the key tool for processing streaming multi-document inputs. It treats the file as a stream of documents rather than a single block of data.


๐Ÿ› ๏ธ How to Process Multi-Document YAML Inputs

Step 1: Open the YAML file normally

Open the file using Python's built-in open() function in read mode. This gives you a file object that can be passed to the YAML parser.

Step 2: Use safe_load_all() instead of safe_load()

Pass the file object to yaml.safe_load_all(). This returns a generator object, not a list. Each iteration of the generator yields one complete YAML document as a Python dictionary.

Step 3: Iterate through the documents

Use a for loop to process each document individually. Inside the loop, you can inspect, transform, or validate each document as needed.

Step 4: Handle the end of stream

The generator will automatically stop when there are no more documents. You do not need to check for a termination marker.


๐Ÿ“Š Comparison: Single Document vs. Multi-Document Processing

Feature Single Document Multi-Document
Function used yaml.safe_load() yaml.safe_load_all()
Return type Single dictionary Generator of dictionaries
Memory usage Loads everything at once Processes one at a time
Use case Simple config files Kubernetes manifests, CI pipelines
Error handling Fails on first issue Can handle each document separately

๐Ÿ•ต๏ธ Common Patterns for Streaming Multi-Document YAML

Pattern 1: Counting documents

You can count how many documents are in a file by iterating through the generator and incrementing a counter. This is useful for validation or reporting.

Pattern 2: Filtering documents by type

Each document often contains a key like kind or type that identifies what it represents. You can check this key inside the loop and only process documents that match your criteria.

Pattern 3: Collecting all documents into a list

If you need to access all documents later, you can convert the generator to a list using list(yaml.safe_load_all(file)). Be cautious with large files as this loads everything into memory.

Pattern 4: Processing documents with error isolation

Since each document is independent, you can wrap the processing of each document in a try-except block. This way, a malformed document does not crash the entire pipeline.


๐Ÿงช Practical Tips for Engineers

Always use safe_load_all() for multi-document files

Never assume a YAML file contains only one document. Using safe_load_all() as a default practice ensures your code works with both single and multi-document inputs.

Be aware of the generator behavior

The generator from safe_load_all() can only be iterated once. If you need to process the documents multiple times, convert to a list first or restructure your logic.

Handle empty files gracefully

If the YAML file is empty or contains no documents, the generator will yield nothing. Your loop will simply not execute, which may be unexpected. Add a check to handle this case.

Use context managers for file handling

Always use the with open() statement to ensure the file is properly closed after reading, even if an error occurs during processing.


๐Ÿšฆ When to Use Multi-Document YAML Processing

Use this approach when:

  • Reading Kubernetes manifest files that define multiple resources
  • Processing CI/CD pipeline definitions with multiple stages
  • Handling configuration bundles where each document is a separate component
  • Working with Ansible playbooks that contain multiple plays
  • Parsing log or event streams formatted as YAML documents

Avoid this approach when:

  • The YAML file is extremely large and you need random access to documents
  • Documents are interdependent and must be validated together
  • You need to preserve comments or formatting from the original file
  • The input is a single document that does not use the --- separator

โœ… Summary Checklist

  • Use yaml.safe_load_all() for any file that may contain multiple YAML documents
  • Iterate through the generator with a for loop to process each document
  • Handle each document independently to isolate errors
  • Convert to a list only when you need multiple passes over the data
  • Always use with open() for safe file handling
  • Check for empty files or zero-document scenarios

Multi-document YAML processing is a fundamental skill for working with modern infrastructure tooling. By mastering the streaming approach, you can handle complex configuration files with confidence and build robust data pipelines that gracefully handle any number of documents.


This technique reads multiple YAML documents separated by --- from a single stream or file.

๐ŸŸข Example 1: Reading two simple YAML documents from a string

This shows how to load multiple YAML documents from a single string using yaml.safe_load_all.

import yaml

data = """
name: Alice
age: 30
---
name: Bob
age: 25
"""

documents = list(yaml.safe_load_all(data))
print(documents)

๐Ÿ“ค Output: [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]


๐ŸŸข Example 2: Iterating through documents without loading all into memory

This demonstrates processing each document one at a time using a generator.

import yaml

data = """
item: apple
quantity: 5
---
item: banana
quantity: 3
---
item: cherry
quantity: 8
"""

for doc in yaml.safe_load_all(data):
    print(doc["item"], "->", doc["quantity"])

๐Ÿ“ค Output: apple -> 5 banana -> 3 cherry -> 8


๐ŸŸข Example 3: Handling documents with different structures

This shows that each document in a stream can have a completely different schema.

import yaml

data = """
server: web01
port: 8080
---
database:
  host: localhost
  name: prod
---
enabled: true
"""

docs = list(yaml.safe_load_all(data))
print(docs[0]["server"])
print(docs[1]["database"]["host"])
print(docs[2]["enabled"])

๐Ÿ“ค Output: web01 localhost True


๐ŸŸข Example 4: Reading multi-document YAML from a file

This demonstrates loading documents from an external .yaml file using a context manager.

import yaml

with open("configs.yaml", "r") as f:
    for doc in yaml.safe_load_all(f):
        print(doc)

๐Ÿ“ค Output: {'env': 'staging', 'debug': False} {'env': 'production', 'debug': False}


๐ŸŸข Example 5: Filtering documents during streaming

This shows how to process only documents that match a condition while streaming.

import yaml

data = """
name: alpha
status: active
---
name: beta
status: inactive
---
name: gamma
status: active
"""

for doc in yaml.safe_load_all(data):
    if doc["status"] == "active":
        print(f"Active: {doc['name']}")

๐Ÿ“ค Output: Active: alpha Active: gamma


๐Ÿ“Š Comparison: Loading Single vs. Multi-Document YAML

Feature Single Document (yaml.safe_load) Multi-Document (yaml.safe_load_all)
Input format One YAML document Multiple documents separated by ---
Return type Single Python object Generator yielding multiple objects
Memory usage Loads entire document at once Processes one document at a time
Use case Simple config files Log files, batch records, stream data

When working with YAML in real-world systems, you will often encounter files that contain multiple YAML documents separated by the --- delimiter. This is common in configuration management tools, CI/CD pipelines, and Kubernetes manifests where a single file defines multiple resources. Processing these streaming multi-document inputs requires a different approach than reading a single YAML document.


๐Ÿง  Context: What Are Multi-Document YAML Files?

A multi-document YAML file contains several independent YAML documents within a single file, each separated by a line containing three dashes (---). An optional trailing ... can mark the end of a document.

Example of a multi-document YAML structure:

  • Document 1: A server configuration
  • Document 2: A database configuration
  • Document 3: A load balancer configuration

Each document is processed independently, and they do not share any context or references unless explicitly linked through external logic.


โš™๏ธ The Core Challenge: Streaming vs. Loading

The standard yaml.safe_load() function can only read a single document. If you try to use it on a multi-document file, it will only return the first document and ignore the rest.

What happens with standard loading:

  • yaml.safe_load(file) โ†’ Returns only the first document as a Python dictionary
  • yaml.safe_load_all(file) โ†’ Returns a generator that yields each document one at a time

The safe_load_all() function is the key tool for processing streaming multi-document inputs. It treats the file as a stream of documents rather than a single block of data.


๐Ÿ› ๏ธ How to Process Multi-Document YAML Inputs

Step 1: Open the YAML file normally

Open the file using Python's built-in open() function in read mode. This gives you a file object that can be passed to the YAML parser.

Step 2: Use safe_load_all() instead of safe_load()

Pass the file object to yaml.safe_load_all(). This returns a generator object, not a list. Each iteration of the generator yields one complete YAML document as a Python dictionary.

Step 3: Iterate through the documents

Use a for loop to process each document individually. Inside the loop, you can inspect, transform, or validate each document as needed.

Step 4: Handle the end of stream

The generator will automatically stop when there are no more documents. You do not need to check for a termination marker.


๐Ÿ“Š Comparison: Single Document vs. Multi-Document Processing

Feature Single Document Multi-Document
Function used yaml.safe_load() yaml.safe_load_all()
Return type Single dictionary Generator of dictionaries
Memory usage Loads everything at once Processes one at a time
Use case Simple config files Kubernetes manifests, CI pipelines
Error handling Fails on first issue Can handle each document separately

๐Ÿ•ต๏ธ Common Patterns for Streaming Multi-Document YAML

Pattern 1: Counting documents

You can count how many documents are in a file by iterating through the generator and incrementing a counter. This is useful for validation or reporting.

Pattern 2: Filtering documents by type

Each document often contains a key like kind or type that identifies what it represents. You can check this key inside the loop and only process documents that match your criteria.

Pattern 3: Collecting all documents into a list

If you need to access all documents later, you can convert the generator to a list using list(yaml.safe_load_all(file)). Be cautious with large files as this loads everything into memory.

Pattern 4: Processing documents with error isolation

Since each document is independent, you can wrap the processing of each document in a try-except block. This way, a malformed document does not crash the entire pipeline.


๐Ÿงช Practical Tips for Engineers

Always use safe_load_all() for multi-document files

Never assume a YAML file contains only one document. Using safe_load_all() as a default practice ensures your code works with both single and multi-document inputs.

Be aware of the generator behavior

The generator from safe_load_all() can only be iterated once. If you need to process the documents multiple times, convert to a list first or restructure your logic.

Handle empty files gracefully

If the YAML file is empty or contains no documents, the generator will yield nothing. Your loop will simply not execute, which may be unexpected. Add a check to handle this case.

Use context managers for file handling

Always use the with open() statement to ensure the file is properly closed after reading, even if an error occurs during processing.


๐Ÿšฆ When to Use Multi-Document YAML Processing

Use this approach when:

  • Reading Kubernetes manifest files that define multiple resources
  • Processing CI/CD pipeline definitions with multiple stages
  • Handling configuration bundles where each document is a separate component
  • Working with Ansible playbooks that contain multiple plays
  • Parsing log or event streams formatted as YAML documents

Avoid this approach when:

  • The YAML file is extremely large and you need random access to documents
  • Documents are interdependent and must be validated together
  • You need to preserve comments or formatting from the original file
  • The input is a single document that does not use the --- separator

โœ… Summary Checklist

  • Use yaml.safe_load_all() for any file that may contain multiple YAML documents
  • Iterate through the generator with a for loop to process each document
  • Handle each document independently to isolate errors
  • Convert to a list only when you need multiple passes over the data
  • Always use with open() for safe file handling
  • Check for empty files or zero-document scenarios

Multi-document YAML processing is a fundamental skill for working with modern infrastructure tooling. By mastering the streaming approach, you can handle complex configuration files with confidence and build robust data pipelines that gracefully handle any number of documents.

Interactive Views

You are currently in ๐Ÿ“š All-in-One mode. Use the tabs at the top to switch to ๐Ÿ“– Theory Only or ๐Ÿ’ป Code Only views.

This technique reads multiple YAML documents separated by --- from a single stream or file.

๐ŸŸข Example 1: Reading two simple YAML documents from a string

This shows how to load multiple YAML documents from a single string using yaml.safe_load_all.

import yaml

data = """
name: Alice
age: 30
---
name: Bob
age: 25
"""

documents = list(yaml.safe_load_all(data))
print(documents)

๐Ÿ“ค Output: [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]


๐ŸŸข Example 2: Iterating through documents without loading all into memory

This demonstrates processing each document one at a time using a generator.

import yaml

data = """
item: apple
quantity: 5
---
item: banana
quantity: 3
---
item: cherry
quantity: 8
"""

for doc in yaml.safe_load_all(data):
    print(doc["item"], "->", doc["quantity"])

๐Ÿ“ค Output: apple -> 5 banana -> 3 cherry -> 8


๐ŸŸข Example 3: Handling documents with different structures

This shows that each document in a stream can have a completely different schema.

import yaml

data = """
server: web01
port: 8080
---
database:
  host: localhost
  name: prod
---
enabled: true
"""

docs = list(yaml.safe_load_all(data))
print(docs[0]["server"])
print(docs[1]["database"]["host"])
print(docs[2]["enabled"])

๐Ÿ“ค Output: web01 localhost True


๐ŸŸข Example 4: Reading multi-document YAML from a file

This demonstrates loading documents from an external .yaml file using a context manager.

import yaml

with open("configs.yaml", "r") as f:
    for doc in yaml.safe_load_all(f):
        print(doc)

๐Ÿ“ค Output: {'env': 'staging', 'debug': False} {'env': 'production', 'debug': False}


๐ŸŸข Example 5: Filtering documents during streaming

This shows how to process only documents that match a condition while streaming.

import yaml

data = """
name: alpha
status: active
---
name: beta
status: inactive
---
name: gamma
status: active
"""

for doc in yaml.safe_load_all(data):
    if doc["status"] == "active":
        print(f"Active: {doc['name']}")

๐Ÿ“ค Output: Active: alpha Active: gamma


๐Ÿ“Š Comparison: Loading Single vs. Multi-Document YAML

Feature Single Document (yaml.safe_load) Multi-Document (yaml.safe_load_all)
Input format One YAML document Multiple documents separated by ---
Return type Single Python object Generator yielding multiple objects
Memory usage Loads entire document at once Processes one document at a time
Use case Simple config files Log files, batch records, stream data