Skip to main content

streaming_yaml_parser

📊 Event-Driven Streaming Parsing for Large YAML Files​

When dealing with multi-gigabyte YAML documents, loading the entire file into memory is impractical. Use Psych::Parser with a custom event handler to process tokens (scalars, mappings, sequences) on‑the‑fly, achieving constant memory usage. This pattern is ideal for ETL pipelines, log processing, or any service that must handle YAML streams in real time.

require 'psych'

class EventHandler < Psych::Handler
def initialize
@path = []
end

def start_mapping(anchor, tag, implicit, style)
@path.push({})
end

def scalar(value, anchor, tag, plain, quoted, style)
# Handle each scalar event as it arrives
puts "Encountered: #{value} at depth #{@path.size}"
end

def end_mapping
@path.pop
end
end

handler = EventHandler.new
parser = Psych::Parser.new(handler)
File.open('large_data.yaml', 'r') do |f|
f.each_line { |line| parser.parse(line) }
end