Skip to main content

streaming_yaml_parsing

🔄 Stream Large YAML with Psych::Parser and Handler​

When working with huge or streaming YAML documents, loading everything at once can blow up memory. By subclassing Psych::Handler and feeding chunks to Psych::Parser, you can react to each event (mapping, sequence, scalar) as it happens and discard data you don’t need.

require 'yaml'
require 'psych'

class MyHandler < Psych::Handler
def initialize
@path = []
end

def start_mapping(anchor, tag, implicit, style)
@path.push({})
end

def scalar(value, anchor, tag, plain, quoted, style)
# Called for every scalar; you can filter by @path state
puts "Scalar at #{@path.size}: #{value.inspect}"
end

def end_mapping
finished = @path.pop
# Process or save the 'finished' mapping if needed
end
end

handler = MyHandler.new
parser = Psych::Parser.new(handler)

File.open('huge.yml', 'r') do |f|
f.each_line { |line| parser.parse(line) }
end