Ruby ERB templating and AST magic

Validating invalid dynamic erb blocks

Feb 05, 2024

Ruby and me are new acquaintances. I’ve used it on and off for random scripts and essentially as bash++ but haven’t really built anything real in it in almost 20 years. I’d shied away due to the lack of static typing but since sorbet it has gotten a lot more palatable to me.

Recently I was working on building a templating engine1 that allows external stakeholders to create display based templates based on some predicate matching, i.e. if this template matches some values then apply it and use the override result. In this case a template is an ERB block to evaluate. ERB is a templating syntax that comes out of the box in Ruby. For example:

when:
  some_field: <matches some value>
override:
  some_other_field: "This is an erb block <%= value %>.

A simple example using ERB to render this would look like

ERB.new("This is an erb block <%= value %>.").result_with_hash({
  value: 123
})

Which generates

This is an erb block 123.

Cool. But what happens if value is missing? ERB is going to be totally fine with that and just render

This is an erb block  .

Notice the missing ending value.

I wanted to ensure that we could prevent a user from ever allowing a template to process empty strings creating nonsensical ERB resultant data.

To do that I opted to find a way to break down the ERB template string into its component parts, effectively an AST of the ERB syntax. By getting access to an AST we could then render each segment and validate that its result is non empty. AST’s are fun and it’s a great way to think about a problem when you decompose something into its tree based metadata. AST’s let you lean into your favorite LISPism: “code is data”.

Thankfully I didn’t have to write an ERB parser, and was able to find one called Temple. Using Temple we can break down the ERB AST now by simple doing:

template = "This is an erb block <%= value %>."

erb_parser = Temple::ERB::Parser.new

erb_parser.call(template)

Which will give us a loosely structured hash based syntax tree of the form

[
  [0] :multi,
  [1] [
    [0] :static,
    [1] "This is an erb block "
  ],
  [2] [
    [0] :escape,
    [1] false,
    [2] [
      [0] :dynamic,
      [1] " value "
    ]
  ],
  [3] [
    [0] :static,
    [1] "."
  ]
]

Kind of weird, but I can work with this. At this point we just need to find all values where the first array element (I wish this was structured data and not just positional array fields2) is the symbol :dynamic, and in my research I had already stumbled on a nice gist of someone gracious enough to share just such a function saving me a little bit of time. Thank you stranger!

def extract_dynamics(ast)
  dynamics = []
  ast.each do |node|
    next unless node.is_a?(Array)
    if node[0] == :dynamic
      dynamics << node
    else
      dynamics += extract_dynamics(node)
    end
  end
  dynamics
end

Once we have all the dynamic blocks we can then map the values and re-render them against the context:

template = "This is an erb block <%= value %>."

erb_parser = Temple::ERB::Parser.new

ast = erb_parser.call(template)

dynamic_data = extract_dynamics(ast)

all_present =
  dynamic_data.all? do |entry|
    # individually render each component of the erb 
    # template and validate if they are all
    # non blank. This ensures we don't accidentally 
    # have a "nil" or "" value rendered in the final resulting string

    ERB.new("<%= #{entry} %>").result(render_api.bind).present?
  end

From here we can decision on what to do and prevent accidental errors during dynamic template invocation.

Because there is no way to catch this at template write time (given it’s dynamic yml files provided by external parties) we can make the right thing safe by validating it template time.

Anytime I work on some sort of platform, SDK, or library, I like to think about how I can minimize errors of the consumer and maximize feedback back to them. Since we know the the ERB sub-block we are evaluating we can even tell the consumer that the particular block is empty, giving them some nice discoverability instead of wondering why something is missing.

I’ve come around, slowly, to the power that dynamic ruby has. I still much prefer to add sorbet typing to everything as much as possible and to avoid working with untyped hashes, but sometimes dropping into the magic of meta-programming can be really impressive.

More on template engine first principles can be found at this earlier blog post.

I find that dynamic language enthusiasts shirk structured data like using a struct/interface/whatever and tend to use positional arguments to imply meaning. In this example it’s an of Tuples where the format is [node type, node value]. I find this problematic and I would almost always advocate to not use this because the meaning behind the data is implicit. Without a logical explanation of it it can’t be self discoverable, and refactoring or changing this is now tied to the array shape. In any typescript this would probably represented as an discriminated union of types. Not only can you see what each field means, but the tree and inter-relationships are clear.

interface DynamicNode {
   type: 'dynamic',
   value: string
}

interface StaticNode {
   type: 'static',
   value: String
}

interface Escaped {
   type: 'escape',
   isEscaped: boolean,
   node: DynamicNode
}

type Node = DynamicNode | StaticNode | Escaped

Going back to typed arrays I often ask an interview question (more about that in my upcoming book!) that requires people to pair three pieces of data (key, value, time). Juniors who overly rely on primitive data structures like maps/arrays tend to create an unintelligible mess solving this problem. And that’s not because these structures don’t work, its because they make it confusing and hard to keep track of. People get lost due to the cognitive load they have to put on themselves to know that array[0] is “type” and array[1] is “value”.

Do yourself a favor and just make it an actual data object and things will get a lot simpler.

onoffswitch.net

Discussion about this post