Handling Nested CDATA With Builder
Tuesday, September 21st, 2010As noted by our associates at Atomic Object, XML doesn’t allow for nested<![CDATA[…]]> elements. In the course of rewriting some pieces of code, I developed the following Builder workaround to allow our application to export valid XML by breaking the nested CDATA elements into distinct chunks. When read back in via our Nokogiri-based parser, it concatenates the values automagically, and the end result is clean, valid XML.
Fix code:
module Builder class XmlMarkup < XmlBase def cdata_with_escaping!(text) if text =~ /(\]\]>)/ text.gsub!(/(\]\]>)/, "]]]]><![CDATA[>") end cdata_without_escaping!(text) end alias_method_chain 'cdata!', 'escaping' end end
Sample output:
>> xml = Builder::XmlMarkup.new(str) >> xml.cdata!("<![CDATA[Foo bar sna]]>") >> xml.target! => "<![CDATA[<![CDATA[Foo bar sna]]]]><![CDATA[>]]>" # valid XML! >> xml.cdata_without_escaping!("<![CDATA[Foo bar sna]]>") >> xml.target! => "<![CDATA[<![CDATA[Foo bar sna]]>]]>" # invalid XML!
Sample parsing with Nokogiri:
>> doc = Nokogiri::XML("<baz><![CDATA[<![CDATA[Foo bar sna]]]]><![CDATA[>]]></baz>") => #<Nokogiri::XML::Document:0x825aff3c name="document" children=[#<Nokogiri::XML::Element:0x825afc1c name="baz" children=[#<Nokogiri::XML::CDATA:0x825af99c "<![CDATA[Foo bar sna]]>">]>]> >> doc.css('baz').first.content => "<![CDATA[Foo bar sna]]>"