Convert XML to JSON: Stop Writing Parsers for Every API That Still Uses XML

A payment processor I integrated with last month sent their webhook callbacks in XML. The rest of our stack was JSON — TypeScript frontend, Node backend, MongoDB. Suddenly I had to write an XML parser for one endpoint. I went looking for a converter that wouldn't send financial transaction data to a third-party server, couldn't find one, and built it directly into Formly instead.

XML isn't dead. It just smells that way. SOAP APIs, RSS feeds, SAML assertions, government data portals, enterprise ERPs — they all still speak XML. And if you work across enough systems, you'll eventually need to bridge XML and JSON. Here's what I've learned about doing it without losing data along the way.

The Conversion Is Trickier Than It Looks

On the surface, converting XML to JSON sounds like a straightforward parsing problem. In practice, XML has features that JSON has no equivalent for — and sloppy converters silently drop them.

A naive converter produces {"customer":{"name":"Acme Corp"}} and the id and type are gone. The W3C XML specification treats attributes as metadata about elements — they're not "extra," they're part of the data model. I've seen this bite teams when converting SAML responses where security attributes live entirely in the attribute space.

I prefer converters that prefix attributes with @ or nest them under a dedicated key, so you can spot-check the output and immediately see whether attributes survived.

Namespaces: The Part Everyone Forgets

XML namespaces are like the semicolons in JavaScript — optional until they aren't, and then everything breaks. A SOAP envelope uses at least two namespaces:

If you strip the namespace prefixes, you get {"Envelope":{"Body":{"Order":{"Number":"4821"}}}}. Looks clean. But what happens when you have <po:Number> and <inv:Number> in the same document? Suddenly they collide. Your converter just merged two different fields into one.

I learned this the hard way debugging an invoice feed where line-item numbers kept getting overwritten by header numbers. The converter dropped namespaces and the data collision was silent — no error, just wrong totals.

Text Nodes Mixed with Child Elements

JSON has no concept of mixed content. A competent converter wraps bare text in a #text key, giving you an array of text and elements. A bad one keeps whichever text node it saw last. I've tested a half-dozen converters and about half get this wrong.

According to RFC 8259, JSON represents structured data — unlike XML which represents documents. This isn't a flaw in either format, but it means every XML-to-JSON converter is making opinionated decisions about what to keep and what to flatten. There is no "standard" conversion — XSLT 3.0 is the closest thing, and it's a transformation language, not a converter.

When I Use a Browser Converter vs. Writing Code

If this is a one-time data migration — pulling a client's old XML product catalog into a new system — I reach for the browser converter. It's faster than writing a Python script, and I can visually verify the JSON looks right before importing. If this is a production API integration that needs to run every 5 minutes, I write a proper Node.js parser with fast-xml-parser and explicit attribute handling.

The browser converter fills the gap between "I need this once" and "I need to maintain this forever." It's the same reason I use a browser image converter instead of opening Photoshop for a quick resize. Speed matters when you're iterating.

What I check before trusting a conversion:
1. Are XML attributes present in the JSON output?
2. Did namespace prefixes get preserved or flattened?
3. How does it handle mixed content (text + children)?
4. Are empty elements <tag/> converted to null or empty string?

Browser vs. Server: The Data Privacy Angle

XML data tends to be enterprise data — customer records, financial transactions, identity assertions. Sending that to a server-side converter means your XML sits in someone else's infrastructure. A browser-based converter processes everything locally: the XML parsing engine runs in your tab, and the data never leaves your machine.

I'm not being paranoid. I'm being pragmatic. When a vendor sends me a sample XML feed with real (or "anonymized" but still identifiable) customer data, I don't want that data on a random SaaS server. The browser keeps it local.

Written by Sam Taylor — Full-Stack Developer. I build tools that process data in the browser so you don't have to trust a server. More about me →