I have talked about this on here before, but the utter fervor around XML in the late 90s/early aughts is somewhat forgotten and unlike anything else I have seen in my career. Honestly, the current AI boom comes the closest.
It’s hard to explain and harder to justify if you weren’t there to see it with your own eyes; “it does XML” was literally a major selling point on so many products.
If you weren’t working with XML, what were you even doing with your career? You’re a dinosaur. Every single day someone was inventing new uses for XML. It was talked about like some sort of savior of the software industry. It’s really bizarre looking back.
There was something in the water. It lead to a lot of XML centric products like “XML Appliances”.
I joined a startup whose mission statement was to "do something with XML." We grabbed an embedded board, connected it to a printer port and enabled legacy systems to "print" to an XML service. It sold for lots of money and was never really integrated into anything.
I Love XML. Well-designed XML is better than well-designed JSON IMHO for human readability.
However, I can live with its successor Lord JSON. But the ArchDemon YAML - my lord in heaven - who summoned that monster ? I get strong urges to quit software and go into a cave crying after any time dealing with it.
I love it also.
The XML ecosystem is truly staggering. I can't say if any of it is "because" of XML, or simply XML was the thing these different projects hitched their wagons too, just because of ubiquity and momentum.
It is imperfect, but still extraordinary.
One of my favorites is XSL/XLT. Yes it's a programming language in XML that's apt at working with XML. But that's the thing, you have an XML programming language that can easily manipulate XML. What other language does that remind folks of? It's also functional. Hard to hear yourself think around XSL because of all the buzzwords flying around.
An early idea, which never caught on, but was still clever, was download XSL files to the client, and then simply serving up XML files to be rendered by the local XSL. We actually did this with CDAs in healthcare (CDAs are essentially XML health care records). That was a driving thing with CDAs. There were actually contests for student groups to come up with stylings of CDAs.
The idea of stylesheets and XML is really powerful. HTML and CSS are joined at the hip, but that's happenstance. CSS can be applied to raw XML, but that aspect of it never matured like HTML and CSS did.
Then you have things like XSL-FO, giving a straight pipeline from raw XML to a formatted output.
I love XML namespaces. The idea of "hiding" meta data into XML files is quite cool. It was straight forward to consider XML elements in other name spaces as no more interesting than comments. This was used to great effect in SOAP and web services. A nice use case was bundling XML-Sig payloads. This was used in SOAP, in SAML. That was the beauty of it. In theory, you could plonk it most anywhere.
Yes, it had its problems regarding normalization and such. And, yes, XML is complicated. There were certainly some sharp corners in dark rooms to bang you head on with XML. It suffered interoperability issues, but then everything did. Everything still does. That's not XMLs fault, that's just humanity.
It's a powerful lingua franca. Once you get something into an XML format, there's so much that can be done with it, stuff we've been doing for 20 years. JSON et al are still trying to catch up, and suffering similar problems that XML has suffered through.
> I Love XML. Well-designed XML is better than well-designed JSON IMHO for human readability.
But both are absolutely inferior to S-expressions. Compare this (from https://www.xml.com/pub/a/2004/07/21/design.html):
<books xmlns='http://www.example.com/books'> <book publisher="Addison Wesley"> <title>Mythical Man Month</title> <author>Frederick Brooks</author> <publication-date>1995-06-30</publication-date> </book> <book publisher="Apress"> <title>Programmer’s Introduction to C#</title> <author>Eric Gunnerson</author> <publication-date>2001-06-30</publication-date> </book> </books>
The S-expression is trivially parseable; the XML, OTOH, requires a fairly complex (hence potentially buggy and insecure) piece of software — and if you’re not careful, you’re susceptible to a billion-laughs attack. The XML will be littered with extraneous text elements between the tags; the S-expression is not. The S-expression just has atoms and lists; the XML has both tags and attributes. Why is publisher an attribute while title is a tag? Who knows?
(books (ns "http://www.example.com/books") (book (publisher "Addison Wesley") (title "Mythical Man Month") (publication-date "1995-06-30")) (book (publisher "Apress") (title "Programmer’s Introduction to C#") (author "Eric Gunnerson") (publication-date "2001-06-30")))
It’s probably not a coincidence that a Google search for "well-designed XML" yields very few results, with your comment one of those few.
Sorry not a good example by deliberately bloating the XML and also adding double-line spacing. title, author and publication-date should be XML attributes as they don't have children and are singular data of a singular data type.
Sweet, succinct, elegant and can be validated with an XSD and given auto-complete and tab placeholder support in your editor/IDE.
<books xmlns='http://www.example.com/books'> <book author="Frederick Brooks" publication-date="1995-06-30" title="Mythical Man Month" /> <book author="Eric Gunnerson" publication-date="2001-06-30" title="Programmer’s Introduction to C#" /> </books>
The above can be parsed without bugginess or insecurity - those feel like opinions to me. We manage to parse and render trillions of HTML markup following the same style without issues.
> Sorry not a good example
I just pulled the example from the article ‘Designing Extensible, Versionable XML Formats’ hosted at xml.com. Presumably xml.com has decent examples of XML?
I an pretty sure that author is not a singular data type, since it is possible for a book to have multiple authors.
> Sweet, succinct, elegant
I think it’s still inelegant in comparison to the S-expression version.
> The above can be parsed without bugginess or insecurity - those feel like opinions to me.
A quick Googling for ‘xml parsing cve’ shows plenty of hits. Are you familiar with billion-laughs exploits? XML is well-known for them: https://en.wikipedia.org/wiki/Billion_laughs_attack
Could you share some examples? Genuinely interested as I really only know those god awful and huge Maven configs as things one would mostly edit by hand
Maven is an excellent example where the original author forgot that XML attributes exist. Or rather he chose to use a use a terrible parser than didn't support them. And overly verbose names.
With a nice XSD, the below could be typed with full autocomplete and tab-next-placeholder support for element and attribute names in both neovim/vscode/ intellij.
The above declares a project CLI tool with proj metadata and a defined coordinate producing a jar file with 1 compile time dependency and 1 test dependency.
<proj name="My Biz CLI" url="http://github.com/mybiz/mybizctl" modelVersion="4.0.0"> <coord groupId="com.mybiz" artifactId="mybizctl" version="1.0.0" packaging="jar" /> OR <coord>com.mybiz:mybizctl:jar:1.0.0</coord> <!-- alt form --> <deps> <dep groupId="junit" artifactId="junit" version="5.0.1" scope="test" /> <dep>info.picocli:picocli:4.7.1</dep> <!-- alt form --> </deps> </proj>
The S-expression version looks so much more pleasant:
(proj (name "My Biz CLI") (url "http://github.com/mybiz/mybizctl") (model-version "4.0.0") (group-id com.mybiz) (artifact-id mybizctl) (version "1.0.0") (packaging jar) (deps (dep (group-id junit) (artifact-id junit) (version "5.0.1") (scope test))))
I probably should have omitted alternatives and comments in my example - would have come to the same length then and just as easy to read. Did so below and even more pleasant to read - esp for folks already used to HTML.
Do S-expressions have a schema standard ?
<proj name="My Biz CLI" url="http://github.com/mybiz/mybizctl" groupId="com.mybiz" artifactId="mybizctl" version="1.0.0" modelVersion="4.0.0"> <deps> <dep groupId="junit" artifactId="junit" version="5.0.1" scope="test" /> </deps> </proj>
Maven is not a great example of the strengths of XML, as another comment describes.
I use XML a lot in favor of JSON because I can pass around a machine readable schema (yes, I know JSON now has some schema support) and because I can represent things like cycles and comments and numbers that are something other than 64-bit floating point values.
A simpler version of xml would have ruled the world. Name spaces and xpath killed it.
XML was the perfect storm of dot-com cool (it’s somehow part of the web!), serious enterprise (there’s money in it!), and simply being reasonably easy to implement: most software products work with some kind of data, so you can always come up with an XML angle.
The crypto/web3 hype shared the first two aspects but not the third (it’s almost impossible to do something meaningful with crypto in an existing product, as companies like Meta with their now-dead NFT integrations found out).
Generative AI may be crossing the threshold of ticking all three boxes again, as APIs become easier to use and enterprise interest ramps up (because companies don’t want their employees copy-pasting trade secrets into ChatGPT).
Is the reason because it made data exchange basically universal between languages?
Before XML you had random binary or text formats, so APIs were a lot of work?
Everything is Json now, but XML->Json is not that remarkable.
Why was it such a big deal?
Prior to XML the way a lot of data got transferred was via comma-separated-value files. They have a number of limitations (to put it mildly), like how to store strings (single quotes? double quotes? no quotes at all?), what character to use to separate values (it wasn't always a comma - sometimes semicolons and pipes were used), and how to escape the in-band control characters (commas, quotes if used, newlines). If you had to share a file with another party there was a negotiation that happened ahead of time on how all that was to be done.
Example: You have to send the name of customer Finley O'Conner. Do you send it as:
And so on. But with XML the process of escaping control characters was well defined, and the receiver would commonly publish a schema definition that detailed what all the values were like (types, sizes, etc.) You could be up and running fairly quickly and with some confidence that the integration would work.
,Finley O'Conner, ,"Finley O'Conner", ,'Finley O''Conner', ,'Finley O\'Conner',
And XML xsd's where fertile, fertile ground for architecture astronauts building Ivory Towers and Pristine Jewels. Oh, almost all the fields in the xsd are optional? No bother!
Also the acronyms oh god the acronyms. Xsd; xml to define the xml schema! Xpath; xml to query xml! Xslt; xml to transform xml! Excuse me while I go puke.
> Xpath; xml to query xml
Actually, xpath looks like this:
That is very much not XML syntax.
> fertile ground for architecture astronauts building Ivory Towers and Pristine Jewels
I have run into that. :)
I was on the HR-XML standards committee for a brief period, and we had an XSD submission that was truly amazing. The author had a solid knowledge of XML, it was fully annotated and documented, and it would fulfill a need for the community. But it was absolutely huge - XML Spy would crash if you expanded too many nodes. I couldn't see any existing DOM style parser being able to validate a document against it.
I get what you're saying about the X-alphabet soup. And you're right - see the other comment from jerf about the hype cycle. JSON is going through it's own hype cycle at the moment and there are active projects to do many of the things you dislike about the XML ecosystem .. against JSON.
> Before XML you had random binary or text formats, so APIs were a lot of work?
Before XML there was, and still is, SGML. From W3C's XML spec:
> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.
So XML was intended to simplify HTML/SGML syntax, and in the form of SVG, MathML, and XHTML (XForms etc.) to add new vocabularies and vocabulary evolution facilities to the web.
There was this idea that your backend service produces XML payloads, and you transform that to HTML via XML transformation languages such as XSLT. Well, it's fair to say web frontend development took a different direction ;) but the initial idea was inspired by the web being a text/markup based environment rather than becoming a desktop replacement, which it eventually did for economic rather than technical reasons ie. nobody buying software hence only "service" lock-in was left to make a living from development (F/OSS played a role in this, too).
From there, XML was then suddenly used for config files and everything else; not so much because it was a good fit, but because it sold well and was ending tool discussions in projects.
There are numerous uses where XML wouldn't be today's choice for data exchange, but where XML is extremely useful and prevailing, even unlikely ones such as for exchanging 3D CAE models with Blender, but mostly enterprise and gov document and data exchange, because in 2000s and early 2010s the expectation towards "open" systems was higher, or just simply because the segment saw enormous growth during that time.
I think you’ve hit the nail on the head.
It was the new hot universal structured way to transfer data over this newfangled internet and interop between applications and even operating systems. It even uses Unicode(!) which was a very big deal at the time.
It's not like ASN1 didn't exist (or still exist). It's hard to say what'll catch on, I guess.
Isn’t ASN.1 is a binary encoding without a universal human readable representation?
I would not call machine-produced XML at all human-readable.
Of course, I suppose some amount of human-readability is pretty important. It's why textproto became a de-facto standard for protobufs.
They are quite often not, but the. XML does have style sheets which make things a bit easier since every browser can render an XML into a human only readable format that way.
"Human readable" in XML lingo means, actually, "Human readable after transformation". Or: add some CSS to it and it's easily consumable.
There's always some sort of hype thing going on. If you made a chart of all the industry hype going back at least the 1990s there's always something. If I were older I could probably push that date back further, I'm just sticking with what I know.
I think there were a lot of factors. One is that while it's not really true there were no interchange formats, they were all either bad, locked away behind huge paywalls, or unknown to the common programmer. It is hard to conceptualize how the industry worked before the Internet meant ten seconds could lead you to anything you like.
Another is that XML was the evolution of some pre-existing standards and the people who were behind those standards, which includes some companies who made lots of money on those standards, were very excited about an evolution and saw an influence and profit opportunity in hyping it up.
Money in general feeds a lot of the hype cycles. Even before we account for people deliberately pumping them, something that becomes slightly popular attracts money, which attracts marketing, which attracts further popularity. You can see that right now in the AI craze. (Note that being a hype cycle doesn't have to mean it's all hot air, there can be something useful in the middle of it.) Then we account for people pumping it and marketing it on purpose and the hype cycles only get bigger.
XML also got a sort of halo effect from the sudden popularity of HTML and the web. It claimed to be the natural next evolution of the web technologies, even though that never really manifested. (It didn't completely fail, either, like with SVG being XML, but it didn't take the world over either.)
It's a lot of things. It's even that it is a good solution to certain problems that didn't have a good interoperable solution at the time; even today a significant portion of XML's bad reputation isn't that it was actually a bad technology but that it was applied to tasks it very much shouldn't have been. (See "What XML is good at": https://news.ycombinator.com/item?id=11446984 )
But if I had to pick the one thing that HN might be underestimating, it was the amount of money being poured into marketing in order to drive hype cycles so that you had to pick up consulting services from Sun or IBM in order to stay current. These marketing teams bought editorials, magazine covers, would create entire conferences from whole cloth for these things, and a lot of companies would dutifully send their employees off to them in order to stay "current".
There's a lot of money to be made in these hype cycles in consulting companies and companies selling shovels. A more recent one is the whole "data lake" thing; if you've not been part of it you may not realize how much money there is in selling to organizations the idea that if you just shove lots and lots of data into a pile and let Data Experts poke through it they will inevitably discover amazing facts about your company that you had no idea about, like, "if customers have been using your product a lot they're likely to continue to" and "if customers are toning down their usage of your product, they're probably about to leave". (I sarcasm here a bit, but it's just a myth that lots of data inevitably has amazing and subtle things to learn about it.) It isn't that hard for the companies making this money to pour just a bit of that money they're making into generalized fanning of the hype cycle du jour... and rolling back around to the topic, XML got a lot of it.
The most amusing thing about these hype cycles is that they generally precede people understanding the tech and generally the maturity of the tech. Java is #1 today because Sun poured immense amounts of money into a hype cycle... yet in my highly opinionated opinion, Java wasn't actually good for much of anything until after the main cycle. XML was the same thing... people generally thought of it then just as many do now, as just some angle brackets and some vague rules, but it's actually a particular thing with particular uses (https://news.ycombinator.com/item?id=11446984) and particular tendencies, which, looking back, were not well understood by much of anybody until after the hype cycle died down. The XML-based technologies that date from the height of the hype cycle are not just verbose and unwieldy, they're also poor users of XML! One of the clearest examples is Atom versus RSS; while I'm actually generally on team RSS Is Good Enough, Atom comes from an era where people generally understood XML and it correctly specifies things and uses the features correctly, whereas RSS generally dates from the "XML is angle brackets and some attributes" era of XML, and as they solve essentially the same problem provide a really clear contrast.
In the early 00s, I regularly used to drive between the south-east and south-west of England, and one of the cut-throughs to avoid the worst motorway is a town named Bracknell.
For years there was a prominent building on that route for a company whose name I cannot remember, but the tagline on their sign was "The XML Company". If that was happening in Bracknell, I can only imagine how crazy the valley was...
Worrying thing is 20+ years later, I'd take XML over YAML without hesitation...
It's name is/was Software AG.[^1][^2] What an impressively generic name. Looks like they still have an office in Bracknell.
Yes, that’s the one, thank you!
I did notice in ~2019 that they were no longer in the same building (which appeared empty), guess they just moved somewhere less prominent.
This! In 2001, I spent six months working at a major VC and my task was "figure out the XML market". It was wild! XML will solve all our integration problems! Then I went to Scriptics and Scriptics veered into doing an XML product attaching Tcl to XPath to do little bits of processing. We got bought out in part because of that product.
Yup, it makes you wonder what's is the XML of right now. Humans like to participate in social manias, and our time is no exception.
I conjectured that it's Kubernetes. Just like XML, Kubernetes solves a problem, but it doesn't solve EVERY problem.
A sibling comment wrote that there was a startup whose mission was "do something with XML", and then they got acquired and didn't amount to much.
There's lots of that around Kubernetes now. Kubernetes companies get acquired and don't end up doing very much.
There's also another comment on the front page now (don't want to draw attention to it) where a person feels he is incompetent because he can't understand Kubernetes. It might be that he's competent, but an industry mania selected an inappropriate solution for his problem.
> I conjectured that it's Kubernetes. Just like XML, Kubernetes solves a problem, but it doesn't solve EVERY problem.
Like XML, Kubernetes does a lot of cool things which were not common before it came around (unlike XML, I think K8s actually manages to improve on the state of the art a bit, too). But just as XML was largely replaced by JSON, I am really excited to see what replaces Kubernetes.
>unlike anything else I have seen in my career.
It seemed quite a lot like the microservices, nosql and ML hypes to me.
I learned XSLT.
It left scars...
Why? It's a perfectly good functional programming language... https://gist.github.com/pjlsergeant/50a3d086d9513612cb397a13...
To think that the previous incarnation DSSSL actually used Scheme and then this replaced it...
XML, OOP, CASE tools and offshoring were huge. Programmers were going to be out if a job soon.
And yet here we are.
Oh god this is a flashback. One of my earlier jobs (circa 2010 or so) was working as a defence contractor doing operations for a system using an IBM/Weblogic/J2EE SOA stack. Part of this stack involved the XI50/XI52 and eventually an XG45. I inherited these devices from someone leaving the program and it became my niche.
Administering the thing was a giant PITA. Lots of GUIs, clicking, and (shocker) more XML.
I recall some sort of pipeline interface where you could manage all the various steps a requests could pass through with various components for doing SAML auth XSLTs and similar operations.
Since leaving that roll, I've seen absolutely nothing about these devices or anybody else who'd ever used them.
AFRL couldn't get enough XML in the 2000s.
Were they doing a lot of CORBA before that? If so, I might have run towards XML/SOAP too.
They did whatever could drive more dollars to deliver nothing contracts, but there was a particular hard on for getting XML in to anything. It was the solution to everything for a decade at least.
Back in the 2010s these things were all the rage in systems where you used XML in combination with technologies like WS-Security (i.e. signed SOAP requests). I suspect these things are still all over the DoD. I remember all sorts of fun issues with different vendor stacks not interoperating a together well so it was useful to have a standard layer to validate messages before they got to the backend.
The reason all these XML based protocols failed is that
1. XML is extremely space inefficient to begin with so all messages are bloated
2. The promise of interoperability between systems never really materialized. Finding a common set of configurations that would work across the multiple Java stacks, .NET and whatever else was out there at the time was a nearly impossible task.
It turns out simple JSON and API specs were more than enough.
There's still a lot of SAML and SOAP being spoken on the Internet, so I wouldn't call it a failed protocol. Interoperability between Java and .NET was a solved problem somewhere around 2008.
Most important reason XML was replaced by JSON for the majority of use cases was the parser had a lot of complexity and wasn't implemented as well for other languages that were considered 'less serious' at the time. If you built with Rails or PHP you probably had to deal with badly implemented XML and SOAP libraries.
> Most important reason XML was replaced by JSON for the majority of use cases was the parser had a lot of complexity and wasn't implemented as well for other languages...
I think it had more to do with:
1) An advantage of XML was supposed to be that it was human-readable, which meant it would be easier for developers to work with and debug. The problem was that many XML documents were so bloated with tags within tags within tags and redundant attributes all over the place, that in practice it was extremely painful for a human to read. JSON is better, in practice.
Do you think there's any of these still in use?
Absolutely. Putting an intelligent intermediary in front of an API is an age-old pattern, and moving that logic into the backing system itself may have its own logistical challenges.
Working on a product in an adjacent space, we translated WS-Security-protected SOAP interactions to OAuth JSON request/responses. The responses had omitted information based on the authorizations given in the OAuth access token scope.
In some cases you would have to replace the XML gateway with an alternative system, for instance when it is combining information from multiple sources which aren't allowed access to one another (say, for PII separation/auditing reasons).
I was asking specifically about if this "XML hardware" was still in use
And he specifically gave examples of where it is used... Search for IBM Datapowers and the like
Probably. I bet maintaining it is someone's personal hell.
I will take XML over YAML any day.
It's not. It's much more simple, than anything else I know. XPath, and its higher-up XQuery make it a breeze! XSD, while not perfect, can be easily displayed as a graphical diagram. What you, however, need, and there is no way around it, is a specialized XML IDE or an IDE, that has "understood" XML.
Hell is others’ XML.
I think you can still buy them. DataPower is still operating at IBM, but they put less emphasis on XML as a selling point and more on it as a general WAF.
I actually came across this wiki page when searching for the backstory behind JSONx, IBM's standard for representing JSON as XML.[^1]
Crafted by RajatSource Code