It’s been two decades since Extensible Markup Language (XML), a markup language for encoding documents, made its debut. While XML has enjoyed considerable success during that time, many programmers (including myself) dislike it. It’s useful to keep in mind the differences between a programming language (like Python, for instance) and markup languages such as XML, which are used to describe data and manage data structures and static user interfaces. In many ways, XML is similar in some ways to HTML, although it’s more powerful because it can add context to data. If you’re building a web page, and you need to define elements such as buttons and images, you’re going to need to know your way around XML. XML was intended as backwards compatible with the Standard Generalized Markup Language (
SGML); this makes it a lot heavier than necessary for just exchanging structured data between programs. Although designed as human- and machine-readable, XML ends up more readable to software than humans—but even worse, it bulks up data to an excessive degree. (It’s also fiddly to edit.) Despite its flaws, XML remains commonplace in many businesses. In my last job maintaining oil-trading software, traders entered trades into a client application. The trade data was sent to the server in XML, where it was converted into SQL and run against a database. Some trades could have a large amount of data, so sending them as XML made for a lot of traffic. It worked, but I always felt it could have been quicker had some smaller transmission format been utilized, instead. Given its pervasiveness, XML will probably be around for some time to come. For instance, it's used a lot in .NET, in config and other files. The problem of handling structured data is not a new one; this wheel has been reinvented many times. The most obvious and well-known XML alternative is probably
JSON; here are some others you might not have heard of:
YAML (Yet Another Markup Language)
YAML is a data serialization language with an emphasis on human readability; it's better at that than JSON, although parsing still requires some effort. In any case, YAML is simpler than XML. Here's part of
an example:
--- !<tag:clarkevans.com,2002:invoice>
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
There are YAML libraries provided for C/C++, Ruby, Python, Java, Perl, C#, Golang, PHP,OCaml, JavaScript, and others.
Protocol Buffers
Created by Google, Protocol buffers are (in the company’s) words: “XML, but smaller, faster, and simpler.” Protocol buffers rely on a different approach to editing files. You create a specification for your data to be serialized in a .proto file. On the
overview page, there’s a simple example of XML:
John Doe
jdoe@example.com
And then the Protocol buffer equivalent:
person {
name: "John Doe"
email: "jdoe@example.com"
}
When you compile this with Google’s protocol buffer compiler, it generates code for your language. Protocol buffers currently support generated code in Java, Python, Objective-C, and C++; with the new proto3 language version, it also works with Go, JavaNano, Ruby, and C#. The protocol version would probably be 28 bytes long and take around 100-200 nanoseconds to parse, compared to 69 bytes for the XML version (minus whitespaces) and 5,000-10,000 nanoseconds to parse.
AXON
More akin to JSON,
AXON combines the best of JSON, XMl and YAML. The code is for Python, with source and examples of use on
Github. Like YAML, it uses indentation to distinguish hierarchic levels:
statement form formatted expression form
axon
name: "AXON is eXtended Object Notation"
short_name: "AXON"
python_library: "pyaxon"
atomic_values
int: [0 -1 17]
float: [3.1428 1.5e-17]
decimal: [10D 1000.35D -1.25E+6D]
bool: [true false]
string: "abc ??? ???"
multiline_string: "one
ConfigObj
Though not updated since 2014, the Python
ConfigObj is handy for creating and reading configuration files. There's in-depth documentation on
Readthedocs. It produces files like the following; this is a Key-Value system, combined with indentation for hierarchy levels:
# initial comment
keyword1 = value1
keyword2 = value2
[section 1]
keyword1 = value1
keyword2 = value2
[[sub-section]]
# this is in section 1
keyword1 = value1
keyword2 = value2
OGDL
Short for Ordered Graph Data Language,
OGDL is another format that writes trees or graphs of text and uses indentation. OGDL is simple and clean. Here’s an example:
eth0
ip
192.168.1.1
gateway
192.168.1.10
mask
255.255.255.0
timeout
20
There are implementations for C, Go, Java and Perl. In Go, you would read it with code like this:
g := ogdl.FromFile("config.g")
ip := g.Get("eth0.ip").String()
to := g.Get("eth0.timeout").Int64(60)
Further XML Exploration
During the research for this article, I found a link to a now-defunct webpage (still available in the Wayback Machine) that links to a page with
26 XML alternatives, including some I've covered here. A few links will have rotted, but it might be still worth a look. If you’re an Android developer, check out this
extensive Android Authority walkthroughof how XML can work for you; it covers everything from syntax to the language’s use outside of layout files. And it’s also worth noting how Google
regards sitemap-related XML. Although language alternatives exist, it’s always good to know the fundamentals of the language itself, given its pervasive use. (Also, make sure to check out our
XML vs. JSON Comparison.)