<< Line (and Column) Precise Error Messages: Priceless | Home | Mark Volkmann Explains REST >>

Can You Spot The Duplications?

<foo>
  <bar>baz</bar>
</foo>

Now the quiz:

  1. Is this a good thing?
  2. Why was it allowed in the first place?
  3. Are the reasons still valid today?

Details at 11:00.



Re: Can You Spot The Duplications?

Answers:

1. The 'information' is not duplicated, which would be a bigger problem in terms of introducing complexity and bugs into a system. The 'syntax' is verbose and repetitive indeed.

2. Because XML was based on SGML. Most likely to leverage existing tools and parsers.

3. Yes, because (almost) every mainstream language, operating system, text editor, and IDE supports this syntax. Using anything else forces you back into "writing your own parser" mode, diverting attention away from solving your customers' problems.

Re: Can You Spot The Duplications?

If the verbosity of XML bothers you, take a peek at http://www.yaml.org

Re: Can You Spot The Duplications?

The duplication aspect of XML syntax is necessary evil to enforce well-formedness. Like someone said, XML leverages the existing technology for parsing and transfer. With effective compression techniques, you can reduce the serialization considerably. Given the fact that it is the machines that generate these tags, it is of no inconvenience to humans. It just makes it easy to debug stuff. If you look at the syntax of similar technologies, Windows INI, Java properties, and even YAML they may be concise but enforcing the identity constraints, referential integrity constraints is not possible (or to give them some benefit of doubt, it is not as effective as XML with XSD). I am not saying that developers are not affected by this problem of duplication - I myself am frustrated about creating 14 XML files to display three JSP file! But, that is the necessary evil if you want your application to be easily testable, configurable and about maintenance, I have conflicting thoughts. Its good to search for silver bullets but while you are at it, just keep in mind that aluminum also looks very similar to silver.

Re: Can You Spot The Duplications?

Answers:

1. It's not a good thing.

2. SGML allows minimization:

<foo>
  <bar>baz</>
</>

The feature was left out of XML to make it easy to write XML parsers.

3. The reason is no longer valid. Modern validating XML parsers are monsters (think XML Schema support):

[weiqi@gao] $ ls -l xercesImpl.jar
-rw-r--r--  1 weiqi weiqi 1010675 Feb 20  2004 xercesImpl.jar

Re: Can You Spot The Duplications?

Sorry but you are wrong. The closing tags without the name of the tag makes it harder for human beings to read XML. Imagine a single long line of tags and try to figure out which clsoing tag goes with which opening tag.

Re: Can You Spot The Duplications?

The same way you figure out which closing brace goes with which opening brace in C---with the help of your editor.

In SGML, you can choose not to use minimization for a particular closing tag. There is also a normalization program that fills in the missing tag names for you.

The mandatory quotes around attribute values is another irritation.

The "my site is W3C validated XHTML 1.1 strict" fanatics probably don't realize those untidy HTMLs were valid back in 1994 as SGMLs.


Add a comment Send a TrackBack