Extremely XML

XML is one of the more exciting file formats in the past fwe decades. Rather than just being a convenient way to store information, it tends to open up accessibility to information to more software than any other format in history.

Sunday, October 22, 2006

Mozilla XForms Extension Preview Version 7 - October, 2006

I was very pleased this afternoon to notice that Preview Version 7 of IBM/Mozilla extension for Firefox was released this month.

Mozilla XForms Extension Preview Version 7 - October, 2006:
Release Notes for Preview Version 7 - October, 2006


If you use the Eclipse IDE or use web standards in your software and/or web designs, you might be very interested in this article IBM published at its developerWorks site several months ago - Apply Schematron constraints to XForms documents automatically:

The World Wide Web Consortium (W3C) developed the XForms standard for the presentation and collection of form data. As stated in the W3C Recommendation, XForms is intended to be "the next generation of forms for the Web." As the Recommendation itself declares, "By splitting traditional XHTML forms into three parts -- XForms model, instance data, and user interface -- it separates presentation from content, allows reuse, gives strong typing -- reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting."

XForms documents feature a data model that contains one or more XML instance documents. The form manipulates the instance documents and provides for the submission of that XML to a back-end system. Since Schematron is itself XML, XForms can easily treat it as part of the data model for a form.

XForms achieved a significant milestone with the release of the second edition of the XForms 1.0 specification on 14 Mar 2006. The update to the XML Forms Generator containing Schematron support became available on alphaWorks shortly thereafter.



The Eclipse extension from IBM that is described in the article allows one to easily create XML-based forms using XForms that comply with the W3C (world wide web consortium) XForms standard, and harness the Schematron standard for attaching validation rules to it as well.

XForms is pretty real at this point.

It will get a whole lot more real when the XForms extension gets bundled into Firefox instead of available as a user-installable option.

In the meantime, any developer or user can get it and install it in a minute or two. In my opinion, for some of them - that is time well-spent.


Technorati tags: , , , , , ,

Sunday, September 24, 2006

Xforms.org website

There is a pretty cool site that came up this summer called Xforms.org.

It looks very cool. The guy who is responsible for the site seems to get XForms and understand its significance.

A lot of the complexity and tedium that arises is modern web UI programming frameworks evaporates when XForms replaces the rather dumb HTTP GET/POST choices of regular, old HTML.

XForms has scored some big customers and major backers during the past year or two, so it really is a viable technology.

It would be nice if XForms.org would post some more entries to its blog about XForms.

Technorati tags: , , ,

Sunday, September 10, 2006

El Defensor Chieftain: Community Calendar

What is it with New Mexico? Are they some kind of high tech hotbed?

This is the latest of a number of times that I have spotted a note in their calendar about an upcoming or recently passed XML related presentation, like this one on DocBook.

It is seriously cool. I wonder why I do not see stuff like this in other community calendars?

El Defensor Chieftain: Community Calendar:
Wed/Sep. 6

Intro to XML, 4 p.m. %u2014 MSEC 187. Today: "DocBook: An XML-based documentation system for print and Web."

Programming with Python, 5:30 p.m. %u2014 Weir 128. Today: "Numeric and string operations."
Technorati tags: , ,

Sunday, August 27, 2006

Apply Schematron constraints to XForms documents automatically

It has been obvious for half a decade or more that much of the GUI programming effort we take for granted in the software industry is just busy work.

As always happens with busy work, sooner or later it gets whacked. Some programmer or corporation locks it in its sites, types in a program to automate the job, and blows it away.

That is just what has happened, or is in the final stages of happening, with GUI forms production.

IBM has released software that generates data-entry validation for state-of-the-art web forms for you.

Apply Schematron constraints to XForms documents automatically:
IBM alphaWorks has released a new round of free tools, including the XML Forms Generator, to accelerate the development of forms that comply to this standard. The recent update lets you apply constraints defined in a Schematron 1.5 document to the generated form. Itself an XML markup, Schematron provides for the specification of business rules and data relationships that XML Schema cannot. While XForms natively provides for validation against XML Schema, any use of Schematron constraints must be built into the form itself.


It is hard to imagine anyone going back to the archaic Resource file format of the Macintosh circa the 1980s, or the Resource Compiler of MS-Windows back in the 1990s.

These days, we pretty much are all using XML or HTML based file formats to define our GUIs. The exception being some Swing forms that are hand programmed in Java using the Swing framework.

XForms is an XML based format and it is much higher-level, and at the same time simpler to use, than the ancient HTML file format. XForms itself was invented over have a decade ago and has been an official web standard for years.

Many companies like IBM are already using it in commercial applications. It helps head off problems which have recently made the national news, where people desperately needed to access a website but did not have a specific version of a specific brand of a specific web browser running on a specific version of a specific operating system.

XML Schemas: the better way

Lots of people use XSD as their sole solution to XML needs because it is so powerful and it is from Microsoft.

Microsoft employees have posted blog entries deploring the unnecessary complexity of XSD. For some of its newest XML file formats, W3 has switched to using other schema languages such as RELAX NG to define the syntax of their files. These other schemas are shorter and more readable too.

What is refreshing about the non-XSD schema community is that they use more modular means of defining/validating their XML file formats than XSD. The schemas are very easy to read/write. They are way simpler for a computer program to process and understand than XSD or even DTD.

Here is a nice set of Schema Design Guidelines I stumbled across on the web this afternoon. The goal is to do the very things I just outlined above as a practice, when defining XML file formats.

XML Schemas: Best Practices

It is a great idea, huh?

Schematron is one of the simplest computer languages I have ever come across. It is an XML-based format for defining validation rules for checking the syntax of a given XML file format.

RELAX NG is easier for people to learn than XSD. Anyone who knows the basics of regular expressions and BNF will have little or no trouble learning it very quickly, especially its Compact form (.RNC file format).

You can probably learn this stuff quicker than you can hand-code your next XSD file from scratch.

Once you have the .RNG grammar file, it is more useful than the .XSD file. Throw in a .SCH (Schematron) rule file - and you are far ahead of that XSD file you are picking away at.

Technorati tags: , , , , , ,

a more elegant approach to validating XML, by using RELAX NG, Schematron, and XVIF

I have had this book about RELAX NG for a few years now.

I just came across a neat description of how to write more efficient, elegant RELAX NG schemas - thus, saving lots of time.

Annotation for Applications

Tuesday, June 27, 2006

Visualizing Social Networks by Harnessing: SIOC and FOAF documents, RDF tools - and producing SVG diagrams

Fred Glasson wrote a very interesting blog entry entitled Implementing and visualizing relationships between Talk Digger's SIOC and FOAF documents.

Researchers are frequently publishing papers about analyzing social networks using semantic web technology now.

The reasons for that are pretty obvious:
  1. There are networks of bad people doing bad things in the world, including the US, right now - so social networks are topical.
  2. Social networks are already documented in semantic web syntax whenever FOAF documents are involved.
  3. Page scraping software can get information published on the Web as HTML text into semantic web compatible XML very quickly.
  4. Semantic web analysis tools are already very powerful and widely available for free.
  5. SVG is widely supported (including by Firefox 1.5) and, being another XML file format - it is easy to convert semantic web data (RDF and OWL based XML data) into SVG using tools like IsaViz.


So in other words, analyzing social networks provides low-hanging fruit for semantic web researchers and commercial companies and other organizations.

The toolchain (define: toolchain) he describes in his post is simple, completely based on free software, and generally useful to a lot of information analysis projects involving data-mining, information analysis, and graphical visualization of the results.

PC Pro: News: paper at w3 conference uses semantic web to turn social web into information goldmine

Data-ming seeks to bring out valuable nuggets of knowledge buried deep in a morass of data. Not only can it do that, if often does.

Really poor data mining jumps to conclusions about people based on false precepts. An example would be that matching first and last names proves matching identity. It does not. Everyone knows it does not.

Good information is qualified, by carefully matching up multiple pieces of information - and taking context into account.

Semantic web researchers recently used their skills to piece together a puzzle involving a huge number of people.

They analyzed a bunch of FOAF files, figured out who-knew-who - and compared that with a list of C.S. researchers.

They took that a step further, and tried to determine how prevalent the possibility of conflict of interest (COI) issues were.

PC Pro:
The plan was to map a simple social network Friend of a Friend, where individuals listed their immediate friends, against a commercial bibliographic database of authors computer science papers.

The latter was Semantically tagged, whereby records are attributed additional data describing each record, for example subject, date, author and so on. This means that online information can be meaningful not just to people viewing it, but also to computers accessing that data.

The goal of the research project was to discover whether there were any conflicts of interest between those authors putting forward papers and those chosen to review them. The researchers claimed the project brought out inferences that a simple topographical view would have missed.


The result is the researchers gleaned some interesting facts while preparing their paper, Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection. Facts they would not have stumbled over or inferred any other way.

What makes it interesting is how many people in the US have dumped their information into MySpace and other social websites. Even more interesting is that they identify on that same site, who their friends are. Sadly, I doubt most of those friends listed on that site really are friends in any conventional meaning of the word.

That is where context actually comes in.

However, the mechanism is still valid. The source of the input data just has be of adequate quality. The FOAF data the researchers culled from was closer to that than MySpace. So their conclusions, if identities were matched by more than first and last name, are probably interesting.

That... makes the Semantic Web a whole lot more interesting.
Technorati tags: , , , , , ,

Wednesday, June 21, 2006

Java Platform, Standard Edition (Java SE) 6 Beta 2

Java SE 6 (JDK 1.6) is on beta 2 now.

There are some interesting improvements being made that will make Java nicer for people trafficking/trading in XML - and persisting information with it.

Java Platform, Standard Edition (Java SE) 6 Beta 2:
New Client and Core Java Architecture for XML-Web Services (JAX-WS) 2.0 APIs
New support for Java Architecture for XML Binding (JAXB) 2.0


Seeing how they have hit beta 2 at mid-year, it seems very likely this JDK will be seeing its official release before the end. That is just going by past history, especially recently past history.

Sounds like Java will get even better for XML-oriented developers very soon.
Technorati tags: , , , ,

Monday, June 19, 2006

DocBook 5 beta 6 has added support for SVG+MathML

Great news this month for those doing software documentation using DocBook!

Not only does the latest beta of DocBook 5.0 support Schematron schema rules and RELAX NG schema grammar for validating DocBooks - it also adds support to DocBook itself for MathML and SVG.

It does not take a genius to figure out why DocBook upgraded its validation scheme when it upgraded itself to support these 2 new XML modules.

The XHTML 2.0 standard is another popular XML standard that is eschewing the older DTD/XSD schema technology in order to get the flexibility - not to mention ease of writing/reading/understanding that RELAX NG proffers.

My guess on what motivation for adding support for SVG+MathML is that Firefox 1.5 supports them, as well has XHTML. And, not just that Firefox supports having them combined in a single document - all 3 of them!

My guess is that the world of software documentation authoring/reading/distibution is about to take a turn into W3 nirvana. It is nice to see that after so long existing as separate - and often ignored - technologies, they are all finally getting integrated at last.

Cafe con Leche XML News and Resources:
Norm Walsh has published the sixth beta of DocBook 5.0.

DocBook 5 is "a significant redesign that attempts to remain true to the spirit of DocBook."

The schema is written in RELAX NG. A DTD and W3C XML Schema generated from the RELAX NG schema are also available.

There's also a Schematron schema "that validates some extra-grammatical DocBook constraints. These patterns are also present directly in the RELAX NG Grammar and some validators, for example MSV, can perform both kinds of validation at the same time."

This beta allows MathML and SVG in imagedata and improves support for aspect-oriented programming source code in DocBook documents.

Schematron, You Are On - paper, that is

At long last, the ISO Schematron standard spec is available as printed hardcopy from the ISO.

Not only that, it is available from ANSI too! Most americans will probably get it from ANSI, since their price is in dollars - not swiss francs. Not to mention most americans are better at reading English than French and do not live particularly close to central Europe.

O'Reilly XML Blog:
The paper and online versions of the ISO Schematron standard are now available from ISO for CHF120 and from ANSI for US$98.


This is one of the simplest ways to validate an XML document that there is. It is based on rules, not grammar. So it works better for XML document formats that are, as should be obvious, specified according to rules - not a grammar per se.

Combined with the RELAX NG schema format, which is a grammar based schema language (and a simple one at that) - Schematron rules can provide a lot of flexibility in determining if (or why not!) an XML document is valid.

Technorati tags: , , , ,

Bio of an XML 'Borg

Back in 1997 or 1998, I flicked a URL to a friend and former coworker of mine by email.

That link was to a new World Wide Web Consortium standard called XML.


I did not think much of it at the time. I remember I just thought it was interesting as a better way to represent data files than tab-separated values and a lot better than the crufty comma separated value (CSV) file format.

I also remember thinking that it was rather verbose, and not all that inspired really - to simply take the HTML format, and essentially say, There, you can make up your own tag names now, and use it for data!.

I mean that was okay as ideas went, but hardly revolutionary.

I was wrong.


My friend went silent for days after I sent that URL. And I think days turned into a couple weeks.

I thought he was mad at me over some slight I was unaware of but nevertheless responsible for, so I did not bother him. I figured if he wanted to get back to me, he would.

It turns out he was not mad at me at all. He was very, very deeply absorbed in learning everything he could about how XML was used.

That was something I had not bothered to do. I saw it as a data file format, and that was that.

Two weeks after I sent him my message, my friend broke the utter silence that had suddenly opened up between us.


He informed me that there was rapid progress being made in XML parsers, which was being spurred by the fact that there was a standardized parser API called SAX. So applications were written to use parsers via that API, and parsers basically functioned as plugins to them.

He went on to tell me that there was work underway on a new standard formatting tool called XSLT. He detailed some of the arguments between one of its inventors and a community that wanted it to be not merely something to style documents with - but a programming language for manipulating them as data.

He told me about the API wars being waged over the choice of functions that would be available in a standard programming library for manipulating XML documents from applications.


He and I subsequently did a presentation about XML to a company in Beltsville, Maryland in mid-1998. He and I introduced XML technology to an R&D office of a blue chip company in the suburbs of Washington D.C.

In late 1998 or early 1999, right around the turn of the year, I wrote my first Java code to generate XML. It was fun and it was easy. It generated a dump of the in-memory data model of an application I was debugging.

In 2000, I wrote some Java code that exported the definitions of the tapes in a tape library system for which I was one of the developers. I also wrote the Java code that would read that same document, verify that it was well-formed, and import the data if it was.

That pair of commands proved a very handy way to allow our system operators to do some basic maintenance of the system - without our team having to write a whole data enter/edit module for those entities.


After that, I got into using XML for documents. Generating web pages, word processor files, spreadsheets, PDF reports - that sort of stuff. I did a lot of that, and I enjoyed it.

If anything was particularly tedious about that, I would have to say it was the incompletely support for certain things in the standards.

While support for reading/writing XML itself has never really been lacking, what has been kind of incomplete is the software support for the most complex XML-related (or XML-compatible) technologies. In my experience, these things include:
  1. CSS
  2. XSL-FO

The things that generally did not seem to fall short of implementing the standards were: XML parsers, DTD validators, and XSLT transformers. Those things worked pretty well. If you wrote your programs for them based on what the spec said, your programs would generally work.

Things have gotten more complicated but the tools someone can bring to bear when working with data in XML format are awesome today.

That is why I named this blog Extremely XML. That, and because XML is not just a data file format. It is all the processing that can be brought to bear on information when it is stored in that format. That is the whole point of using the XML format to house it.