Stuff related to XML, XSLT, XPath etc.,

Xembly, an Assembly for XML

I use XML in almost every one of my projects. And, despite all the fuss about JSON/YAML, I honestly believe that XML is one of the greatest languages ever invented. Also, I believe that the beauty of XML reveals itself when used in combination with related technologies.


Introducing JSON++, with powerful new features for modern development needs.

Problem 1: JSON doesn’t support the kinds of data which developers need to send over the wire:

1a) JSON++ supports many new kinds of fields, including date/time, boolean, binary data and different sizes of integer.

1b) JSON++ allows you to embed arbitrary text blocks without needing to escape everything in the block.

1c) JSON++ has comments, finally!

1d) JSON++ supports metadata for fields, used to specify e.g. the language used for a string, units of measure for numbers, etc.

1e) JSON++ enforces a standard serialization format so there is never miscommunication between client and server. Problem 2: Validation needs to be done the same way on the client and on the server

2a) JSON++ has an associated schema definition language. The schema supports all the new types above, PLUS constraints such as regex patterns for strings and min/max for integers.

2b) JSON++ schema lets you define more complex arrangements than just a simple collection of fields. Improvements include required fields, choices, and lists with min/max length. Problem 3: JSON documents are being stored in databases and need to be queried in a consistent manner

3a) JSON++ has a standard syntax for identifying and linking to particular sub-objects and sub-fields in large JSON documents.

3b) JSON++ has an associated SQL-like query language for querying large JSON documents.

3c) JSON++ has a standard streaming parser protocol so that large documents can be parsed without needing to load the whole thing into memory.

3d) JSON++ provides out-of-the-box support for most major relational databases, so that you can use JSON++ queries from within the database, and mix-and-match relation data and JSON++ data.

Problem 4: There are so many similar-but-different JSON schemas around. Developers need to convert between them without having to write special code.

4a) JSON++ has a declarative (code-free) tool for converting JSON data from one schema to another. Sound awesome? Want to try it out?

JSON++ is available right now! For more information, see bit.ly/JsonPlusPlus

XPath is actually pretty useful once it stops being confusing

HTML-XML-utils can be a nice command line utility for parsing out content from HTML files. Much better than writing custom beautifulsoup script everytime.

XML coreutils

xmlto – is a shell-script tool for converting XML files to various formats. At the present it supports conversion from docbook, xhtml1 and fo format to various output formats (awt, fo, htmlhelp, javahelp, mif, pdf, svg, xhtml, dvi, html, html-nochunks, man , pcl, ps, txt, xhtml-nochunks, epub).

The essence of XML [PDF]

I found these the other day and I wonder how these have largely slipped under the radar. Of particular interest is hxpipe and hxunpipe which makes “scraping” tasks absurdly easy, by converting html to a form easily manipulatable by sed, grep and other fun unix utilities. update: tracking the score of this post on the front page using this: – src

curl -s https://news.ycombinator.com/news | hxnormalize | hxpipe | grep -C 20 "TheZenPsycho" | grep points
-9\n                    points