HEML - Human Editable Markup Language (yet another)


1  Presentation

1.1  Why HEML

XML files, while growing, can quickly become very difficult to edit using an average text editor or even a more advanced software development advanced editor. Graphical XML editors are very efficient to update a few attributes or elements but are not comfortable for long writing sessions.

Rich editors and modern word processing applications are very efficient to edit some well defined XML formats such as docbook and xhtml but are quite difficult to configure to any user format. Wiki formats (wkiktext, confluence, and others) are by designed efficient for on line ASCII editing then easy to handle with any text editor. But those formats are not as generic than XML and are not very helpful for anything else than document editing.

HEML design is inspired from wikitext but targets to be functionnaly equivalent to XML and allow editing of any kind of data, not only documents for human readers.

More than just a document format, combination of the include and table feature will help to migrate legacy data format (for now limited to ASCII format) to XML. Then the HEML processor performs as a revers XLS processor able to support easy translation from ASCII more or less delimited files to XML.

Because readable, it is as easy to track change, merge, than programming language source code, and software development tools can support HEML management, in particular build tools (make) and SCM (subversion, git, ...).

1.2  Yet another markup language (that is not YAML)

Main features are:

  • More concise, then less characters to type
  • As generic as XML: can express rich structured (hierarchical) information in a same manner.
  • Handles simple layouts (paragraph, indents and bullets) as wiki text.
  • Big documents are still readable directly from ASCII tools. Consider using your preferred syntax highlighter to gain more readability (a vim syntax file example is include in source HEML distribution).
  • 1.3  Drawbacks

    While processing, some layout information may be lost. Then, after automatic processing, it is not always possible to restore a document that is still optimised for human edition.

    In a same way, some layout and special features are lost when translating document to XML format and the exact reverse process can't be realized.

    Those two drawbacks make HEML a human editable Only markup language.

    2  Java implentation

  • Download.
  • API documentation.
  • Browse source code.
  • Issue tracking.
  • 2.1  Implementation status

  • HEML to XML command line filter.
  • SAX like callback API: use HEML as your native data format.
  • Full changelog is available from EGPI tasks and issues management.

    2.2  TODO

  • Improve error handling and reporting.
  • Special characters handling
  • Improve HEML writer (try to get closer to some reversible HEML/XML translation)
  • Android port including XSL processor to turn any android device to a rich notepad producing publishable structured data.
  • Advanced tables: add fixed size field without delimiter and binary file support to the table feature. This will turn HEML to a generic legacy data format to XML converter.
  • 3  More features, XML mapping and examples

    3.1  Elements and attributes

    HEML XML
    {root
    	{elem %attr=value}
    	{elem %attr=value %attr2=value2
    		{subelem text}
    	}
    }
    		
    <root>
    	<elem attr="value"/>
    	<elem attr="value" attr2="value2">
    		<subelem>text</subelem>
    	</elem>
    </root>
    		

    3.2  Paragraphs, indentations and bullets

    HEML XML
    {section %title=paragraph layout example
    paragraph
    paragraph
    - bullet
    - bullet
    	- sub bullet
    }
    		
    <section title="paragraph layout example">
    	<p>paragraph</p>
    	<p>paragraph</p>
    	<li>bullet</li>
    	<li>bullet</li>
    	<ul>
    		<li>bullet</li>
    	</ul>
    </section>
    		

    3.3  Parser commands

    Element's names starting with '?' character are handled as parser command. Currently supported commands are:

  • ?set : changes parser settings, possible related attributes are:
    • %encoding: specify which character encoding to use to parse the file.
    • %tab: defines the character count for one tab character while text computing indentation levels (see paragraph layout examples above).
  • ?include: includes (and parse) another HEML file in place of the include command. File to include is specified using the %src attribute with relative path or absolute URL.
  • ?table: includes or defines in-line tabular data (that is ASCII delimited data such as CSV format). More table examples are provided below.
  • 3.4  Includes

    HEML XML

    FileA.heml:

    {section %title=include example
    paragraph
    {?include %src=FileB.heml}
    paragraph
    }
    		

    FileB.heml:

    included paragraph
    included paragraph
    		
    <?xml version="1.0"?>		
    <section title="include example">
    	<p>paragraph</p>
    	<p>included paragraph</p>
    	<p>included paragraph</p>
    	<p>paragraph</p>
    </section>
    		

    3.5  Tables

    The HEML table feature let concise tabular data writing and also import legacy CSV files. Some attributes are available to configure the parser behavior to handle the table content:

  • %encoding: name of character encoding to use when reading imported table. This parameter must be be use in combination with %src attribute.
  • %fields: name of the fields container elements or attributes. When only one name is provided, all fields have the same name unless %style is set to "attr". Default field name is "td".
  • %fieldSep: list of characters to consider as field separator in the following table definition. Default separator character is "%".
  • %record: name of the record container elements. Default record name is "tr".
  • %recordSep: list of characters to consider as record separator in the following table definition. Default separator character is "\n" (carriage return).
  • %src: if set, its value is used as path or URL to specify file to import as table content. Default is undefined, then table content is expected.
  • %style: if value is "attr", fields are expanded as record element's attribute, other wise as sub elements. Default behaviour is to create sub elements.
  • %token: when set to "true", fields are extracted like StringTokenizer java class does. E.g. Adjacent separator characters are considered as a single one. Default value of this attribute is "false".
  • %trim: when "true", leading and ending blank characters (space, tabs, ...) are removed from extracted fields. Default value of this attribute is "true".
  • HEML XML
    {?table %record=tableRowName %fields=field1,field2,field3
    un		% un1		% un2
    deux	% deux1	%	% deux3
    trois	% trois1	% trois2 % trois3
    }		
    		
    <tableRowName>
    	<field1>un</field1>
    	<field2>un1</field2>
    	<field3>un2</field3>
    </tableRowName>
    <tableRowName>
    	<field1>deux</field1>
    	<field2>deux1</field2>
    	<field3/>
    	<f1>deux3</f1>
    </tableRowName>
    <tableRowName>
    	<field1>trois</field1>
    	<field2>trois1</field2>
    	<field3>trois2</field3>
    	<f1>trois3</f1>
    </tableRowName>
    		
    {?table %record=tableRowName %style=attr %fields=field1,field2,field3
    un		% un1		% un2
    deux	%% deux2		% deux3
    trois	% trois1	% trois2 % trois3
    quatre	% quatre1
    }
    		
    <tableRowName field1="un" field2="un1" field3="un2"/>
    <tableRowName field1="deux" field2="" field3="deux2">deux3</tableRowName>
    <tableRowName field1="trois" field2="trois1" field3="trois2">trois3</tableRowName>
    <tableRowName field1="quatre" field2="quatre1"/>		
    		
    {?table %fieldSep=" \t" %token=true
    a	b	c	d		e
    1	        2	3	4
    }
    		
    <tr>
    	<td>a</td>
    	<td>b</td>
    	<td>c</td>
    	<td>d</td>
    	<td>e</td>
    </tr>
    <tr>
    	<td>1</td>
    	<td>2</td>
    	<td>3</td>
    	<td>4</td>
    </tr>		
    		

    From one HEML file:

    {?table %src=test.csv %fieldSep=;}
    		

    Included csv file:

    1;2;3;4;5;
    ;;;;;
    ;;III;;V;
    a;b;c;d;e;f
    		
    <tr>
    	<td>1</td>
    	<td>2</td>
    	<td>3</td>
    	<td>4</td>
    	<td>5</td>
    </tr>
    <tr/>
    <tr>
    	<td/>
    	<td/>
    	<td>III</td>
    	<td/>
    	<td>V</td>
    </tr>
    <tr>
    	<td>a</td>
    	<td>b</td>
    	<td>c</td>
    	<td>d</td>
    	<td>e</td>
    	<td>f</td>
    </tr>		
    		

    3.6  Miscellaneous features

  • comments: text between {# and #} is ignored by the parser.
  • as-is copy: text between {! and !} is not interpreted and forwarded unchanged to the output inside au <pre> element.
  • escape character: any character following \ is forwarded to output as-is. Useful to include some of the HEML control char such as { into one element's text.
  • SAX-like callback API: the current java implementation can be used as a library and programmer can define its own HEML handler by implementing the ParserCallback interface.
  • See this page's HEML source file to see a complete use example.
  • syntax highlight: markup is simple then syntax highlighting configuration should be easy for common editors. A syntax highlight vim script is included in HEML parser source distribution as an example.
  • 4  Similar formats discussion

    4.1  Wiki markups

    Wiki markups are designed to allow easy in line edition from a simple text field. But the purpose is first to support documents writing. Trying to edit any kind of data like with XML is probably possible but leads to some kind of brainfuck like naming and coding convention above the standard markups to make any data processable.

    See Wiki markup wikipedia page to know more about the many wiki markup languages.

    4.2  YAML

    YAML's basic purpose is very similar to HEML's one: enabling easy manual data edition. But it's primary focus is data, and promote lists to handle data. May be I didn't realized what YAML is before I started coding HEML. I just found YAML was not what I searched and hope HEML is more convenient particularly for documents writing. At least it is the case for me!

    See also YAML home page.

    4.3  txt2tags

    Txt2tags can be view as wiki markup precursor and suffers the same limitation when used to edit data that are not readable documents.

    See also txt2tags home page.

    4.4  JSON

    JSON features are very close to YAML and like YAML seems more focused to complex data serialization support and not best suited to direct and manual document edition. It is directly related to javascript, even it is usable with any programming language thanks to the many libraries available.

    See also JSON home page.

    4.5  XML

    Of course, you can edit directly XML, HTML, XHTML... But it's quite boring because of its heavy markup (opening and ending tags, < and > for each, quotes for attributes). HEML is the same but requires fewer markups then less characters to type. Moreover, paragraph layout feature inspired from wiki markups and txt2tags keeps HEML document accessible to the human eye.

    Mon, 08 May 2017 13:01:12 +0200
    Donate: 1HjZSXcwk6eHfJsvUA2hd6nVowimJrjdwp