HTMLDOC


HTMLDOC is a previously commercially developed open-source program that converts HTML and Markdown web pages and files to EPUB, indexed HTML, PostScript, and PDF files, complete with a table of contents. HTMLDOC can be used from the command line, a simple GUI, or from a web server. Development originally occurred through the author's now-defunct business, Easy Software Products, and now continues on the author's personal web site.

Features and limitations

HTMLDOC 1.9 supports most of HTML 3.2 with some elements of HTML 4.01, it has limited support for Unicode and no support for CSS and PDF forms.
HTMLDOC 1.9 supports the following character sets: Windows-874, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-14, ISO-8859-15, KOI8-R; you cannot mix characters from different code pages. There is no support for CJK and Arabic characters, and support for ISO-8859-13 is missing. Support for UTF-8 is limited mainly to Western, Latin-alphabet-based, left-to-right-written languages. HTMLDOC 1.9 uses several proprietary processing instructions for formatting the pdf output, these use the syntax of the HTML comments.
There are no plans for introducing the CSS support or broader Unicode support.

License and availability

Licensed under the terms of the GNU General Public License version 2. It is legal to compile the sources and distribute the program, and various versions can be found on the Internet. For example, HTMLDOC is included as part of the Debian operating systems.