Docbook

Docbook is a typesetting and layout tool for authors. Specifically, it is an XML schema intended to produce both print and electronic copies of text. Combined with any one of many powerful xml processors available, it achieves the goal of letting an author write once and publish to anything. It is widely used in technical documentation (such as the first few editions of Slackermedia itself), but also for works of fiction, academia, and more.

Strengths [Weaknesses]

Familiar

XML is vastly different than HTML, but the concept is very similar. If you are good with HTML, then XML will feel like the “pro” version of what you already know.

Strict

XML is rigid in what its processors accept, so there is an absolutism to how you structure your documents. This may help you organise information better, and it guarantees predictable output in the end. You won't spend time shifting indents and special meta-characters around in your text editor; you will spend time writing content within a well-structured framework.

Documented

XML is a long-standing format, and Docbook is a well-respected schema. Docbook is well documented on http://docbook.org and XML is so well-known that you can take classes on the subject.

[Ex]Portable

One your text is in XML format, it is structured and predictable. This probably means that if there is another format (html, epub, pdf, ps, plain text, rtf, odt, and so on) that you want to output to, you can convert to it from XML. There just isn't any ambiguity about XML, and heaps of post-processors.

Weaknesses [Strengths]

Complex

The process of creating well-formed XML is not simple. It is a very verbose format, it will fail at the smallest error, it enforces inheritance, and it requires some number of post-processors in order to get it out of the XML format.

Strict

Unlike markdown or HTML, XML is intolerant of any deviation from its defined schema. Something as simple as a missing closing tag will break the processor. There are tools, such as xmllint to help ensure well-formed XML, but it is not uncommon to attempt at least three builds before a successful one.

Style

The look of documents output from Docbook are clean and professional, but to change the look and feel of your output, you probably need to learn XSL. XSL can be complex, especially if you have only just learnt XML and how to process it.

Install

Docbook is not an application, but a schema, meaning that it is nothing more than a set of rules that you follow whilst writing text in any plain text editor of your choice. If you have ever used HTML, it's a little like that; you don't install HTML, you just write it, and other programmes bear the burden of interpreting it and processing it into a form for public consumption.

The docbook schema, along with a number of XML tools, comes pre-installed on Slackware.

To find where you schemas are located, use the find command:

find / -iname "*docbook*dtd*"

This reveals that the schemas are located in /usr/share/xml/docbook/xml-dtd-X.Y (where X.Y is a version number).

Quickstart

The best quickstart guide to Docbook is a short work by David Rugge, Mark Galassi, and Eric Bischoff and located at http://xml.web.cern.ch/XML/goossens/dbatcern/dbatcern.html.

Here is a basic summary, featuring a severely limited set of functionality:

Docbook Header

The Docbook header is a line of text at the top of a Docbook file which identifies the file as being an XML document following the Docbook schema, and points to where the schema's rules are located on your computer (or a networked location, if you have confidence in your network environment).

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

Docbook can format articles or books; so your header should match what you intend to write:

  • If writing a book, the header should include: DOCTYPE book PUBLIC
  • If you are writing an article, then: DOCTYPE article PUBLIC

You may never commit the header to memory unless you type it daily, so keep it someplace handy.

The structure of a Docbook file is inflexible. If you are writing an article, then the order of tags is defined by the article schema, and if you are writing a book then the order of tags is defined by the book schema. Any deviation from the schema rules will result in invalid XML and usually will be rejected by an XML processor.

The easiest way to learn what a particular schema demands is straight from the Docbook documentation, available online at http://www.docbook.org/tdgXY/en/html/docbook.html (where tdgXY is the version of Docbook that you are using).

A simple example of a Docbook file using the book schema:

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

<book>
  <title>
    Texto Ekzempla
  </title>

  <chapter>
    <title>
      Kiel Libera Programmaro Intigas Arton
    </title>

    <para>
      Elsxuti <ulink url="http://slackware.com">Linukso</ulink>, kaj provu gxin.
    </para>
  </chapter>
</book>

To some degree, it is intuitive as long as you know the tags that you have to work with, and the order of the basic skeleton structure. In the case of a book, the basic structure is:

  • book
  • book title
  • chapter
  • chapter title
  • paragraphs
  • (close chapter tag)
  • (close book tag)

This file, saved, is a valid Docbook file, but it is essentially source code.

Processing XML

The easiest XML processor to use is xmlto, an application run in the shell with the sole purpose of translating XML into any number of other formats.

For example, to convert the book into an html file:

$ mkdir html
$ xmlto html book.xml -o ./html 

Or into plain text:

$ xmlto txt book.xml -o book.txt

There are several ways to generate a PDF, but the most reliable tends to be the Apache Foundation's fop tool. Fop is written in Java, so it does require that the Java runtime is installed.

Apache fop

Install Java (called jdk; the Java Development Kit) from either the /extra directory on your install media, or from http://slackbuilds.org/repository/X.Y/development/jdk/ (where X.Y is the version of Slackware that you are running). As Java is currently owned and maintained by Oracle, it requires a EULA agreement that can only be satisfied with a GUI, so you must manually download the Java installer, but you can then run the SlackBuild installer separately so that you have Java properly logged in /var/log/packages.

Once JDK is installed, install fop from http://slackbuilds.org/repository/14.1/office/fop/?search=fop.

To create a PDF, first translate your Docbook file to fo with xmlto:

$ mkdir ./pdf
$ xmlto fo book.xml -o ./pdf
$ fop ./pdf/book.fo -o ./pdf/book.pdf

Wait for the document to process, and at the end you have, in the pdf directory that you created, a PDF file. Even the hyperlink is fully-functional, just like in a “real” PDF (because it is a real PDF, fully compliant with the spec).

Advanced Techniques

In large works, there is an advantage to keeping your writing modular. Structuring your work such that each chapter is an individual file allows your text editors to load and work on them faster, and makes rearranging the order of the chapters trivial.

If you choose to work modularly, you only need your Docbook header in the first file, and you should only close your <book> tag in the final file. For example:

  • 00.xml: docbook header + <book><title></title>
  • 01.xml: <chapter><title></title><para></para><para>..</para></chapter> (and so on)
  • 02.xml: <chapter><title></title><para></para><para>..</para></chapter> (and so on)
  • end.xml: </book>

The file end.xml can literally have nothing but one tag in it: </book> or, if you have a colophon or appendix, you can place them in your end matter; the point is to wrap the modular files in two extremities (00.xml and end.xml, for example) to ensure that the outermost docbook tags are included once and only once.

Concatenating and Building

Once you have all of your files ready, there is just one additional step to what you already know: you must concatenate all of your files into a temporary master file.

Assuming you have all of your files in a directory called xml:

$ cd ~/mybook/xml
$ cat 00.xml 01.xml 03.xml end.xml > tmp.xml
$ mkdir pdf
$ xmlto fo tmp.xml ./pdf 
$ fop ./pdf/tmp.fo ./pdf/mybook.pdf
$ trash tmp.xml

Makefiles

The entire build process can be automated with a Makefile, a kind of script for GNU make.

They have a specific syntax: keywordcolonrequired filesinstruction block

This becomes executable by typing make keyword.

A specific example:

 
# Makefile by myName

html: html
      cat ??.xml end.xml > tmp.xml
      xmlto html tmp.xml -o ./html

pdf: pdf
     cat ??.xml end.xml > tmp.xml
     xmlto fo tmp.xml -o ./pdf

Each block, somewhat translated:

[keyword]: [what file must exist to proceed]
[tab] The command to run.
[tab] Another command to run.

To use it, run make from the directory where your makefile exists:

$ cd mybook
$ ls -1 
xml
Makefile
$ mkdir html
$ make html

By scripting with make, the build process is simpler than building manually, and does not require you to remember commands or syntax.

R S Q