**This is an old revision of the document!**

Docbook

Docbook is a typesetting and layout tool for authors. Specifically, it is an XML schema intended to produce both print and electronic copies of text. Combined with any one of many powerful xml processors available, it achieves the goal of letting an author write once and publish to anything. It is widely used in technical documentation (such as the first few editions of Slackermedia itself), but also for works of fiction, academia, and more.

Strengths [Weaknesses]

Familiar

XML is vastly different than HTML, but the concept is very similar. If you are good with HTML, then XML will feel like the “pro” version of what you already know.

Strict

XML is rigid in what its processors accept, so there is an absolutism to how you structure your documents. This may help you organise information better, and it guarantees predictable output in the end. You won't spend time shifting indents and special meta-characters around in your text editor; you will spend time writing content within a well-structured framework.

Documented

XML is a long-standing format, and Docbook is a well-respected schema. Docbook is well documented on http://docbook.org and XML is so well-known that you can take classes on the subject.

[Ex]Portable

One your text is in XML format, it is structured and predictable. This probably means that if there is another format (html, epub, pdf, ps, plain text, rtf, odt, and so on) that you want to output to, you can convert to it from XML. There just isn't any ambiguity about XML, and heaps of post-processors.

Weaknesses [Strengths]

Complex

The process of creating well-formed XML is not simple. It is a very verbose format, it will fail at the smallest error, it enforces inheritance, and it requires some number of post-processors in order to get it out of the XML format.

Strict

Unlike markdown or HTML, XML is intolerant of any deviation from its defined schema. Something as simple as a missing closing tag will break the processor. There are tools, such as xmllint to help ensure well-formed XML, but it is not uncommon to attempt at least three builds before a successful one.

Style

The look of documents output from Docbook are clean and professional, but to change the look and feel of your output, you probably need to learn XSL. XSL can be complex, especially if you have only just learnt XML and how to process it.

Install

Docbook is not an application, but a schema, meaning that it is nothing more than a set of rules that you follow whilst writing text in any plain text editor of your choice. If you have ever used HTML, it's a little like that; you don't install HTML, you just write it, and other programmes bear the burden of interpreting it and processing it into a form for public consumption.

The docbook schema, along with a number of XML tools, comes pre-installed on Slackware.

To find where you schemas are located, use the find command:

find / -iname "*docbook*dtd*"

This reveals that the schemas are located in /usr/share/xml/docbook/xml-dtd-X.Y (where X.Y is a version number).

Quickstart

The best quickstart guide to Docbook is a short work by David Rugge, Mark Galassi, and Eric Bischoff and located at http://xml.web.cern.ch/XML/goossens/dbatcern/dbatcern.html.

Here is a basic summary, featuring a severely limited set of functionality:

Docbook Header

The Docbook header is a line of text at the top of a Docbook file which identifies the file as being an XML document following the Docbook schema, and points to where the schema's rules are located on your computer (or a networked location, if you have confidence in your network environment).

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

Docbook can format articles or books; so your header should match what you intend to write:

  • If writing a book, the header should include: DOCTYPE book PUBLIC
  • If you are writing an article, then: DOCTYPE article PUBLIC

You may never commit the header to memory unless you type it daily, so keep it someplace handy.

The structure of a Docbook file is inflexible. If you are writing an article, then the order of tags is defined by the article schema, and if you are writing a book then the order of tags is defined by the book schema. Any deviation from the schema rules will result in invalid XML and usually will be rejected by an XML processor.

The easiest way to learn what a particular schema demands is straight from the Docbook documentation, available online at http://www.docbook.org/tdgXY/en/html/docbook.html (where tdgXY is the version of Docbook that you are using).

A simple example of a Docbook file using the book schema:

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

<book>
  <title>
    Texto Ekzempla
  </title>

  <chapter>
    <title>
      Kiel Libera Programmaro Intigas Arton
    </title>

    <para>
      Elsxuti <ulink url="http://slackware.com">Linukso</ulink>, kaj provu gxin.
    </para>
  </chapter>
</book>

To some degree, it is intuitive as long as you know the tags that you have to work with, and the order of the basic skeleton structure. In the case of a book, the basic structure is:

  • book
  • book title
  • chapter
  • chapter title
  • paragraphs
  • (close chapter tag)
  • (close book tag)

This file, saved, is a valid Docbook file, but it is essentially source code.

Processing XML

The easiest XML processor to use is xmlto, an application run in the shell with the sole purpose of translating XML into any number of other formats.

For example, to convert the book into an html file:

$ mkdir html
$ xmlto html book.xml -o ./html 

Or into plain text:

$ xmlto txt book.xml -o book.txt

There are several ways to generate a PDF, but the most reliable tends to be the Apache Foundation's fop tool. Fop is written in Java, so it does require that the Java runtime is installed.

Apache fop

Install Java (called jdk; the Java Development Kit) from either the /extra directory on your install media, or from http://slackbuilds.org/repository/X.Y/development/jdk/ (where X.Y is the version of Slackware that you are running). As Java is currently owned and maintained by Oracle, it requires a EULA agreement that can only be satisfied with a GUI, so you must manually download the Java installer, but you can then run the SlackBuild installer separately so that you have Java properly logged in /var/log/packages.

Once JDK is installed, install fop from http://slackbuilds.org/repository/14.1/office/fop/?search=fop.

To create a PDF, first translate your Docbook file to fo with xmlto:

$ mkdir ./pdf
$ xmlto fo book.xml -o ./pdf
$ fop ./pdf/book.fo -o ./pdf/book.pdf

Wait for the document to process, and at the end you have, in the pdf directory that you created, a PDF file. Even the hyperlink is fully-functional, just like in a “real” PDF (because it is a real PDF, fully compliant with the spec).

Advanced Techniques

In large works, there is an advantage to keeping your writing modular. Structuring your work such that each chapter is an individual file allows your text editors to load and work on them faster, and makes rearranging the order of the chapters trivial.

If you choose to work modularly, you only need your Docbook header in the first file, and you should only close your <book> tag in the final file.

> The Makefile <--

This section is going to assume that you have compiled code from source before. If you have never done this, you should go learn how to do that and then return to this section. I think I have an episode of my podcast the “GNU World Order” on the subject. So, do what you have to do; learn it, install GCC or bin-utils or whatever it's called on your distro. If you're using Slackware or freeBSD, you already have that stuff installed.

So, Makefiles are basically litle scripts for GNU Make. They have a specific syntax, and are infinitely flexible, but we're gonig to keep it simple here because, well, that's all we need, plus I'm a Makefile noob.

The Makefile syntax is: some term → colon → instruction set

…which then becomes executable by tying make (term).

So create a text file in your editor and call it Makefile (capitalization counts) and try this:

p------------------------------------------------------------q
| # Makefile by myName                                       |
|                                                            |
| html: docbook.header *.docbook.xml                         |
|       cat docbook.header *.docbook.xml credits > tmp.xml   |
|       xmlto html tmp.xml -o ./html                         |
|                                                            |
b----------------------------/fig. 11. your very own Makefile!

So the line that starts with html is the target line, meaning that when you type make html, GNU Make looks at those files; if they are not present, it returns 1 (that is, it gets borked). Assuming everything's good, GNU Make continues and processes the next line, which is the cat line that generates tmp.xml, and then the xmlto command.

Try it:

p-----------------------q
|  bash$ make html      |
|                       |
b---------------/fig. 12!

And watch in amazement as your html files are generated with that one simple step. This is helpful largely because in real life you'll be making your html files a lot, as you find little layout errors here, or you update your book there, and so on.

You can do the same for your pdf generations:

p-------------------------------------------------------------q
| # pretend like the rest of the Makefile is right here       |
|                                                             |
| pdf: docbook.header *.docbook.xml                           |
|      cat docbook.header *.docbook.xml > tmp.xml             |
|      xmlto fo tmp.xml -o ./pdf                              |
|      fop ./pdf/tmp.fo ./pdf/docbook.pdf                     |
|                                                             |
b-------------------------------/fig. 13. More of the Makefile!

Same deal.

So, since it is kind of probable that you'll be running make a lot, the chances of you generating lots of little tmp.xml files and html files and stuff like that is great. It's quite helpful, and a feature of GNU Make, to be able to clean all that cruft out. This way you can always get back to your base state and feel confident that your make isn't failing because of some old file lying around.

This is usually done with “make clean” but I also like to implement a “make tidy”, where “tidy” will remove the little intermediary files like the tmp.fo and tmp.xml, and “clean” removes those PLUS the big main files like the html and pdf files and my custom .header file. I don't know that there is a canonical way to do this but here's what I do:

p-------------------------------------------------------------q
| # pretend like the rest of the Makefile is right here       |
|                                                             |
| tidy:                                                       |
|      -rm -f ./pdf/*.fo tmp.xml *.out                        |
|                                                             |
| clean:                                                      |
|      -rm -f ./pdf/*.fo ./pdf/.pdf tmp.xml *.out             |
|      -rm -f *.header                                        |
|      -rm -f html/*.html                                     |
|                                                             |
b----------------------------------/fig. 14 Makefile additions!

> Closing Thoughts <--