**This is an old revision of the document!**

Docbook

Docbook is a typesetting and layout tool for authors. Specifically, it is an XML schema intended to produce both print and electronic copies of text. Combined with any one of many powerful xml processors available, it achieves the goal of letting an author write once and publish to anything. It is widely used in technical documentation (such as the first few editions of Slackermedia itself), but also for works of fiction, academia, and more.

Strengths [Weaknesses]

Familiar

XML is vastly different than HTML, but the concept is very similar. If you are good with HTML, then XML will feel like the “pro” version of what you already know.

Strict

XML is rigid in what its processors accept, so there is an absolutism to how you structure your documents. This may help you organise information better, and it guarantees predictable output in the end. You won't spend time shifting indents and special meta-characters around in your text editor; you will spend time writing content within a well-structured framework.

Documented

XML is a long-standing format, and Docbook is a well-respected schema. Docbook is well documented on http://docbook.org and XML is so well-known that you can take classes on the subject.

[Ex]Portable

One your text is in XML format, it is structured and predictable. This probably means that if there is another format (html, epub, pdf, ps, plain text, rtf, odt, and so on) that you want to output to, you can convert to it from XML. There just isn't any ambiguity about XML, and heaps of post-processors.

Weaknesses [Strengths]

Complex

The process of creating well-formed XML is not simple. It is a very verbose format, it will fail at the smallest error, it enforces inheritance, and it requires some number of post-processors in order to get it out of the XML format.

Strict

Unlike markdown or HTML, XML is intolerant of any deviation from its defined schema. Something as simple as a missing closing tag will break the processor. There are tools, such as xmllint to help ensure well-formed XML, but it is not uncommon to attempt at least three builds before a successful one.

Style

The look of documents output from Docbook are clean and professional, but to change the look and feel of your output, you probably need to learn XSL. XSL can be complex, especially if you have only just learnt XML and how to process it.

Install

Docbook is not an application, but a schema, meaning that it is nothing more than a set of rules that you follow whilst writing text in any plain text editor of your choice. If you have ever used HTML, it's a little like that; you don't install HTML, you just write it, and other programmes bear the burden of interpreting it and processing it into a form for public consumption.

The docbook schema, along with a number of XML tools, comes pre-installed on Slackware.

To find where you schemas are located, use the find command:

find / -iname "*docbook*dtd*"

This reveals that the schemas are located in /usr/share/xml/docbook/xml-dtd-X.Y (where X.Y is a version number).

Quickstart

The best quickstart guide to Docbook is a short work by David Rugge, Mark Galassi, and Eric Bischoff and located at http://xml.web.cern.ch/XML/goossens/dbatcern/dbatcern.html.

Here is a basic summary, featuring a severely limited set of functionality:

Docbook Header

The Docbook header is a line of text at the top of a Docbook file which identifies the file as being an XML document following the Docbook schema, and points to where the schema's rules are located on your computer (or a networked location, if you have confidence in your network environment).

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

Docbook can format articles or books; so your header should match what you intend to write:

  • If writing a book, the header should include: DOCTYPE book PUBLIC
  • If you are writing an article, then: DOCTYPE article PUBLIC

You may never commit the header to memory unless you type it daily, so keep it someplace handy.

The structure of a Docbook file is inflexible. If you are writing an article, then the order of tags is defined by the article schema, and if you are writing a book then the order of tags is defined by the book schema. Any deviation from the schema rules will result in invalid XML and usually will be rejected by an XML processor.

The easiest way to learn what a particular schema demands is straight from the Docbook documentation, available online at http://www.docbook.org/tdgXY/en/html/docbook.html (where tdgXY is the version of Docbook that you are using).

A simple example of a Docbook file using the book schema:

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">

<book>
  <title>
    Restricting Less
  </title>

  <chapter id="one">
    <title>How Open Source Excludes No One</title>

    <para>
      Download it and try it
    </para>

And now to create a directory for the files, and process tmp.xml with xmlto:

p---------------------------------------------q 
|  bash$ mkdir ./html                         |
|  bash$ xmlto html tmp.xml -o ./html         |
|                                             |
b----------------------/fig 8. xmlto in action!

Obviously the syntax of xmlto is… xmlto - the command html - the type of output we want -o - the flag to tell xmlto where to dump the output files

And now if you navigate into the html folder you'll find a BUNCH of html files, and if you launch konqueror or some other web browser to that folder, then you'll see it lookin' all pretty and really nicely laid out and stuff.

For a pdf, the first and second steps are essentially the same; if you already have a concatenated tmp.xml then you can skip that step, and the second is similar:

p---------------------------------------------q 
|  bash$ mkdir ./pdf                          |
|  bash$ xmlto fo tmp.xml -o ./pdf            |
|                                             |
b----------------/fig 9. xmlto in action again!

WTF is an fo file? I don't know, but it's the intermediate step between raw unadulterated XML and a fancy hot-link-clickable PDF. It dumps out a tmp.fo in your ./pdf directory.

To get the tmp.fo into pdf, we use Apache's fop:

p-----------------------------------------------------q 
|  bash$ fop ./pdf/tmp.fo ./pdf/myBook_by_myName.pdf  |
|                                                     |
b-----------------------------------------/fig 10. fop!

And now in your ./pdf directory you have a really really cool pdf with a table of contents that is clickable, and text that can be copied and pasted, and all that good stuff, just like the pro's. Except, in our case, we didn't have to sell our souls to the evil that is Ad0be :^)

So, that's it, you're done. Oh, well, unless you want to take it to the next level. I mean, if you think you can handle it. Well, take a moment, think it over, and if you want this to be a really lean-and-mean docbook-wielding machine, gather your party and venture forth:

> The Makefile <--

This section is going to assume that you have compiled code from source before. If you have never done this, you should go learn how to do that and then return to this section. I think I have an episode of my podcast the “GNU World Order” on the subject. So, do what you have to do; learn it, install GCC or bin-utils or whatever it's called on your distro. If you're using Slackware or freeBSD, you already have that stuff installed.

So, Makefiles are basically litle scripts for GNU Make. They have a specific syntax, and are infinitely flexible, but we're gonig to keep it simple here because, well, that's all we need, plus I'm a Makefile noob.

The Makefile syntax is: some term → colon → instruction set

…which then becomes executable by tying make (term).

So create a text file in your editor and call it Makefile (capitalization counts) and try this:

p------------------------------------------------------------q
| # Makefile by myName                                       |
|                                                            |
| html: docbook.header *.docbook.xml                         |
|       cat docbook.header *.docbook.xml credits > tmp.xml   |
|       xmlto html tmp.xml -o ./html                         |
|                                                            |
b----------------------------/fig. 11. your very own Makefile!

So the line that starts with html is the target line, meaning that when you type make html, GNU Make looks at those files; if they are not present, it returns 1 (that is, it gets borked). Assuming everything's good, GNU Make continues and processes the next line, which is the cat line that generates tmp.xml, and then the xmlto command.

Try it:

p-----------------------q
|  bash$ make html      |
|                       |
b---------------/fig. 12!

And watch in amazement as your html files are generated with that one simple step. This is helpful largely because in real life you'll be making your html files a lot, as you find little layout errors here, or you update your book there, and so on.

You can do the same for your pdf generations:

p-------------------------------------------------------------q
| # pretend like the rest of the Makefile is right here       |
|                                                             |
| pdf: docbook.header *.docbook.xml                           |
|      cat docbook.header *.docbook.xml > tmp.xml             |
|      xmlto fo tmp.xml -o ./pdf                              |
|      fop ./pdf/tmp.fo ./pdf/docbook.pdf                     |
|                                                             |
b-------------------------------/fig. 13. More of the Makefile!

Same deal.

So, since it is kind of probable that you'll be running make a lot, the chances of you generating lots of little tmp.xml files and html files and stuff like that is great. It's quite helpful, and a feature of GNU Make, to be able to clean all that cruft out. This way you can always get back to your base state and feel confident that your make isn't failing because of some old file lying around.

This is usually done with “make clean” but I also like to implement a “make tidy”, where “tidy” will remove the little intermediary files like the tmp.fo and tmp.xml, and “clean” removes those PLUS the big main files like the html and pdf files and my custom .header file. I don't know that there is a canonical way to do this but here's what I do:

p-------------------------------------------------------------q
| # pretend like the rest of the Makefile is right here       |
|                                                             |
| tidy:                                                       |
|      -rm -f ./pdf/*.fo tmp.xml *.out                        |
|                                                             |
| clean:                                                      |
|      -rm -f ./pdf/*.fo ./pdf/.pdf tmp.xml *.out             |
|      -rm -f *.header                                        |
|      -rm -f html/*.html                                     |
|                                                             |
b----------------------------------/fig. 14 Makefile additions!

> Closing Thoughts <--