Wednesday, July 23, 2014

How to Export a Github Wiki to PDFs and/or a Website

I recently needed to forward some documentation for work. This documentation had been done wisely in a wiki in Github. Unfortunately it was an enterprise account and we couldn't just give them a link. I just needed to put the important sections into a pdf or website and send it along.

Unfortunately I couldn't figure out how to do it.

After some pain and suffering I got it the way I wanted, and I'm sharing it here so I don't forget.

First clone the wiki by adding .wiki to the repository url:
git clone

I had to remove a bunch of files so that only a particular section was left so I first made a branch and did the removal there.
git checkout -b one_section_only

rm unimportant_sections*.md

Next up you'll need a recent version of pandoc. Now pandoc converts markdown files (like github's wiki pages) to  other markdown formats, or latex, or html, or some other things. I love latex. Thats another story. I needed something more easily viewable for the other company I was sending it to. My ubuntu 12.04 LTS didn't have recent enough so it didn't understand the github style markdown, but 14.04 worked OOTB.
sudo apt-get install pandoc

Now pandoc will conglomerate multiple files into a single file, but it just does it in the order received (alphabetical if you just give *.md as the input). This yielded a document that was impossible to follow as the pages are in a non-logical order. If it did a pdf with internal links and you could specify the order more easily that would have worked well for me. I dream of a recursive version that just finds local linked files and concatenates them with links between articles/pages. But there isn't anything like that that I could find. And I tried several different converters. Pandoc is good enough (and the best given my requirements).

You can convert each page to pdf separately and then the links work. You just need to tell the reader which file to start with. One problem with the pdf output is that it converts the markdown to latex then runs pdflatex. This results in some rather fine looking documents, but I ran into trouble with tables: they're put in as floats and the wiki depends on the tables appearing in place.

You can also convert to HTML5 which yeilds a local website which is a little nicer to navigate because you don't end up with 30 pdf reader instances running. The trouble there is that the links don't append .html to the target so you have to leave them named without a file extension which means you can't just double click to open in a browser. Also they're encoded in UTF-8 so you might have to set the browser to view in Unicode if something looks wrong (ours did).

I just made both pdf and html versions and sent them along. They haven't complained so it must have been useable.

Here's how to do the conversions:
find -name "*.md" -exec pandoc -o {}.pdf {} \;
mkdir doc
for file in *.md.pdf; do mv "$file" "doc/${file/.md/}";done

or for html:
find -name "*.md" -exec pandoc -o {}.html {} \;
mkdir doc
for file in *.md.html; do mv "$file" "doc/${file/.md.html/}";done

As I mentioned it has to strip the html extension for the links to work. I actually had to sed -i to fix a lot of the links as people had put "../data/page" as the link when they could just put "page". But it worked in the end. It took me a whole day too, ironing out the process. But we ended up with an easy to share file that has the same structure and organization as we'd been taking care to write into the wiki.

If you have a better way let me know!


Terry Burton said...

The following method uses filters to preserves links:

Terry Burton said...

The following method uses filters to preserves links: