Offline extraction of a WordPress site

I want a offline browseable static version of my wordpress website to be able to put it on USB or upload to a backup static location. I searched some wordpress plugins to do that and wp2static seemed very promising. But it turned out disappointing (version 6.1) because of many flaws in the crawler (many url were missed) and in the ways url are rendered as it is mainly intended to output with a full target URL (relative URLs are really not working at all). I tried a bit to patch the plugin but the code  was too difficult to understand and modify. So I decided to use a tool outside wordpress, the well known httrack I used years ago.

Offline CSS

Some of the features of the site are not available or relevant in an offline version of the wordpress site, like comments, search box, google translate, google gallery…  So I will hide them with custom CSS added in my theme :

That only requires the ‘offline’ class to be added to the <body> main tag. This function is not available in httrack out of the box and that is the purpose of the additions below.

Method 1 : a postprocessing plugin

httrack gives you the opportunity to add plugins to enhance the main behaviour. That is exactly what we want !

Here is the code to add the offline class :

Be careful to include in the path of your file in your LD_LIBRARY_PATH (or launch httrack as written in the header of the C file).

Note that hts_free(old) crashes and after struggling a little I had to comment it out. It results in an awful memory leak, but it is not really too annoying for my use.

 

Method 2 : a simple sed script

The method above is finally a little bit tiresome, so I decided to use sed to add the offline class (‘s/<body class=”/\0offline /’) and a simple shell script that is more convenient to modify and deploy. The script below will do all the work to automate a zipped offline version.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Close Menu