Hebbut.Net Public Download Area

Public Offerings by Ger Hobbelt - available for download

PAVUK web spider

pavuk ( http://www.pavuk.org/ ) is a very sophisticated web spider, with some very nice configuration options. PAVUK goes on where wget et al throw the towel in the ring. ;-)

The Goals and Problems

Some web site owners will force you to always visit their sites and in a particular web page visiting order at that. This is not done to rub you the wrong way, but to allow them to maximize their ad revenue (this site costs money; theirs does too. They have faster connections, which cost more...).

There are also a lot of sites which are not so much into it for the ad revenue (though they might like that benefit as well) but they rather want you to visit them when you for information. This is intended as part service (you will always have access to the most up-to-date information and documentation, e.g. in the case of wiki-based sites) and as part customer tracking: your visits are logged and we can see what our customers visit most. This can drive both sales and development efforts, which, ultimately, should benefit you, as a (potential) customer, as well.

Nevertheless, you may find yourself in a situation where you do not have Internet access, yet need access to the 'on line' information anyhow. (One prime example is when you work at a tightly secured site, e.g. at many DoD contractors (or your local military equivalent). It has happened to me and it will probably happen again: a workplace, laptop allowed, but definitely no (LAN) network access for such rogue equipment, plus no Internet access (or possibly severely restricted access) from your on site equipment at all.

Then having only on-Net documentation for your software packages is a serious setback. pavuk can help alleviate this issue: grab a (partial) site copy at home for off-line use, carry it along with you and now benefit from having off-Net access to the same information while you work.

For these and a myriad of other web site mirroring and off-line storing tasks, pavuk is a prime solution: it can handle complex, dynamic web sites with ease. And when you run into extremely complex issues, pavuk very often has a fitting answer to those.


The on-line HTML manual for the latest CVS head version of pavuk is available here as well.



Each release comes with an up-to-date NEWS and ChangeLog text document in each 'src' archive. Some of these are also provided on-line for immediate perusal; see the 'Notes' column in the download table.

Test Pages

These URLs can be used to test specific pavuk features.

This includes some of the tests that come with pavuk itself and are run by the make check command — which you run once


has completed.

These test pages include (yes, there's sure to be more than just these!):


Note: the 7z (7zip) downloads are strongly advised as those will be faster to download (and smaller too). For anyone who does not want to use 7zip for compelling religious reasons (like alarmingly restrictive enforced company policies), we provide a .tar.bz2 file as an alternative for some files. Click on the "bz2" links instead to download these - when available.


Downloadable archives / files

Compiles on:
Files / Archives Date/Time Quality Notes

Microsoft Windows

Microsoft Windows pavuk-0.9.36cvs-20071108-win32-bin.7z

2007-11-08 Production

The console application only (GTK is still b0rked anyway).

Microsoft Windows

Microsoft Visual Studio 2005 SP1 Redistributable Package

You'll need to install this if you cannot run the compiled executables on your Windows platform:

Microsoft Windows (Win32: 32-bit Windows) Microsoft.Visual.Studio.2005.SP1.Redistributable.Package.(x86).exe

64-bit Microsoft Windows (Intel Itanium) (IA64 / Itanium: 64-bit Windows) Microsoft.Visual.Studio.2005.SP1.Redistributable.Package.(IA64).exe

65-bit Microsoft Windows - AMD64 (x64 / AMD64: 64-bit Windows) Microsoft.Visual.Studio.2005.SP1.Redistributable.Package.(x64).exe

2008-05-24 Production Mirror from Microsoft.