PAVUK web spider: grabbing JavaScript-generated content with embedded URLs - test pages

pavuk ( http://www.pavuk.org/ ) is a very sophisticated web spider, with some very nice configuration options. PAVUK goes on where wget et al throw the towel in the ring. ;-)

Test Pages

These URLs can be used to test specific pavuk features.

This includes some of the tests that come with pavuk itself and are run by the make check command — which you run after

./configure
make

have completed.

Note

If you want to run all tests at once, but prevent pavuk from spidering the complete site (which I'd rather frown upon, if you get my drift), you may find it useful to know that these tests are all located inside this directory, so you can easily achieve this by specifying the URL of this page as the starting URL for grabbing/testing the test cases, while specifying the additional '-dont_leave_dir' command line option to prevent pavuk from traveling outside this directory.

Test case #1: a Chinese web forum

Test URL: a Chinese webforum (chinese_bbs_test1.html)

 

Other test cases

We offer additional test cases for use with pavuk ('make check') on this site. These include: