This test page contains Chinese text and a lot of JavaScript-generated HTML, which should be processed properly by pavuk.
As the JavaScript generates whole sections of HTML per call, containing multiple hyperlinks, this is not a simple 'pavuk -js_pattern' job. As the hyperlinks all point to dynamic pages (the original had an ASP backend; I use some crude PHP here) and the content is rather dependent on the query parameters passed along in the URL, it is quite hard to make an off-line copy of such a site as the JavaScript approach prevents pavuk from easily rewriting the grabbed page(s) and thus filling in URLs which would actually work in an off-line, static environment. For the rewriting to succeed, we need the advanced features available through 'pavuk -fnrules' and us copy&pasting large sections of the JavaScript.
Really, we are balancing on the current bleeding edge of pavuk web grabbing technology, and one should seriously wonder if this kind of site isn't rather better (and easier) fetched by a HTML rendering engine based grabber. Which also points us into the direction pavuk might go, now that we see more and more AJAX-like websites pop up on the Net.
Each link points to a separately generated tiny page, which loads a small image. In the end, all those image files should be available in your local store as it should be fetched by pavuk, given the following command line:
pavuk -dont_leave_dir
pages, etc. index:
We offer additional test cases for use with pavuk ('make check') on this site. These include:
(c) Copyright 2001-2009, Gerrit E.G. Hobbelt (Ger Hobbelt a.k.a. [i_a] ) - Hebbut.Net