Table of Contents
One of the significant features of Wacs is it's ability to do automatic downloads of sets from a number of subscription web sites. The same mechanisms can also serve to do automatic updating of internet-facing production web sites from in-house content preparation systems when used in a commercial context.
The download system works on the basis of using the template details contained in the vendor database (see the section called “Vendor Manager” in Chapter 11, Other Web Based Tools), filling them in with a model's details from the Identity Map (IDMap) and firing off a web request for her model page to the remote web site. With the model page retrieved, the HTML of the web page is then parsed looking for links which match the template for either video or image set links on that site. Once matching links are found, they are added to a list of known sets for that model (download records). Additionally in due course we hope to promote the exchange of download records between people on the internet, so that they can find more sets by their favourite models. From a commercial web site owner's viewpoint, they can use the IDmap and vendor templating system to link to other web sites gaining a commission for referals from the other site.
Once a download record has been created for a specific set on a specific site, Wacs includes an automatic download tool that can use a quiet time of night to go fetch those sets from the remote server. Once collected from the remote site, the wacs model page and download list willl show their presence and allow you to unpack them using the unpack and placement manager much as described earlier for normal sets. The integration of the download mechanism allows additional clues such as set type and photographer to be extracted from the information given by the upstream site.
Over the time we've been developing Wacs, things have moved on and website owners have become very concerned about the issue of site scrapping - the sucking of all the content off their site by automated downloaders. This not only causes huge additional bandwidth usage and server load, but in some cases costs them real money if they have to pay for the bandwidth used from their servers. As content providers ourselves, we understand their pain, but at the same time having started out as merely avid collectors, we really want to know that we have everything we can get by the models we're interested in.
After initial private experiments with fully automatic downloading, we fairly rapidly progressed to a viewpoint of providing tools that focused only on pre-defined models because the automatic tools were both against the terms and conditions of most sites and because the quantity of material was simply too much to cope with. The public versions of Wacs have always been based on the idea of creating a model definition for those models you like and working from there.
Some recent developments have made it even harder to do automatic downloading with various techniques including extensive use of short-lived cookies and image capture challenges being used. Additionally our usual web browser of choice, firefox has given up on storing it's current cookie collection in a readable text file making it even harder to recover those vital cookies. As a result, starting with Wacs version 0.8.4, we've started a new policy of using web pages with links on as a way of guiding you through the download process. These give you links to what needs downloading next but including the human element to ensure the challenges and other tricks are seen and answered. The older command line based tools are still there and will remain usable for those sites that do not impose additional restrictions beyond username and password requests.
The topic of how the download mechanism is initially set up is rather too complex to discuss here and we would direct you to the Wacs developers list at our sourceforge site for more information and discussion on this. For now we're just going to give a brief overview of how the mechanism can be used.