Board Thread:Policies/@comment-18172179-20170103063927/@comment-3020860-20170114001651

FeRDNYC wrote: Lawrence King wrote: If someone has the software skills, I would suggest that they write a script that follows all the APP/APN links like a spider-bot, and saves the results. I can actually offer up a little algorithmic trickery that'd save you (meaning "whoever") the page crawl [...]

We can easily get a list of all pages which contain APNs, that's just the list of template transclusions: Special:WhatLinksHere/Template:Apn?hidelinks=1&hideredirs=1. Scanning all of those pages and pulling out all of the strings would give you... let's call it the "Master APN Node List" for the entire wiki, a massive list of four-tuples of the form (Character,Prev,Appearance,Next) — each one representing a call to , found on the individual issue page  Appearance.

Filter that master list down to any one unique Character, and you have that Character's APN Node List. A quick parsing function that knows how to connect up Appearances via their corresponding Next and Prev entries can at least make a first attempt at sorting the list into a coherent timeline.

So, I started playing around with how to actually do this, and I appear to have hit a snag at the first step.

I know that Special:Export can be used to dump the source for a list of pages, so that the code could be parsed for instances of. And Export even includes a handy tool to automatically dump the list of pages in a Category, though that list can also be pulled (and presumably is pulled, by Export) using an API call to Articles/List.

But I'm not seeing any (good) way to pull the equivalent of Special:WhatLinksHere/Template:Apn via either the API or Export. Attempting to Export that page results in an empty export, containing no list items. And there doesn't appear to be any way, using the API, to get the equivalent of Articles/List for the transclusions of a Template, rather than a Category.

It does seem to be possible to web-scrape the list; retrieving http://dc.wikia.com/wiki/Special:WhatLinksHere/Template:Apn?limit=1000&hideredirs=1&hidelinks=1 returns what appears to be the first 1000 pages. (It throws up a nice warning about adblocking, when the output file is loaded locally, but the content is still there and can be parsed out, or that adblock-warning JavaScript disabled). So, that might be an option, though as it involves parsing the HTML output for the page list it's not an ideal one.

Is it possible to pull the list, or even better the source, of all pages which transclude Template:Apn directly? Or do we need to add an autocategorization to the template, so that it adds all of the pages where it's called to in order to identify them? (And going that route, would all of those pages have to be touched afterward, to fully populate the Category, or would that happen automatically? Could the Category update be forced, either by a user or an Administrator?)

If we did have a fully-populated, then the rest of the work of collecting APNs should be fairly straightforward, but right now I'm not seeing a way to do what's needed without it. I can easily add that categorization to Template:Apn, but I wanted to get some input on (a) whether that's actually necessary, and/or (b) whether it would really work. (If the pages all have to be visited afterwards, there's kind of no point.)