Board Thread:Policies/@comment-18172179-20170103063927/@comment-3020860-20170110194415

Lawrence King wrote: If someone has the software skills, I would suggest that they write a script that follows all the APP/APN links like a spider-bot, and saves the results. Then after the APP/APN tempates are deleted, the saved results could be used to generate for each character an "appearances in canon order" list. Of course, these pages would be incomplete and imperfect, but since this is a wiki, people could then edit them and make them better. I can actually offer up a little algorithmic trickery that'd save you (meaning "whoever") the page crawl — which I wouldn't recommend as a method, anyway, since there'd be no way for such a spider to handle loops or disconnections. (It could detect them, sure, but then what? If you're walking a list and hit an appearance with no "Previous" or "Next" link, depending on the direction you're traveling, where do you go next? How would a bot even begin to figure that out? How would a human? APNs aren't a single continuous chain, they're a vast collection of small, unconnected chains which begin and end at unknown locations.) But spidering the APN links shouldn't be necessary anyway.

We can easily get a list of all pages which contain APNs, that's just the list of template transclusions: Special:WhatLinksHere/Template:Apn?hidelinks=1&hideredirs=1. Scanning all of those pages and pulling out all of the strings would give you... let's call it the "Master APN Node List" for the entire wiki, a massive list of four-tuples of the form (Character,Prev,Appearance,Next) — each one representing a call to , found on the individual issue page  Appearance.

Filter that master list down to any one unique Character, and you have that Character's APN Node List. A quick parsing function that knows how to connect up Appearances via their corresponding Next and Prev entries can at least make a first attempt at sorting the list into a coherent timeline.

In the ideal case, theoretically, you end up with a single unbroken chain of Appearances that stretches from (First appearance) to (Death).

Or, more typically, from (First appearance) to (Death) to (Resurrection) to (Death) to (Revealed as clone) to (Original rescued) to (Depowered) to (Lost in parallel dimension) to (Shocking return) to (Rebooted) to (Hated out of existence by new writer). #BecauseComics

More commonly, I strongly suspect, what you'd end up with is a collection of "timeline chunks" — clusters of chained appearances that can internally be sorted into a section of timeline, but aren't connected to each other. The disconnections may represent anything from missing nodes, to holes in the chronology, areas of conflicting chronology, or even simple wiki editor error. This is also the point where any actual disjoint linkings (AppearanceA.Next is AppearanceB, but AppearanceB.Prev is AppearanceC — O NOES!) would be revealed.

Regardless the nature of any problems or gaps found, this would be the starting point for human curation of the list, and the task of resolving these discrepancies would fall to its editors.

Since the decommissioning process for the APN functionality would (broadly) involve simply replacing the current Template:Apn with a shim that just discards the Prev/Next parameters and then passes the character link to Template:A, primarily to avoid having to edit every page where Template:Apn has been transcluded, the deactivation isn't a hard deadline because the APN data doesn't go away. All existing APN calls will still be present right where they are today, in the form of all those transclusions that become equivalent to  when APN functionality is turned off. However, it would be best if any sort of APN data extraction happened as quickly as possible, because once Template:APN no longer provides any functionality over Template:A it's likely that some editors will begin "cleaning up" APNs (replacing them with ) as they edit pages, in the interest of readability/simplicity.

But in my opinion, once wiki-editors decide that the presence of errors in a mass of information justifies the fullscale deletion of that information, they have adopted a perfectionist philosophy that is fundamentally incompatible with the very concept of a wiki. See, on that point I wholeheartedly agree. Believe me, if the only problem with the APNs were that they contained fixable errors, I would be leading the charge to say, "So fix the errors!" Removing information instead of improving it is always a poor decision. But the errors in the APN data are actually the least of my concerns. The APN implementation has fundamental design flaws that make its data unfixable, and any additional work put into "improving" the information is wasted as long as it's still being stored in its current flawed form.

Because every change to APN data requires edits on multiple separate pages, detecting errors (automatically OR manually) in an edit history is impossible, since multiple pages' edit histories have to be compared together in order to reveal mistakes. The process of correcting any errors found becomes even more error-prone, because fixing errors also requires syncing up changes across multiple pages. If something is erroneously inserted into a list on a page, that edit can simply be reverted to remove it, and nothing else is affected. But when a mistake is introduced into an APN chain, reverting any single edit only causes further breakage. The person correcting the information has to make sure they check all affected pages... after first figuring out which pages that would be.

The chief problem with the entire APN concept is that it commits one of the cardinal sins of database design: Data duplication. And it doesn't merely allow or facilitate data duplication, it requires it on a massive scale. Data duplication is inherent to its very design.

Going back again to my list-editing comparison, when you insert a line into a list article (or think of it as "inserting an item into a table row", for the purposes of this discussion), that item's relationship to the other items is implicit in the position it's placed at. The Previous item is the one in the row immediately above it. The Next item is the one in the row immediately below it. That relationship information is only stored once, in the form of the item's row number. Assigning those relationships is done with a single edit, to a single page, which is easily reversible. (An item's data may of course be the subject of multiple edits, either consecutive or scattered through the history. But the only edits that change the Previous/Next relationships between items are ones that either insert new lines, or move existing lines up/down within the list. Those changes are always atomic — contained within a single operation.)

But for every APN node, each of its relationships to other items must be duplicated on two separate pages, and any changes to those relationships cannot be done in a single edit. Nor, therefore, can any single edits that change APN relationships be safely reverted without chasing down corresponding edits on other pages. Effectively, every change to APN information REQUIRES one or more edits which break the existing chain, introducing a temporary error into the data, and then repairing the break with subsequent additional edit(s) on other pages so that the data is put back into sync. Mistakes or omissions that occur in that process are impossible to spot, because it would require examining multiple separate pages' edit histories to determine whether the edits all match up correctly.

That doesn't merely result in errors which need to be corrected, it makes it impossible to avoid the continual introduction of new errors as information is modified. Primarily that's due to the duplication of Prev/Next relationships. Not only do those all need to match up in both directions for the APN chains to work correctly, but in any cases where they don't match up, there's no way to determine from the immediately available data which one is correct (assuming they're not both wrong) — someone would have to refer to the source material, or examine the broader context and make a judgement call on what's implied by it. All of that just to correct what'd probably be a simple and obvious typo, if it were done as a flat list edit. There's a reason database designers are so horrified by data duplication, and go to such great lengths to prevent it.