Jump to content

User:Rich Farmbrough/Disambig scripts

From Wikipedia, the free encyclopedia

Scripting

[edit]

Download the database.

Note these scripts work with the SQL databse dumps, not with the XML dumps. Create this perl script dab.pl



#!/usr/bin/perl

while (<>) {
	@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
      foreach $line (@lines){
		$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
		$space=$1;
		$name=$2;
		$text=$3;
            if ($space==0) {
			if ($text =~ m/\{\{disambig\}\}/){
				print $name, "\n";	
			}
			elsif ($text =~ m/\{\{msg:disambig\}\}/){
				print $name, "\n";	
			}
		}	
	}
}


run

perl dab.pl ddddddddd_cur_table.sql > dab.txt

Where dddddd is the appropriate date (Takes a few minutes, I didn't time it.)

then create countdab.pl



#!/usr/bin/perl
%dab=();
open (DAB,"dab.txt");
while (<DAB>){
	chomp();
	$dab{$_}=0;
}
$i=0;
while (<>) {
	@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
      foreach $line (@lines){
		$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
		if ($1==0){
			$_=$3;
			@links= /\[\[(.*?)(?:\||\]\])/g;

 			foreach $link (@links){
				if ( exists $dab{$link} ) {
					$dab{$link}++;
				}
			}
		}
	}
	
	print STDERR ".",++$i;
}
$i=0;
foreach $key (sort { $dab{$b} <=> $dab{$a} } keys %dab) {
  print "# [[", $key,"]] ([[Special:Whatlinkshere/",$key,"|links]] to ",$dab{$key}," articles)\n";
  if ($i++>200) {last;}
}


and run with something like

perl dabcount.pl dddddddddd_cur_table.sql > count.txt

where dddddddddd is the appropriate date. (takes about twenty minutes)

and you have your result. It's not perfect because it ignores nowiki, comments etc. but for a disambiguation league table it's good enough.