27 October 2004

I got mad skilz

An aquaintance needed some email addresses that were listed on a Web site (no, no spamming involved, just business v. business) and asked me to extract them. They were hidden behind an interesting bit of encoding that had to be worked around. What to do?

I guess the skilz weren't all that mad. And maybe the 'z' is unwarranted, too, but the process was interesting. I have, however, left out most of the specific names to avoid any unlikely-but-unwanted recognition.

First, the site had a list of tens-of-thousands of addresses available only as searches that return pages of 10, 20, 50, or 100. The specific region we were interested in had around 2400 results spread across at least 24 pages. Getting around that was easy enough by just changing a url parameter from '100' to '2500' (or exactly '2429'). I was worried about the long pause, but it eventually worked and spit out a table of 3 gigs of data.

Next, we had to extract the emails. The table had columns for name, email, phone, and location, but the email column contained an image (of a little envelope, natch) with no text. Clicking on the image brought up a new email message addressed to that person.

The HTML for the image contained a link with some JavaScript like this:

<a href='javascript:let("NNN NNN NNN ... ",NNN,NNN)'>
<img src="/images/email.gif" border="0"/>
</a>

Where the groups of NNNs were values between 1 and 999. I compared several entries against the email addresses that they generated and found that although they were internally consistent with their character replacement, that replacement varied across different entries. E.g. 123 always equalled 'a' within a single email address, but might equal 't' or '@' in other addresses. That damn JavaScript function let() was doing some positional or length hashing. The top of the page contained the unlikely named script block:

<script src="javascript/deobscure.js"></script>

And even less likely, the javascript directory was browsable. I grabbed deobscure.js and had a look. I've copied the original file here, but it's small enough to list the contents:

function let(grandfather,alchemy,tree) {
	grandfather += ' ';
	var biologist = grandfather.length;
	var horse = 0;
	var drawer = '';
	for(var cavern = 0; cavern < biologist; cavern++) {
		horse = 0;
		while(grandfather.charCodeAt(cavern) != 32) {
			horse = horse * 10;
			horse = horse + grandfather.charCodeAt(cavern)-48;
			cavern++;
		}
		drawer += String.fromCharCode(shake(horse,alchemy,tree));
	}
	if (arguments[3]) {
        drawer += arguments[3];
    }
	parent.location = 'm'+'a'+'i'+'l'+'t'+'o'+':'+drawer;
}

function shake(people,farm,historian) {
	if (historian % 2 == 0) {
		mathematical = 1;
		for(var message = 1; message <= historian/2; message++) {
			memory = (people*people) % farm;
			mathematical = (memory*mathematical) % farm;
		}
	} else {
		mathematical = people;
		for(var member = 1; member <= historian/2; member++) {
			memory = (people*people) % farm;
			mathematical = (memory*mathematical) % farm;
		}
	}
	return mathematical;
}

The author put his name and address in the comments. Following those eventually takes us to this page where he offers his neat little algorithm. I sincerely hope he doesn't mind what I've done (since, as I said, it's not that mad or skill-filled). As with any of the content here, if you think I've broken protocol, just tell me and I'll do what I can.

The interesting part of the code is in the parent.location = 'm'+'a'+'i'+'l'+'t'+'o'+':'+drawer; statement. This takes the unencoded email address and forces the browser to open a new email window (basically). So, all I need to do is use the DOM to instead output that email to the current page. That would force the table to be generated with the email in the cell that currently has the JavaScript call. BTW, all of those variable names (people, farm, grandfather, alchemy, etc.) and the cumbersome concatenation of 'mailto:' I assume was done to hide the intent of the script. However, they should have named the file something other than 'deobscure.'

The first task was to have the JavaScript output to the page:

// parent.location = 'm'+'a'+'i'+'l'+'t'+'o'+':'+drawer;
document.write(drawer);

Then, the HTML that generated the link needed to be converted to just a script block. These sections:

<a href='javascript:let("NNN NNN NNN ... ",NNN,NNN)'>
<img src="/images/email.gif" border="0"/>
</a>

Were changed to this:

<script>let("NNN NNN NNN ... ",NNN,NNN)';</script>

That was done with two simple search-and-replaces, and my task was complete (except for the copy-and-paste into a spreadsheet). I felt a little dirty, but the knowledge gained was worth it and I think the information is going to be used for good and not evil. Or at least it will be used in a neutral manner.

[ posted by sstrader on 27 October 2004 at 6:35:28 PM in Programming ]