Got my home development environment cleaned up to allow me to do more remote work:
I'm still running an hilariously old version of MySQL (because of an equally old version of MovableType), but that upgrade can wait. Now, I believe Fresh to Order across the street has some free wifi and a bottle of wine with my name on it...
def c = { arg1, arg2-> println "${arg1} ${arg2}" }
def d = c.curry("foo")
d("bar")
function sum(a, b) {
return a + b;
}
sum(10, 5) // 15
var addTen = sum.curry(10);
addTen(5) // 15
function curry(f, args) {
var thisF = f;
var thisArgs = Array.prototype.slice.apply(args);
return function(args) {
thisArgs = thisArgs.concat(args);
if (thisF.length <= thisArgs.length) {
return thisF.apply(thisF, thisArgs);
} else {
return this;
}
}
}
function sum(a, b, c) {
return a + b + c;
}
function test() {
sum(1, 2, 3); // 6
x = curry(sum, [1]);
x([2]);
x([3]); // 6
x = curry(sum, [1]);
x([2, 3]); // 6
}
I can't heap enough praise on the JavaScript library DWR (direct web remoting). It came to my attention a few months back, but I've only now been able to really dive in an use it. DWR dynamically generates JavaScript classes from the Java classes of your choice. Any calls beyond simple set/get are implemented as Ajax calls back to the server. DWR allows you to leverage your existing Java code without reimplementing in JavaScript.
Given the Java class Users that manages system users with the User class, we could have the following simplified script in a page:
<script type="text/javascript">
// Ajax call to server with JavaScript callback
Users.findActive(showActiveUsers);
// Callback passed collection of User objects, populate page as needed
function showActiveUsers(users) {
$A(users).each(function(user) {
new Insertion.Bottom(
"myDiv",
"<p>" + user.name + ": " + user.id + "</p>);
})
}
</script>
Voila. JavaScript code becomes an extension of your Java code (with a little help from the equally invaluable Prototype library).
I've added the ability to search restaurant menus on EventNett. Users can edit restaurant entries and add URLs for their menus. EventNett will read the contents of the menus and update the database with menu items, descriptions, and prices. You can then search for specific dishes. Here's a list of events at restaurants in Midtown that serve lamb. You can view the full menus and links at the bottom of the restaurants' location pages. Here's Ecco's page with their late night and dinner menus. Price ranges are also displayed.
This is a first draft, so bla-bla-bla. There's so far few menus added, it doesn't parse PDFs yet, and the HTML on a very few menu pages is so chaotic that EventNett can't find anything of value and may return nonsense. Finally, screen-scraping is an uncertain art at best, so menus will be only 90% accurate. It remains to be seen whether that's valuable enough.
I had the idea last weekend when Lisa & I were out trying to remember where we had mini-burger appetizers. We never figured it out, but the a-ha moment came soon after.
(new concept, tryin' it out...)
For RadioWave, I continued cleaning up my messy, messy code. Much of it was carried over from when I was learning JEE and so required a tear-down/rebuild. Along with some JSP refactoring, I began converting code over to the data framework I developed for EventNett. It cleanly abstracts objects into classes for HTTP forms, database ResultSets, and read/write POJOs. They make it easy to copy between the different domains and are far better than my first fumblings in Java (without forcing an more invasive, 3rd party framework on the code). All of this refactoring unfortunately means shakier code in the short term as bugs get worked out but a more stable site in the long term.
I added Classic FM, a station in Australian (recommended by a European listener) which has its own weird time zones, and the music and news feeds from KCRW. Along with this new content, I broke out the detailed listings from shows that list tracks they will be playing but that don't say when they'll be playing them. In RadioWave, these un-timed tracks show up with the same schedule range as their parent show, but with the schedule times light gray instead of white.
I also finally got the Arizona stations in the correct time zone. Although, we'll only know for sure when daylight savings (which they don't follow) hits again.
I moved hosting from my server to LunarPages last month. They have something called an add-on domain that is a very inexpensive way (free, +$2/mo. for JSP) to host a new domain using your existing space. They seem to be down for ~15 minutes every week, but the price is right for JSP hosting.
For EventNett I primarily focused on getting the time zone conversions right--converting the GMT database time to the users' times. That was a bitch and it shouldn't have been. Live and learn. The knowledge will eventually be useful for converting RadioWave's times from GMT.
The only other notable update was fixing the JSP/CSS for the embedded feed (you can see an example here). The CSS had kinda skewed from the main site. Most of the problems were with IE's handing of TABLEs that have their widths specified. If IE can't determine the width of a TABLE's parent DIV, it will use the page width. Nice. The solution--as I now know and will cherish--is to create a DIV between the TABLE and its parent and specify a width of 100% in its CSS (not in its width attribute).
Finally, work (isn't it always the least interesting?). I had been struggling with getting the data sent from an existing utility to be received properly by a servlet I had written. The utility had been in the field for years with no issues (AFAIK), but when it sent data to my servlet--via HTTPS PUT--every other message was garbled. Unencrypted data came across fine, but SSL failed. What to do?
After getting schooled in the use of Wireshark from a co-worker, we poked and prodded with the packets for a few days, trying to learn what SSL should look like. I first wrote a Java utility to make the same calls, but it didn't reproduce the mangled packets. Finally, I wrote a C++ utility, mimicking the Win32 calls that the actual application was using. Testing with that, I found that the problem occurred only if data was sent from a Win2000 machine. I did some googling and found that the Win32 calls weren't being passed the correct flags; still waiting on verification though. Man, I hope that fixes it, 'cause I'm all out of ideas.
adj. -tier, -tiest
A good general rule of OO is to look at everything as a class. If you design with classes in mind, you get all of the benefits of polymorphism-encapsulation-inheritance and all of that other crap you need to spew out in interviews. This approach of class-centric design should be remembered even when using raw data types--a point that is easy to forget.
I recently ignored this rule on some simple code I was working on at home. The code manipulated an array of days ("Monday," "Tuesday," etc.). Halfway through coding some bloated utility functions, I came to my senses and quickly refactored. The value of OO--encapsulation specifically--was immediately apparent. Sometimes you get sloppy and forget, and sometimes you file it under "To Be Fixed Later." Far too much corporate code gets written in that way. It's easier to add comments to your code as you're writing it than at a later time (which never comes). Similarly, it's easier to encapsulate data earlier.
This shoud be, and indeed does sound like, a very duh point, but it's one that gets ignored.
So anyway, the day array began as a simple string array. A collection class provides sort and search and management methods, so I didn't design any further than the collection class. Almost immediately, utility methods started appearing: convert from a delimited list, convert to a delimited list, check for a weekday, check for a weekend. All of these were small but invasive to the surrounding code. In a situation where a simple true or false result was needed, access to the collection required a search and comparison that included one or more temporary objects. If only one spurious type is involved, then your code would not get too mangled. However, many similar nearly-basic types are used throughout programs, and those incur an unwieldy increase in local noise in the form of utility code. As local objects and line lengths increase, readability decreases. It's never one or two extra lines that you're adding, it's the potential for 10s or 100s.
I caught my lazy error early, and the simplification to the code was ... breathtaking. An added bonus--as always with encapsulation--was that the newly created class provided a chance to add additional safety checks that would be avoided the in already-complex surrounding code. Again, this seems like a simple Programming for Dummies recommendation because it's so obvious, and yet it is too often avoided with no good excuse. LOC numbers may go down, but so do headaches.
Another troublemaker tries to denounce the efficiency of C/C++ and gets the smackdown. Good discussion. Check out especially the shootout results and the short digression on template metaprogramming. It's statements like this, along with their responses, that produce useful information; it's only when you state your biases publicly that someone can correct them (or support them). This is the only context where I might praise a silly language war.
I stopped reading Joel Spolsky a year or so ago simply because he's a blowhard. Any slight good in an article was undoubtedly surrounded by moderate to heavy non-good or out-and-out dogma. It's good to see someone take the time to call him out on his ramblings against Ruby. The closing quote shows just what you get when you mix arrogance with malicious deceit.
The new Mindstorm is out! And there's also a book for developing Mindstorm robots with Java by Brian Bagnall. Wikipedia has a brief but useful entry with lists of supported languages and links to relevant source material.
I'm assuming that languages developed for first generation Mindstorm (RCX) will not work for NXT. Brian Bagnall is one of the contributors to the LeJOS VM (used to control a robot that visited the ISS in 2001!) and wrote the Core Java book on programming the RCX, but LeJOS's SourceForge page hasn't been updated since January.
The built-in language is LabVIEW, a visual programming language with lots of drag-and-drop fun (this is a toy, after all). In November 2004, I had looked at a very high-level visual programming language called Alice that is intended to teach programming through narrative. The demos were amazing. LabVIEW appears to be a little more gritty but still very kid-friendly. All the same, I'd stick with Java.
Continue reading "New Mindstorm"In different types of languages, meaning can be represented to varying degrees through morphology or syntax. Meaning is expressed with morphology by, for example, adding "s" for plurals of nouns or "ed" for past participles of regular verbs. In this domain, morphemes are combined through declension and conjugation to generate word-forms with different meanings. Meaning is expressed with syntax by word order and subcategorization. For example: the determiners "the" or "a," when present, must be the first word in a noun phrase; the verb "sympathize" must subcategorize for a "with" clause. The lexical alterations of morphology are replaced with structural alterations of syntax.
At work, we've recently moved from Ant to Maven to manage our build process. Ant uses XML files to define build scripts that contain similar functionality to those of make files, declaring commands that define dependencies, create directories, or that compile and copy files. Although Ant build files are in XML format, they act basically as a procedural script. Maven also contains XML files, but they contain considerably less information and instead rely on a directory structure to define tasks. The presence or absence of a specific folder decides whether or not a standard build action will be executed on the contents of that folder.
The Ant language's imperative emphasis depends largely on morphology; Maven's structural emphasis depends largely on syntax. This is somewhat of an over-generalization but it is useful in understanding the different approaches, which can be jarring. Ant's tasks have structure that serve the same purpose as Maven's directories, yet the Ant tasks read more like word-forms and the Maven directories read more like subcategorization.
Reading some of the comments at /. covering GWT. Several posters bring up the opinion that toolkits are wrong because beginners will use them before first learning "the basics." This opinion has come up recently in different contexts--all presented as a grievous flaw that teaches bad habits to those kids (e.g. the Java-doesn't-have-pointers complaint that appeared a few months back). The problem presented with the GWT is that it generates JavaScript from the user's Java code in order to push HTML and JavaScript from the web developer domain to that of the application developer. This is also its benefit: working in one domain can simplify development.
Critics look suspiciously at the process of skipping the pain of hand-coded JavaScript (and HTML for that matter). Such a position stinks of an elitism that's been around long enough to evoke weathered jokes about programming with only zeros or about what language "real programmers" use. I remember interviews for Windows MFC development where I'd be grilled on the Win32 API. The interviewer would invariably mock the ineptitude of those who didn't cut their teeth on Win32. I'd counter that programmers coming from Win32 were inept in their understanding of vtables and templates. Snobbery should always be defused by an even greater snobbery.
And that gets to the core issues and differences. Is it a problem of language or framework? MFC is a C++ framework that wraps the Win32 C API (an addition to the C language). GWT is a Java framework that wraps a subset of the JavaScript language. Both MFC and GWT--as with all frameworks--reorganize a more raw, low-level domain in order to simplify common tasks. Frameworks are, in a way, narrower manifestations of relative language abstraction and whether you're working in a high- or low-level language. Frameworks are intermediate steps that a language can take to make it a higher-level language.
A well-written framework should strive to elide the necessity to use its lower-level components. Ruby programmers don't need to be conversant in the C language that it was written in, although it would help them understand garbage collection, reference management, etc. C programmers don't need to for Assembly (stretching the argument since most are probably written in C), although it would help them understand stack memory or thread management. That knowledge would always help--and many programmers will eventually dig deeper--but its absence doesn't imply ignorance.
Short discussion on LtU about the Google Web Toolkit and commenting on its relationship to the Links programming language (which I had previously puzzled over). The GWT, on the surface, looks like a wonderful tool for Web development. More later...
The Links programming language [ via Lambda the Ultimate ] is an experimental one-tier language that encompasses the three-tier Web model. A single Links program compiles into JavaScript to run on the client and into SQL to run on the database
. Groovy, I say. Check out their short, 12-page paper [ PDF ] describing this accomplishment.
I began thinking it was amazing-yet-misguided. Existing technologies are duplicated within its syntax (e.g. XQuery), and toy programs--though mandatory as communicative examples--necessarily avoid the pitfalls of scale. However, I'm beginning to think that I'm just prejudiced by the ubiquity and expressiveness of SQL, and maybe holding too tightly to the opinion that the current domain languages are separate because they must live in separate domains. Just because JavaScript exists on the client should we have to write to it? Maybe flattening can also simplify, and maybe Links is appropriate as sort of a 4GL Web language that sits atop several, grainier languages.
But then, what of the nightmare of HTML embedded in PHP, Servlets, et al.? Don't maintenance issues appear when hasty design flattens the different domains? What about the ideal: separation of layout (CSS), static structure (HTML), client logic (JavaScript), dynamic content (JSP script), and data management (SQL)? That separation exists for the similar reason that we create small methods, classes, and files: modules should only do one thing. The reason that Links seems like a natural progression is that we can so often blur the interface between the different domains. Is that blurring a hint that a single-tiered language is the answer, or simply the artifacts of ideas sketched quickly but not yet complete?
C/C++ Users Journal died a couple of months back, and my subscription was extended with DDJ. This seemed like one of those "signs" that everyone's been talking about for the past five-years-at-least. Even as magic was being accomplished with template programming (the main proponents being the guys at Boost and Alexandrescu), C++ was being relegated to the basement of low-level development.
Just a week ago, Google released their Ctemplate library (different type of templates) for separating layout and content. It basically brings C++ closer to the world of Web scripting languages. For the past several months, I've been getting at least one unsolicited email a week presenting C++ programming opportunities. Anecdotal, but interesting. And a friend just started a job that is--although I don't think exclusively--heavily involved in C++.
Maybe "resurgence" isn't quite right.
In C++, templates provide compile-time polymorphism (similar to parametric polymorphism) by expanding templates into different type-checked functions and types during compilation. std::vector<MyObject> is a unique vector<> type because it implements the required interface, and the determination of the "derived" vector<MyObject> type occurs at compile-time. Circle is a unique Shape type for the same reason, but the determination of derived type (given a Shape reference) occurs at run-time.
Using MyObject within vector<> means that you've entered into a contract stating that MyObject will provide those features that vector<> expects. This contract is different from polymorphism using inheritance because the is-a relationship does not exist. Types used within templates do not need to exist in a specific class hierarchy. However, those types must contain an implied interface--one which the template expects. This implied interface is managed similar to how dynamically typed languages manage accessing interfaces on class instances. A dynamically typed language only determines if FunctionA() exists on MyObject at run-time and at the point of the call. With templates, FunctionA() is only required at the point of the call, but checking occurs when the code is compiled. In both dynamically typed languages and statically typed templates, an object can be missing the required method if the code never gets called.
Using the following class (in pseudo-code):
class MyClass
{
function set(String param1) {do something;}
}
In a dynamically typed language, a specific interface is validated only if the code is called:
function FunctionA(param1)
{
param1.set("a string");
}
function FunctionB(param1)
{
// Run-time error here, only if MyClass passed in.
param1.set(23);
}
MyClass myClass;
FunctionA(myClass);
With run-time polymorphism, a specific interface is required but accessed only when the type is used:
class AnotherClass extends MyClass
{
}
function FunctionA(MyClass & param1)
{
param1.set("a string");
}
function FunctionB(MyClass & param1)
{
// Compile-time error here, even though it is never called.
param1.set(23);
}
AnotherClass anotherClass;
FunctionA(anotherClass);
With compile-time polymorphism, a specific interface is validated only the the type is used:
template MyTemplate<type MYTYPE>
{
function FunctionA(MYTYPE param1)
{
param1.set("a string");
}
function FunctionB(MYTYPE param1)
{
// Compile-time error here, only if this method is referenced.
param1.set(23);
}
}
MyTemplate<MyClass> myTemplate;
MyClass myClass;
myTemplate.FunctionA(myClass);
This gives templates the flavor of dynamic typing (classes need not be of a specific type) but with the benefits of static type checking.
Interesting discussions (several clicks deep with comments all along the way) on virtual machines--JVM, CLR, etc.--via Lambda the Ultimate. Mastering a language is trivial, mastering the libraries is where the real effort lies. The common line of reasoning that goes like this 'She knows C++ therefore she will be able to pick up programming on platform X in no time', is becoming increasingly fallacious.
Looking at the code to randomly pixelize the Google logo, I'm amazed at how utterly more efficient others' code is than mine [ via Digg ] (and a perfect example of how idiotic Digg comments are). This is always in the back of my mind, but since reading about the 100-line Lisp project that implements Reddit, I've been considering it more. Although much of the efficiency and terseness is in the additional libraries (both examples) and the more compact syntax (Lisp), a simple solution is something to admire and study.
I'm also reminded of the power of C++ (as I'm entrenched in Java). Many of the gee-wiz functional programming techniques used in Lisp (but by all means not all) can be duplicated with some fancy template programming (for example, using Alexandrescu's generalized functors). I still have template programming stuck in my head and have not had the opportunity to use Java's generics.
Late Wednesday, I decided to do some end-of-year security checks on my Web server. I keep up with Windows updates, but I hadn't run Microsoft Baseline Security Analyzer in a while so that was step one. It made a few good recommendations concerning default IIS Web sites that I'd never removed (just disabled) and the fact that I didn't disable the Guest user. The fatal recommendation was to run something called IIS Lockdown from Microsoft which further cleans up stray IIS settings that could cause problems.
I'm not sure exactly what happened when I ran it, but the result was the elimination of all of my Web sites from IIS (the settings, not the files). Yipes. My fault was two-fold: I should have had IIS backed up and I should have researched more closely what the lockdown app was going to do. Anyway, the past few days--late into the evening Wednesday, a good portion of last night when I got RadioWave (JSPs) and my blog (Perl) up, and today when I finally got my development wiki (PHP) back--were exhausting. Oddly, getting Tomcat working was the biggest headache, mostly because IIS seems to be erratic about refreshing with refresh (the Web site), restart (the server), or reboot (the machine). I need to write down all of the peculiarities as soon as possible before I forget, especially because I found others describing some of the symptoms but with no solutions. I've already updated my notes on configuring MediaWiki with some new links, but there's some more to add. Getting Perl working was effortless. Getting PHP was a little more difficult because it involved some rarely-documented stuff.
All-in-all, it was a good re-learning experience and I was able to clean up many of the spurious settings from my Tomcat config files. The irony now is that my Web server is probably more insecure (I probably shouldn't advertise that, should I?) because of the gobal changes that were just made. I think I'll be locking down IIS on my own from now on, thank you.
Opening a workspace with several projects, I was presented with the following error for two of the projects:
The project cannot be built until the build path errors are resolved.
The resolution was to force a resave of the selected projects (and their .classpath files):
The only other references I could find were to make minor alterations of contents of the .classpath file.
Interesting articles currently being browsed:
There's a wealth of information and opinion on Java/Ruby/Perl/Python/etc. in the Java discussions.
Continue reading "Programming languages and direction"One of my cube neighbors, a new-ish employee, said that he didn't want to keep his desk clean because he did not yet have a clear understanding of the product he's working on. I understood what he meant, and I think it's important. Only after he understands the system can he organize his environment to fit that system. My note-taking process begins on a small stack of paper-to-be-recycled, white side up, sitting in front of my keyboard. I scribble notes and drawings and UML diagrams as needed. From there, if they're valuable and not just scribbles, I move them to my development wiki in the appropriate location and HTML-ify them with wiki links and external links. Eventually, I may add further notes, link other articles to them, or move them into a more appropriate location as I get a better understanding of the domain...
Continue reading "Allowing chaos"The CIA paid the Rendon Group more than $23 million dollars to help bring down Saddam Hussein through propaganda and media manipulation. That propaganda, fed to Judith Miller among others, once reported was used by the administration to bolster support for the war. In one breath John Rendon criticises the media for reporting unflattering and incorrect information about the war, in the next he boasts of feeding incorrect information to that same media. Jackass.
It reminds me of the essay "Astroturf: How Manufactured 'Grassroots' Movements are Subverting Democracy" from The Best American Nonrequired Reading 2003. In it, Jason Stella outlines how propaganda--lies--from the Kuwaiti government was used to push lawmakers to vote for the first Iraq war.
First, I find out that string theory is in question, now the big bang too? My head is spinning. All of those problems that still exist with the theory could eventually bring it down--and in the process describe a universe that is at least 70 billion years old instead of 13! This is big. At the center of the dispute is plasma cosmology.
The article is, however, absolutely dispicable in the way it presents modifications that occured in the big bang theory. At several points, scientific adjustments are presented as some sort of weasling out on the part of the scientists. Look: theories are meant to adjust as new facts are presented. That's what science is. If the theory eventually falls apart--which the big bang may-or-may-not--then the theory that best represents the new facts will replace it. Too much sensationalist science reporting. Jackasses.
This, oddly, makes me wish Brian Greene had a blog. I wonder what the discussions are in the physicist and cosmologist circles...
And, bravo to Eric Lerner for his vigilance in keeping the Wikipedia entry on plasma cosmology unmolested by rabid graduate students. New science is new science and it needs to be presented with fact and not ridiculed with emotion.
Self-contained wiki based on JavaScript contained within the HTML pages. Basically, you can save your entire, functioning wiki to a single HTML file. Client-side scripting at its best. Now I have to think about porting my development wiki, and maybe even my blog, to this.
Finishing up layout for a bunch of Web pages: they look perfect in Opera and Firefox but are completely messed up in IE. The causes were: missing selector support, padding or margin differences with LI elements, differences in TABLE width with TABLEs contained in multiple DIVs, and the often-reported DOM differences. All is almost well now, with the only serious problems left involving the CSS menu system. I can either consider it a "learning experience" or be pissed off at IE. Considering that I'm following CSS specs dated 1998, I'm leaning towards the latter.
Continue reading "Web standards"I had to create a report in Crystal Reports--containing graphs of historical data within subgroups of users--that ended up being more complex that I had expected. Much of the information comes from the Crystal Reports Knowledge Base article C2011945 with little else available on the Web. Here are some notes using CR10:
Continue reading "Crystal Reports: displaying a graph of the most recent data"From /.:
Sony just loves everyone $sys$anally. They are the greatest company ever when it comes to technology $sys$that $sys$sucks. Everyone is gonna love $sys$to $sys$hate Sony, and they will $sys$not buy any Sony product that they see. It's because Sony loves $sys$to $sys$fuck $sys$with their customers.
Good joke. Poorly implemented, but still a good joke. And we really do need to start spreading around the phrase "infected with DRM." Sony's rootkit is the perfect catalyst.
That serial numbers site saves me again! And for the same reason: we own the software, but no one had written down the serial number (probably provided on the disposable packaging).
Using Crystal Reports prior to version 10, tables from different databases cannot be linked using more than one column. So, if two tables are joined by the columns id1 and id2, you can only specify a link for one of those columns. More than one join will display a message box with the error "Invalid file link. Not all fields in same index expression."
References:
I tested using the demo version of CR11, and could successfully link multiple fields.
I encounted a crash a few days ago that took two days of testing to resolve. It was caused by the addition of an otherwise simple 20-line header file.
Continue reading "Crash: C++ objects moved in memory after construction"I reached a milestone of sorts with EventNet this past week. I have the filter for date ranges, relative dates, and keywords written. It's a little slow right now, possibly because some of the schedule logic is executed in code and not on the database side, but that's a point for refactoring later. I also have the basic views written--calendar, timeline, and list--with links to drill-down by week and day, and pagination of longer lists. Most of the read-only aspects of the site are written, with the most important part--user editing--yet to be done.
The past few days have been mostly finishing a couple of site aggregators. I had realized that, like RadioWave, an app on the server could populate the database by screen-scraping from the information that's already out there. Duh. It was oddly easy to write a bot to spider the information from a major-events-calendar-site-who-will-remain-nameless. Instant data, tagged and scheduled.
The primary issue has been the constantly nagging decision of how much work to offload to the database and how much to put in code. That's always an issue and resolved generally case-by-case. You want the code that represents database objects to be as generic and reusable as possible, but often need to write very specific queries to retrieve the data you need. And there's a balance between executing repeated queries but getting hit with extra memory allocation, and executing fewer queries but taking up extra CPU cycles in code. Ideally, the design should be such that you can easily make alterations in either direction. Ideally.
Anyway, production may slow considerably, what with the new job starting this Wednesday. A change, and a return to C++, will be good.
Continue reading "Recent work"Had a good interview this morning. Well, "good" as far as I could tell--my "good" interview meter is generally broken. At least I think my Tourettes of Cracking Wise was kept in check. And we were in general agreement that Eclipse really needs to fix that problem with duplicate views of a file getting out of sync. Overall it was a fun interview: I like hearing how other companies' development groups work and talking out the pros and cons of such things, even if I dislike the stiffness of the surroundings. All interviews should be done at a bar!
Eclipse will sometimes allow you to have multiple views of the same file open. Often when debugging, it will open a new view of the file when it stops on a breakpoint. If you leave these multiple views open, you run the risk of editing one and losing changes in another. Ouch. I had read about this problem in a review of Eclipse, had kept an eye out for it, had see it happen several times and dodged it, but was just recently stung. So watch out. Sometimes the bug is smarter and more vigilant than the user.
I'm still enjoying Eclipse, it's just that I'm now rubbing the bump on my head where it hit me with a frying pan, and looking at it warily: "Oh, you ... (wagging finger with a slightly pained expression) ..."
If you're going to reference URLs in a Web page generated through code, you need to encode the search part of those URLs to have acceptably escaped HTML characters. In Java you can do the following:
import java.lang.*; import java.net.*; // Copy our raw URL. String encodedURL = rawURL; // Determine if the URL contains a query. URL url = new URL(rawURL); if (url.getQuery() != null) { // Encode only the query part. int queryIndex = rawURL.indexOf(url.getQuery()); encodedURL = rawURL.substring(0, queryIndex) + url.getQuery().replaceAll("&", "&"); }
Friday 26 August 2005
Code changed above. Instead of using URL.Encode() to encode the query, just replace ampersands with their HTML escape. We don't want any equal signs encoded.
A recent (humbling) discovery: you need to explicitly define looping variables in functions called recursively. So, in the following pseudo-code, the "var" was left out and a single "index" variable will exist across all calls (probably screwing up your intended results):
function recurse(collection)
{
// Ouch! Forgot the var.
for (index = 0; index < collection.length; ++index)
{
recurse(collection.children);
}
}
I hate that I still have scoping issues with JavaScript. Curses.
Continue reading "JavaScript gotcha"Along with the EventNet project, I've been refactoring the aggregation code behind RadioWave within Eclipse. The debug tools there have been helpful in cleaning up the mess that can occur when learning new techniques. Yeah, excuses excuses. Anyway, while finding my way around Eclipse, I fixed some problems with duplicate entries in BBC 4, eliminated failed updates of file-based Web sites, and sped up the database access. However, with all of that reorganization and testing some of the existing recordings were lost. Duh. Still, it always feels good to clean house.
Kevin recommended WREK's Tuesday show "Loud Smoky Rooms," but their stream hasn't worked over the last few days, so I don't have too much hope of it getting recorded. The two problems with aggregating streaming radio have been lack of online schedules and obfuscated or missing stream links. Independent and public broadcasting stations seem to be the most open with both, with commercial and (surprisingly) college the least. There's a lesson in there somewhere.
Continue reading "Recent work"This HP paper on tagging systems explains well what I had previously explained so poorly. From the paper:
Like a Venn diagram, the set of all the items marked cats and those marked africa would intersect in precisely one way, namely, those documents that are tagged as being about African cats. Even this is not perfect, however. For example, a document tagged only cheetah would not be found in the intersection of africa and cats, though it arguably ought to; like the foldering example above, a seeker may still need to search multiple locations.
This illustrates the limitations of both folders and tags, and how an ontology however achieved is required to provide more encompassing results. A user should be able to specify "Africa" and "cats" and the system should understand all of the hyponyms of "cats" ("cheetah" etc.) as well as those of "Africa" ("Egypt" etc.). A taxonomy gives us this.
People have complained about the brittleness of these implemented tagging systems. I agree that they are not ideal, but as this HP document shows, their existence and popularity allow us to examine how a system could improve on them.
Re-reading some entries on the semantic Web and found Burningbird's link to this summary by Peter Van Dijck of the major concepts being discussed and the people discussing them. With pictures!
The biggest problem here, I think, is the top-down v. bottom-up issue: that v. really needs to go. The argument is that of (primarily) a defined versus an emergent global ontology. The Van Dijck article points out an emotional source to this issue, but I think it originates from the AI realm and the top-down-bottom-up battles there. A power metaphor is compelling but only after the fact.
The Eclipse [Wikipedia] IDE has completely amazed me as I've been using it for my new Java project. I had learned Java/JSP development on the long-defunct Forte IDE from Sun. Forte was powerful but very quirky and has become pretty outdated, so it's nice to get on a more modern IDE. From my basic experience within .NET I think that Eclipse is at least comparable, but I've still considerable digging around to do.
I haven't looked at the C/C++ side of it yet (the C/C++ Development Toolkit). This article provides a simple overview of the C++ tools for Eclipse and points to a demo for configuring MinGW with the CDT. With this configuration, I'm not sure if I will also need gdb and make.
Continue reading "Eclipse"Today I became a Perl programmer. Not a good Perl programmer but a Perl programmer all the same. It's almost readable to me now! Except ... well, and this is probably one reason why I'll never get a valid nerd card: I despise regular expressions. Yesyesyes, I understand their power and all, but the sheer unreadability and brazen obfuscatalogical syntax is just irritating.
Anyway, there you have it. I kicked around in the MT code to add some spam filtering (task 2 of my current short-term sabbatical), and let-me-tell-you it was way overdue. There'll be some continued tinkering--even if it's not needed, just so that I can play around some more--so we'll see how that turns out. Ultimately, let's hope we're approaching the end of those jackass spammers around here.
Time to update my resume.
Continue reading "@_"Scientists across several European universities are working on creating a simulated computer environment, a la The Sims or the upcoming Spore, whose AI inhabitants will create their own culture and language. Although their intent is to study the basic processes of language and culture, some scientists feel that an artificial environment will only illuminate artificial processes. There's always value with simulations as long as the relationship and bounds of the simulation's rules and those of the real world are understood.
At the very least, the experiment will be interesting as an accomplishment of NLP and AI programming.
Continue reading "Artificial culture"Google has provided the schema for the XML of Google Earth data files [via /.]. Their code site points to GE's documentation and a tutorial. Woot.
Continue reading "Hack the Earth"I've always had a nagging issue with the refactoring item replace conditional with polymorphism. The concept is that you have a bunch of objects that are acted upon in the same situation but with different underlying actions. With the canonical Shape class and its Circle, Square, and Triangle subclasses, a poor design would have the draw method in Shape and a conditional within that method to draw each possible type. That's a little contrived because of the simplicity of the example, but real-world examples are common resulting from tree-forest syndrome in bulky classes, or from code bloat as more and more classes are added--requiring copy-pastes with small alterations to the conditional.
This refactoring method converts the multiple-line conditional blocks into single lines of code that distribute their work across an already-existing class hierarchy (see below). The problem I have is that, while the superfluous conditionals are removed, there's always one or two that must remain: those that create the separate objects in the first place.
if x then
process
else if y then
process
else if z then
process
end if
Becomes:
base->process() - - - - > calls x or y or z
That "redistribution" is key, and eliminates so much noisy, conditional code that I'm sometimes unjustly suspicious of every conditional I see. This is similar to how the standard's algorithms have made loops suspicious--breaking them up into templates (find<>, for_each<>, set_intersection<>, etc.) and predicates. Absolutes are never so absolute, so there are times to use loops and conditionals (and times when the standard binders justdon'twork). The Boost library and features in updates to the C++ standard and especially some of the functional magic going on in Alexandrescu's techniques are all helpful in this regard.
Continue reading "Conditionals and polymorphism"Am I insane?
I recently sat through a rant against OOD because of its (1) over-abstraction and (2) inefficiency. The ranter in question was seething over some guy that was in love with his own code
because he designed detailed object relationships. It's tough to argue about code you haven't seen, a coder you don't know, and all of this in relation to a common ailment with coders. I'll only relate this as a simile, but it's like impugning an artist for being in love with their own work. That's a given. However, that being said, I've heard too often that OOD is anathema.
From the Design Patterns [Amazon] book to Modern C++ Design [Amazon], I've been frustrated that the technique I use to implement the visitor pattern is not offered or even refuted. I'm either on to something or, quite the opposite, off of something. Or maybe simply on something.
Continue reading "A truly decoupled visitor"Required reading for C++ programmers: Alexandrescu's Modern C++ Design. I've been re-reading his chapter on generalized functors in the hope of seeing how I could have improved my recent code with abstracted database access. I wrapped specific MFC database actions in functions or function objects that are passed to a main function for execution. The main function catches the different MFC errors and checks return codes, then converts them into a custom exception hierarchy. This simplified the main code by eliminating multiple exception handlers around the code and multiple error checks inside the code. However, my function objects had limited versatility because their signatures had to stay the same.
Alexandrescu's generalized functors would have been ideal. He has created a small library that can wrap every type of callable entity (C functions, C function pointers, references to functions, functors, and member functions), along with parameter binding, in a single template. Very nice.
Continue reading "Generalized functors"A co-worker is working on a VB form that reads custom XML files that define layout and validation. It will communicate with a back-end database and the layout will presumably be tweaked by the customer. This is an extremely common task. In my previous job, I wrote a system on CE that used HTML for layout, XML for data, and JavaScript for validation (sadly, I just heard that upon my leaving, it was declared "unuseable" by the other programmer on the project and completely rewritten). Before that, I worked at a company whose main product was an IDE that allowed users to design and script windows for simulations. The goal in both of those was to allow the client to create applications within a limited domain and with varying levels of complexity. It's unfortunate that these tools probably cannot be made generic enough for portability between projects. That "limited domain" is always so different that very little intersection exists.
Two of the best features of C++ exception handling are transparency and the propagation of rich content. With good design, exceptions can eliminate intrusive error checking and the related structural code required to support that checking, making your error handling effectively transparent. Along with that structural code, error information based on POD types is replaced with class hierarchies that can provide a richer set of information on errors. I'm still refactoring code at work and have just ran across a good example of how exception handling simplified an area I was working on.
Continue reading "Exception handling in a limited domain"I recently summarized a few thoughts about merging the top-down and bottom-up systems of WordNet and tagging as a good solution to fixing the brittleness of a tagged ontology. Clay Shirky suggests that combinatorial principles [Wikipedia] could be used to find the ontology inherent in the set of tags. Instead of merging a semantic hierarchy with search engine predicate calculus, the sematics would be derived from the liklihood that tags occur in specific combinations.
In natural language processing [Wikipedia], many parts of speech taggers will use a similar method (as a hidden Markov model) to tag unknown words based on the surrounding, known words.
It's a compelling argument, and I should have considered it.
Still, if the comparison can be continued, there are no completely probablistic POS taggers. They at least will know about syntactic items such as articles or common morphemes in order to tag the unknown words. It's a popular method, though, and the principle could be useful for tagged ontologies.
Continue reading "The alternative to emergent ontologies"Joel has a new rant about Hungarian notation and exceptions. In it, he outlines the principle that you should code in a manner that makes bad code more apparent. He uses an example of passing string values that must have special formatting in some instances and not others (safe string variable, sValue, and unsafe string variable, usValue). As usual, he goes step-by-step to slowly improve a bad situation. While he generally surprises me by going further than I can imagine with techniques to clean up the code, this time he stopped too soon.
He argued that Hungarian notation would alert programmers when they attempt to copy a normal string variable to a location that requires a formatted string variable (usValue = sValue, wrong). But in his frenzy to praise Hungarian notation (offering up yet another retelling of the history of it) and complain about poor operator overloading (hint: it's bad) he failed to provide a real solution to the problem.
Yeah, tagging variables with clues about their appropriate use is nice, but why not just create classes that forbid inappropriate use. Instead of staying in the realm of data error, use your compiler to notify you of errors.
class String {...};
class FormattedString {...};
String value1;
FormattedString value2;
// This won't compile.
value1 = value2;
Tada. Am I missing some other point he was trying to make?
I had previously lamented the limitations of a semantically empty tagging system (e.g. de.lici.us), and suggested that incorporation of the wonderful WordNet would solve that. The same tag system would be used, but a search tool would be added that allows ontological searches (e.g. a search "flower" would match the "Hydrangea" tag).
Before that post, I also rambled about tagging as a subset of centralized emergent systems on the Web (e.g. flickr or MySpace) as opposed to distributed systems (I had suggested harnessing existing Web sites to create more specific tools, a la Paul Rademacher's stunning use of Google Maps to automatically map Craig's List apartment listings).
The benefit of an emergent system is that you only need to define the rules of the system, not the content. AI long ago discovered the brittleness of creating top-down systems (e.g. write every rule of logic) compared to the fuzzy elegance of bottom-up systems (have genetic algorithms discover those logic rules). Often, these two approaches are merged to harness the benefits of both. Tags seem to be a dumb form of bottom-up system. This could be merged with WordNet (or something similar). I originally felt that the deficiencies of tagging were fixable-but-flawed. Based on a project I'm about to start, I'm now beginning to think that emergent systems are the best method for creating Web content that will evolve. Rather than being "brittle" ontologies (as Tim Bray suggested), I think tags should be viewed more as emergent ontologies.
Continue reading "Tags and emergent systems"Options available in representing date and time values:
Continue reading "Representing time in C, C++, SQL, MFC, and Java"I recently had an interesting, small-scale class design that takes advantage of some template tomfoolery. Alexandrescu's Modern C++ Design has some great concepts for compile-time polymorphism, but they're difficult to absorb without implementing. I've used policies many times before but decided to take a slightly different approach this time.
Continue reading "Compile- and run-time polymorphism"
Amazon's A9 is pretty sexy (and loved by all who obsess over short domain names). I am such a Google-head that I don't use A9 enough on reflex. Now, they have a new-ish technology called OpenSearch. It defines requirements for exposing a site's searches as XML--thus turning queries into aggregatable plug-ins. They also point out that existing search engines can be wrapped so that HTML search results are translated into OpenSearch RSS results,
although I think that you would just be screen-scraping and rewriting the content.
Caught a little of Donald Knuth [Wikipedia] on Morning Edition this morning (so there is some benefit to getting up early). They have the audio here. Hey! He looks a little like Larry David. With the story, I was once again reminded that I need to purchase The Art of Computer Programming [Wikipedia]. Amazon has volumes 1 through 3 new for $164.99 or used for ~$100. When's that first paycheck come in?
Computer types seem fascinated by Postmodernism
In an essay by the always-wonderfully expressive Paul Ford, he says that [m]any computer types seem fascinated by Postmodernism.
Guilty. He includes an unusually inexpressive (ok, almost always) few paragraphs on the subject of geeks and Postmodernism [Wikipedia]. I have my own idea on the subject based on the work I had done with natural language processing. I hadn't considered that many other geeks were so inclined, but there you have it.
Computer types seem fascinated by Postmodernism"
A summary of some upgrades to RadioWave:
Continue reading "Fixes and upgrades to RadioWave"Quick collection of links relevant to the recent release of Google! Maps! BETA!!
The maps:
The commentary and analysis:
Needless to say, Google's solution is elegant and standards compliant. Beautiful. I'm angry at Opera for not supporting the XSLT technology required by the site. I'm getting (slightly) tempted to switch to Firefox.
Complaints about Google's use of JavaScript:
Some hacks via SourceForge.
Tomcat's a go, but there are many little problems with the RadioWave code itself that got ignored when I became Network Administrator. Yay: back to being coder...
Working on the Web server now.
Continue reading "RadioWave offline for upgrade"Here's what not to do when trying to upgrade your server software: get distracted with another programming project and exacerbate the original problem that required the upgrade. Sometimes, I'm an undisciplined fool.
Continue reading "Tomcat: 1, Scott: 0"A few weeks back I got fed up with hacking through my Tomcat [Wikipedia] installation and its various configuration files--with only online flotsom to guide me--and went on the hunt for a book. Yes, a book made of paper and not bits. I went to B&N to use one of those new-fangled gift cards. A book bought from a physical store? Now I've seen everything. Mastering Tomcat Development [Amazon] looked like the very thing, but at $45 I was dubious. So I browsed Amazon with my phone and found it for ~$9.
Online purchasing: 1. Brick and mortar: 0.
So I'm now digging in to update my Tomcat installation to 4.x and fix the hacked configurations that grew from necessity and convenience. The book has very clear explanations going from square zero of where to download the files through configuration, Java servlet programming, JDBC, JSP, Tags, Strust, etc. I bought it for an explanation of the finer points of Tomcat configuration, but it will be a nice addition to my Core Java [Amazon] programming books. Expect some downtime as I move from Tomcat 3.x.
Continue reading "Book: Mastering Tomcat Development"I got some mysterious failed and half-recorded recordings last night from RadioWave, and the logs only go back a few hours. Apologies to anybody who actually wanted it to work correctly.
(Debug debug debug ...)
I just finished my final phone interview for a day that began at 9:00 AM and went non-stop until now. If anyone else asks me what a template is I'm going to crap a pair of angled brackets. I need ... (footsteps going away) ... (and back) ... have a drink. Ah. Thank goodness for left-over party beer.
The prospects are good for < a week searching. I did better technically than I expected (performance anxiety) ... although I think I pulled a Larry David with two of them: make a good initial impression then do something uncomfortable like compliment their son's penis. Gotta work on my social skills.
To The Vortex! Forthwith!!
Ed Felten and Alex Halderman over at Freedom to Tinker wrote the world's smallest P2P application in 15 lines of Python. Fifteen lines of anything can be pretty obfuscated, so Richard Jones clarified the intent of the code. The code can be run as a server (serving its own files and connecting to other servers to also server their files) or as a client (able to connect to and download from a network of servers). Connections are password-protected.
Wow.
I'm sorry I have not really tested this, but this was too cool not to pass on. A source code search engine. It may-or-may-not contain cool stuff, but I'm sure it's worth the effort to find out. I was just praising the breadth and depth of SourceForge as a resource for the hapless programmer, while bemoaning it's lack of organization ("I need a compression algorithm..."). Koders may answer that.
It's similar-but-different than the code library index I've been slowly building from the C/C++ Users Journal.
Continue reading "Code search"This entry is a repository for useful code libraries that have been presented in C/C++ Users Journal. The article name links to the zipped file, on CUJ's site, of that issue's source code. The issue date links to the CUJ page for that issue. This will be regularly updated.
Continue reading "Code libraries from C/C++ Users Journal"
In the current issue of Queue, the magazine of the Association for Computing Machinery (ACM), there's an article titled "Natural programming languages and environments." The article is written by a group working to create programming languages and environments that are ... closer to the way people think about their tasks.
This page contains an organized list of Web layout references and tools based on HTML and CSS. An earlier version of the source is available in a previous entry.
Continue reading "CSS Reference"It's begun: Employees readying class-action lawsuit against EA.
Continue reading "Time and again"Steve Litt at Troubleshooters.com has this page where he presents an algorithm for calculating prime numbers and then steps through optimizations. The algorithm is in C (very well written) with explanations for each step.
The /. article has some further comments on Litt's code along with some useful links.
[ via /. -> Fun With Prime Numbers ]
Continue reading "Prime geek"This entry is a repository of links and instructions covering how to install MediaWiki on a Windows 2000 IIS machine (ongoing).
An aquaintance needed some email addresses that were listed on a Web site (no, no spamming involved, just business v. business) and asked me to extract them. They were hidden behind an interesting bit of encoding that had to be worked around. What to do?
I guess the skilz weren't all that mad. And maybe the 'z' is unwarranted, too, but the process was interesting. I have, however, left out most of the specific names to avoid any unlikely-but-unwanted recognition.
Continue reading "I got mad skilz"WebReference.com has a new article on the semantic Web.
Continue reading "More on the Semantic Web"This is a simple search, and a simple fix, but I just had to re-learn it so I'll post it here.
When performing a disk cleanup on any machine older than two days, you'll probably be forced to wait for several minutes for the process to compress old files. If this is important to you, enjoy a free coffee break. If, however, the gain is not worth the wait, you can eliminate this step from the process.
Go into the Registry and navigate to this key:
And then delete the subkey "Compress old files". The Google search for "compress old files" registry contains more detailed articles on this.
I finally sat down after work Friday to fix the IE problem. The right column boxes (currently etc.) were resizing irregularly and left justified instead of right. It might have been caused by a template change I made when I added the Comic Page of the Week, it might have been by a CSS change when I applied styles to all of the images, or it might have been just some crap IE problem that surfaced to waste my time. It could have been a lot of things and could have taken up my night (of which I value).
However, in less than 30 minutes I had it fixed and republished (what's the emoticon for patting yourself on the back?) and am here to impart wisdom. And, of course, to write this down so that I might retain some of the wisdom.
Continue reading "Why do the facts hate IE? part 2"Paul Ford has begun a series of articles chronicling his conversion of Congress' Web site into semantic Web content. It's a quick read with clean, complete examples detailing his process and with clear insights into a Web guru's approach to this difficult task. It's a labor of interest more than usefulness: he wants to prove to himself and others the benefits of the semantic Web, and he knows that his solution (a hack based on screen-scraping) can easily go out-of-date. Nevertheless, the process is facinating.
Continue reading "Semantic government"