17 November 2004

Natural programming

In the current issue of Queue, the magazine of the Association for Computing Machinery (ACM), there's an article titled "Natural programming languages and environments." The article is written by a group working to create programming languages and environments that are ... closer to the way people think about their tasks.

Several people in the /. discussion didn't really get the concept of using more natural language as a programming language. They got stuck in the mistaken idea that it denotes constructs such as "Set the variable X with the value 3.14" or "Loop X beginning at 1 and ending with 10." What most research along these lines focuses on is eliminating variables and simplifying structures by focusing on the actual intent: "Print the first 10 pages." There are obvious difficulties with this approach (or else full-fledged solutions would exist today), but these researchers are there to discover the limitations and possibilities.

As a simple example of how programming languages are more complex than the users' intent, they looked at summing a list of numbers. Their example in C uses three kinds of parentheses and three kinds of assignment operators in five lines of code, yet a spreadsheet requires only the SUM() function. The authors think that this could be extended even further. Interestingly, recent library additions to the C++ language provide this first level of advance (using the std::accumulate<> algorithm).

The authors discuss solutions using their graphical programming language called HANDS and another called Alice. They performed studies with children and adults by asking them to solve problems on paper involving, respectively, Pac Man and database access. The found the following:

  • An event-based or rule-based structure was often used, where actions were taken in response to events. For example, "When PacMan loses all his lives, it's game over."
  • Aggregate operators (acting on a set of objects all at once) were used much more often than iterating through the set and acting on the objects individually. For example, "Move everyone below the 5th place down by one."
  • Participants rarely used Boolean expressions, but when they did they were likely to make errors. That is, their expressions were not correct if interpreted according to the rules of Boolean logic in most programming languages.
  • Participants often drew pictures to sketch out the layout of the program, but resorted to text to describe actions and behaviors.

They also point out that problem occur either from a flaw in the programmers solution or a flaw in their implementation. Makes sense. When these problems occur, the programmer then asks "why did" 32% of the time, "why didn't" 68% of the time. Even better 50% of all errors were due to programmers' false assumptions in the hypotheses they formed while debugging existing errors. Yowch. In Steve McConnell's book Code Complete [Amazon], he sites a study showing that 25% to 66% of the bugs being fixed will introduce a new bug (i.e. of 12 bugs being fixed, 3 to 8 new ones will be introduced). This is less than the ACM numbers, but McConnell states that all studies show great variability--I assume based on domain and programmer quality.

The assist in debugging--this is cool--the Alice environment provides a "why did" and "why didn't" menu that highlights the areas in code that would answer those questions. It also offers a "why line" that flow-charts the code that affects those areas. These types of tools are amazing accomplishments and probably can be achieved for today's languages in simpler domains.

[ via /. -> ACM Queue ]

[ posted by sstrader on 17 November 2004 at 1:39:32 PM in Programming ]