Bill Totten's Weblog

Wednesday, September 19, 2007

Who Needs Hackers?

The biggest threat to increasingly complex systems may be the systems themselves.

by John Schwartz

New York Times (September 12 2007)

Nothing was moving. International travelers flying into Los Angeles International Airport - more than 17,000 of them - were stuck on planes for hours one day in mid-August after computers for the United States Customs and Border Protection agency went down and stayed down for nine hours.

Hackers? Nope. Though it was the kind of chaos that malevolent computer intruders always seem to be creating in the movies, the problem was traced to a malfunctioning network card on a desktop computer. The flawed card slowed the network and set off a domino effect as failures rippled through the customs network at the airport, officials said.

Everybody knows hackers are the biggest threat to computer networks, except that it ain't necessarily so.

Yes, hackers are still out there, and not just teenagers: malicious insiders, political activists, mobsters and even government agents all routinely test public and private computer networks and occasionally disrupt services. But experts say that some of the most serious, even potentially devastating, problems with networks arise from sources with no malevolent component.

Whether it's the Los Angeles customs fiasco or the unpredictable network cascade that brought the global Skype telephone service down for two days in August, problems arising from flawed systems, increasingly complex networks and even technology headaches from corporate mergers can make computer systems less reliable. Meanwhile, society as a whole is growing ever more dependent on computers and computer networks, as automated controls become the norm for air traffic, pipelines, dams, the electrical grid and more.

"We don't need hackers to break the systems because they're falling apart by themselves", said Peter G Neumann, an expert in computing risks and principal scientist at SRI International, a research institute in Menlo Park, California.

Steven M Bellovin, a professor of computer science at Columbia University, said: "Most of the problems we have day to day have nothing to do with malice. Things break. Complex systems break in complex ways."

When the electrical grid went out in the summer of 2003 throughout the Eastern United States and Canada, "it wasn't any one thing, it was a cascading set of things", Mr Bellovin noted.

That is why Andreas M Antonopoulos, a founding partner at Nemertes Research, a technology research company in Mokena, Illinois, says, "The threat is complexity itself".

Change is the fuel of business, but it also introduces complexity, Mr Antonopoulos said, whether by bringing together incompatible computer networks or simply by growing beyond the network's ability to keep up.

"We have gone from fairly simple computing architectures to massively distributed, massively interconnected and interdependent networks", he said, adding that as a result, flaws have become increasingly hard to predict or spot. Simpler systems could be understood and their behavior characterized, he said, but greater complexity brings unintended consequences.

"On the scale we do it, it's more like forecasting weather", he said.

Kenneth M Ritchhart, the chief information officer for the customs and border agency, agreed that complexity was at the heart of the problem at the Los Angeles airport. "As we move from stovepipes to interdependent systems", he said, "it becomes increasingly difficult to identify and correct problems".

At first, the agency thought the source of the trouble was routers, not the network cards. "Many times the problems you see that you try to correct are not the root causes of the problem", he said.

And even though his department takes the threat of hacking and malicious cyberintruders seriously, he said, "I've got a list of sixteen things that I try to address in terms of outages - only one of them is cyber- or malicious attacks". Others include national power failures, data corruption and physical attacks on facilities.

In the case of Skype, the company - which says it has more than 220 million users, with millions online at any time - was deluged on August 16 with login attempts by computers that had restarted after downloading a security update for Microsoft's Windows operating system. A company employee, Villu Arak, posted a note online that blamed a "massive restart of our users' computers across the globe within a very short time frame" for the 48-hour failure, saying it had overtaxed the network. Though the company has software to "self-heal" in such situations, "this event revealed a previously unseen software bug" in the program that allocates computing resources.

As computer networks are cobbled together, said Matt Moynahan, the chief executive of Veracode, a security company, "the Law of the Weakest Link always seems to prevail". Whatever flaw or weakness allows a problem to occur compromises the entire system, just as one weak section of a levee can inundate an entire community, he said.

This is not a new problem, of course. The first flight of the space shuttle in 1981 was delayed minutes before launching because of a previously undetected software problem.

The "bug heard round the world", as a former NASA software engineer, John B Garman, put it in a technical paper, came down to a failure that would emerge only if a certain sequence of events occurred - and even then only once in 64 times. He wrote: "It is complexity of design and process that got us (and Murphy's Law!). Complexity in the sense that we, the 'software industry', are still naive and forge into large systems such as this with too little computer, budget, schedule and definition of the software code."

In another example, the precursor to the Internet known as the Arpanet collapsed for four hours in 1980 after years of smooth functioning. According to Dr Neumann of SRI, the collapse "resulted from an unforeseen interaction among three different causes" that included what he called "an overly lazy garbage collection algorithm" that allowed the errors to accumulate and overwhelm the fledgling network.

Where are the weaknesses most likely to have grave consequences? Every expert has a suggestion.

Aviel D Rubin, a professor of computer science at Johns Hopkins University, said that glitches could be an enormous problem in high-tech voting machines. "Maybe we have focused too much on hackers and not on the possibility of something going wrong", he said. "Sometimes the worst problems happen by accident".

Dr Rubin, who is director of the Center for Correct, Usable, Reliable, Auditable and Transparent Elections, a group financed by the National Science Foundation to study voting issues, noted that glitches had already shown up in many elections using the new generation of voting machines sold to states in the wake of the Florida election crisis in 2000, when the fate of the national election came down to issues like hanging chads on punch-card ballots.

Dr Bellovin at Columbia said he also worried about what might happen with the massively complex antimissile systems that the government is developing. "It's a system you can't really test until the real thing happens", he said.

There are better ways.

Making systems strong enough to recover quickly from the inevitable glitches and problems can keep disruption to a minimum. The customs service came under some of the most heated criticism for not having a backup plan that could quickly compensate for the network flameout; eventually, airport officials had to provide fuel to the planes so that the airlines could run the air-conditioning, and provided food, beverages and diapers to the trapped passengers.

Mr Ritchhart said it was unfair to characterize his department as having no backup plan. In fact, there were two - but neither addressed the problem. The main backup plan envisions a shutdown of the national customs network, and allows local networks to function independently. Since it was the local network that was in trouble at Los Angeles, he said, that backup plan did not work.

The other fallback involves setting up customs agents with laptops that are equipped to scan the millions of names on the watchlists and to perform other functions. That system was put in place, he said, but the laptops operate at one-third the speed of the computer network, and the delays persisted. The agency is reviewing its policies to improve its response, he said, and if a similar slowdown occurs, is considering having agents call colleagues in other cities to perform searches on functioning parts of the network.

The best answer, Dr Neumann says, is to build computers that are secure and stable from the start. A system with fewer flaws also deters hackers, he said. "If you design the thing right in the first place, you can make it reliable, secure, fault tolerant and human safe", he said. "The technology is there to do this right if anybody wanted to take the effort".

He was part of an effort that began in the 1960s to develop a rock-solid network-operating system known as Multics, but those efforts gave way to more commercially successful systems. Multics' creators were so farsighted, Dr Neumann recalled, that its designers even anticipated and prevented the "Year 2000" problem that had to be corrected in other computers. That flaw, known as Y2K, caused some machines to malfunction if they detected dates after January 1 2000. Billions of dollars were spent to prevent problems.

Dr Neumann, who has been preaching network stability since the 1960s, said, "The message never got through". Pressures to ship software and hardware quickly and to keep costs at a minimum, he said, have worked against more secure and robust systems.

"We throw this together, shrink wrap it and throw it out there", he said. "There's no incentive to do it right, and that's pitiful".

Bill Totten


Post a Comment

<< Home