WoldWide Drilling Resource

43 WorldWide Drilling Resource ® JULY 2019 Common Cause by Britt Storkson Owner, P2FlowLLC Consider Air France Flight 447, Asiana Airlines Flight 214, Lion Air Flight 610, Ethiopian Airlines Flight 302, the USS McCain, the USS Fitzgerald, and many others. Question: What do all of these aircraft and military ships have in common? Answer: All have been suspected of having computer control systems that were flawed in such a way as to cause death and considerable property damage. These failures were not failures in the classic sense; meaning something disconnected or disintegrated, which in turn caused the crash or collision - what could be called equipment failures. These could be described more accurately as system failures. That is, not flaws in the equipment itself, but flaws in terms of how this equipment was put together and implemented. These include system overcomplexity, expecting the computer to do things the computer wasn’t programmed to do, not fully understanding how the various systems interact with each other, and inadequate testing before putting the computer controls into service. This could be called a “mix” of issues. With the above scenario, often one problem compounds the other problems within the systems. Software and hardware overcomplexity greatly increases the time and cost of testing this equipment. Since product testing doesn’t generate immediate cash flow, it is often relegated to minor status when it should be a major issue. I spend far more time testing products than making them, and that’s the way it should be. I’ve observed well-known and highly regarded “cutting-edge” companies spending huge sums of money on flawed com- puter systems that, while costing the company huge sums of money, did not result in injury or loss of life because human lives did not depend on these products working properly. When human lives depend on these systems working correctly, when these systems malfunction even for an instant, lives are lost. Often, these system operators must “cover for” the flawed computer. They have to remember this function doesn’t work or that function causes the computer to “lock up”, so don’t use it. Why? Because it’s cheaper to place the burden of making the equipment work right on the operator than it is to fix the computer. This begs the question: If the operator must “cover for” the flawed equipment, why have the equipment in the first place? Most operators are highly qualified and are up to the task, but no human can learn and recall everything instantly with per- fect memory retention. This problem is compounded by vendors constantly changing how the equipment works so the operator has to be “up to speed” on version 1, and then version 2 a few months later, and on and on. Constantly changing the system operation sets up the operator for failure, which is something we don’t want. After an incident which resulted in the failure of a system or process, often companies will write a “postmortem” detailing the problems leading up to this failure and recommending “fixes” so they won’t happen again. This is a great approach that should be practiced everywhere. Britt Britt Storkson may be contacted via e-mail to michele@worldwidedrillingresource.com The important thing is to never stop questioning. ~Einstein