Like Surfside, are you “inspecting to failure”?
In June 2021, we learned about the partial collapse of a condominium in Surfside Florida, with devastating results – only a few survivors, and many fatalities. The building had actually been inspected in 2018 by a qualified engineer, and parts of it were deemed to be “structurally unsound”. In plain English, that means the building’s structure was no longer in good condition and possibly dangerous. It was unsafe to use. The report was submitted to a local authority who raised no red flags, and the building owners chose to do nothing about it. Roughly 3 years later – disaster. It’s a sad example of “inspecting to failure”.
As you might expect, “failing to inspect” critical assets can lead to the undetected development of conditions that result in the failure of that asset. Not checking the air pressure in your car’s spare tire (in the trunk) could result in a lot of inconveniences that the spare is intended to avoid.
“Inspecting to failure”, is subtly different. Instead of not inspecting, you actually do the inspection, then ignore what you find, and fail to act on problems that are found.
Inspections to determine the condition of our physical assets, help us determine that they are still capable of doing what they are supposed to do. If they are incapable or deteriorating, we should act on that knowledge to avoid the consequences that occur when the asset eventually fails. That is the essence of Condition Monitoring (inspecting) and Condition Based Maintenance (acting on the findings).
In Surfside the consequences were lethal. Their failure to act on inspection findings, however, is actually quite commonplace. In buildings, factories, processing plants, fleets of vehicles, infrastructure assets, transportation assets, utilities, etc. that same failure to act occurs. Either they fail to inspect, or inspect to failure.
In some instances, the consequences could also be lethal, and in others, they won’t be. Invariably the failures that occur disrupt normal operations, create disruption, inconvenience, loss of revenues, and possible injuries, death, or environmental damage. The whole purpose of the inspections is NOT to avoid the failures, they are done to avoid the consequences.
Inspections and condition monitoring cannot reveal a problem until the problem has already begun to form. Many failures take some time to progress from a small defect to a full-blown failure. If it is carried out at the right inspection intervals, Condition Monitoring will catch those failures after they have begun, but before they’ve progressed to a catastrophic conclusion.
Condition monitoring and condition-based maintenance are very powerful forms of maintaining an asset’s operational condition if used properly. If not, however, it can be just a waste of money. Here are a few common problems:
- Failing to inspect: We know what conditions to monitor for, but we just don’t do the inspections. This does save inspection money but eventually, you pay for it when the failure occurs. Usually, the costs of failure are far greater than the costs of the inspections, so avoiding inspections is not a smart idea. Have you ever had to change a tire and found that your spare was deflated? This is just plain irresponsible.
- Skipping inspections: This like failing to inspect. You might get away with it sometimes, but concluding that it is ok to skip inspections because nothing happened, is an invalid conclusion – that’s known as an “induction fallacy”. You’ve induced a false conclusion from incomplete or faulty evidence. This is both irresponsible, and being ill-informed. It’s largely a result of not really understanding what you are inspecting for and why.
- Inspecting at the wrong intervals (too long between inspections): The intervals should be determined with the deterioration time in mind. When you skip inspections you risk missing deterioration in progress. If you inspect at too long an interval (wait too long) you also risk missing the deterioration. Effectively you are skipping inspections. In this case, occasionally you may still catch a problem, so you have additional evidence that your interval is just fine, but you won’t catch them all – you have incomplete evidence and again, an induction fallacy. Ever had a lawnmower or snow blower run out of gas? It worked the last couple of times so you didn’t check for fuel this time, but it was low or empty.
- Inspecting at the wrong intervals (too short between inspections): This error just costs you more money than it really should. You are inspecting too often and finding no more than you would with a longer inspection interval. If the inspection is intrusive and involves disturbing operation in some way, it also increases the risks of inducing failures inadvertently. This isn’t usually a problem because our natural tendency is to do too little, not too much, but occasionally this crops up. Usually, it is because someone is overly zealous and feels they must constantly keep checking.
- Misuse or misapplication of Monitoring Equipment: There’s a lot of technology for condition monitoring and it all requires training and practice to get good at using it. It’s not uncommon to see it being used incorrectly (lack of training), or used for the wrong purposes (misapplied). This is largely an education and training issue.
- Monitoring incorrectly: Here you may think you are doing good but missing important conditions, usually because you don’t know what you are looking for. This occurs if the people doing the monitoring don’t really understand the equipment and its operation and, in many cases, the inspection wasn’t specified in sufficient detail. The instructions are incomplete or imprecise. Asking a plant operator to check a machine may result in his observation that the machine is there and running, but nothing about its condition. The operator needs to know what to look for – e.g.: oil levels, vibrations, bearing temperatures, flow rate, look for leaks, etc. If you aren’t specific you never know what you might get.
- Failing to act on findings: This happens if you do your condition monitoring, find a problem, and then don’t do anything about it. This is what happened at the Surfside Condominium. There may be more than one person involved in the process of taking action on findings, but it is absolutely critical that this business process works. Again, it is irresponsible to allow this to happen and in cases like the Surfside condos, where lives were at stake, it should probably be treated as criminal negligence.
Determining inspection intervals is a bit technical and I won’t get into that here. We have methods for doing that.
In the case of the Surfside Condominium, there was a required engineer’s inspection. It did point out problems. It indicated that there were structural problems and it indicated the nature and extent of work needed to correct it.
Why were the warnings ignored?
Greed may be a factor. The repairs needed would have been very expensive and if residents couldn’t come up with the money the building may have been torn down and they’d lose their investments in their homes. In many jurisdictions, there must be a reserve fund for major repairs. If they had one, it may not have been sufficient. Unfortunately, whatever the details, the fear of not having enough money tends to lead people down the path of doing nothing. Non-action is a choice, albeit passive, and it has consequences.
It’s possible that those who were reading the engineer’s report didn’t understand it. I’ve read it and it wasn’t overly technical, but someone unfamiliar with the terminology may have been confused by it. Of course, the city had someone who looked at it – presumably, that person (or persons) knew what they were looking at it. Why didn’t they act? Criminal negligence comes to mind.
The condo board members may well have been misinformed, depending on how they got their information about the report findings. It’s always a good idea to have someone who understands the technical details involved in the decision-making. It would seem that they did not have that, or did not avail themselves of advice from those who did.
It seems to be human nature to believe that because nothing has gone wrong so far, it won’t go wrong in the future. That is called “induction fallacy” – we draw a false conclusion based on incomplete or irrelevant data. It’s like saying that because you and your car have never been in a car accident, that you won’t be in the future.
Those responsible for the Surfside condo concluded, perhaps without really thinking, that because the building had been safe so far, it would be safe in the future despite the warnings to the contrary. Deterioration in civil structures and buildings is a slow process. In this case, it was roughly 3 years between the report with warnings and the actual collapse. Timing of your actions in these cases is always going to be tricky, but erring on the side of caution can save lives. When civil structures fail, they often do so with disastrous consequences.
Please don’t read this and conclude that failures in your buildings, infrastructure, systems, equipment, etc. won’t have consequences. Maybe, like the condo residents at Surfside, you believe you are safe because nothing has happened so far. That’s an “induction fallacy” at work.
Every physical asset has a finite number of ways in which it can deteriorate and fail. Taking the time to think about those, identify what they are, identify the potential consequences if they occur, and then to decide on what to do to mitigate those risks, is entirely possible. There are methods that help us do exactly that. Assets that can impact life, safety, the environment, or major business losses when they fail are all critical. They deserve that degree of attention and rigor. Without it, you are fooling yourself and sooner or later, someone will pay for it.