Myth Busting 10: Shutdown coming

A lot of people think they understand shutdowns but their actions reveal that they have serious misunderstandings.

Shutdowns are major undertakings performed when production is at a standstill (zero revenues) and because of the scale of the work being undertaken, costs are at a high point. There is a natural and well-justified desire to minimize the duration and frequency of shutdowns.

There is also a problem that arises getting repair work done in a plant that is running. You often need to shut down at least part of your production to achieve the repair. If there is a way to limp along until there is a convenient time for the downtime, then you defer to that time and hope nothing else goes wrong. If the asset can actually run, even if not at desired performance levels, there is a tendency to put the work off until the next shutdown.

Loading up shutdowns with work that can be done most “conveniently” during a shutdown (as described above) is a big mistake.

Statistically, we know from reliability studies and teaching in Reliability Centered Maintenance that most failures will manifest as “infant mortality”. That means they happen sooner than anyone might expect and either at or shortly after startup. This occurs because of a variety of reasons all of which are related to having shut the asset down, worked on it, then started it back up. Sometimes we have the wrong parts, assemble things incorrectly, forget to tighten all the fasteners, use the wrong lubricants, make mistakes in start-up procedures, etc.  We don’t do these things intentionally, but they do happen quite naturally. To avoid infant mortality, it seems logical to avoid the shutdown work as much as possible. When I ask shutdown managers about the success of their startups after the shutdown is over I find that most experience some form of infant mortality. Usually, in equipment or systems, they worked on and often in equipment and systems where work arose (was found) during the course of doing other planned shutdown work. Those systems and equipment were often working very well before the shutdown – and now they aren’t. They were not broken – so why did they get fixed?

While I am generally shut-down averse, I recognize that they are unavoidable in some cases.  There are age or usage-related failures (degradation) that need to be addressed on a regular frequency, which should drive your shutdown frequency. Because you are down, and it is certainly convenient to do all sorts of work that is difficult to schedule and execute the plant running, you load up the shutdown with those jobs.

Those add to the shutdown scope. That additional work adds up to more resources, cost, and downtime with attendant loss of production. Trying to squeeze it all into an arbitrarily short time frame (as is often done) forces the work to be done in a hurry with an attendant risk of making mistakes (one of the causes of infant mortality). My solution is to NOT do that.

If there is any way that work can be done outside of a shutdown without harming the business more then the shutdown will then leave that work outside of the shutdown. Do it at night, in a slower shift, on a weekend, or take the lowered production hit and execute the job well in the minimum time through the application of good planning and scheduling. Small hits are usually easier for the business to handle than big ones. Eliminating part of the very big risk of unsuccessful startups after your shutdowns is usually well worth it.

Condition monitoring is a part of the solution. It tells you when equipment is in distress and might need intervention (maintenance). Advanced techniques can even tell you with some precision exactly what is wrong while the equipment is still operating. You can use that information to accurately plan that work, stage materials, and schedule it when most convenient / least disruptive. Defining your Condition Monitoring program requires a solid method like Reliability Centered Maintenance.

Your planning and scheduling (already spoken about in other myth-busters) must be good.

What about work arising during a shutdown? I get asked this a lot. The easy answer is that most of the time you can ignore it. Why? If your Condition Monitoring is comprehensive and you didn’t know there was a problem before you shut down, then why worry about it? Maintainers tend to be perfectionists. If they see a small flaw they want to fix it. Those fixes add to work scope and worse, that work won’t be planned for because you didn’t realize you were going to find it. By doing that work you add to the scope and you add unplanned work to the scope, you probably don’t add time to the shutdown to accommodate it, so you rush the work increasing the risk of infant mortality. My advice – don’t go there.

If that flaw wasn’t detected before the shutdown, then chances are that it wasn’t that significant. Live with the imperfection and leave it undisturbed. Fix it only if it adds nothing to the time required to do the original job or if it is something big that could not have been detected with condition monitoring (I’d appreciate hearing about your examples of these too).

What about regulatory work? Sometimes shutdowns are done to comply with regulations. The regulations are rarely (if ever) based on technical considerations such as you will see if you do RCM. They are usually a result of some accident that happened long ago that people thought they could avoid by doing more maintenance. That’s where the airline industry was before RCM came along and it was killing the industry with high costs and excessive accidents. If you do RCM you will have a solid basis on which to challenge the regulations. You may not win the argument, but you can at least give it a good shot and you may well win your case if you don’t do your analysis properly.

Bear in mind that regulations are often minimums that must be done and most are founded on a flawed perception that most failures occur with age. Don’t ignore them, but if you do your RCM, you’ll find that in many cases, the regulations are not an issue.

Bottom line – keep the work in your shutdowns to an absolute minimum with shutdown frequency driven solely by the frequency of those preventive tasks that result from usage or age only.