Except for a few industries where mistakes can have high costs in terms of human life or massive environmental impact most of us do a poor job of setting ourselves up for success. We build or acquire new capital assets making little or no provision for spare parts, training of maintainers and operators in new skills and knowledge. Most of us do a very poor job of providing technical documentation to the support disciplines – project engineering often simply files whatever it has gathered and often in a file scheme tied to contractor contract numbers rather than using the asset hierarchy that everyone else will use in one form or another. Commissioning activities rarely push the new assets hard enough – they ramp up slowly, almost gingerly, because so few really know much about how it all should work. They are unprepared for any semblance of normal operations.
Contrast that with the airline industry where new planes are successfully launched, tested and ready for service. When an airline buys its new aircraft it is quick to put it into service on a route where it will be used to its maximum capability. They don’t taxi gingerly down the runway, fly at low altitude and carry half their payload of passengers and luggage. You don’t do that with your new car either – you might put it through a brief “run-in” period where you limit top speed, but you’ll still carry full loads of people, groceries and whatever else you normally carry and even with your speed limit it is set above full highway speeds.
Why do we expect aircraft and cars to operate more or less at full capacity as soon as we take possession while ramping up our plants and factories so slowly?
It’s because we are not ready for them. We’ve done little or nothing to prepare for operations. Spending capital budget on that preparation harms the business case that was used to justify the capital acquisition in the first place so we bury our head in the sand. We leave all that to Opex budgets where they may make modest provisions at best. Those Opex budgets have to be lean or again, the whole business case for the new capital asset is at risk. Again more burying of the head in the sand.
Can we afford to get it right?
I’ve worked on major capital projects where the provision of support was an integral part of the project. We were ready when things started up and we came up to full capacity very rapidly. Everyone knew what to do, we knew what to do if something went wrong, we had the parts, tools, etc. that we could possibly need, we had the proactive maintenance program in place, the systems debugged and working, the skills, knowledge and abilities we needed. What’s more, it wasn’t difficult nor expensive to put all that in place. The cost was on the order of 2 – 3 % of capital cost – far less than the cost of engineering (typically around 10 %) and in line with what companies often pay for warranties.
Considering that warranties usually come with very specific requirements for maintenance or operating practices that are often ignored and invalidating the warranty, then why not spend that money elsewhere? A tailored program for support, including a well-designed maintenance program will usually be different and less expensive than whatever the Original Equipment Manufacturers’ require for warranty compliance.
Designing a proper support program will cost no more than your warranty spend, 2 – 3 % of capital cost and it will deliver benefits for many years. Benefits that can, over the life of the assets add up to amounts that exceed that initial capital cost in its entirety!
What do the best do?
Reliability Centered Maintenance (RCM) was developed in the airline industry for use in the development of reliability programs for new aircraft. It was quickly adopted for use in military, nuclear power and other applications where high reliability is a “must have” and the risks demanded that proactive steps be taken to ensure safety and reliability before systems were put into service. Once they are in service, RCM and other methods are used to ensure that operational systems remain reliable.
Aircraft safety records improved dramatically – more than 120-fold in the years since RCM was developed. Aircraft are delivered ready to work and to do so very safely. The nuclear industry has a similarly impressive track record as does the military with new weapon systems deployment. They all plan ahead and do so very well.
Following on those early examples, RCM has made its way into a wide variety of industries, but the “planning ahead” part that follows from it is often forgotten. In these industries, RCM has usually been applied after plants and equipment enter into service and often in response to disappointment in reliability performance. It is used as a method to correct an intolerable situation.
It would seem that wherever the consequences of failure are largely operational and related only to profit making capability, companies have tended to wait until something goes wrong, or until continued disappointing results are a threat, before taking action. This increases the owners’ risks but it is done in trade-off for a potentially higher short term payback. The savings are up front (less capital spend) and the payback only lasts a short time until the inevitable operational and maintenance problems show up, often not long after commissioning.
The rest of us are missing the boat!
RCM has two potential uses – to set you up for success or to regain success from the jaws of failure.
Regardless of when you use it though, RCM alone isn’t enough to get all the benefits. Applying it after the systems are in service, to recover shortfalls in performance, and then failing to follow up on RCM’s results only delivers part of the benefit. That’s where the aircraft and nuclear industries and the military have it right – they take those extra needed steps. The rest of us don’t!
What they do differently.
RCM outputs can be leveraged to provide additional logistical benefits that go beyond the immediate goal of achieving high levels of reliable performance and risk reduction. Improved performance reduces demand for corrective repair work, RCM sets the schedule for proactive work and knowing both there is an opportunity to define and streamline support requirements enabling optimum investment in support to ensure desired system performance – risk reduction and profitability.
Where RCM has been applied at the development and design stages there is an opportunity to define and optimize the entire support infrastructure for the new system.
Where RCM is applied to systems already in service, there is an opportunity to fine tune whatever support is already in place.
In both cases, maximum benefit arises from the leverage we can gain from the knowledge that RCM provides about our systems, how they fail and how to manage their failures. The benefit of that leverage depends on actions taken after the RCM analysis is completed as shown in the diagram below. There is an entire process that most of us overlook – and we pay dearly for it.
RCM has three primary outputs – tasks (with complete descriptions), task frequencies (based on solid technical risk and cost based criteria) and definition of who should do the tasks. There are other outputs as well: decisions in some cases to run an asset to failure, identification of the need for design, procedure, skill, and knowledge or process changes to eliminate failure causes, make failures more evident and reduce risks associated with their occurrence. These are all failure management strategies.
Those output failure management strategies are thoroughly justified on the basis of technical feasibility, risk reduction and / or costs. Yet clearly defining what to do isn’t enough if we don’t do what is decided and leverage the knowledge of those outputs fully. We know that despite its success in the military, airline and nuclear industries, RCM programs elsewhere often fail to achieve their desired goals and some don’t get beyond the technical analysis phase. All are using the same basic methodology – RCM, but not all are getting the results.
RCM is a bit like golf and baseball where we see the best shots / hits only when we see perfectly executed follow-up. In RCM, the follow-up action isn’t just an artful swing – it’s a set of actions. The obvious and immediate actions are defined, failure mode by failure mode and they include steps to:
- Put new or changed procedures in place,
- Update CMMS / EAM with new and revised maintenance tasks and frequencies, and
- Initiate design changes.
Those are the immediate requirements and sadly they don’t always get implemented, rendering the analysis all but useless. That happens if RCM is treated like a project – it will have a beginning and an end. Often its output is treated as a deliverable and “others” are responsible for those follow up actions. In some cases those who need to take action are blissfully unaware of the outputs and their role in following up. In those cases where that work is done however, there is a lot more that can be done to ensure full benefit from the analysis effort.
In the diagram below, the definition of tasks, engineering and other changes are immediate outputs from RCM appearing on the left hand side. The project deliverable often comprises those first two blocks – carry out the analysis and deliver decisions. As the diagram depicts though, there is a full life cycle support requirements definition process and then staging that follows. The rest of the diagram shows additional actions that can be taken to set yourself up for success.
There is an old saying that “failing to plan, is planning to fail”. This diagram depicts a basic process for “planning to succeed” – reactive work, that costs much more, is reduced dramatically and overall spending drops.
Defining Life Cycle Support Requirements.
Wherever a failure can occur or is allowed to occur (run-to-failure decisions) there will be a requirement for a maintenance repair job plan. Even where we take proactive steps (condition based, detective and preventive maintenance) we have a requirement for a maintenance job plan. Those plans define what is needed for execution of the job (i.e.: parts, tools, test equipment, lifting apparatus, transport, shop capabilities, skill sets (trades), documentation and drawings, and time to do the work).
Comparing existing maintainer or technician skill sets with those required to carry out the various defined jobs can reveal the need for additional skills, knowledge or abilities – i.e.: maintainer training. Similarly, for defined operator tasks or checklists, we have a need to make sure the operators have the capabilities – i.e.: operator training.
Taking those plan outputs a bit further and comparing what is needed to provide them with what you already have in place, you arrive at a full definition of what you need to add to provide that support – i.e.: the support infrastructure. That consists of training facilities or training providers, store rooms and their optimal locations, tooling and tool cribs, support equipment (e.g.: carnage, transport, shop equipment and tooling) and documentation to support it.
Spare parts are defined when we prepare job plans. For spare parts we know there will be a recurring demand and we also know (from the RCM analysis) the demand rates. We can forecast immediate and future spares requirements, set min/max levels, define parts’ specifications, identify suppliers and lead times. For repairable items we can carry out repair vs. replace analyses to determine if repair is economic. By estimating repairable item attrition rates (i.e.: how many do not make it back from repair) we can forecast how many spares to carry for those repairable items. In taking these actions we enable our supply chain to position itself well to support the reliability program. We inform it of future demands with plenty of lead time and enable it to meet those demands. Our supply chain becomes as proactive as our reliability program.
This also has the potential to remove one of the biggest irritants to maintainers and stores people in operational environments – the mutual antagonism over a lack of the right parts and sufficient warning to procure them. Instead of being antagonists, your future maintainers and supply chain become partners.
Once that is all in place, the operation (new or existing) is well positioned to achieve its designed-in reliability characteristics that are inherent in any physical plant or asset. That is a huge improvement for many operations where maintenance, supply chain and operations are often working at cross purposes due to a lack of understanding and full definition of what they must do to ensure successful operation.
What is added to the capital investment?
Your upfront capital investment must go beyond engineering, procurement and building costs. Additional activities that will need to be done that fit within the 2 – 3 % of capital cost figure are:
- RCM analysis on the new design (preferably at both concept and detailed design phases so you can incorporate design change recommendations easily)
- Consolidating RCM analysis outputs into jobs, task lists, operator inspection routes, etc.
- Writing procedures and practice descriptions to go with the operator tasks.
- Planning for future maintenance work (complete job plans including other logistical support needs such as lifting apparatus, tools, test equipment, spare parts, etc.).
- Putting the defined and planned jobs into your work order management system, setting it up with its library of standard jobs.
- Setting up the proactive maintenance program schedules based on defined task frequencies.
- Skills definition – complete descriptions of skills required to do all the work that is forecast
- Training needs analysis – defining the gaps that will require additional training for existing staff and for induction of new staff.
- Determine how to deliver the needed training.
- Definition of training facilities requirements.
- Defining the spares requirements (analysis of spares and their demand frequencies to determine what to stock, how much and where).
- Repairable item analysis (to determine what to spare in support of repairables).
- Defining tooling, shop and other support equipment requirements
- Determination of what to outsource or keep in-house
- Documentation definition together with a “management of change” process to keep it up to date.
- Determine asset hierarchy
- Define all technical documents that are required
- Catalogue and store documentation that you acquire with the asset
Over and beyond that 2 – 3 % spend you will also need to invest in things you should be buying anyway, but now you’ll be buying what you need, exactly what you need, where you need it, in the right quantity, and in sufficient time to serve operational needs. If this amounts to more than you may have expected to spend, it’s because your previous spending estimates were understated and would have left you wanting once you enter service with your new capital asset.
These additional items include:
- Spares, tools, test equipment, shop equipment, calibration equipment
- Facility modifications / build for training, shops, laydown areas, store rooms
- Support IT (CMMS, EAM, spares management, etc.) if not already in existence and use
- Staffing – hiring of new human resources
- Training (courses, materials, delivery)
- Contracting set up – for outsourced services such as repairs that are not done by in-house personnel, outside repairs to repairable items, specialized analysis services such as lube, vibration, infra-red, ultrasonic, etc.
- Documentation that does not come with the capital assets:
- Training syllabus’, system diagrams, new procedures and practices, process diagrams and definition, etc.
All of this may seem like a lot to an organization that has not done this well in the past. It may even seem like overkill to some, but consider what will be missing if you don’t do it. Consider the disruption, loss of production, the risks to safety and the environment that could arise if you are not prepared properly. Consider the time it will take to ramp up to full capacity if you are stumbling through teething problems and their solutions as you gradually come up. Consider the loss of production / capacity that accompanies that long ramp up period.
Can you afford “failing to plan” or can you now afford to “plan for success”?