In the first installment of this series we described the basics behind proactive maintenance and some of the considerations users need to make.
The second installment described RCM programs – the “gold standard” if you like for program development. This third installment describes what you can do if you realize you need a program but have nothing. It would also work if you’ve got a PM program but you are unhappy with the results you are getting. Chances are that something is missing or not being done often enough.
We’ve often encountered maintenance programs that are lacking. They need a stronger proactive component and they need it quickly to get things under control. This guideline is intended to help to get things under control.
Before you put this into use, note that we cannot be held responsible for any results you may get from using it. Consider this as a guideline only – it is not a replacement for RCM by any means and there are no guarantees you’ll get improved results. What we’ve put here reflects years of experience (both good and bad) and an application of the logic underlying RCM, but we stress, it IS NOT an RCM program so it is not the best you can do by any means. While we are confident it will be beneficial in the right circumstances and that you should see improvements, we cannot guarantee it. Use it at your own risk.
As a special note to RCM users – if you are not getting the desired results from your RCM derived program, this is not a suitable replacement. Your RCM assumptions, decision process or decisions resulting in tasks and frequencies should reflect your operating context. If you are not getting the results you expect, then you need to revisit that program using RCM.
Condition based maintenance entails two steps – checking for faults that are developing and then acting on those faults when you find them. The whole program will fail if you do not act on the defects you find in a timely manner – your work management processes will need to be capable of handling the demands that will arise.
Most of your checks should reveal that things are operating satisfactorily. You can expect that for much of the checking you do, it ends there. A small portion should reveal defects. DO NOT stop inspecting and checking only because you do not find a problem frequently – you shouldn’t find problems in any given equipment all that often. If you do, then you’ve probably got deeper design or operating problems that cannot be corrected with maintenance alone. Our suggestions are:
- If it rotates at high speeds (i.e.: pumps or rotary compressors greater than 1800 rpm) use an overall velocity or accelerometer reading on vibrations. You’ll need to determine what normal is in each case by checking vibrations when the equipment is known to be operating well. If readings are “high” act to correct the fault being indicated, if they are normal or only slightly elevated, then leave it alone.
- For low speed rotational equipment (i.e.: crushers) go with displacement readings. Anything > ½ bearing clearance is bad.
- For anything at very high speeds (e.g.: turbines, tubo-expanders) but if they do, go with accelerometers and acceleration readings.
- The frequency of taking readings should be less than the amount of warning you usually get for problems from the equipment. In the past, if you’ve discovered a problem existed but the equipment was still functioning, how long were you able to run before you had to take it down or before it failed? Divide that time by half and use that for your checking interval. For all other vibration readings – take them no less frequently than weekly. You can generate “routes” to follow (taking readings at every accessible bearing housing and right over the bearings, take one radial and one axial reading each). The routes can be loaded into your CMMS or dealt with on checklists.
- Keep the readings so you can trend them graphically (spreadsheets work well for this but your maintenance software may have a capability to display trends). If this is difficult to set up, then you can simply look at past readings when you take the new ones. If your check list contains columns you can keep readings from multiple checks on the same sheet and see trends quite easily. Monitor any equipment that is trending upwards more frequently (we suggest daily). Your goal is to catch it before it crashes but not too long before or people will think they are doing work for no reason. Be aware that in the shop, any equipment taken out of service before it has totally failed may appear to be in good condition. This is normal. The stress of higher speeds and loads (which you cannot duplicate in the shop) are part of what allows you to detect the problems.
- Whenever equipment is installed (i.e.: after a repair) take a set of readings as a baseline. Begin the trend at the baseline. If possible (and this is difficult to do so don’t dwell on it), take readings when equipment is under constant load and at the same load for all readings (i.e.: make sure it’s running in the same conditions each time) or readings will be highly variable. Over time you will get used to seeing a normal range of variation but it can be alarming at first. Do not take a single high reading as an indicator for removal – you need to watch it a bit to be sure it’s bad.
- For critical equipment with large open oil sumps (not sumps that are typically sealed with screwed plugs) take oil samples, analyze for particulates, water, contaminants and act on recommendations from the analysis lab. You’ll probably have to send samples to a lab for analysis. Sample monthly, act on results within a week unless analysis points to greater urgency. Create routes as above. Note: if oil change frequency from manufacturer is only a month or two (or less), then omit from this sampling / analysis and stick purely with oil changes.
- Best oil analysis techniques to use are ferrographic (visual inspection of particles under a microscope) and particle counts. Do not rely on spectrographic methods – the readings will rise while damage is minor, then fall giving false feeling of things getting better, just when it gets bad enough to worry about. Spectrographic methods can’t detect larger particles. Don’t get all excited about what metals are in the oil samples – that info is useful only for diagnostics and only really worth doing if the machine is critical and un-spared (i.e.: no backup).
- For smaller or less critical equipment make sure the oil is uncontaminated by dirt and water. Water will show up turning the oil milky (emulsification). Dirt will discolor the oil. Water in small quantities can be detected using a spatter test – heat a small sample of oil in a spoon over an open flame or a plate burner (in a safe environment). If the oil spatters it has water in it.
- Carry out Infra-red inspections using IR cameras on all electrical switchgear quarterly. Look for hot spots that reveal loose connections and other defects. Ideally this is done with cabinets open (watch for arc-flash risks) but can still give results even with cabinets closed if problems are significant.
- IR may also be useful for any other areas where problems show up as heat – blocked pipes carrying normally hot fluids (downstream looks cooler), rotating equipment couplings (no need to remove guards to check this), bypassing steam traps, loose or misaligned belts, excessively loaded motors will appear hotter than most do, motors with dirty cooling fins, gearboxes overheating, uneven exhaust temperatures on engines, blocked or partially blocked heat exchangers, damaged insulation on tanks, pipes, exchangers, etc.
- Visual inspections to be carried out as operator rounds twice in each shift – at the start (immediately before or after shift change) and half way through the shift. Operators need to be taught what to look for (see below under “cleaning”). Any anomalies need to be logged and reported as work requests for action. Operators need to know that any condition they feel is “uncomfortable” is a potential problem that must be dealt with. Better to be overly cautious and find a problem than to miss it and potentially suffer downtime or worse.
- Visual inspections by Maintenance foreman once per shift. As above – can also be used to investigate any reports from operators of potential problems and to see that operators are indeed doing their cleaning.
This is work that is done regardless of the equipment condition. You are restoring good operating conditions through cleaning or other restoring activities, replacing working fluids or parts. You do it at a regular frequency that is shorter than the usual time to failure. If you don’t have good records that tell you what the times to failure are, then ask your maintainers and operators. They probably have a good “gut feel” for it – that is usually valuable information and surprisingly accurate!
- For mobile equipment – in the absence of a thorough RCM analysis, follow the manufacturer recommendations for oil / filter / component changes. This could be overkill in some cases but doing it generally won’t hurt anything. At worst you use too many filters and oil. If you have experience that varies from the recommendations (e.g.: longer frequencies), then go with the site experience.
- Ditto for overhauls of mobile equipment components.
- For any plant equipment with closed oil sumps, change oil and filters at manufacturer’s recommended frequency.
- Conveyors – lubricate rollers and idlers, watch belts for signs of damage and tracking error, or slippage. If you have a lot of conveyors, then have someone dedicated to this – he / she can start walking around and when he gets to the end, start over.
- Cleaning – keep equipment and surrounding areas clean to avoid contaminants, to allow minor problems to be more immediately visible, to eliminate safety hazards and to help foster a sense of pride in the work area. This applies to mobile equipment as well as shops and plants. Anyone doing cleaning must know how to do it without harming equipment (i.e.: no water hoses aimed at bearing housings) and they must know what to watch for. Cleaners should be trained (this doesn’t take long) to look for obvious signs of equipment distress – leaks that soil cleaned areas soon after cleaning is done, higher vibrations or sounds than normal, cracks in grout or foundations of equipment or in floors near equipment, loose guards, etc. Your maintainers can probably come up with a very good and site specific list of things to watch for in these operator inspections.
- Heat exchanger cleaning – experience at the site probably reveals where heat exchangers have been problematic. In those cases schedule cleaning at a frequency less than that of the problems arising. Note that dirt will reduce heat transfer capability, decrease process efficiency and increase energy costs.
- Roads – keep graded and clear of large rocks or other debris that can harm truck tires. If you have dirt roads, keep them watered to reduce dust in the air which can get into equipment and storage areas and contaminate equipment and spare parts.
- Shops – clean them up! Avoid sources of contamination such as dusty laydown areas outside – pave them. If the shop is hot, ventilate using filtered air if you are in a normally dusty environment.
- For all back up equipment – testing once per month to prove that it is capable of starting.
- If the equipment is subject to wear out type failures (e.g.: air compressor valves), then use testing to prove operation and auto start (fake the low pressure condition), run for a short time and then switch back to normally operating equipment. Equalizing running hours is a recipe for multiple failures if the failures are related to running time. Don’t simply swap equipment of this time back and forth.
- If the equipment is subject to random types of failures (e.g.: mechanical seals and bearings on centrifugal pumps), then the test is accomplished by starting the standby equipment and putting it into operation until the next test interval (i.e.: swap the equipment back and forth). Equal running hours is not a problem in this situation.
- For safety devices you want to test them periodically to prove that they will work. Note that you will find there are far more of these than you might expect once you really start looking for them. The best test is a full “end to end” test if possible, not a simple push the test button and watch the lights go on – that only tests the bulbs. Consider things like high / low level / pressure / temperature alarms and stops, process parameter driven stops or alarms, fire alarms, etc. Don’t forget warning signs and escape route signs – if they are missing or obscured with dirt they can’t do their job when needed.
- Frequency – the more frequently you test the device the great the risk reduction you achieve. If you have a critical protective device you will want to test it more frequently than for devices that are not all that critical. A good start is to test most devices monthly unless testing is highly impractical. For things like safety valves that may be covered by some sort of legislation, do them at the legislated frequency and in the manner the regulations call for. Usually regulations require these to be changed and tested offline. These guidelines are no substitute for regulations you may be subjected to follow.
- Signage – check that all warning signs are where they should be, clearly visible (unobstructed) and good condition (e.g.: lit) so they can be read. If this includes emergency escape route signs, make sure they are pointing the right way! While doing this, inspect escape routes to ensure they are unobstructed by debris, tools, etc.
- Safety equipment (fire extinguishers, first aid kits, etc.): these are probably inspected / tested in accordance with regulations – make sure this is happening.
Operational Basic Care (Mobile Equipment)
- Carry out circle checks of equipment before using it. The operators need a good checklist and supervisors must make sure they do this thoroughly. Get your maintainers to work with operators to create the checklists.
- The best checks require that actual readings be recorded for trending purposes. Simply saying it was checked is not a good practice.
- Drive equipment within its operating parameters and do not tolerate abusive equipment operating practices. Industrial equipment is robust but it is not designed for play.
In equipment that is non-critical to operations, is unlikely to cause a safety or an environmental problem you might be comfortable to run it to failure. Failures will always require repair, but if the losses (production, quality, etc.) that come with it are negligible, then you can probably tolerate the failure and save the money you might otherwise spend preventing or predicting failures.
To accept run-to-failure the equipment must meet a few simple criteria:
- Worst case failure has little to no impact on production. (e.g.: consider running spared equipment to failure if it meets the rest of this criteria).
- Worst case failure has little to no environmental impact.
- Worst case failure does not create a safety hazard.
- The cost of repair after failure is less than the cost of preventive maintenance over time.
- The cost of repair after failure is less than the cost of predictive maintenance over time.
A simple way to tell if run-to-failure is acceptable is to ask yourself if in the past, when the equipment failed did you really need to put it back into service in a hurry? If the answer is “no” then you have a candidate for run-to-failure.
For equipment that you choose to run-to-failure you should make an annotation in your CMMS or in the equipment register to let people know that you’ve made that choice. If you don’t do this, there is a good chance that operators will put high work priorities on jobs that really don’t need them.
These guidelines are by no means an all-inclusive list but they should get you started and help you get things “under control” so that you free up resources to apply more thorough analysis like RCM. As you work your way through these guidelines and apply them in your operation you may see other equipment or systems that are not covered here. Hopefully there is enough here to give you a good idea how to handle those. If not, then you should really be moving to an RCM program.
Ultimately, RCM is where you want to go in all cases. Anything less than that will invariably miss something. If it misses something that is critical to your business, or that can result in a safety problem or environmental non-compliance then you are ultimately to blame. Imagine how you might answer a judge at an inquiry into a fatal accident caused by a preventable equipment failure if he asks you, “did you do everything in your power to prevent that failure?” If you’ve done RCM and done it well you can honestly answer “yes”, even if your analysis was flawed. After all, you are human and you will make mistakes. But if you haven’t done that analysis, you have not done as much as you can.Reliability Training