The Common Network Asset Indices Methodology (CNAIM) was born out of a working group of six Distribution Network Operators (DNOs) in Great Britain, and is:
"A common framework of definitions, principles and calculation methodologies, adopted across all GB Distribution Network Operators, for the assessment, forecasting and regulatory reporting of Asset Risk."
It has been in development and use since 2015, and contains a wealth of information relating to "distribution network operators" and their assets.
The OFGEM – CNAIM Decision can be found here.
The OFGEM – DNO CNAIM document can be found here.
The idea and practice of forming a body of knowledge that captures the current thinking of industry experts, is a much needed one in many industries, not just electricity distribution operators. Information gathering facilitated by OFGEM and the others involved with the creation of this methodology is no small endeavour, and due credit must be given to all of the contributors and managers involved.
A common framework for capturing information and knowledge in a way that can be re-used by others, is firmly within modla's future vison of asset modelling.
Before we discuss improvement ideas, a distinction must be made between
Knowledge capture is a continuously improving and never-ending endeavour, one which brings its own set of challenges which will not be addressed here.
The framework on the other hand, plays an important role in allowing accurate capture, analysis, and use of the knowledge contained within. It enables users of this methodology to leverage the contained information for decision making. This where the "value" of a knowledge centric approach can be obtained and must be defensible from a technical, mathematical and engineering standpoint.
The purpose of this article is to highlight improvement opportunities with the CNAIM framework (not the knowledge captured within), and show how use of an RCMD framework, addresses these problems.
The CNAIM is an asset class level modelling method. Logic has been defined within the methodology document, that outlines the process for asset specific information to be used in determining a health score, and a subsequent Probability of Failure (PoF) for a specific asset within the model class.
The PoF calculation can be loosely represented by the below image, showing how the data (right) is combined to calculate a PoF (left).
CNAIM probability of failure calculation representation
Just like other risk calculations, the CNAIM also contains consequence calculations, namely safety, environment, financial and network performance.
These are broadly represented the below image.
CNAIM consequence of failure representation
The creation of new and innovative approaches to solving asset management problems are increasing in frequency. Innovation is valuable if it creates a new solution to an existing problem that is more efficient or accurate. Likewise, innovation is valuable if it solves a previously unsolved problem.
Reliability Centered Maintenance (RCM) is a methodology that has been around and in use within asset intensive industries since the 1960's. Moreover, the discipline of reliability engineering is mature and has produced established methods, frameworks and approaches. Yet, insights that could have been transferred from these existing approaches seem to be absent from the CNAIM.
It is not clear if these existing approaches were deemed unsuitable, or even if they were considered at all. Perhaps the RCM methodology was too complicated for the purpose, or its implementation was impeded by a lack of understanding of how RCM elements can be used to inform risk and intervention decisions at an asset class level. It could also be the case that RCM was not considered a scalable solution to the class or fleet level problems.
The RCM methodology can be extended to accept inputs the same way as the current CNAIM does. This would produce a scalable solution able to address the fleet level problem the CNAIM set out to solve. In addition, it builds on the foundation of reliability engineering without the introduction of new nomenclature, and without the use of questionable and obfuscated statistical methods. Some of these questionable methods will be highlighted below.
The CNAIM approach was developed with a particular goal in mind; the
"assessment, forecasting and regulatory reporting of asset risk".
Understandably, the CNAIM framework structured to achieve this single purpose. However, the restrictive framework impedes leveraging the contained knowledge for other uses, especially in the operational and maintenance space. e.g. spares analysis, maintenance strategy optimisation, task list generation, etc.
To perform these other types of analysis, the framework needs to accommodate detailed failure mode and task level information. RCM structures already possess these traits.
An Ideal framework would need to be universal in it's application within an organisational context i.e. one single source of asset related information, that can answer any business question thrown at it. This prevents rework for other areas of the business, and the CNAIM framework is not conducive to this ideal.
Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect.
Asset management and reliability disciplines are based on cause and effect relationships.
In it's simplest form, it is the need to understand the performance and risk of an asset, based on cause-and-effect relationships. The basic questions of "Why", or "What is causing this", is not impossible, but difficult to infer from CNAIM results.
As an example, if a CNAIM model suggest that an asset has a poor health score and subsequently has a high likelihood of failure, the driver or cause is not immediately obvious. How the asset is likely to fail, has direct implications as to the method in which it can be addressed.
The solution to this problem is failure mode level detail that is maintained at all points in the calculation.
In addition, the use of factors and their combinations (to be discussed later) masks the significance/influence of any one factor. It is impossible drawing casual links between the health score and inputs making it nearly impossible to determine sensitivity, and contest or improve the models assumptions.
Part of the reason for the issues around traceability and linking outputs back to their causes, is the use of a health score as part of the methodologies core calculation.
Health scores or health indices, have some utility for reporting and understanding (at a high level) the current state of an organizations assets or fleet of assets. This utility is not being argued here.
Calculating a PoF from a health score is a blunder. A health score is a summarising metric. To go from granular (inputs) -> broad (health score) -> granular (PoF) loses all the resolution that was derived from the subject matter expert knowledge. This loss of detail and traceability is directly responsible for breaking the causal links.
The result is that an asset can be identified as in poor condition, but the correct course of action cannot be clearly articulated (unless of course the asset only has one intervention type e.g. replace)
Most reliability engineers are familiar with the different forms of a hazard function.
Combined, these three shapes form the widely known "bathtub curve".
Assets fail indifferent ways and therefore their failure characteristics are described by different forms of the hazard function. No single form of the hazard function applies to all assets. For example, an asset can fail randomly, with no infant mortality issues or observable wear out.
The CNAIM uses a Taylor series expansion of an exponential function to represent the hazard function. The result is likened to a hazard function of the wear out form. The CNAIM applies this hazard function to the failure calculation of all asset classes contained within it. The distribution is fitted to an expected life such that on average, one failure will occur every determined lifetime.
Having the same shaped distribution for all asset and asset classes is another blunder and assumes a "one size fit's all" approach. This is a violation of core reliability principles, and the way each failure distribution should be considered is entirely dependent on the failure mode in question and its respective causes. This is a separate issue, but is an effect of, the use of health scores as an input.
Use of standard reliability distributions (e.g. Weibull distribution) for each specific mode would allow for a flexible and configurable alternative.
Random + wear out
To add a random component to the CNAIM models, each asset has it's probability set to random when under a particular health score (four at the time of writing). This random addition increases the area under the hazard function curve and subsequently reduces the asset's predicted life. Again, there are no clear causal links between this period of random failure and actual causes of failure. In addition, the predicted life of the asset is also less than the expected life detailed in the CNAIM, often this discrepancy is unknown to the user
Modifying an existing distribution to fit a narrative is another blunder.
Factors and factor combination
Another mathematical objection stems from the use of ambiguous factors and their combinations. The use of factors points to a causal relationship.
As an example; the corrosivity of an environment reduces the life of an asset by 10% (factor of 0.9). This causal inference is a step in the right direction, however it fails on multiple accounts:
Beta and modified ageing rates (aging reduction factor)
To control the rate at which probability of failure increases, an "age reduction factor" is used. This age reduction factor reduces or increases the aging rate to become closer to the predicted mean.
From the DNO CNAIM document:
For assets that are approaching end of life (EoL), this can result in a run-away effect in the forecast future PoF, which would not reflect the deterioration that would be observed in real life. The cause of the runaway effect is due to the imperfect match of the selected curve once the asset reaches high values of health and hence resultant PoF. In order to minimise the potential for overstatement of the forecast future PoF, an Ageing Reduction Factor is introduced to modify the asset’s rate of deterioration.
The use of an age reduction factor is effectively a patch to address an issue in the underlying calculation. The reasons for the runaway effect can be partly attributed to the use of a Taylor series expansion of an exponential function, the distribution parameters, and a misinterpretation of the Probability of Failure and subsequently risk (discussed later).
The CNAIM uses two separate scales for indicating asset health, 0-10 for the current year, and 0-15 for future years. This seems nonsensical since if an asset can exsist in the future with a health score of 15 why can't it exist in the same condition today?. The use of two scales is ambiguous, reduces its utility, and does not reflect how assets are assessed and interpreted in the real-world.
The CNAIM's primary function is to forecast and enable the regulatory reporting of asset risk.
However, there seems to be a lack of detail around how the predicted asset risk should be reported and interpreted. The CNAIM goes as far as to define Risk as:
Risk = Consequence * Probability
It gives the methods for calculating both consequence and probability of failure, however the answer is more nuanced than a simple equation.
The CNAIM model's probability of failure is represented by a hazard function (Taylor series expansion). It must first be understood that this does not represent the Probability of failure in that year, but instead, the Probability of failure in that year IF the asset it survives to that year.
This small distinction means that the probability results are only valid for calculating the risk an organisation is carrying, in the current year (Y0).
To determine the actual risk that the business carries into the future, the equation Risk = Consequence * Probability still holds true, however, the probability component must be expanded to probability of failure * probability of survival.
In effect when producing risk profiles and predicting costs for future years, a Probability Density Function (PDF) must be used.
Using the risk calculation as it currently stands leads to a couple of effects:
1) The future projections cannot be used to predict the risk that an organisation is carrying.
2) Future likelihoods are overstated. (It is my understanding that this is the reason for the use of an "ageing reduction factor" above.)
Failure to clarify or highlight the above is leading to widespread misuse of the risk aspects of the CNAIM.
The use of asset specific approaches, e.g. Transformers, subcomponents, submarine cables etc. detracts from the scalability and consistency of the framework. While each asset class has its own nuances, a truly scalable framework should be universal and able to account for it without changing the underlying framework.
Simple, use an RCMD Framework.