Condition Trending and Predictive Analytics: It’s Not ExtrapolationWritten by Gary T. Fry, Vice President, Fry Technical Services, Inc.
TIMEOUT FOR TECH, RAILWAY AGE OCTOBER 2022 ISSUE: Historical datasets are important to consider, but it is much more than a simple process.
Welcome to “Timeout for Tech.” Every month, we examine a technology topic that professionals in the railway industry have asked to learn more about. This month we focus on using data to support condition trending of engineered systems.
The right data, collected and processed in the right ways, can give critical insight to an engineered system’s fitness for service. In fact, with an archive of suitable data, we can assess the system at various points in time: past, present, and most desirably, future. Reliable knowledge of the present and future fitness of a system can both enhance its safety and reduce its cost to owners. This is a very attractive double-win scenario.
We begin by pointing out that there is no dearth of data for the engineered systems on railways. Railroad rail lasts 15 years or more; ties can last 10-20 years or more depending on materials and environmental conditions; bridges last 100 years or more; and cars and locomotives 30 years or more. And all these assets are inspected and serviced on regular schedules throughout their useful lifetimes: annually, weekly, even daily. With increasing frequency and volume, much of today’s data is generated, transmitted, and archived in digital forms.
In short, there is plenty of data readily available! So, for purposes of condition trending and prediction, how do we ensure we have the right data streams? And how do we ensure we are processing them in the right ways? It turns out that, like the engineered systems themselves, the right data streams and processing methods exist by careful design and in close collaboration with experts knowledgeable in the engineering details of the systems. It is not always easy to reliably achieve the “double-win” safety-cost scenarios we seek. The design and implementation process is often complex and resource intensive involving multidisciplinary teams of domain experts and data science experts. Sometimes budget and resource constraints result in less than comprehensive efforts.
Let’s begin with a fun example of what could happen when an overly simplified approach is used to make a data-driven future prediction. Starting from the late 1940s, tail fins on American cars became an increasingly popular aesthetic. The fins began as subtle, rounded geometric forms that peeked over the sides of the trunk lid by two or three inches, like the 1950 Cadillac Series 62 Convertible (above). By the end of the 1950s, some car models sported prominent, sharp-edged fins of 12 inches or more. Figure 1 (top) is a photograph of a 1959 Cadillac Eldorado Biarritz convertible. As many of us know, ’59 Cadillacs had very prominent tail fins!
Here we seem to have the basic ingredients for a data-driven predictive tool. Tail fin height and shape are unarguably objective data sets, and we have them over a period of several years. Let’s take a simple path and extrapolate linearly using the data from the years between 1950 and 1959. Doing so we can predict that many production car models in 2023 will have razor-sharp tail fins that are roughly six feet tall. How’d we do with our prediction? It suffices to say that car body evolution is not well approximated by a linear function with respect to time—at least over the long term.
I have used this example many times in different contexts, and it never fails to bring smiles. Unfortunately, however, there are numerous unsmiling real-world examples I have encountered over the years that are directly analogous to the six-foot tail fin prediction. And they sometimes came complete with severe and lasting economic and reputational repercussions for the teams and organizations involved.
Let’s go through a different example that illustrates a logic process for developing a predictive model. We will see that historical data sets are important to consider, but it is much more than a simple process of extrapolation. We must make sure the data sets are relevant and that we include all of the essential influencing parameters.
Imagine that I have a flight to catch. When do I need to leave the office so that I arrive at the boarding gate on time? This is a problem many of us have encountered and usually solve successfully. But how often have we considered this to be a basic exercise in predictive analytics? Suppose we create a model that is an ordered list of the trip segments that largely determine the time of our trip to the gate (the logic of the model) and an estimate for the duration of each segment. For example, Table 1 (above) contains a list of those trip segments and the parameters that might influence the duration of each segment.
With the list assembled, all that remains is to create estimates for the segment durations. I rely largely on my historical experience and current information from my phone apps about traffic, weather conditions, and expected wait times at the airport. I usually add 15 to 30 minutes to the estimate as a buffer.
The process of modeling the evolving condition of engineered systems as they experience wear and tear requires a logic similar to the airport travel time example. We need to identify the significant parameters that influence the degradation of the system and estimate the amount of influence each parameter contributes to the degradation. For example, consider designing a predictive model for rail wear caused by train loading.
In general, we can’t simply measure rail wear at a particular location for several years and apply that historical data to predict future rail wear at a different location. We can’t even measure rail wear at several locations for several years and apply a statistical summary of the data as a representative predictive model for a new location. This is because each location of track, such as seen in Figure 2 (above), has a specific set of parameters that influence rail wear: for example, car types, axle loads, wheel profiles, train frequency, train speed, track curvature, superelevation, track type, track condition, rail profile, rail metallurgy, rail lubrication, etc. An effective predictive model for rail wear will include all the influencing parameters and their relative contributions with appropriate logic and mathematical representations. When assessing a particular track location, one simply provides the parameters of that location as input to the model and the model will generate estimates of rail wear associated with those parameters.
The takeaways are these. Condition trending for engineered systems and future condition prediction requires more than historical data. All the significant parameters that influence condition must be identified and modeled with appropriate logic. The predictive modeling process often benefits from multidisciplinary teams of domain experts and data scientists working together. It’s not simply a process of extrapolating history to predict the future.
Dr. Fry is the Vice President of Fry Technical Services, Inc. (https://www.frytechservices.com/). He has 30 years of experience in research and consulting on the fatigue and fracture behavior of structural metals and weldments. His research results have been incorporated into international codes of practice used in the design of structural components and systems, including structural welds, railway and highway bridges, and high-rise commercial buildings in seismic risk zones. He has extensive experience performing in situ testing of railway bridges under live loading of trains, including high-speed passenger trains and heavy-axle-load freight trains. His research, publications and consulting have advanced the state of the art in structural health monitoring and structural impairment detection.