Big Data drives big results

Written by Dr. Allan M. Zarembski
image description

UP tie cranes position new ties in Post Falls, Idaho. Bruce Kelly photo.

Railway Age, February 2019: As railroads continue to expand their data collection technologies across all of their operational areas, they simultaneously continue to expand their ability to analyze this data and convert the data into actionable information—in other words, to generate information that can be directly used in their operation or maintenance activities.

This was clearly evident at the 2018 Big Data in Railroad Maintenance Planning Conference, held at the University of Delaware Dec. 13-14. With more than 250 registrants for this year’s conference, railroads, suppliers, data scientists and university researchers came together to talk about what is being done in this exciting new arena of Data Analytics or “Big Data.” Furthermore, it was clearly apparent that the focus was no longer on what is needed or should be done, but rather what is actually being developed and implemented.

Equally impressive was the presence of railroad and supplier attendees with titles such as Chief Data Scientist, Manager of Advanced Analytics, Data Analyst, etc. Clearly, the railway industry has taken note of the need for this type of advanced analytics approach, which can extract value from data. This is a very practical part of the newly emerging field of Data Science.

Data Science is an interdisciplinary field using evolving analysis tools and techniques to extract knowledge or insights from data in various forms, either structured or unstructured [1]. Data Science has all of the characteristics needed by railway engineering and maintenance personnel to address and handle the enormous amount of data generated by the various technology platforms currently in place [2]. This includes all of the current and emerging data collection systems being used by railways to help monitor infrastructure and equipment condition, optimize and plan maintenance and improve safety, as well as the predictive analytics tools being developed within the field of Data Science and now being implemented in the railway industry. This is illustrated in Figure 1, which shows current and future data acquisition systems integrated with enhanced data analytics and decision support tools with a goal of “lean” and “effective” maintenance, i.e. effective maintenance at a minimum cost [3].

Figure 1. Integration of Data Acquisition Systems and Data Analytics in Maintenance Actions [3].

Safety, too, is a key goal, as illustrated in Figure 2, which shows a clear reduction in track caused derailments directly associated with increased track miles tested [4]. This was further illustrated in the development of a prediction model by one Class I railroad to predict which “Yellow Tags” will turn Red; i.e., which maintenance “exceptions” will develop into Red or safety defects usually requiring immediate intervention or corrective action [4]. A Yellow Tag refers to an exception identified by an inspection system, such as a track geometry car, that exceeds a railroad’s maintenance threshold but not a safety threshold. A Red Tag refers to an exception that exceeds a safety threshold, thus usually requiring immediate action.

Figure 2: Reduction in Track Caused Derailments as a Function of Increased Track Inspection [4].

The range of analytical tools currently being used by railways and their suppliers, as well as by researchers, was equally impressive. They ranged from predictive analytic tools such as Logistic Regression and Bayesian Inference to Machine Learning and Deep Learning techniques using such techniques as Image Recognition, Blockchain Technology, Language Recognition, Text Analytics, etc. [5, 6, 7 and 8]. Figure 3 illustrates how Latent Semantic Analysis (LSA), a Language Recognition technique used to analyze relationships between a set of documents and the terms they contain, can be integrated with a broad range of railroad collected data into an overall predictive modeling framework [8].

Figure 3: Application of Latent Semantic Analysis (LSA) for Predictive Modeling of Railway Data [8].

The keynote speaker at the conference, the Honorable Ronald Batory, FRA Administrator, gave a personal example of how use of good data can lead to safer, more efficient and cost-effective operations. When he was President and Chief Operating Officer of Conrail Shared Assets Operations, he was responsible for the major redesign of a yard for improved operations. As part of the design, he instructed his staff to collect an extensive amount of data regarding all aspects of the yard operations, down to an extremely fine level of operating detail. By analyzing this data, he indicated that they were able to redesign the yard for improved operations and improved safety.

Figure 4: Risk Model for Development of Recurrent Rail Defects [9].

The use of data for improved operations, maintenance and safety was an ongoing theme for the conference. This included application in all aspects of railroad operations to include track, rolling stock and transportation.

On the track side, use of data analytics addressed all aspects of track maintenance and safety, ranging from rail wear prediction, broken rail safety, tie design and inspection and prediction of track geometry degradation and associated risk of derailments. One presentation discussed a model for calculating the probability of a track geometry caused derailment as a function of a Geometry Condition Indicator (GCI) and distance from the geometry condition [5]. Prediction of rail failure to include both fatigue and wear was a recurring focus, with one class one railroad developing a rail wear modeling tool that has since been incorporated into their capital planning process. Figure 6 illustrates another risk model, focusing on the risk of developing recurrent rail defects and rail service defects [9]. Several FRA sponsored activities [10] focusing on broken rail risk included:

• Development of Artificial Intelligence Aided Track Risk Analysis (AI-Track Risk) model focused on rail failures.

• Development of an integrated broken-rail derailment risk analysis and simulation framework that included development of a Bayesian analytical framework for predicting the probability of broken rails; prediction of derailment consequence using multivariate data analyses; and evaluation of segment-specific risk and assessment of the impacts of various track risk management strategies.

Figure 5: Overlay of Multiple Rail Wear Measurements before and after Alignment [11].

Another recurring focus was one of correcting errors in position or location, associated with repeated measurements on the same section of track at different times. This is a problem that has long plagued railway engineers trying to compare multiple measurement runs to include track geometry measurement runs, rail profile measurement runs, or any other “continuous” type of measurements. This problem was addressed by several different presenters, who dealt with such issues as:

  • Correcting Position Errors in overlay of multiple measurement runs to include rail profile [11, 12] (Figure 4) and track geometry [13].
  • Correction of position errors such as Absolute Position Error (APE), and Relative Position Error (RPE), associated with measurement systems, wheel slip and adhesion (Figure 5) [14].
  • Correction of position errors due to the use of different measurement systems or vehicles, where measurement systems or sensors are located in different locations within the vehicle (Channel-inside Position Offset (CPO), Figure 6) [14].
  • Addressing “abnormal” data or data exceptions (Figure 5) [11, 14].

Likewise on the mechanical side, use of data analytics for both passenger and freight equipment was discussed. The use of data in Condition Based Maintenance (CBM) of rolling stock is illustrated in Figure 6 [15]. Using both onboard data and data from wayside train scanners, maintenance can be performed at a number of levels ranging from threshold alerted conditions (reactive maintenance) to rules based maintenance to predictive maintenance using trend analysis and forecasting models with inputs from multiple data sources.

Figure 6: Methodology for Correction of Position Errors Due to Multiple Causes [14].

Finally, presentations on the use of data analytics for improving operations ranged from prediction of train delay (Figure 7) to predicting wheel slides.

Figure 7: Use of data in Condition Based Maintenance (CBM) of Rolling Stock [15].

The development of these analytical models, both on the track and rolling stock sides of the railroad, has led to the movement of the industry from Reactive maintenance, to Preventive maintenance, to Predictive maintenance with a future goal of Prescriptive maintenance as illustrated in Figure 8 [17]. Thus, while preventive maintenance, in the form of condition based maintenance, is currently the most widely accepted approach, the railroad industry is clearly moving to predictive maintenance, which takes condition-based maintenance a step further, using advance analytics to predict when maintenance should be performed. This was clearly evident throughout virtually all of the conference presentations.

Figure 8: Using Data Analytics for Real Time Train Delay Forecasting [16].

However, as illustrated in Figure 9, the future appears to be prescriptive maintenance, which uses advanced analytics to make predictions about maintenance, but with prescriptive systems not only making recommendations but also acting on these recommendations. Thus prescriptive maintenance requires that various asset management and maintenance systems be well integrated, with the prescriptive system would actually issuing a work order to field technicians and oversee the entire maintenance workflow. Thus, prescriptive maintenance systems must be ‘cognitive’, or have the ability to think, a technology that is at the intersection of big data, analytics, machine learning, and artificial intelligence [18].

Figure 9: Use of data to Move from Reactive to Prescriptive Maintenance [17].

Overall, this year’s Big Data in Railroad Maintenance conference had a much greater focus on what has been accomplished in the “mining” of the railroads’ Big Data and the implementation of data analytics to develop predictive models and tools for both maintenance and safety. Thus, the conference echoes the movement of the industry itself, from just starting to think about the use of their data, to actual development and application of tools to use the data across the spectrum of railroad operations and maintenance. The University of Delaware expects even more insightful information to be available in its next Big Data 2019 conference.

The 2019 Big Data in Railroad Maintenance Planning conference will be held Dec. 11-12, 2019, at the University of Delaware’s Newark, Del., campus. Contact Professor Allan M. Zarembski at [email protected].


  1. Zarembski, A. M., “The Emerging Role of Data Science in Railroad Maintenance Management,” Railway Age, May 2018.
  2. Attoh-Okine, N., Big Data and Differential Privacy: Analysis Strategies for Railway Track, Wiley, May 2017.
  3. Tegelberg, Erland, “Effective Asset Management and Exciting New Big Data Sources,” Managing Consultant, Strukton Rail North America, 2018 Big Data in Railroad Maintenance Planning Conference.
  4. Messner, M., “BNSF Geometry Tag Prioritization,” Assistant Director of Roadway Planning,” BNSF, 2018 Big Data Conf.
  5. Smart K. and Einbinder D., “Utilizing Bayesian Inference and Machine Learning to Identify Risks to Railroads,” ENSCO, Inc., 2018 Big Data Conf.
  6. Stewart, L. and Pagliuco, S., “An Artificial Intelligence Approach to Aligning Historical Railroad Data,” GREX, 2018 Big Data Conf.
  7. Attoh-Okine, N., “The Future of Blockchain Technology in Railroad Track Engineering,” University of Delaware, 2018 Big Data Conf.
  8. Williams, T. and Betak, J., “Using Text and Data Analytics to Study Railroad Operations,” Collaborative Solutions, LLC, 2018 Big Data Conf.
  9. He, Q., “Data-Driven Rail Defect Deterioration Modeling for Responsive Maintenance,” University of Buffalo, 2018 Big Data Conf.
  10. Baillargeon, J, “Update on FRA’s Predictive Analytics Research,” Program Manager, FRA, 2018 Big Data Conf.
  11. Palese, J “Application of Data Analytics to Rail Wear Forecasting,” Senior Scientist, University of Delaware, 2018 Big Data Conf.
  12. Rice, J. S. and Amouie, M, “Norfolk Southern’s Rail Wear Prediction Using Artificial Intelligence and Machine Learning,” 2018 Big Data Conf.
  13. Rome, J., “Developing Geometry Data Alignment for Amtrak,” Navigation Innovations, Inc., 2018 Big Data Conf.
  14. Wang, Y., “Position Synchronization for Track Geometry Inspection Data via Big-Data Fusion and Incremental Learning,” Southwest Jiaotong University, China, 2018 Big Data Conf.
  15. Flix, N., “Acquisition, processing and Storage of Rolling Stock CBM Data,” Alstom, 2018 Big Data Conf.
  16. Karnik, A., “Evolution of Operational Analysis Using Discrete Data Streams and Big Data Approach: Case study: Prediction of Train Arrival Times,” Volanno, Inc. , 2018 Big Data Conf.
  17. Thompson, T., “Utilizing Artificial intelligence to Increase Rolling Stock Maintenance Efficiency,” Uptake, 2018 Big Data Conf.
  18. Bellias, M., “The Evolution of Maintenance,”
Tags: , ,