Better railroading through Big Data

Written by Dr. Allan M. Zarembski
image description

Figure 1: Converting Large Volumes of Raw Data to Actionable Insights [6].

As railroads develop and implement new generations of sophisticated inspection and monitoring systems, they find themselves collecting large volumes of data, at increased frequencies across a variety of interrelated systems.

This large volume of data, often referred to as “Big Data,” generally refers to data sets that are that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them [1]. More recently, the term Big Data tends to refer to the use of predictive analytics, or other advanced data analytics methods that extract value from data [1]. This is considered part of the newly emerging field of Data Science.

Data Science is an interdisciplinary field using evolving analysis tools and techniques to extract knowledge or insights from data in various forms, either structured or unstructured. Data Science provides ways (and tools) to deal with and benefit from Big Data, to include ways to see patterns, discover relationships and develop predictive analytic capabilities, as well as make sense of varied images, data streams and information [2].

In railway applications, Data Science can look at the full range of inspection and management systems with the goal of obtaining new and useful insights into such phenomenon as track and equipment component degradation, interaction and failure. Data Science has all of the characteristics needed by railway engineering and maintenance personnel to address and handle the enormous amount of data generated by the various technology platforms currently in place [3].

Thus, railroads are starting to look at the predictive analytics tools within this field of Data Science to help monitor infrastructure and equipment condition, optimize and plan maintenance and improve safety—all while trying to keep track of an exponential growth in the amount of data being collected. Thus, while traditional analysis of data, particularly “threshold” based analysis, is still being used, there is a growing awareness of and use of Data Science to provide new and innovative insights, and an improved understanding of maintenance and safety issues [4, 5].

As part of this increased focus on Big Data, the University of Delaware organizes an annual “Big Data in Railroad Maintenance Planning” conference, with the goal of bringing together railroad users, data science professionals, consultants, suppliers and academia to look at the latest trends in Data Science analytics and its application in the railroad industry. The 2017 conference focused on the railroads’ specific needs and practical applications to date of Big Data analytics. This can be seen in Figure 1 (above), which shows how information can be distilled from large volumes of raw data (of the order of Petabytes of data [7]) down to actionable insights [6].

The conference brought together approximately 200 railroad and data analysis professionals to hear 32 technical presentations, led off by a keynote address by Wick Moorman, former President of Amtrak. This was followed by a CIO session, where the Chief Information Officers of three railroads (Union Pacific, Amtrak and SEPTA) talked about the strategic importance of making use of the large volumes of data (“mountains of data”) generated by railroad inspection and management systems. Among the key points made by the CIOs:

  • Sensors and data analytics continue to drive railroad innovations and growth.
  • Not all data will be actionable.
  • Analytics can serve as a guide to which actions would be most effective.

Applications of Big Data analytics included track and equipment maintenance analysis, as well as forecasting and planning algorithms based on large volumes of collected data.

In the case of track maintenance analysis and planning, one presentation looked at the application of deep neural network techniques applied to rail wear data. Using multi-layer models, such as the one consisting of 2 layers of 7 and 5 neurons shown in Figure 2, resulted in excellent agreement corresponding to an actual to predicated fit of 87.5%. Such an approach uses a subset of the data to train the forecasting model and then predict on the remaining (or future) data [8].

Figure 2: Deep Neural Network Application to Rail Wear Modeling [8].

Another presentation showed how a logistic regression model (with Sigmoid function) can be used to identify exact curve locations from track geometry car data, to a high degree of accuracy. This included identification of curve points for simple and compound curves [9]. The resulting benefits, which are presented in Figure 3, include the ability to automatically and accurately detect 8,000 curve points on 1,560 curves in a matter of hours rather than the months needed previously using traditional surveying practices, with a significant savings in cost as well as time.

Figure 3: Benefits of Big Data on Curve Condition Management [9].

Several presentations addressed the use of Big Data analytics in switch monitoring and maintenance prediction. Noting that a switch is a complex system with numerous failure types that can occur simultaneously, application of Data Analytics (k-Means Clustering, Statistical Process Control, Decision Tree analysis) has shown the ability to identify decaying asset status in an early stage, which in turn allows for the optimization of maintenance intervals with associated reduction in failures and cost [10]. By relating anomalies to known root causes, it is possible to improve repair time and save money by performing the right maintenance action at the right time.

It was noted that predicting that a failure will occur has limited value, if you can’t predict what failure mode it will be: For a maintenance crew, accurate and detailed information is needed.

In the area of equipment and rolling stock maintenance, there has likewise been an evolution of maintenance management, as shown by the changing shape of analysis of onboard diagnostic data from rolling stock, as shown in Figure 4. This in turn translates into practical savings on the track (through prevention of road failures, optimization of asset performance and asset tracking to inform rail network planning), in the shop (through increased shop productivity, accurate repairs the first time to mitigate repeat repairs and decrease in total shop time) and in the yard (through decreased time needed to assemble trains, optimized allocation of locomotives and cars for missions, and full visibility of fleet inventory) [12].

Figure 4: Evolution of Maintenance Management for Rolling Stock [11].

Figure 5 sums up the process, from data acquisition to machine learning and analytics to real business outcomes [13].

Figure 5: Road Map of Advanced Analytics [13].

Organizations such as Railinc, the AAR’s rail data and messaging subsidiary that services the North American freight railway industry, have on the order of 100 Terabytes of data currently in active data repositories [13]. This order of magnitude of data was echoed by many of the presenters at the Big Data conference, including railroads, rolling stock service (and data) managers and other service providers.

The common theme heard throughout the 2017 conference was that the potential use of this Big Data has only been scratched, and that as railways learn how to more effectively “mine” their data, the most actionable insights, systems and processes will be available to help optimize maintenance management across the entire spectrum of railway operations. The University of Delaware expects even more insightful information to be available in its Big Data 2018 conference.

The 2018 Big Data in Railroad Maintenance Planning conference will be held on December 13-14, 2018 at the University of Delaware’s Newark, Delaware campus. For more information contact Professor Allan M Zarembski, @[email protected].


  1. Wikipedia, “Big Data”
  2. Zarembski, A. M., “Using Data Science to Establish Relationships between Key Railroad Engineering Parameters and Behavior,” Trends Tech Sci. Res. 2018; 1(1): 555552.
  3. Attoh-Okine, N., Big Data and Differential Privacy: Analysis Strategies for Railway Track, Wiley, May 2017.
  4. Zarembski, A. M., “Big Data in Railroad Engineering,” IEEE Big Data Conference, Washington D.C., October 2014.
  5. Zarembski, A. M. and Attoh-Okine, N., “Big Data in Railroad Engineering: The Challenge of Vast Amounts of Data,” Railway Track & Structures, November 2017 pp. 28-30.
  6. Engel, Grant, “Building the Future: Railroad Needs from Big Data; A CIO’s Perspective,” Insights and Analytics Manager, Southeastern Pennsylvania Transportation Authority, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  7. Zarembski, Allan M., Introduction to Big Data 2017, Professor, University of Delaware, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  8. Palese, Joseph, “Using Big Data to Develop Rail Wear Forecasting Model,” Senior Scientist, University of Delaware, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  9. Hosseinipour, Milad, “New Approaches to Track Geometry Analysis,” Automated Track Inspection System Engineer, Amtrak, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  10. 10 Tegelberg, Erland, “Data Collection and Predictive Maintenance in Health Monitoring of Switches,” CEO EurailScout/Managing Consultant, Strukton Rail North America, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  11. Flix, Nicolas, “HealthHub, Shaped for Best and Easiest Control of Railways System Operations,” Maintenance Engineering Director, Alstom Transport, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  12. Eric Holzer, “Update Rail Case Study,” Leader of Rail Solutions, Uptake, Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.
  13. Herb, Cathy and Veerappan, Ramesh, “From Big Data Platform to Intelligent Data Platform – Hadoop Modernization,” Railinc Corp., Big Data in Railroad Maintenance Planning Conference, December 2017, University of Delaware, Newark, Del.

Allan M. Zarembski, Ph.D., P.E. FASME; Hon. Mbr. AREMA, is Professor of Practice and Director, Railroad Engineering and Safety Program, Department of Civil and Environmental Engineering, University of Delaware.


Tags: , , ,