Rolling with Big Data

Written by Dr. Allan M. Zarembski

The emerging role of Data Science, commonly known as “Big Data” in railroad maintenance, was discussed earlier in Railway Age (“Better Railroading Through Big Data.”) As noted, railroads are developing and implementing new generations of sophisticated inspection and monitoring systems, and as a result are finding themselves collecting large volumes of data.

This large volume of data, often referred to as Big Data, generally refers to data sets that are so voluminous and complex that traditional data-processing application software is inadequate to deal with them, resulting in the need to use advanced data analytic tools. This is illustrated in Figure 1, which compares the traditional data analysis approach to data handling with the Big Data approach.

This use of Big Data in the railroad industry, to include freight and passenger rail, cuts across traditional departmental lines, with applications in Engineering (Track and Structures), Equipment (Rolling Stock) and Transportation (Operations). These applications have been highlighted at the University of Delaware’s annual “Big Data in Railroad Maintenance Planning” conference, where railroad users, data science professionals, consultants, suppliers and academia come together to examine new and emerging uses of data science analytics in the railroad industry. The 2017 conference looked at Engineering (Track) and Rolling Stock applications on passenger and freight railroads, in the U.S. and worldwide.

To illustrate the scope of this data, Figure 2 shows the daily transactional volume reported by Railinc at the 2017 Big Data conference. Railinc currently houses nearly 100 Terabytes of data and accommodates 2,500 business customers and 65,000 users. A significant portion of this data is railcar (rolling stock) data. When dealing with this large volume of data, it is necessary first to provide access to the data on multiple levels, to include the ability to “drill down” to an individual railcar.

However, access to data, no matter how complex and sophisticated, is not enough; railway managers want “information,” particularly “business information,” which allows railways to make intelligent decisions based on not just the data, but the information derived from this data. Figure 3 illustrates the process of defining the problem and the available data, developing the necessary tools (models), and then deploying these models through interactive data integration and “learning.” This is the “decision support/intelligent software” portion of data analytics illustrated in Figure 1.

Figure 4 presents the process in a slightly different perspective, moving from data acquisition and management to machine learning and analytics, to obtaining real business outcomes that improve operations and provide real value. This increased value can encompass improved identification of “bad actor” railcars, improved equipment failure analysis and associated improved preventive maintenance.

Another Big Data conference presentation illustrated this business benefit, specifically the business benefits of improved locomotive shop productivity associated with the application of data analytics in the locomotive shop environment as shown in Figure 5.


In the area of passenger equipment monitoring and maintenance, integrated planning and management systems are emerging that monitor condition and operational data from an entire fleet of equipment and provides specific actionable information to managers and maintenance personnel at multiple levels, as illustrated in Figure 6. This Figure illustrates how analysis of large volumes of data (Big Data Analysis or data analytics) generates useable information and specific actions at four different manager levels. Thus, the Maintenance Planner/Help Desk (second level in Figure 6) analyzes data from train events, rolling stock faults, mileage, inspection data and other individual car and train data and generates preventive maintenance information. An example of this is identification of a car that needs (or will soon need) maintenance This in turn can lead to a specific action such as generation of a work order for this car.

Again, as was noted in the May article, the use of Big Data or Data Analytics has only scratched the surface of the data now available. As railways learn how to more effectively “mine” their data, the data can be converted into information, actionable insights, and specific “actions” to help optimize maintenance management for not only rolling stock but also infrastructure across the entire spectrum of railway operations. The University of Delaware expects even more insightful information to be available in its 2018 Big Data in Railroad Maintenance Planning conference, December 13-14, 2018 at the University of Delaware’s Newark, Delaware campus. For more information, contact Professor Allan M Zarembski @[email protected].


  • Zarembski, A. M., “Better Railroading Through Big Data,” Railway Age, May 2018.
  • Zarembski, A. M. and Attoh-Okine, N., “Big Data in Railroad Engineering: The Challenge of Vast Amounts of Data,” Railway Track & Structures, November 2017 pp. 28-30.
  • Kune et al., “The Anatomy of Big Data Computing,” Software Practice and Experience, Vol. 46(1) 79-105.
  • Attoh-Okine, N., “Big Data and Differential Privacy: Analysis Strategies for Railway Track,” Wiley, May 2017.
  • Herb, Cathy and Veerappan, Ramesh, “From Big Data Platform to Intelligent Data Platform – Hadoop Modernization,” Railinc Corp., 2017 Big Data in Railroad Maintenance Planning Conference.
  • Eric Holzer, “Update Rail Case
    Study,” Leader of Rail Solutions, Uptake, 2017 Big Data in Railroad Maintenance Planning Conference.
  • Flix, Nicolas, “HealthHub, Shaped for Best and Easiest Control of Railways System Operations,” Maintenance Engineering Director, Alstom Transport, 2017 Big Data in Railroad Maintenance Planning Conference.
Tags: , ,