Data Challenges / Solution

business brief

The client is one of the largest manufacturers of agricultural machinery.

The business need is to predict the future failure of some of the key “prime parts” of its top end tractor model, so that predictive/condition-based maintenance (CBM) policies can be implemented. CBM recommends maintenance actions based on some sensory/telemetry information representing the current condition of the vehicle’s subsystems. This could minimize the risk of unexpected failures, which may occur before the next periodic maintenance operation.

data challenges

The vehicle’s on-board telematics system collects a variety of measurements pertaining to various parts/systems such as, oil temperature, tyre pressure etc. There are approximately 200 such measurements which are being sent to a HANA server every half an hour. The sheer volume of data was immense – 10000 plus variables coming directly from a telemetry database and being refreshed real time, adding to issues around data velocity.

solution highlights

The HANA SQl tool was used to aggregate the raw telematics data to get an “analysis-ready” table for modelling purposes. R (installed on HANA linux machine) had been chosen to do the modeling job. However since any statistical modelling software has a restriction on data size (R has a limit of matrix of 2^31 cells), we solved the problems by


Using SP HANA’s in memory capabilities to crunch the data in short time period


Used a smaller sample to build the model and did extensive validation to ensure the model was robust