FAQ – AI in Process Automation

How much does a data science project cost?

A reasonable question but not always one that can be readily answered at the onset. We offer a transparent pricing structure and the work can be charged at an agreed daily rate or alternatively on a fixed price basis against an agreed project scope. However, to provide an indication for the discovery stage, based on the ‘The anatomy of a Data Science project’ to cover steps 1.1 to 1.4, then around 3 to 5 man-days of work would be required. This estimate is very dependent on the format and quality that the data is provided in. Any initial discussions are carried out free of charge.

Our information is stored in a number of disparate data sources, does this prevent us from conducting a data science project?

No, but it would be fair to say that this represents a minor speed bump. It is relatively common for the data to be stored across different systems. You would be surprised at how often data is provided as a series of Excel workbooks. However, one of the challenges is being able to join the data together. Ideally, the data can be joined by a unique identifier but often the data can only be loosely ‘joined’ based on timestamps.

How are the solutions deployed?

The solution may be deployed on either an on-premise or cloud basis. Whenever possible for machine learning applications, an on-premise solution is preferred, based on a standard server-class hardware, as it is felt that this offers the most cost-effective solution. It also avoids potential issues with manufacturing data, which may represent proprietary data, being moved offsite. Where large language models (LLMs) are required then the system could still be hosted on site but it is likely that an internet connection would be required to support language processing.

Any on-premise solution may include packaged software which is typically delivered as a virtual machine.

Can you work with our preferred solution integrator to deliver the on-premise solution?

Generally, this would be our recommended approach. Although we have significant experience with automation systems, we lack site-specific knowledge of your architecture.

What data formats can you handle?

Typically we can handle the following:

GE Digital historian archives
CSV, text and XML data
PDFs
JSON
Excel workbooks
SQL data e.g. MS SQL Server, PostgreSQL and MySQL

Why have you requested the data to be sent offsite for analysis?

Many of the software tools we use to pre-process the data are either proprietary or are licensed in such a manner that they cannot be used on third party hardware. In addition for machine learning applications there is substantial compute power required during the discovery phase.This requires specialist hardware which would not be available on-site.

How much data do you need for analysis?

A reasonable question but again one that is difficult to answer as it varies enormously on the complexity of the problem. However, assuming a project of medium complexity involving a single product, then probably a minimum of 50 batch records would be required. But more is always better.

We have thousands of inputs which may be relevant to this issue, can you identify which ones?

Techniques are available to handle this situation but they are likely to be computationally expensive. Nevertheless, this is probably going to be a challenge unless you have a very large amount volume of data for analysis. Generally, your best course of action would be to involve your SMEs to try and reduce this list to a more manageable number of inputs, for example the 20 most likely inputs.

We would like to predict the failure of rotating machines, can you help?

A major indicator is vibration analysis, specifically in the frequency domain. In our experience, this requires specialist vibration sensors to be installed. Many such vendors also provide a monitoring service using specialist software. In this niche area, they are likely to provide a better solution.

Can you guarantee a successful data science project?

Unfortunately, nobody can guarantee a successful outcome. One of the reasons for failure is that the process cannot be modelled accurately. Reasons include:

Insufficient data.
Some required input data is not available.
Input data is not being measured to the required accuracy. For example, we are attempting to identify subtle changes to a machine’s performance based on changes in power consumption. After analysing the data, it is found that power is only measured to +/- 5% which lacks sufficient precision.

We adopt a stepped approach to the discovery process to identify issues early so as to minimise wasted effort and cost.

All we can guarantee is that we will work professionally, always putting the client’s interests first.