6 Steps To Create FDA/CE Compliant Data Products

March 2025 , Sam Moreland

Getting regulatory clearance is incredibly important for any medical company. Without it you cant market your product, its difficult to be used in clinical settings (even with an IRB), and you have a hurdle for investors that may hamper your next raise. Even worse, if you think your company does not need regulatory clearance and 2 years down the line, the FDA disagrees, you will be in a world of trouble.

This article will not cover general product management practises. It will just cover the building blocks of creating a model to be compliant.

Note for the article, when I write medical device or device, this also includes AI. Because physical products were regulated first, newer technologies such as software (software as a medical device or SAMD) and now AI-enabled device software functions (AI-DSF) are linked to the regulations for hardware products (medical devices).

FDA/CE/UKCE vs CMS/EMA/NICE

One key distinction to make and a very important one is the difference between a Regulator (such as the FDA) and Guideline Generators (such as CMS).

Regulators are the to make sure that your device is safe for use in the manner outlined in your indications for use and environment. For example that a heart rate algorithm is accurate enough or an x-ray machine wont overdose you in radiation.

Guideline Generators are there to work out whether or not your device is efficacious for use in a healthcare setting i.e. will you get paid for it.

These are not the same thing. An example may be you create a heart rate algorithm thats super accurate only in the range of 50 - 80 bpm (this is a hypothetical, 606061-2-47 requires it to be 30-240 bpm). It may be super accurate, but its totally useless on monitoring anyone with SVT (Supraventricular tachycardia) or Bradycardia.

Having regulatory clearance is just the first step on the way to getting product market fit. To get your product into hospitals and generate enough efficacy data for recommended use, you will need regulatory approval. You also need to be able to develop your product to work in real world settings, which only works in the real world.

Step 1 - Is it even physiologically possible?

The biggest issue I see is companies assuming AI is so smart that it can do anything. There are absolute limits to what you can do with current AI technologies. I see a lot of companies raise money or labs post papers about an amazing breakthrough (usually based off testing on 5 people in the lab) for things that are just physiologically impossible. Continuous blood pressure is a common one, with companies trying to measure it anywhere from the accelerometer on your smartphone to your toilet seat.

Doctors are not stupid. Although medical knowledge has gaps, it has a lot of highly accurate information. Its not like asking for stock tips. There are often really good reasons things are done the way they are or some breakthroughs haven’t been made. Read the literature and talk to an expert to see if your physiological hypothesis are true.

Step 2 - Open datasets and PoC

Now good datasets are few and far between and will most likely not have the format or accuracy you are looking for. But they can be a great way of creating the building blocks of your algorithms, understand the tradeoffs in methods, and give you an idea of realistic algorithm accuracy.

Step 3 - Know what you’re aiming for

Assuming you know the product you want to make, then your next step is with the regulator themselves. Regulation is not a one time event, its a process where they get to learn about your company, you get to understand the regulatory landscape and build a company that is capable of creating and supporting a medical product in a safe manor.
Crucially
you need to understand how to validate your product, to prove your product is safe.

Develop a
q-sub to the FDA outlining your product, indications for use and environment, and your validation proposal. From this you will understand your needs for algorithm validation and what data you need to be gathering for training and test.

Most regulatory approvals will fall into two categories; those with predicates and those without (see post here outlining the different routes).

Devices with predicates will already have a pathway to validation that you can follow. Some of these predicates have been encoded into ISO standards (i.e. ISO 60601-2-47 for heart rate and arrhythmia) and some of them are from novel approvals. Even though there is a defined route to market, you still need to talk to the FDA early as there may be changes in the validation requirements based on feedback from real world performance and safety!

Devices without predicates will need to come with a validation plan and work with the FDA to agree on the types of data, methods and accuracy requirements needed for approval.

For understanding different types of validation please see my post here!

You will also want to discuss any Predetermined Change Control Plans (PCCP) you have with the FDA early to agree on scope for what you will be able to do. This may change the class of algorithms you choose to use.

Step 4 - Data collection

Data collection can take a while so its best to start that early to shine a light on any issues in your product viability, validation protocol or algorithm development. You may not have an FDA agreed protocol yet, but its important to get your team used to doing it.

I highly recommend that as far as possible, your data team does the data collection. Yes its boring and it may be cheaper to outsource to other people, but no one will care about the quality of the data as much as the people who will use it. Not only will this give you better data quality, but often in generating the data, you will get insights into useless of your product which may change how and what you develop.

Step 5 - Algorithm development

There is a trade-off here between cost, complexity and explainability. Neural network based architectures (AI) definitely have a higher ceiling on capability, but they are very complex and costly to develop and serve. As well as the cost/complexity, they are also less explainable, which is crucial.

In a healthcare setting knowing when to not make a prediction/diagnosis/measurement is crucial on two levels, patient safety and trust. Regulators and clinicians care deeply about this! As AI products take on more and more of the decision making in hospitals, explainability will become more and more core to your product development.

Explainable algorithms may be more applicable or using combination of ML with AI may be best. For example, do we need a neural network to predict a heart rate from an ECG? AI is pretty bad at scaling to unseen data and times series methods scale very well. However neural networks are great at things that are hard for time series or logic models, such as classifying good quality vs bad quality data or data segmentation.

My rule of thumb is usually start with the more basic algorithms. If they don’t have the level of performance needed, start using more complicated algos and iterate. Using multiple types of models are optimal, a model for general performance and severalmodels for edge cases.

Step 6 - Post Market Surveillance (PMS)

Realise that FDA clearance/approval is just the first step. You need to set up yourself to be able to improve your algorithm quickly once its actually being used in the real world. As a part of your QMS system you will already have a CAPA process in place, but you need to be proactively monitoring your model performance. You need to monitor both the inputs and outputs to your algorithm, which need to be in place before you start being used on patients!

You will also need to monitor for usage drift. Usage drift occurs when either the inputs or outputs to your data model change. Your model will have be been created on data in a certain context and this can shift over time. This means the accuracy and applicability of your models may be unsafe for patients and create a poor product experience.

For input drift you may have created the model on normotensive patients, but not on hypertensive patients. Or perhaps you were previously using a different ECG monitor as input with different data characteristics to your arrhythmia model. This can cause your algorithm to become inaccurate.
For output drift you may have refactored some code or changed so architecture which has brought in a different data processing path causing different outputs.

Catching these can be tricky with CI/CD as the data you have may not be enough to safely test any changes. You want to have some statistical models describing key features that can identify input and output distribution range used in validation, and potentially block data from a distribution you’ve not seen before.

This is where its crucial for good data engineering. Healthcare systems tend to be very complex (especially microservices) and for that reason tracking of data flow is incredibly important. You do not want to have a serious error and not find it, or fail track the cause of customer complaint. Regulators demand this of you. Data versioning, lineage, modelling, validation and real time monitoring are critical. This will allow you to help identify issues and identify areas for model improvement.

Previous
Previous

Understanding Generative AI

Next
Next

Validating Medical AI Algorithms