The Nima Gluten Sensor has launched in the UK to mixed response. I’ve written a piece for Foods Matter providing an overview of the product and looking at the research into its performance and limitations. But in this post I specifically want to explore the often-quoted 96.9% accuracy figure associated with the Nima, to look at how it was derived and what it actually represents.
The Study
The precise wording of the claim is this: “Nima detects gluten at 20ppm [20 parts per million] and above at 96.9% accuracy” — and that all-important figure comes from this study, published online in mid-2018, and conducted by the Nima team.
They used both gluten-spiked and unspiked gluten-free food samples as well as various foods from catering outlets for their research.
The first thing to point out is that quite a few of the food samples — as you can see from Figure 5a here — fall below the ’20ppm and above’ range, and yet appear to have contributed towards the calculation of the 96.9% figure (which I’ll come to). Why Nima claim the figure applies only to 20ppm and above samples I can’t answer.
Either way, 20ppm and above is the enormous range 20ppm–1,000,000ppm, and it would be foolish to assume, whatever the ‘accuracy’ figure really is, presupposing that one can be confidently defined and found, that the figure applies equally across all points on that spectrum. It’s sensible to consider accuracy is lower at the lower end and higher at the higher.
But anyway, there is a more serious problem.
It concerns the inclusion of samples within the 2–20ppm range — in other words, samples which contain “detectable gluten” but also fall under a “gluten free” definition.
The crux of much of what I have to say about Nima hinges on this, and the Nimalites out there have failed to see it.
The Numbers
The researchers tested 447 samples which had been tested by other methods to establish their gluten content.
Of those 447, 31 gave an ‘error’ reading on Nima, leaving 416 successful results.
Nima gives one of two results — either ‘gluten found’, or a smiley face (gluten free).
The results are collated into the table shown in 5b, just below the one referred to above. Take a close look at it here.
The top half represents the results. I’ll break the 416 down into the 4 outcomes:
1/ Nima returned ‘gluten detected’ in samples containing gluten at 2ppm or above 284 times. (This was deemed a ‘true’ result -— a true positive.)
2/ Nima returned ‘gluten detected’ in samples containing less than 2ppm (ie either zero gluten or undetectable gluten) 10 times. (This, a false result — a false positive.)
3/ Nima returned a ‘smile’ in a sample with 20ppm or over gluten 3 times. (A false result — a false negative.)
4/ Nima returned a ‘smile’ in a sample with 20ppm or under 119 times. (A true result — a true negative.)
In the bottom half of the table, the calculations, where we see the 96.9% figure. It is calculated by summing the number of true results (ie 284 + 119 = 403) and dividing by the total of true and false results (416). That gives 0.969, or 96.9%.
Note first that the accuracy with the discounted errors is actually 90.2%.
But further undermining the validity of the purported accuracy figure is the following info, given to me by Nima when I was querying data. It concerns foods in the 2–20ppm range, of which there are a number in the study, using 15ppm as an example:
“For a 15ppm sample, such as the carrot currant donut, the donut is placed in gluten or gluten free condition depending on the Gluten Found or Smile result of the sensor. If the 15ppm donut is gluten found, then it will be >2ppm condition. But if the 15ppm donut is a Nima smile, then it will be <20ppm condition.”
Take a moment to absorb what is being admitted to here.
ALL samples in that critical 2–20ppm range were counted either as a true positive or a true negative, depending on what result the Nima gave.
And ALL true results counted positively towards the accuracy figure in the calculation shown above.
There is no way for samples in that 2–20ppm range to give a false result, by this standard. The Nima will ALWAYS be correct, if you set these parameters for ‘accuracy’, because both possible results are ‘accurate’. Had researchers simply selected samples within that range, the accuracy result according to their formula would have been 100%. The Nima would have no way of failing.
It’s not unlike asking a dice how good it is at providing you a whole number greater than 0 and less than 7, and giving it a point for accuracy once you’ve rolled it.
The analogy
If you’re struggling with this — and I did for some time — then imagine a proposed blood test for people under 20 (bear with me) to reveal whether the donor is a child or adult.
And then imagine it gives one of two results — not ‘child’ or ‘adult’, but instead ‘child’ or ‘teenager’.
A ‘child’ result is unambiguous, but a ‘teenager’ result doesn’t tell you whether it’s an adult teenager (ie 18 or 19) or a child teenager.
Now consider Nima. This is a test ostensibly to determine whether or not a food might be safe or unsafe for coeliacs. For that to be the case, it needs to distinguish between ‘gluten free’ and ‘not gluten free’.
But Nima doesn’t do that. It instead distinguishes between ‘gluten free’ and ‘gluten detected’.
But in the same way that you don’t know whether a ‘teenager’ result belongs to an adult or a child, you also don’t know whether a ‘gluten detected’ result belongs to a ‘gluten free’ or ‘not gluten free’ sample either.
And if you only tested teenagers aged 13 to 17, the fictional blood test will never be wrong — either possible result, either ‘child’ or ‘teenager’, will be correct, but may fail to tell you what you need.
That’s the Nima for you, people. At the critical data range of 2–20ppm, where it really matters, it will never be wrong by its own defined standards and binary result, but it may fail to tell you what you need to know — namely, that it is within or outside safe levels.
You may be satisfied with this, but consider that you could put ‘gluten detected’ and ‘gluten free’ on opposite sides of a coin, toss it, and get 100% accuracy, according to the standards described in that research paper, when applied to this range of foods.
If you want to pay for that, it’s your perogative.
The summary
It’s all about input and output.
What questions you ask of a test, and how you judge the answers you receive.
If you’re fairly lenient about the acceptability of the answers you receive, and choose questions to suit the outcome beneficial to you, then you will score well.
But this is bias, should it need spelling out.
Accepting either of the only two possible answers, and including samples which will give either of those two acceptable answers, is not a stringent test in my view.
If you take into account errors, and you remove outliers like 0ppm or a high-ppm food which are statistically unlikely to give a mistaken result due to the chemistry involved, and you instead concentrate on borderline foods in the, say, 5–40ppm range, and you set standards that ‘gluten detected’ means ‘not gluten free’ (which is how many users might interpret that result), then ‘accuracy’ figures would I imagine be modest. If pushed, I’d guess around 70%. Remember that any binary test will be correct 50% of the time due to chance.
I quite understand why Nima did not do this. And as it turns out, in 96.9%, Nima has an excellent, but imperfect, and hence more believable figure, which is being obligingly trotted out by unquestioning bloggers paid to promote the product, and sing the praises of its ‘accuracy’.
If you ask me whether or not it’s raining outside and I tell you it’s 18 degrees, that may well be true, but it does not tell you whether you need an umbrella.
Would that make me an ‘accurate’ weatherman? According to Nimalite logic, yes.
But you might conclude otherwise.
So, ‘gluten detected’ could mean under 20ppm OR under 2ppm, is that what you are saying here – sorry, got a bit lost!? And a smiley face means under 20ppm, correct?
Sorry – which bit did you get lost at and I’ll try to explain? It also depends on what you mean by ‘mean’!
Gluten detected / detectable could be 2ppm and above – which is an enormous range – but the ‘crunch’ zone, as we all know, is at the 2-20ppm – where there’s gluten, but there’s gluten-freeness too!
Well I suppose it depends if you believe the current accepted 20ppm level is ‘safe’ enough or not – but then you knew I would say that!! I’ve just read your other piece actually and understand it a bit more and, given the obvious issues, it is not something I will be recommending. Thanks for your very useful investigation of it. One thing I am looking at is Gluten Detect, if you’ve come across that? It is a stool test that looks for 33 mer peptides and can tell people if they have been ‘glutened’ in the last 2-7 days. Helps with monitoring and dietary compliance one assumes.
Well I’m going by the labelling law with respect to what is GF and judging it by those standards, which Nima has taken into account. I’ll be posting more on it so will touch upon other issues re: lower level gluten detection in coming weeks.
I have heard of Gluten Detect but not looked at it. Thanks for the nudge. I’ll add it to my ‘to do’.
Hi Alex, good article. Nima has other problems too, as I am sure you are aware – one is that it is based on a small food sample, so unless it is a smoothy or a liquidised meal, there is a risk that the user may not get a sufficient sample of all of the food to test. Secondly, as it uses light to detect a reaction, in highly coloured foods (containing mustard or turmeric, for example) it also will give false results and finally, it does not detect any fermented gluten-based products, such as soy sauce. So a typical person using it when eating in an Indian or Chinese restaurant, for example (which is very likely where it often will be used) will have not much better than chance as to whether they will get it right or not. I think what is more concerning is that they are developing peanut and milk sensors and whilst as a coeliac the worst that might happen short-term is that I could get unwell, with a peanut or milk allergy sufferer this could potentially be a life or death situation with anaphylaxis and certainly could not be relied on. Essentially, it is a nice idea, but the limitations of the testing process and accuracy of the sensor themselves make it not much better than a flip of a coin.
Thanks Chris – appreciate that. Some of the points you raise are touched upon in the article on Foods Matter (linked to near the beginning) and both them and others I will get to in subsequent posts here later this month or soon.
The peanut sensor is actually making waves currently in the US, but isn’t launched in the UK yet. Yes, there are obvious concerns, but I’ve not looked into that one yet, so will hold fire until I get the opportunity.