When viral marketing goes too far
A recent post which claims to detect COVID-19 using a deep neural network with a very high accuracy gains keen mediatic interest.
An Australian PhD candidate in artificial intelligence made a recent post on LinkedIn about his researches on SARS-CoV-2. The post gathered thousands of views, likes, and shares.
He built a Deep Learning model which is able to predict from chest radiographs with a 97.5% accuracy whether a patient is infected with the COVID-19 virus or not.
As it stands, the project features:
Everything above was built in one rough week.
Deep convolutional networks have potential benefits towards disease diagnosis and treatment. Many scientific publications have emerged in the recent years , here are a few of them:
In 2016, a group of London researchers published a method for diagnosis of diabetic retinopathy with 86% accuracy, trained on a dataset of 80,000 fundus photographs .
In the same year, Ugandan researchers evaluated the performance of CNNs on microscopic blood smears using a dataset of 10,000 objects .
An effort to classify lung nodules was led by two Japanese researchers on a dataset of 550,000 CT scans .
But here, a quick glimpse into the GitHub repository depicts at best an acute lack of understanding in deep learning and AI, and at worst a vicious attempt at getting self-promotion while capitalizing on the pandemic. Here is why.
The latent neural representation of these networks is very complex, hence they require a lot of training samples, as depicted in the aforementioned studies.
As of now, the COVID-19 detector was trained on a dataset of... 30 images!
Fig. 1: The model learns from chest X-Ray
For a network that has more than 150 layers and over 20 million parameters, this is completly absurd. This approach was debunked in the following Reddit thread.
Moreover, there is a huge data bias. The 30 pictures are not labeled based on whether an individual had the virus or not, they are labeled based on lung damage for acute cases of COVID-19. Unless your lungs are already wrecked by the virus, the model has no way to detect an infection. In the case where a person presents symptoms of pneumonia, if those symptoms are not acute the accuracy of this model is unproven.
Finally, the COVID model is based on a popular baseline network, the ResNet-50. While this an usual approach for image recognition and classification, the ResNet was pre-trained using photos of everyday objects. Thus, the internal representation of its hidden layers is activated with geometrical forms and colorful patterns (2).
Fig. 2: Visualization of ResNet features 
Such patterns are nowhere to be found in radiographs. This is why most of the medical neural networks are made from scratch.
Many other problems appear when we take a closer look at the code repository. Training, validation and testing datasets contain duplicate images; most of the training process has been taken from a PyTorch tutorial obfuscated with unnecessary code; the Github issues are ridiculous...
Clearly, that post was destined for thousands of likes, shares and views, no matter what the content behind the title was.
Yet, the author doesn't despair when faced the truth, and often comes forward with the following answer:
«Hi xxx, we have curated 5000 with the support of radiologist from a Research Institute in Canada »
I don't know the part of truth in this bold answer, but if such a model is used as-is for a medical application, it can be very dangerous.
The author even created a Slack group with multiple channels. Needless to say, it gathered a lot of interest.
The #datascientists channel doesn't have a lot of serious content and is punctuated by enthusiastic newcomers with a lot of hope but very little experience. Similarly, the only tangible content in the #doctors channel comes from professionalS addressing medical issues, for instance that chest X-Rays isn't the recommanded approach for a COVID-19 diagnosis. Finally, the #researchers channel is almost empty.
On the other hand, the UI/UX channels are generating a lot of content. The initiative now has 5 different logos, and a mockup for both a mobile and a web interface.
There is even a #marketing channel to find ways to enhance communication and raise funds and a #sponsors channels with potential investors asking about future prospects of return on investment.
See by yourself
Deep learning is not a silver bullet solution. Many unprepared companies who tried to internalize it into data squads went nuts after they saw their cost rising while little or nothing was going into production.
Still, advances in AI are groundbreaking nowadays. One would be crazy to ignore them completly.
That doesn't mean diving straight into the pool and flailing around in the water gasping for some air. Hence the importance of a rock solid team with transversal skills in AI / ML, DataOps, architecture, development and many others topics.
 Deep Learning for Medical Image Processing: Overview, Challenges and Future, Muhammad Imran Razzak, Saeeda Naz, Ahmad Zaib, 2017.
 Harry Pratt, Frans Coenen, Deborah M Broadbent, Simon P Harding, and Yalin Zheng. Convolutional neural networks for diabetic retinopathy, 2016.
 John A Quinn, Rose Nakasi, Pius KB Mugagga, Patrick Byanyima, William Lubega, and Alfred Andama. Deep convolutional neural networks for microscopy-based point of care diagnostics, 2016.
 Masaharu Sakamoto and Hiroki Nakano. Cascaded neural networks with selective classifiers and its evaluation using lung x-ray ct images, 2016.
 Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015.
 Brian Chu, Daylen Yang, Ravi Tadinada. Visualizing Residual Networks, 2017.
Antoine Champion, Mar. 23th 2020