Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks[…].
Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees. They had used only 50 of each for the training set. The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly. Success confirmed! The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.
It turned out that in the researchers’ dataset, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest. [emphasis added]
While this story may be apocryphal, it nonetheless illustrates a common pitfall in machine learning: training on a proxy feature instead of the intended feature. In this case, cloudy vs. sunny instead of tank vs. no tank. As CNNs are increasingly used in critical applications, sound training can literally be a matter of life and death .
We developed Picasso to help protect against situations where evaluation metrics like loss and accuracy may not tell the whole story in training neural networks on image classification tasks. Picasso includes two visualizations so far: partial occlusion  and saliency mapping . The user may upload new input images and select from among installed visualizations and their attendant settings. Picasso was designed with ease of adding new visualizations in mind, detailed in the Implementation and architecture and Reuse Potential sections. At the time of this writing, Picasso has support for neural networks trained in Keras  or Tensorflow .
At Merantix, we work with a variety of neural network architectures. Picasso makes it easy to see standard visualizations across our models in various fields: including applications in automotive, such as understanding when road segmentation or object detection fail; advertisement, such as understanding why certain creatives receive higher click-through rates; and medical imaging, such as analyzing what regions in a CT or X-ray image contain irregularities. See Figure 1 for a screenshot of the Picasso application after computing partial occlusion maps for various images. The user has chosen to use the VGG16  model for image classification. This example is included with Picasso, along with a trained MNIST  model in both Keras and Tensorflow.
Other visualization packages exist to help bring transparency to the learning process, most notably the Deep Visualization Toolbox  and keras-vis , which can also generate saliency maps. There are also various applications for visualizing the computational graph itself and monitoring the evaluation metrics, like Tensorboard. Not all of these tools provide a web application out-of-the-box, however. We furthermore required an application that would easily allow us to add new visualizations, which may in the future include visualizations such as class activation mapping [17, 18] and image segmentation [19, 20].
Let us return to the tank example. Could the visualizations provided with Picasso have helped the Army researchers? We would like to be able to see that our model (VGG16) is classifying based on the “tank-like” features of the image, and not some proxy feature like the weather. See Figure 2 for the partial occlusion maps generated by Picasso. We see that when we occlude portions of the sky, the model still classifies the image as a tank. Conversely, when we occlude parts of the tank treads, the model is far less certain that the image is a tank.
That the model is classifying on the correct features is further supported by the saliency maps. Saliency maps compute the derivative of the classification for a given class with respect to the input image. Thus regions with high gradient–bright regions–are important to the given classification because changing them would change the classification more relative to other pixels. Figure 3 shows the saliency map for the tank image. Notice that with a few exceptions the non-tank areas are largely dark, which means changing these pixels should not make the image more or less “tanky.”
Picasso was written in Python 3.5 using the Flask web application framework. Visualization classes and HTML templates must be defined separately by the user, but do not require modifying any other source files to use. Picasso handles the uploading of user-supplied images and generates temporary folders containing input and output images. If the visualization class has a settings attribute, Picasso automatically renders the settings selection as a separate page.
Application-level settings are handled via a configuration file, where the user may specify the deep learning framework (Keras or Tensorflow) as well as the location of the checkpoint files for their chosen model. The user must also supply a function to preprocess the image (reshape the image into appropriate input dimensions) and decode the output of the model (provide class labels).
Picasso has unit tests written in the Pytest framework covering the web application functionality, and automatically tests that new visualizations render without errors. The GitHub repository performs continuous integration via Travis-CI. Test coverage is monitored with Codecov. A user can verify the software is working by starting the web application and pointing a web browser to 127.0.0.1:5000. In addition to docstrings and inline comments, extensive documentation is available on Read the Docs.
Any operating system capable of running Python 3.5 or higher.
Python >= 3.5.
These Python packages will be installed as part of the normal installation process: click >= 6.7, cycler >= 0.10.0, Flask >= 0.12, h5py >= 2.6.0, itsdangerous >= 0.24, Jinja2 >= 2.9.5, Keras >= 1.2.2, MarkupSafe >= 0.23, matplotlib >= 2.0.0, numpy >= 1.12.0, olefile >= 0.44, packaging >= 16.8, Pillow >= 4.0.0, protobuf >= 3.2.0, pyparsing >= 2.1.10, python-dateutil >= 2.6.0, pytz >= 2016.10, PyYAML >= 3.12, requests >= 2.13.0, scipy >= 0.18.1, six >= 1.10.0, tensorflow >= 1.0.0, Werkzeug >= 0.11.15.
Persistent identifier: https://github.com/merantix/picasso/tree/v0.1.1
Version published: v0.1.1
Date published: 12/05/17
Persistent identifier: https://github.com/merantix/picasso
Date published: 12/05/17
Persistent identifier: N/A
Date published: N/A
Any researcher or engineer working with a model in Tensorflow or Keras which takes images as inputs and gives classification probabilities as output can use Picasso with very little effort. Picasso does make some assumptions about the topology of the neural network, but developers can further adapt the Picasso framework to more specialized computational graphs with minimal changes to the code.
Picasso is specifically designed to make implementing new visualizations as painless as possible (see the visualization documentation). New visualization code can be added without modifying any other source code. We hope to add more visualizations as we continue to develop this tool internally, and especially hope for new community-developed visualizations.
The authors would like to thank the Merantix team for support during development and documentation. Also, thanks to David Dohan and Nader Al-Naji for helpful discussions in preparing this manuscript.
The authors have no competing interests to declare.
Warren, S M and Walter, P (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5(4): 115–133, DOI: https://doi.org/10.1007/BF02478259 ISSN 1522-9602.
Yann, L, Léon, B, Yoshua, B and Patrick, H (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11): 2278–2324, DOI: https://doi.org/10.1109/5.726791
Yann, L, Fu, J H and Leon, B (2004). Learning methods for generic object recognition with invariance to pose and lighting. Computer Vision and Pattern Recognition, CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, 2: II–104. 2004, IEEEDOI: https://doi.org/10.1109/CVPR.2004.1315150
Zhang, G P (2007). Avoiding pitfalls in neural network research. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37(1): 3–16, DOI: https://doi.org/10.1109/TSMCC.2006.876059
Chiyuan, Z, Samy, B, Moritz, H, Benjamin, R and Oriol, V (2016). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, URL: https://arxiv.org/abs/1611.03530.
The Tesla Team (2016). A tragic loss. URL: https://www.tesla.com/blog/tragic-loss. Accessed: 2017-5-12.
Matthew, D Z and Rob, F (2014). Visualizing and Understanding Convolutional Networks. Cham: Springer International Publishing, 978-3-319-10590-1pp. 818–833, DOI: https://doi.org/10.1007/978-3-319-10590-1
Karen, S, Andrea, V and Andrew, Z (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, URL: http://arxiv.org/abs/1312.6034.
Franois, C K (2015). URL: https://github.com/fchollet/keras.
Yann, L and Corinna, C (2010). MNIST handwritten digit database. URL: http://yann.lecun.com/exdb/mnist/.
Jason, Y, Jeff, C, Anh, N and Thomas, F (2015). Understanding neural networks through deep visualization. Deep Learning Workshop, International Conference on Machine Learning (ICML). URL: https://github.com/yosinski/deep-visualization-toolbox.
Raghavendra Kotikalapudi and contributors (2017). keras-vis. https://github.com/raghakot/keras-vis
Bolei, Z, Aditya, K, Agata, L, Aude, O and Antonio, T (2016). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 2921–2929. URL: http://cnnlocalization.csail.mit.edu/.
Ramprasaath, R S, Abhishek, D, Ramakrishna, V, Michael, C, Devi, P and Dhruv, B (2016). Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR, abs/1610.02391, URL: http://arxiv.org/abs/1610.02391.
Yi, Li; Haozhi, Q; Jifeng, D; Xiangyang, J; Yichen, W . (2016). Fully convolutional instance-aware semantic segmentation. CoRR, abs/1611.07709, URL: http://arxiv.org/abs/1611.07709.
Kaiming, H, Georgia, G, Piotr, D and Ross, B G (2017). Mask R-CNN. CoRR, abs/1703.06870, URL: http://arxiv.org/abs/1703.06870.
U.S. Army (). US Army operating Renault FT tanks. URL: https://en.wikipedia.org/wiki/Light_tank#/media/File:FT-17-argonne-1918.gif.