Nelson Correa, Ph.D.
Data Science, Machine Learning and Natural Language Processing
https://linkedin.com/in/ncorrea
Palm Beach Data Science Meetup West Palm Beach, FL
Aiming for 45 minutes of presentation and 10 - 15 minutes of Q&A.
1950s - R. Rosenblatt, W. McCoullogh, W. Pitts (perceptron); Hubel, Weisel (cats)
1960s - MIT AI: Minsky, Waltz, Winston, Marr; AI, Blocks World, Robotics
1970s - MIT AI & computer vision; E. Harth, Alopex; G. Hinton, connectionist networks
1980s - Symbolic AI & computer vision; Connectionism
1990s - Pattern recognition (PAMI), machine learning
2000s - Deep learning, OpenCV
2010-2019 - Rapid progress in deep learning
"The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. "
(https://en.wikipedia.org/wiki/Computer_vision#Recognition)
NOTE: Other CV and multi-modal CV tasks (captioning, Visual Question Answering)
... other
"ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images."
Source: http://image-net.org/about-stats
Perceptron (trainable; direct solutions or gradient descent). Too simple a learning function (cannot learn XOR; Minsky and Papert, 1969)
Multi-Layer Perceptron (MLP) and deep networks (backpropagation ~ chain rule)
Convolutional neural networks (CNNs) with a few layers (LeCun, 1989)
Parallel Distributed Processing
Improved deep network training (2011 ILSVRC visual recognition challenge)
Improved deep network models and modules (2011 - present)
Large image datasets, CV tasks
MNIST dataset
MNIST MLP
MNIST CNN
See notebook: MNIST_Digit_Classification.ipynb - html
Regularization of neural networks using dropconnect, Li Wan, Matthew D Zeiler, Sixin Zhang, Yann LeCun, and Rob Fergus, ICML 2013.
99.79% accuracy (0.39% error rate; 0.21% with ensembling)
Dynamic Routing Between Capsules, Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 2017 (https://arxiv.org/abs/1710.09829) 99.75% accuracy (0.25% error rate)
Simple 3-layer CNN above (total params, 130,890)
without any training regularization: acc: 99.15%
See notebook: VGG16_Image_Classification.ipynb - html
Image classification task
VGG16 image classification architecture
Pretrained and custom VGG16 image classifiers
An extension of image classification.
input image -> class probabilities (multinomial logistic regression)
(single object): input image ->
(multiple objects at different scales): input image -> plural image grid cells
See notebook: YOLOv3_Object_Detection.ipynb - html
Source: Ayoosh Kathuria, YOLOv3 YOLO: https://pjreddie.com/yolo/
ImageNet 2011 - 2017, human performance (95%) vs. super-human performance (98%) (Source: EFF AI Metrics)
Role of feedback in mammalian vision: a new hypothesis and a computational model P.S. Sastry, Shesha Shah, S. Singh, K.P. Unnikrishnan, Vision Research 39 (1999) 131–148, Elsevier.
Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks, Cao, Chunshui et al., IEEE ICCV, 2015. pdf github IEEE PAMI
Dynamic Routing Between Capsules, Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, 2017 (https://arxiv.org/abs/1710.09829) (https://github.com/XifengGuo/CapsNet-Keras)
Absolute MNIST error rate reduction from 0.21% (99.79% accuracy) to 0.17% (99.83% accuracy), Zhao et al., 2019.
Capsule Networks with Max-Min Normalization, Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan, 2019, (https://arxiv.org/abs/1903.09662)
NOTE: After 30 years, MNIST Test can no longer be considered a proper test set (it is instead a validation set).
Misclassified MNIST images using 3-model majority vote from CapsNets trained using Max-Min normalization
Source: Capsule Networks with Max-Min Normalization, Zhao et al., 2019
In this talk we have presented:
computer vision tasks of image classification and object detection
current image benchmark datasets (MNIST, Pascal VOC, ImageNet, MS-COCO)
the MLP and recent deep learning architectures
pre-trained, custom trained and used several deep learning architectures
Noted new developments (MNIST SOTA, capsules and feedback networks)
There are many impacts that computer vision and AI already have in society:
We would to thank the following for useful discussions and comments on this work:
Deep Learning with Python, François Chollet, 2017,
Manning Publications, (Chapter 5)
https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Aurélien Géron, 2017, 1st ed.
(second edition in October)
https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
Deep Learning, LeCun, Y., Bengio, Y. and Hinton, G. E., Nature, Vol. 521, 2015 (pdf)
Very Deep Convolutional Networks for Large-Scale Image Recognition, Karen Simonyan & Andrew Zisserman, Visual Geometry Group, Department of Engineering Science, University of Oxford, ICLR 2015. (https://arxiv.org/abs/1409.1556)
You Only Look Once: Unified, Real-Time Object Detection, Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi; University of Washington, Allen Institute for AI, Facebook AI Research, 2015 (https://pjreddie.com/yolo/)
Robust Real-time Object Detection, Paul Viola and Michael Jones, IJCV 2001
Learning OpenCV, Gary Bradsky and Adrian Kaehler, O'Reilly, 2008