In Which Instructions were Sought, Part 2

A Fire Under My Posterior

I confess, I lied a little in the last blog post. (Here it is, In Which Instructions were Sought, Part 1 (link).) Take a look. In that blog entry I talked about the kinds of deep learning courses available online, and which ones looked promising.

I had said that the listing of courses I had chosen had no particular order. Actually, it did. It was the order of my preferences for which classes to take. I wanted to take a class with a fee, to both motivate myself and to have proof that I finished it, but didn't want to go the path of the Udacity course with the high price without knowing that it had a significant better return on investment than the Coursera course. I also wanted an easy-to-access community of fellow students to discuss questions and issues. Thus the ranking of Coursea -> fast.ai -> Stanford videos -> Udacity. So I signed up for the Coursera deep learning specialization.

Coursera Deep Learning Specialization Review

Depth is in the eye of the beholder

Andrew Ng is an approachable instructor who provides clear and simple explanations of the main foundational algorithms used in deep learning. He does not cover the bleeding edge technology in this course, but an understanding of the basics is required before newer algorithms can be understood. He presents some of the math in a very simplified form so that not much of a refresher is needed, although if you are not familiar with matrix math and calculus, some introductory courses would be helpful. The courses has been running for a while, and the long history of the course means that if you have any questions, you can most likely find the answers in the forums. This actually may be too much of a shortcut sometimes, and you need to be careful to thoroughly think your questions through for better understanding, before jumping to the forum for a quick answer.

The courseload is similar to a light university course, with about 1.5 hours of lecture a week, and an assignment that takes a few hours to complete. The lectures are split into easily digestible chunks of video at about 5~15 minutes each, so that for those of us with no attention span (guilty... *raises hand*), we can force ourselves to sit through one video without wandering off.

There is a quiz after every week's videos. You have infinite attempts to get all the questions right, but you are limited to three attempts every eight hours. This is a good way to review the material for understanding of key concepts.

There is also typically one assignment every week where you get to implement the algorithms discussed during the lectures in a series of steps. The parts that you need to code yourself are well deliminated and limited in scope, confined to just the few lines needed, with the rest of the scaffolding already in place. Ample instructions are also provided, which go over again what needs to be implemented in each section in some detail, sometimes making it very easy to write the code without doing much thinking. Assignments have an automatic grader, which compares your outputs with the expected answer. You have infinite attempts to get it right here as well. All assignments are run as Jupyter notebooks on Google Colab, which provides free computing resources with some limitations.

Here are some quick summaries of the 5 courses that are part of the specialization:

______________________________________________________

Course 1: Neural Networks and Deep Learning (4 weeks)

Week 1: Introduction to Deep Learning (28 minutes of video)
Week 2: Logistic Regression as a Neural Network, Python and Vectorization (2.3 hours of video)
Week 3: Shallow Neural Network (1.5 hours of video)
Week 4: Deep Neural Network (1 hour of video)

Course 1 covers the basics of neural networks, including review of basic calculus and matrix math, derivations of the basic neural network equations, and the use of Python and NumPy for programming neural networks.

______________________________________________________

Course 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization (3 weeks)

Week 1: Setting up your Machine Learning Application, Regularizing your neural network, Setting up your optimization problem (1.7 hours of video)
Week 2: Optimization algorithms (1.2 hours of video)
Week 3: Hyperparameter tuning, Batch Normalization, Multi-class classification, Introduction to programming frameworks (1.6 hours of video)

Course 2 goes over the various knobs you can tweak to improve the performance of your neural network. A neural network is a many-dimensioned construct, and not just in the sense that the network itself being connected in a 3D fashion, but also in that there are many hyperparameters (parameters beyond the ones that the network learns itself through training) and optimization algorithms that can be adjusted or added to improve performance.

______________________________________________________

Course 3: Structuring Machine Learning Projects (2 weeks)

Week 1: Introduction to ML Strategy, Setting up your goal, Comparing to human-level performance (1.3 hours of video)
Week 2: Error Analysis, Mismatched training and dev/test set, Learning from multiple tasks, End-to-end deep learning (1.9 hours of video)

Course 3 explains how to figure out what you need to tweak to improve your neural network. There are a large number of parameters and hyperparameters, and if you try to tune each one, or randomly tune a parameter, you are likely to waste a lot of time for little return. This course teaches you what to look for and how to analyze your current results to figure out what could give the largest improvement to your network.

______________________________________________________

Course 4: Convolutional Neural Networks (4 weeks)

Week 1: Convolutional Neural Networks (1.8 hours of video)
Week 2: Case studies, Practical advice for using ConvNets (1.6 hours of video)
Week 3: Detection algorithms (1.3 hours of video)
Week 4: Face Recognition, Neural Style Transfer (1.2 hours of video)

Course 4 introduces convolution, which allows a neural network to generalize what it learns across the whole input. Without convolution, neurons can only learn from the small part of the input it is connected to. For example, if such a neural network is trained with images with faces only in the upper right corner, the neurons connected to the upper right corner may learn to recognize faces, but a face in the lower left corner would confuse the network. Conversely, a convolutional neural network can distill the essence of what it learns so that it applies across the entire image, and can therefore generalize better.

______________________________________________________

Course 5: Sequence Models (3 weeks)

Week 1: Recurrent Neural Networks (1.8 hours of video)
Week 2: Introduction to Word Embeddings, Learning Word Embeddings: Word2vec & GloVe, Applications using Word Embeddings (1.6 hours of video)
Week 3: Various sequence to sequence architectures, Speech recognition - Audio data (1.6 hours of video)

Course 5 discusses neural network models which can make inferences on data presented in sequence, such as a string of text or time series data (e.g. sensor data, stocks, audio). The dependency of such data on what came before (and sometimes after) makes a different structure necessary in order for the neural network to remember important features from data already processed so that other data would have more meaning and context.

With a Little Help

Some shaggy guys that also needed help once in a while, or so I've heard. (Wikipedia link)

To help me better understand and remember the material, I went old-school and took notes with pen-on-paper throughout the courses. And whenever I come across an algorithm I did not understand, I made sure to look up other sources on the internet, and crank through the equations myself until I understand the derivations. I highly recommend this way of studying for the visceral aspect of reshaping the knowledge into your own interpretation, which immensely helps with retention and understanding.

Having jumped head-first into this course, I needed some refreshers on a few points, alternate explanations on a few other points, and more detailed derivations for some equations. Below are several sites which helped with the points I struggled with and helped with my understanding:

A series of posts which described gradient descent, logistic regression, and overfitting which helped clarify the concepts: https://www.internalpointers.com/post/introduction-machine-learning

Matrix calculus refresher: https://explained.ai/matrix-calculus/index.html

The derivation of the derivative of the loss function was a bit of a sticking point for me. This page helped me to figure out the proper derivation: https://math.stackexchange.com/questions/2623822/how-to-get-the-loss-function-derivative

Here is a page explaining the derivation of the derivative of the cost function: https://math.stackexchange.com/questions/477207/derivative-of-cost-function-for-logistic-regression

This Python NumPy tutorial brought me up to speed on all the NumPy functions that were being called in the examples: https://cs231n.github.io/python-numpy-tutorial/

A cute set of highly condensed notes for the course from Tess Ferrandez, a data scientist from Microsoft: https://www.slideshare.net/TessFerrandez/notes-from-coursera-deep-learning-courses-by-andrew-ng

Wrapping It Up

You down with Entropy? (YouTube link)

Overall I am very satisfied with the course, and would recommend it to people who have a bit of computer science or artificial intelligence background. The material is definitely on the light side, with the subject covered in a basic manner and much of the math and theory just lightly touched upon. This is to be expected, as the courses were targeted more towards the general public than hardcore university courses where they can expect students to have a solid computer science and math background and can spend years studying the material in depth. The courses give a good starting foundation for learning more about neural networks, as after this specialization the common terms used in the field are now familiar, and new knowledge can then be built upon that.

A regrettable part is the lack of emphasis and practice for actually designing and deploying neural networks in a modern environment. This specialization is fairly theoretical, and implementations are closer to the metal than what is commonly used nowadays, such as with popular frameworks like Keras or PyTorch. I feel like I need more practical experience in the implementation of neural networks, so I will look for something to supplement what I have learned in these courses. I will follow up in the next blog (In Which Instructions were Sought, Part 3 (link)) and share what I found. Stay tuned!

Still Waters Run Deep Learning