For quick searching
Course can be found here
Video in YouTube
Lecture Slides can be found in my Github(PDF version)
If you want to break into AI, this Specialization will help you do so. Deep Learning is one of the most highly sought after skills in tech. We will help you become good at Deep Learning.
In five courses, you will learn the foundations of Deep Learning, understand how to build neural networks, and learn how to lead successful machine learning projects. You will learn about Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization, and more. You will work on case studies from healthcare, autonomous driving, sign language reading, music generation, and natural language processing. You will master not only the theory, but also see how it is applied in industry. You will practice all these ideas in Python and in TensorFlow, which we will teach.
You will also hear from many top leaders in Deep Learning, who will share with you their personal stories and give you career advice.
AI is transforming multiple industries. After finishing this specialization, you will likely find creative ways to apply it to your work.
We will help you master Deep Learning, understand how to apply it, and build a career in AI.
Neural Networks and Deep Learning
Course can be found here
Lecture slides can be found here
About this course: If you want to break into cuttingedge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities. Deep learning is also a new “superpower” that will let you build AI systems that just weren’t possible a few years ago.
In this course, you will learn the foundations of deep learning. When you finish this class, you will:
 Understand the major technology trends driving Deep Learning
 Be able to build, train and apply fully connected deep neural networks
 Know how to implement efficient (vectorized) neural networks
 Understand the key parameters in a neural network’s architecture
This course also teaches you how Deep Learning actually works, rather than presenting only a cursory or surfacelevel description. So after completing it, you will be able to apply deep learning to a your own applications. If you are looking for a job in AI, after this course you will also be able to answer basic interview questions.
This is the first course of the Deep Learning Specialization.
Who is this class for: Prerequisites: Expected:  Programming: Basic Python programming skills, with the capability to work effectively with data structures. Recommended:  Mathematics: Matrix vector operations and notation.  Machine Learning: Understanding how to frame a machine learning problem, including how data is represented will be beneficial. If you have taken my Machine Learning Course here, you have much more than the needed level of knowledge.
Week 1 Introduction to deep learning
Be able to explain the major trends driving the rise of deep learning, and understand where and how it is applied today.
Learning Objectives
Understand the major trends driving the rise of deep learning.
Be able to explain how deep learning is applied to supervised learning.
Understand what are the major categories of models (such as CNNs and RNNs), and when they should be applied.
Be able to recognize the basics of when deep learning will (or will not) work well.
Welcome to the Deep Learning Specialization
Welcome5 min
Introduction to Deep Learning
What is a neural network?7 min
Supervised Learning with Neural Networks8 min
Why is Deep Learning taking off?10 min
About this Course2 min
Frequently Asked Questions10 min
Course Resources1 min
How to use Discussion Forums10 min
Practice Questions
Quiz: Introduction to deep learning10 questions
QUIZ
Introduction to deep learning
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
September 17, 11:59 PM PDT
1 point
1.What does the analogy “AI is the new electricity” refer to?
Correct
Yes. AI is transforming many fields from the car industry to agriculture to supplychain…
1 point
2.Which of these are reasons for Deep Learning recently taking off? (Check the three options that apply.)
Correct
These were all examples discussed in lecture 3.
Unselected is correct
Correct
Yes! The digitalization of our society has played a huge role in this.
Correct
Yes! The development of hardware, perhaps especially GPU computing, has significantly improved deep learning algorithms’ performance.
1 point
3.Recall this diagram of iterating over different ML ideas. Which of the statements below are true? (Check all that apply.)
1 point
4.When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models. True/False?
This should not be selected
No. Finding the characteristics of a model is key to have good performance. Although experience can help, it requires multiple iterations to build a good model.
1 point
5.Which one of these plots represents a ReLU activation function?
1 point
6.Images for cat recognition is an example of “structured” data, because it is represented as a structured array in a computer. True/False?
This should not be selected
No. Images for cat recognition is an example of “unstructured” data.
1 point
7.A demographic dataset with statistics on different cities’ population, GDP per capita, economic growth is an example of “unstructured” data because it contains data coming from different sources. True/False?
1 point
8.Why is an RNN (Recurrent Neural Network) used for machine translation, say translating English to French? (Check all that apply.)
Correct
Yes. We can train it on many pairs of sentences x (English) and y (French).
This should not be selected
No. RNN and CNN are two distinct classes of models, with their own advantages and disadvantages.
Correct
Yes. An RNN can map from a sequence of english words to a sequence of french words.
1 point
9.In this diagram which we handdrew in lecture, what do the horizontal axis (xaxis) and vertical axis (yaxis) represent?
1 point
10.Assuming the trends described in the previous question’s figure are accurate (and hoping you got the axis labels right), which of the following are true? (Check all that apply.)
Heroes of Deep Learning (Optional)
Geoffrey Hinton interview40 min
Week 2 Neural Networks Basics
Learn to set up a machine learning problem with a neural network mindset. Learn to use vectorization to speed up your models.
Learning Objectives
Build a logistic regression model, structured as a shallow neural network
Implement the main steps of an ML algorithm, including making predictions, derivative computation, and gradient descent.
Implement computationally efficient, highly vectorized, versions of models.
Understand how to compute derivatives for logistic regression, using a backpropagation mindset.
Become familiar with Python and Numpy
Work with iPython Notebooks
Be able to implement vectorization across multiple training examples
Logistic Regression as a Neural Network
Binary Classification8 min
Logistic Regression5 min
Logistic Regression Cost Function8 min
Gradient Descent11 min
Derivatives7 min
More Derivative Examples10 min
Computation graph3 min
Derivatives with a Computation Graph14 min
Logistic Regression Gradient Descent6 min
Gradient Descent on m Examples8 min
Python and Vectorization
Vectorization8 min
More Vectorization Examples6 min
Vectorizing Logistic Regression7 min
Vectorizing Logistic Regression’s Gradient Output9 min
Broadcasting in Python11 min
A note on python/numpy vectors6 min
Quick tour of Jupyter/iPython Notebooks3 min
Explanation of logistic regression cost function (optional)7 min
Practice Questions
Quiz: Neural Network Basics10 questions
QUIZ
Neural Network Basics
10 questions
To Pass80% or higher
Deadline
September 24, 11:59 PM PDT
1 point
1.What does a neuron compute?
1 point
2.Which of these is the “Logistic Loss”?
1 point
3.Suppose img is a (32,32,3) array, representing a 32x32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?
1 point
4.Consider the two following random arrays “a” and “b”:


What will be the shape of “c”?
1 point
5.Consider the two following random arrays “a” and “b”:


What will be the shape of “c”?
1 point
6.Suppose you have $$n_x$$ input features per example. Recall that $$X = [x^{(1)} x^{(2)} … x^{(m)}]$$. What is the dimension of X?
1 point
7.Recall that “np.dot(a,b)” performs a matrix multiplication on a and b, whereas “a*b” performs an elementwise multiplication.
Consider the two following random arrays “a” and “b”:


What is the shape of c?
1 point
8.Consider the following code snippet:


How do you vectorize this?
1 point
9.Consider the following code:


What will be c? (If you’re not sure, feel free to run this in python to find out).
1 point
10.Consider the following computation graph.
What is the output J?
Programming Assignments
Deep Learning Honor Code2 min
Deep Learning Honor Code
We strongly encourage students to form study groups, and discuss the lecture videos (including invideo questions). We also encourage you to get together with friends to watch the videos together as a group. However, the answers that you submit for the review questions should be your own work. For the programming exercises, you are welcome to discuss them with other students, discuss specific algorithms, properties of algorithms, etc.; we ask only that you not look at any source code written by a different student, nor show your solution code to other students.
You are also not allowed to post your code publicly on github.
Programming Assignment FAQ10 min
Python Basics with numpy (optional)1h
Python Basics with numpy (optional)
Welcome to your first (Optional) programming exercise of the deep learning specialization. In this assignment you will:
Learn how to use numpy.
Implement some basic core deep learning functions such as the softmax, sigmoid, dsigmoid, etc…
Learn how to handle data by normalizing inputs and reshaping images.
Recognize the importance of vectorization.
Understand how python broadcasting works.
This assignment prepares you well for the upcoming assignment. Take your time to complete it and make sure you get the expected outputs when working through the different exercises. In some code blocks, you will find a “#GRADED FUNCTION: functionName” comment. Please do not modify it. After you are done, submit your work and check your results. You need to score 70% to pass. Good luck :) !
Open Notebook
Practice Programming Assignment: Python Basics with numpy (optional)1h
Logistic Regression with a Neural Network mindset2h
Logistic Regression with a Neural Network mindset
Welcome to the first (required) programming exercise of the deep learning specialization. In this notebook you will build your first image recognition algorithm. You will build a cat classifier that recognizes cats with 70% accuracy!
As you keep learning new techniques you will increase it to 80+ % accuracy on cat vs. noncat datasets. By completing this assignment you will:
Work with logistic regression in a way that builds intuition relevant to neural networks.
Learn how to minimize the cost function.
Understand how derivatives of the cost are used to update parameters.
Take your time to complete this assignment and make sure you get the expected outputs when working through the different exercises. In some code blocks, you will find a “#GRADED FUNCTION: functionName” comment. Please do not modify these comments. After you are done, submit your work and check your results. You need to score 70% to pass. Good luck :) !
Open Notebook
Programming Assignment: Logistic Regression with a Neural Network mindset
Heroes of Deep Learning (Optional)
Pieter Abbeel interview16 min
Week 3 Shallow neural networks
Learn to build a neural network with one hidden layer, using forward propagation and backpropagation.
Learning Objectives
Understand hidden units and hidden layers
Be able to apply a variety of activation functions in a neural network.
Build your first forward and backward propagation with a hidden layer
Apply random initialization to your neural network
Become fluent with Deep Learning notations and Neural Network Representations
Build and train a neural network with one hidden layer.
Shallow Neural Network
Neural Networks Overview4 min
Neural Network Representation5 min
Computing a Neural Network’s Output9 min
Vectorizing across multiple examples9 min
Explanation for Vectorized Implementation7 min
Activation functions10 min
Why do you need nonlinear activation functions?5 min
Derivatives of activation functions7 min
Gradient descent for Neural Networks9 min
Backpropagation intuition (optional)15 min
Random Initialization7 min
Practice Questions
Quiz: Shallow Neural Networks10 questions
QUIZ
Shallow Neural Networks
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
October 1, 11:59 PM PDT
1 point
1.Which of the following are true? (Check all that apply.)
1 point
2.The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?
1 point
3.Which of these is a correct vectorized implementation of forward propagation for layer l, where 1≤l≤L?
1 point
4.You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
1 point
5.Consider the following code:


What will be B.shape? (If you’re not sure, feel free to run this in python to find out).
1 point
6.Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
1 point
7.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
1 point
8.You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?
1 point
9.Consider the following 1 hidden layer neural network:
Which of the following statements are True? (Check all that apply).
1 point
10.In the same network as the previous question, what are the dimensions of Z[1] and A[1]?
Programming Assignment
Planar data classification with a hidden layer2h 30m
Programming Assignment: Planar data classification with a hidden layer
Heroes of Deep Learning (Optional)
Ian Goodfellow interview14 min
Week 4 Deep Neural Networks
Understand the key computations underlying deep learning, use them to build and train deep neural networks, and apply it to computer vision.
 See deep neural networks as successive blocks put one after each other
 Build and train a deep Llayer Neural Network
 Analyze matrix and vector dimensions to check neural network implementations.
 Understand how to use a cache to pass information from forward propagation to back propagation.
 Understand the role of hyperparameters in deep learning
Deep Neural Network
Deep Llayer neural network5 min
Forward Propagation in a Deep Network7 min
Getting your matrix dimensions right11 min
Why deep representations?10 min
Building blocks of deep neural networks8 min
Forward and Backward Propagation10 min
Parameters vs Hyperparameters7 min
What does this have to do with the brain?3 min
Practice Questions
Quiz: Key concepts on Deep Neural Networks10 questions
QUIZ
Key concepts on Deep Neural Networks
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 5, 11:59 PM PST
1 point
1.What is the “cache” used for in our implementation of forward propagation and backward propagation?
We use it to pass variables computed during backward propagation to the corresponding forward propagation step. It contains useful values for forward propagation to compute activations.
It is used to cache the intermediate values of the cost function during training.
It is used to keep track of the hyperparameters that we are searching over, to speed up computation.
We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.
1 point
2.Among the following, which ones are “hyperparameters”? (Check all that apply.)
learning rate α
number of iterations
bias vectors b[l]
number of layers L in the neural network
activation values a[l]
weight matrices W[l]
size of the hidden layers n[l]
1 point
3.Which of the following statements is true?
The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.
The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.
1 point
4.Vectorization allows you to compute forward propagation in an Llayer neural network without an explicit forloop (or any other explicit iterative loop) over the layers l=1, 2, …,L. True/False?
1 point
5.Assume we store the values for n[l] in an array called layers, as follows: layer_dims = [nx, 4,3,2,1]. So layer 1 has four hidden units, layer 2 has 3 hidden units and so on. Which of the following forloops will allow you to initialize the parameters for the model?








1 point
6.Consider the following neural network.
How many layers does this network have?
The number of layers L is 4. The number of hidden layers is 3.
The number of layers L is 3. The number of hidden layers is 3.
The number of layers L is 4. The number of hidden layers is 4.
The number of layers L is 5. The number of hidden layers is 4.
1 point
7.During forward propagation, in the forward function for a layer l you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer l, since the gradient depends on it. True/False?
1 point
8.There are certain functions with the following properties:
(i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?
1 point
9.Consider the following 2 hidden layer neural network:
Which of the following statements are True? (Check all that apply).
W[1] will have shape (4, 4)
b[1] will have shape (4, 1)
W[1] will have shape (3, 4)
b[1] will have shape (3, 1)
W[2] will have shape (3, 4)
b[2] will have shape (1, 1)
W[2] will have shape (3, 1)
b[2] will have shape (3, 1)
W[3] will have shape (3, 1)
b[3] will have shape (1, 1)
W[3] will have shape (1, 3)
b[3] will have shape (3, 1)
1 point
10.Whereas the previous question used a specific network, in the general case what is the dimension of W^{[l]}, the weight matrix associated with layer l?
W[l] has shape (n[l−1],n[l])
W[l] has shape (n[l],n[l+1])
W[l] has shape (n[l+1],n[l])
W[l] has shape (n[l],n[l−1])
Programming Assignments
Building your Deep Neural Network: Step by Step2h 30m
Programming Assignment: Building your deep neural network: Step by Step
Deep Neural Network  Application1h
Programming Assignment: Deep Neural Network Application
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
Course can be found here
Lecture slides can be found here
About this course: This course will teach you the “magic” of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good results. You will also learn TensorFlow.
After 3 weeks, you will:
 Understand industry bestpractices for building deep learning applications.
 Be able to effectively use the common neural network “tricks”, including initialization, L2 and dropout regularization, Batch normalization, gradient checking,
 Be able to implement and apply a variety of optimization algorithms, such as minibatch gradient descent, Momentum, RMSprop and Adam, and check for their convergence.
 Understand new bestpractices for the deep learning era of how to set up train/dev/test sets and analyze bias/variance
 Be able to implement a neural network in TensorFlow.
This is the second course of the Deep Learning Specialization.
Who is this class for: This class is for:  Learners that took the first course of the specialization: “Neural Networks and Deep Learning”  Anyone that already understands fullyconnected neural networks, and wants to learn the practical aspects of making them work well.
Week 1 Practical aspects of Deep Learning
Learning Objectives
Recall that different types of initializations lead to different results
Recognize the importance of initialization in complex neural networks.
Recognize the difference between train/dev/test sets
Diagnose the bias and variance issues in your model
Learn when and how to use regularization methods such as dropout or L2 regularization.
Understand experimental issues in deep learning such as Vanishing or Exploding gradients and learn how to deal with them
Use gradient checking to verify the correctness of your backpropagation implementation
Setting up your Machine Learning Application
Train / Dev / Test sets12 min
Bias / Variance8 min
Basic Recipe for Machine Learning6 min
Regularizing your neural network
Regularization9 min
Why regularization reduces overfitting?7 min
Dropout Regularization9 min
Understanding Dropout7 min
Other regularization methods8 min
Setting up your optimization problem
Normalizing inputs5 min
Vanishing / Exploding gradients6 min
Weight Initialization for Deep Networks6 min
Numerical approximation of gradients6 min
Gradient checking6 min
Gradient Checking Implementation Notes5 min
Practice Questions
Quiz: Practical aspects of deep learning10 questions
QUIZ
Practical aspects of deep learning
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
October 15, 11:59 PM PDT
1 point
1.If you have 10,000,000 examples, how would you split the train/dev/test set?
60% train . 20% dev . 20% test
98% train . 1% dev . 1% test
33% train . 33% dev . 33% test
1 point
2.The dev and test set should:
Come from the same distribution
Come from different distributions
Be identical to each other (same (x,y) pairs)
Have the same number of examples
1 point
3.If your Neural Network model seems to have high variance, what of the following would be promising things to try?
Add regularization
Get more training data
Increase the number of units in each hidden layer
Get more test data
Make the Neural Network deeper
1 point
4.You are working on an automated checkout kiosk for a supermarket, and are building a classifier for apples, bananas and oranges. Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply.)
Increase the regularization parameter lambda
Decrease the regularization parameter lambda
Get more training data
Use a bigger neural network
1 point
5.What is weight decay?
A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration.
A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights.
The process of gradually decreasing the learning rate during training.
Gradual corruption of the weights in the neural network if it is trained on noisy data.
1 point
6.What happens when you increase the regularization hyperparameter lambda?
Weights are pushed toward becoming smaller (closer to 0)
Weights are pushed toward becoming bigger (further from 0)
Doubling lambda should roughly result in doubling the weights
Gradient descent taking bigger steps with each iteration (proportional to lambda)
1 point
7.With the inverted dropout technique, at test time:
You apply dropout (randomly eliminating units) and do not keep the 1/keep_prob factor in the calculations used in training
You apply dropout (randomly eliminating units) but keep the 1/keep_prob factor in the calculations used in training.
You do not apply dropout (do not randomly eliminate units), but keep the 1/keep_prob factor in the calculations used in training.
You do not apply dropout (do not randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training
1 point
8.Increasing the parameter keep_prob from (say) 0.5 to 0.6 will likely cause the following: (Check the two that apply)
Increasing the regularization effect
Reducing the regularization effect
Causing the neural network to end up with a higher training set error
Causing the neural network to end up with a lower training set error
1 point
9.Which of these techniques are useful for reducing variance (reducing overfitting)? (Check all that apply.)
Dropout
Data augmentation
Vanishing gradient
Xavier initialization
Gradient Checking
Exploding gradient
L2 regularization
1 point
10.Why do we normalize the inputs x?
It makes the parameter initialization faster
It makes the cost function faster to optimize
It makes it easier to visualize the data
Normalization is another word for regularization–It helps to reduce variance
Programming assignments
Initialization1h
Programming Assignment: Initialization
Regularization1h 30m
Programming Assignment: Regularization1h
https://www.coursera.org/learn/deepneuralnetwork/programming/SXQaI
Gradient Checking1h
Programming Assignment: Gradient Checking1h
https://www.coursera.org/learn/deepneuralnetwork/programming/n6NBD
Heroes of Deep Learning (Optional)
Yoshua Bengio interview25 min
Week 2 Optimization algorithms
Learning Objectives
 Remember different optimization methods such as (Stochastic) Gradient Descent, Momentum, RMSProp and Adam
 Use random minibatches to accelerate the convergence and improve the optimization
 Know the benefits of learning rate decay and apply it to your optimization
Optimization algorithms
Minibatch gradient descent11 min
Understanding minibatch gradient descent11 min
Exponentially weighted averages5 min
Understanding exponentially weighted averages9 min
Bias correction in exponentially weighted averages4 min
Gradient descent with momentum9 min
RMSprop7 min
Adam optimization algorithm7 min
Learning rate decay6 min
The problem of local optima5 min
Practice Questions
Quiz: Optimization algorithms10 questions
QUIZ
Optimization algorithms
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
October 22, 11:59 PM PDT
1 point
1.Which notation would you use to denote the 3rd layer’s activations when the input is the 7th example from the 8th minibatch?
a[8]{3}(7)
a[8]{7}(3)
a[3]{8}(7)
a[3]{7}(8)
1 point
2.Which of these statements about minibatch gradient descent do you agree with?
You should implement minibatch gradient descent without an explicit forloop over different minibatches, so that the algorithm processes all minibatches at the same time (vectorization).
Training one epoch (one pass through the training set) using minibatch gradient descent is faster than training one epoch using batch gradient descent.
One iteration of minibatch gradient descent (computing on a single minibatch) is faster than one iteration of batch gradient descent.
1 point
3.Why is the best minibatch size usually not 1 and not m, but instead something inbetween?
If the minibatch size is 1, you end up having to process the entire training set before making any progress.
If the minibatch size is 1, you lose the benefits of vectorization across examples in the minibatch.
If the minibatch size is m, you end up with batch gradient descent, which has to process the whole training set before making progress.
If the minibatch size is m, you end up with stochastic gradient descent, which is usually slower than minibatch gradient descent.
1 point
4.Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like this:
Which of the following do you agree with?
If you’re using minibatch gradient descent, something is wrong. But if you’re using batch gradient descent, this looks acceptable.
If you’re using minibatch gradient descent, this looks acceptable. But if you’re using batch gradient descent, something is wrong.
Whether you’re using batch gradient descent or minibatch gradient descent, this looks acceptable.
Whether you’re using batch gradient descent or minibatch gradient descent, something is wrong.
1 point
5.Suppose the temperature in Casablanca over the first three days of January are the same:
Jan 1st: θ1=10oC
Jan 2nd: θ210oC
(We used Fahrenheit in lecture, so will use Celsius here in honor of the metric world.)
Say you use an exponentially weighted average with β=0.5 to track the temperature: v0=0, vt=βvt−1+(1−β)θt. If v2 is the value computed after day 2 without bias correction, and vcorrected2 is the value you compute with bias correction. What are these values? (You might be able to do this without a calculator, but you don’t actually need one. Remember what is bias correction doing.)
v2=10, vcorrected2=10
v2=7.5, vcorrected2=7.5
v2=10, vcorrected2=7.5
v2=7.5, vcorrected2=10
1 point
6.Which of these is NOT a good learning rate decay scheme? Here, t is the epoch number.
α=1t√α0
α=11+2∗tα0
α=etα0
α=0.95tα0
1 point
7.You use an exponentially weighted average on the London temperature dataset. You use the following to track the temperature: vt=βvt−1+(1−β)θt. The red line below was computed using β=0.9. What would happen to your red curve as you vary β? (Check the two that apply)
Decreasing β will shift the red line slightly to the right.
Increasing β will shift the red line slightly to the right.
Decreasing β will create more oscillation within the red line.
Increasing β will create more oscillations within the red line.
1 point
8.Consider this figure:
These plots were generated with gradient descent; with gradient descent with momentum (β = 0.5) and gradient descent with momentum (β = 0.9). Which curve corresponds to which algorithm?
(1) is gradient descent. (2) is gradient descent with momentum (large β) . (3) is gradient descent with momentum (small β)
(1) is gradient descent with momentum (small β), (2) is gradient descent with momentum (small β), (3) is gradient descent
(1) is gradient descent. (2) is gradient descent with momentum (small β). (3) is gradient descent with momentum (large β)
(1) is gradient descent with momentum (small β). (2) is gradient descent. (3) is gradient descent with momentum (large β)
1 point
9.Suppose batch gradient descent in a deep network is taking excessively long to find a value of the parameters that achieves a small value for the cost function J(W[1],b[1],…,W[L],b[L]). Which of the following techniques could help find parameter values that attain a small value forJ? (Check all that apply)
Try better random initialization for the weights
Try tuning the learning rate α
Try minibatch gradient descent
Try using Adam
Try initializing all the weights to zero
1 point
10.Which of the following statements about Adam is False?
Adam combines the advantages of RMSProp and momentum
Adam should be used with batch gradient computations, not with minibatches.
We usually use “default” values for the hyperparameters β1,β2 and ε in Adam (β1=0.9, β2=0.999, ε=10−8)
The learning rate hyperparameter α in Adam usually needs to be tuned.
Programming assignment
Optimization2h
Programming Assignment: Optimization30 min
Heroes of Deep Learning (Optional)
Yuanqing Lin interview13 min
Week 3 Hyperparameter tuning, Batch Normalization and Programming Frameworks
Master the process of hyperparameter tuning
Hyperparameter tuning
Tuning process7 min
Using an appropriate scale to pick hyperparameters8 min
Hyperparameters tuning in practice: Pandas vs. Caviar6 min
Batch Normalization
Normalizing activations in a network8 min
Fitting Batch Norm into a neural network12 min
Why does Batch Norm work?11 min
Batch Norm at test time5 min
Multiclass classification
Softmax Regression11 min
Training a softmax classifier10 min
Introduction to programming frameworks
Deep learning frameworks4 min
TensorFlow16 min
Practice Questions
Quiz:Hyperparameter tuning, Batch Normalization, Programming Frameworks10 questions
QUIZ
Hyperparameter tuning, Batch Normalization, Programming Frameworks
10 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
October 29, 11:59 PM PDT
1 point
1.If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?
1 point
2.Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?
1 point
3.During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:
Whether you use batch or minibatch optimization
The presence of local minima (and saddle points) in your neural network
The amount of computational power you can access
The number of hyperparameters you have to tune
1 point
4.If you think β (hyperparameter for momentum) is between on 0.9 and 0.99, which of the following is the recommended way to sample a value for beta?








1 point
5.Finding good hyperparameter values is very timeconsuming. So typically you should do it once at the start of the project, and try to find very good hyperparameters so that you don’t ever have to revisit tuning them again. True or false?
1 point
6.In batch normalization as presented in the videos, if you apply it on the lth layer of your neural network, what are you normalizing?
z[l]
b[l]
W[l]
a[l]
1 point
7.In the normalization formula z(i)norm=z(i)−μσ2+ε√, why do we use epsilon?
To speed up convergence
To avoid division by zero
To have a more accurate normalization
In case μ is too small
1 point
8.Which of the following statements about γ and β in Batch Norm are true?
There is one global value of γ∈R and one global value of β∈R for each layer, and applies to all the hidden units in that layer.
β and γ are hyperparameters of the algorithm, which we tune via random sampling.
They set the mean and variance of the linear variable z[l] of a given layer.
They can be learned using Adam, Gradient descent with momentum, or RMSprop, not just with gradient descent.
The optimal values are γ=σ2+ε−−−−−√, and β=μ.
1 point
9.After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should:
Perform the needed normalizations, use μ and σ2 estimated using an exponentially weighted average across minibatches seen during training.
Skip the step where you normalize using μ and σ2 since a single test example cannot be normalized.
Use the most recent minibatch’s value of μ and σ2 to perform the needed normalizations.
If you implemented Batch Norm on minibatches of (say) 256 examples, then to evaluate on one test example, duplicate that example 256 times so that you’re working with a minibatch the same size as during training.
1 point
10.Which of these statements about deep learning programming frameworks are true? (Check all that apply)
Even if a project is currently open source, good governance of the project helps ensure that the it remains open even in the long term, rather than become closed or modified to benefit only one company.
Deep learning programming frameworks require cloudbased machines to run.
A programming framework allows you to code up deep learning algorithms with typically fewer lines of code than a lowerlevel language such as Python.
Programming assignment
Tensorflow3h
Programming Assignment:Tensorflow
Structuring Machine Learning Projects
About this course: You will learn how to build a successful machine learning project. If you aspire to be a technical leader in AI, and know how to set direction for your team’s work, this course will show you how.
Much of this content has never been taught elsewhere, and is drawn from my experience building and shipping many deep learning products. This course also has two “flight simulators” that let you practice decisionmaking as a machine learning project leader. This provides “industry experience” that you might otherwise get only after years of ML work experience.
After 2 weeks, you will:
 Understand how to diagnose errors in a machine learning system, and
 Be able to prioritize the most promising directions for reducing error
 Understand complex ML settings, such as mismatched training/test sets, and comparing to and/or surpassing humanlevel performance
 Know how to apply endtoend learning, transfer learning, and multitask learning
I’ve seen teams waste months or years through not understanding the principles taught in this course. I hope this two week course will save you months of time.
This is a standalone course, and you can take this so long as you have basic machine learning knowledge. This is the third course in the Deep Learning Specialization.
Who is this class for: Prerequisites:  This course is aimed at individuals with basic knowledge of machine learning, who want to know how to set technical direction and prioritization for their work.  It is recommended that you take course one and two of this specialization (Neural Networks and Deep Learning, and Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization) prior to beginning this course.
Course can be found here
Week 1 ML Strategy (1)
Learning Objectives
 Understand why Machine Learning strategy is important
 Apply satisficing and optimizing metrics to set up your goal for ML projects
 Choose a correct train/dev/test split of your dataset
 Understand how to define humanlevel performance
 Use humanlevel perform to define your key priorities in ML projects
 Take the correct ML Strategic decision based on observations of performances and dataset
Introduction to ML Strategy
Why ML Strategy2 min
Orthogonalization10 min
Setting up your goal
Single number evaluation metric7 min
Satisficing and Optimizing metric5 min
Train/dev/test distributions6 min
Size of the dev and test sets5 min
When to change dev/test sets and metrics11 min
Comparing to humanlevel performance
Why humanlevel performance?5 min
Avoidable bias6 min
Understanding humanlevel performance11 min
Surpassing humanlevel performance6 min
Improving your model performance6 min
Machine Learning flight simulator
Machine Learning flight simulator2 min
Quiz: Bird recognition in the city of Peacetopia (case study)15 questions
QUIZ
Bird recognition in the city of Peacetopia (case study)
15 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 26, 11:59 PM PST
Question 11 point
1.Question 1
Problem Statement
This example is adapted from a real production application, but with details disguised to protect confidentiality.
You are a famous researcher in the City of Peacetopia. The people of Peacetopia have a common characteristic: they are afraid of birds. To save them, you have to build an algorithm that will detect any bird flying over Peacetopia and alert the population.
The City Council gives you a dataset of 10,000,000 images of the sky above Peacetopia, taken from the city’s security cameras. They are labelled:
 y = 0: There is no bird on the image
 y = 1: There is a bird on the image
Your goal is to build an algorithm able to classify new images taken by security cameras from Peacetopia.
There are a lot of decisions to make:
 What is the evaluation metric?
 How do you structure your data into train/dev/test sets?
Metric of success
The City Council tells you the following that they want an algorithm that
 Has high accuracy
 Runs quickly and takes only a short time to classify a new image.
 Can fit in a small amount of memory, so that it can run in a small processor that the city will attach to many different security cameras.
Note: Having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate. True/False?
Question 21 point
2.Question 2
After further discussions, the city narrows down its criteria to:
 “We need an algorithm that can let us know a bird is flying over Peacetopia as accurately as possible.”
 “We want the trained model to take no more than 10sec to classify a new image.”
 “We want the model to fit in 10MB of memory.”
If you had the three following models, which one would you choose?
Test Accuracy  Runtime  Memory size 

97%  1 sec  3MB 
Test Accuracy  Runtime  Memory size 

99%  13 sec  9MB 
Test Accuracy  Runtime  Memory size 

97%  3 sec  2MB 
Test Accuracy  Runtime  Memory size 

98%  9 sec  9MB 
Question 31 point
3.Question 3
Based on the city’s requests, which of the following would you say is true?
Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.
Accuracy is a satisficing metric; running time and memory size are an optimizing metric.
Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three.
Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable.
Question 41 point
4.Question 4
Structuring your data
Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?
Train  Dev  Test 

6,000,000  1,000,000  3,000,000 
Train  Dev  Test 

9,500,000  250,000  250,000 
Train  Dev  Test 

3,333,334  3,333,333  3,333,333 
Train  Dev  Test 

6,000,000  3,000,000  1,000,000 
Question 51 point
5.Question 5
After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.
You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False?
Question 61 point
6.Question 6
One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because:
This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit.
A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set.
The 1,000,000 citizens’ data images do not have a consistent x–>y mapping as the rest of the data (similar to the New York City/Detroit housing prices example from lecture).
The test set no longer reflects the distribution of data (security cameras) you most care about.
Question 71 point
7.Question 7
You train a system, and its errors are as follows (error = 100%Accuracy):
Training set error  4.0% 

Dev set error  4.5% 
This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training error. Do you agree?
Yes, because having 4.0% training error shows you have high bias.
Yes, because this shows your bias is higher than your variance.
No, because this shows your variance is higher than your bias.
No, because there is insufficient information to tell.
Question 81 point
8.Question 8
You ask a few people to label the dataset so as to find out what is humanlevel performance. You find the following levels of accuracy:
Bird watching expert #1  0.3% error 

Bird watching expert #2  0.5% error 
Normal person #1 (not a bird watching expert)  1.0% error 
Normal person #2 (not a bird watching expert)  1.2% error 
If your goal is to have “humanlevel performance” be a proxy (or estimate) for Bayes error, how would you define “humanlevel performance”?
0.0% (because it is impossible to do better than this)
0.3% (accuracy of expert #1)
0.4% (average of 0.3 and 0.5)
0.75% (average of all four numbers above)
Question 91 point
9.Question 9
Which of the following statements do you agree with?
A learning algorithm’s performance can be better than humanlevel performance but it can never be better than Bayes error.
A learning algorithm’s performance can never be better than humanlevel performance but it can be better than Bayes error.
A learning algorithm’s performance can never be better than humanlevel performance nor better than Bayes error.
A learning algorithm’s performance can be better than humanlevel performance and better than Bayes error.
Question 101 point
10.Question 10
You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “humanlevel performance.” After working further on your algorithm, you end up with the following:
Humanlevel performance  0.1% 

Training set error  2.0% 
Dev set error  2.1% 
Based on the evidence you have, which two of the following four options seem the most promising to try? (Check two options.)
Try decreasing regularization.
Train a bigger model to try to do better on the training set.
Try increasing regularization.
Get a bigger training set to reduce variance.
Question 111 point
11.Question 11
You also evaluate your model on the test set, and find the following:
Humanlevel performance  0.1% 

Training set error  2.0% 
Dev set error  2.1% 
Test set error  7.0% 
What does this mean? (Check the two best options.)
You should try to get a bigger dev set.
You have underfit to the dev set.
You have overfit to the dev set.
You should get a bigger test set.
Question 121 point
12.Question 12
After working on this project for a year, you finally achieve:
Humanlevel performance  0.10% 

Training set error  0.05% 
Dev set error  0.05% 
What can you conclude? (Check all that apply.)
If the test set is big enough for the 0.05% error estimate to be accurate, this implies Bayes error is ≤0.05
This is a statistical anomaly (or must be the result of statistical noise) since it should not be possible to surpass humanlevel performance.
With only 0.09% further progress to make, you should quickly be able to close the remaining gap to 0%
It is now harder to measure avoidable bias, thus progress will be slower going forward.
Question 131 point
13.Question 13
It turns out Peacetopia has hired one of your competitors to build a system as well. Your system and your competitor both deliver systems with about the same running time and memory size. However, your system has higher accuracy! However, when Peacetopia tries out your and your competitor’s systems, they conclude they actually like your competitor’s system better, because even though you have higher overall accuracy, you have more false negatives (failing to raise an alarm when a bird is in the air). What should you do?
Look at all the models you’ve developed during the development process and find the one with the lowest false negative error rate.
Ask your team to take into account both accuracy and false negative rate during development.
Rethink the appropriate metric for this task, and ask your team to tune to the new metric.
Pick false negative rate as the new metric, and use this new metric to drive all further development.
Question 141 point
14.Question 14
You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data.
You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first?
Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team.
Put the 1,000 images into the training set so as to try to do better on these birds.
Try data augmentation/data synthesis to get more images of the new type of bird.
Add the 1,000 images into your dataset and reshuffle into a new train/dev/test split.
Question 151 point
15.Question 15
The City Council thinks that having more Cats in the city would help scare off birds. They are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful aren’t they.) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.)
Having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate.
Needing two weeks to train will limit the speed at which you can iterate.
If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data.
Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity.
Heroes of Deep Learning (Optional)
Andrej Karpathy interview15 min
Week 2 ML Strategy (2)
Learning Objectives
 Understand what multitask learning and transfer learning are
 Recognize bias, variance and datamismatch by looking at the performances of your algorithm on train/dev/test sets
Error Analysis
Carrying out error analysis10 min
Cleaning up incorrectly labeled data13 min
Build your first system quickly, then iterate6 min
Mismatched training and dev/test set
Training and testing on different distributions10 min
Bias and Variance with mismatched data distributions18 min
Addressing data mismatch10 min
Learning from multiple tasks
Transfer learning11 min
Multitask learning12 min
Endtoend deep learning
What is endtoend deep learning?11 min
Whether to use endtoend deep learning10 min
Machine Learning flight simulator
Quiz: Autonomous driving (case study)15 questions
QUIZ
Autonomous driving (case study)
15 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
December 3, 11:59 PM PST
Question 11 point
1.Question 1
To help you practice strategies for machine learning, in this week we’ll present another scenario and ask how you would act. We think this “simulator” of working in a machine learning project will give a task of what leading a machine learning project could be like!
You are employed by a startup building selfdriving cars. You are in charge of detecting road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. As an example, the above image contains a pedestrian crossing sign and red traffic lights
Your 100,000 labeled images are taken using the frontfacing camera of your car. This is also the distribution of data you care most about doing well on. You think you might be able to get a much larger dataset off the internet, that could be helpful for training even if the distribution of internet data is not the same.
You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days).
Spend a few days collecting more data using the frontfacing camera of your car, to better understand how much data per unit time you can collect.
Spend a few days getting the internet data, so that you understand better what data is available.
Spend a few days checking what is humanlevel performance for these tasks so that you can get an accurate estimate of Bayes error.
Spend a few days training a basic model and see what mistakes it makes.
Question 21 point
2.Question 2
Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers.
For the output layer, a softmax activation would be a good choice for the output layer because this is a multitask learning problem. True/False?
Question 31 point
3.Question 3
You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time?
500 images on which the algorithm made a mistake
10,000 images on which the algorithm made a mistake
500 randomly chosen images
10,000 randomly chosen images
Question 41 point
4.Question 4
After working on the data for several weeks, your team ends up with the following data:
 100,000 labeled images taken using the frontfacing camera of your car.
 900,000 labeled images of roads downloaded from the internet.
 Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example, y(i) = 10010 means the image contains a stop sign and a red traffic light.
Because this is a multitask learning problem, you need to have all your y(i) vectors fully labeled. If one example is equal to 0?11? then the learning algorithm will not be able to use that example. True/False?
Question 51 point
5.Question 5
The distribution of data you care about contains images from your car’s frontfacing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets?
Choose the training set to be the 900,000 images from the internet along with 20,000 images from your car’s frontfacing camera. The 80,000 remaining images will be split equally in dev and test sets.
Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 600,000 for the training set, 200,000 for the dev set and 200,000 for the test set.
Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s frontfacing camera. The 20,000 remaining images will be split equally in dev and test sets.
Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 980,000 for the training set, 10,000 for the dev set and 10,000 for the test set.
Question 61 point
6.Question 6
Assume you’ve finally chosen the following split between of the data:
Dataset:  Contains:  Error of the algorithm: 

Training  940,000 images randomly picked from (900,000 internet images + 60,000 car’s frontfacing camera images)  8.8% 
TrainingDev  20,000 images randomly picked from (900,000 internet images + 60,000 car’s frontfacing camera images)  9.1% 
Dev  20,000 images from your car’s frontfacing camera  14.3% 
Test  20,000 images from the car’s frontfacing camera  14.8% 
You also know that humanlevel error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply).
You have a large avoidablebias problem because your training error is quite a bit higher than the humanlevel error.
You have a large variance problem because your model is not generalizing well to data from the same training distribution but that it has never seen before.
Your algorithm overfits the dev set because the error of the dev and test sets are very close.
You have a large datamismatch problem because your model does a lot better on the trainingdev set than on the dev set
You have a large variance problem because your training error is quite higher than the humanlevel error.
Question 71 point
7.Question 7
Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?
Your friend is right. (I.e., Bayes error for the training data distribution is probably lower than for the dev/test distribution.)
Your friend is wrong. (I.e., Bayes error for the training data distribution is probably higher than for the dev/test distribution.)
There’s insufficient information to tell if your friend is right or wrong.
Question 81 point
8.Question 8
You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries:
Overall dev set error  14.3% 

Errors due to incorrectly labeled data  4.1% 
Errors due to foggy pictures  8.0% 
Errors due to rain drops stuck on your car’s frontfacing camera  2.2% 
Errors due to other causes  1.0% 
In this table, 4.1%, 8.0%, etc.are a fraction of the total dev set (not just examples your algorithm mislabeled). I.e. about 8.0/14.3 = 56% of your errors are due to foggy pictures.
The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False?
True because it is the largest category of errors. As discussed in lecture, we should prioritize the largest category of error to avoid wasting the team’s time.
True because it is greater than the other error categories added together (8.0 > 4.1+2.2+1.0).
False because this would depend on how easy it is to add this data and how much you think your team thinks it’ll help.
False because data augmentation (synthesizing foggy images by clean/nonfoggy images) is more efficient.
Question 91 point
9.Question 9
You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the frontfacing camera. Based on the table from the previous question, which of the following statements do you agree with?
2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance.
2.2% would be a reasonable estimate of the minimum amount this windshield wiper could improve performance.
2.2% would be a reasonable estimate of how much this windshield wiper will improve performance.
2.2% would be a reasonable estimate of how much this windshield wiper could worsen performance in the worst case.
Question 101 point
10.Question 10
You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this:
Which of the following statements do you agree with?
So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images, since human vision is very accurate for the problem you’re solving.
There is little risk of overfitting to the 1,000 pictures of fog so long as you are combing it with a much larger (>>1,000) of clean/nonfoggy images.
Adding synthesized images that look like real foggy pictures taken from the frontfacing camera of your car to training dataset won’t help the model improve because it will introduce avoidablebias.
Question 111 point
11.Question 11
After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply).
You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution
You should correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.
You should not correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution
You should not correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.
Question 121 point
12.Question 12
So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning.
What do you tell your colleague?
She should try using weights pretrained on your dataset, and finetuning further with the yellowlight dataset.
If she has (say) 10,000 images of yellow lights, randomly sample 10,000 images from your dataset and put your and her data together. This prevents your dataset from “swamping” the yellow lights dataset.
You cannot help her because the distribution of data you have is different from hers, and is also lacking the yellow label.
Recommend that she try multitask learning instead of transfer learning using all the data.
Question 131 point
13.Question 13
Another colleague wants to use microphones placed outside the car to better hear if there’re other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help?
Transfer learning from your vision dataset could help your colleague get going faster. Multitask learning seems significantly less promising.
Multitask learning from your vision dataset could help your colleague get going faster. Transfer learning seems significantly less promising.
Either transfer learning or multitask learning could help our colleague get going faster.
Neither transfer learning nor multitask learning seems promising.
Question 141 point
14.Question 14
To recognize red and green lights, you have been using this approach:
(A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y).
A teammate proposes a different, twostep approach:(B) In this twostep approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light.
Between these two, Approach B is more of an endtoend approach because it has distinct steps for the input end and the output end. True/False?
Question 151 point
15.Question 15
Approach A (in the question above) tends to be more promising than approach B if you have a __ (fill in the blank).
Large training set
Multitask learning problem.
Large bias problem.
Problem with a high Bayes error.
Heroes of Deep Learning (Optional)
Ruslan Salakhutdinov interview17 min
Convolutional Neural Networks
About this course: This course will teach you how to build convolutional neural networks and apply it to image data. Thanks to deep learning, computer vision is working far better than just two years ago, and this is enabling numerous exciting applications ranging from safe autonomous driving, to accurate face recognition, to automatic reading of radiology images.
You will:
 Understand how to build a convolutional neural network, including recent variations such as residual networks.
 Know how to apply convolutional networks to visual detection and recognition tasks.
 Know to use neural style transfer to generate art.
 Be able to apply these algorithms to a variety of image, video, and other 2D or 3D data.
This is the fourth course of the Deep Learning Specialization.
Who is this class for:  Learners that took the first two courses of the specialization. The third course is recommended.  Anyone that already has a solid understanding of densely connected neural networks, and wants to learn convolutional neural networks or work with image data.
Week 1 Foundations of Convolutional Neural Networks
Learn to implement the foundational layers of CNNs (pooling, convolutions) and to stack them properly in a deep network to solve multiclass image classification problems.
Learning Objectives
 Understand the convolution operation
 Understand the pooling operation
 Remember the vocabulary used in convolutional neural network (padding, stride, filter, …)
 Build a convolutional neural network for image multiclass classification
Convolutional Neural Networks
Computer Vision5 min
Edge Detection Example11 min
More Edge Detection7 min
Padding9 min
Strided Convolutions9 min
Convolutions Over Volume10 min
One Layer of a Convolutional Network16 min
Simple Convolutional Network Example8 min
Pooling Layers10 min
CNN Example12 min
Why Convolutions?9 min
Practice questions
Quiz: The basics of ConvNets10 questions
primary
Programming assignments
Convolutional Model: step by step2h
Programming Assignment: Convolutional Model: step by step
Convolutional Model: application1h
Programming Assignment: Convolutional model: application
Week 2
primary
Week 3
primary
Week 4
primary
#
Week 1
primary
Week 2
primary
Week 3
primary
Week 4
primary