For quick searching
Course can be found here
Video in YouTube
Lecture Slides can be found in my Github

About This Specialization
This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

Projects Overview
You will master your skills by solving a wide variety of real-world problems like image captioning and automatic game playing throughout the course projects. You will gain the hands-on experience of applying advanced machine learning techniques that provide the foundation to the current state-of-the art in AI.

Introduction to Deep Learning

Course can be found here
Lecture slides can be found here

About this course: The goal of this course is to give learners basic understanding of modern neural networks and their applications in computer vision and natural language understanding. The course starts with a recap of linear models and discussion of stochastic optimization methods that are crucial for training deep neural networks. Learners will study all popular building blocks of neural networks including fully connected layers, convolutional and recurrent layers.
Learners will use these building blocks to define complex modern architectures in TensorFlow and Keras frameworks. In the course project learner will implement deep neural network for the task of image captioning which solves the problem of giving a text description for an input image.

The prerequisites for this course are:
1) Basic knowledge of Python.
2) Basic linear algebra and probability.

Please note that this is an advanced course and we assume basic knowledge of machine learning. You should understand:
1) Linear regression: mean squared error, analytical solution.
2) Logistic regression: model, cross-entropy loss, class probability estimation.
3) Gradient descent for linear models. Derivatives of MSE and cross-entropy loss functions.
4) The problem of overfitting.
5) Regularization for linear models.

Who is this class for: Developers, analysts and researchers who are faced with tasks involving complex structure understanding such as image, sound and text analysis.

Week 1 Introduction to optimization

Welcome to the “Introduction to Deep Learning” course! In the first week you’ll learn about linear models and stochatic optimization methods. Linear models are basic building blocks for many deep architectures, and stochastic optimization is used to learn every model that we’ll discuss in our course.

Learning Objectives

Train a linear model for classification or regression task using stochastic gradient descent
Tune SGD optimization using different techniques
Apply regularization to train better models
Use linear models for classification and regression tasks

Course intro

Welcome!5 min

Linear model as the simplest neural network

Linear regression 9 min

Linear classification 10 min

Gradient descent 5 min

Quiz: Linear models 3 questions

QUIZ
Linear models
3 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 26, 11:59 PM PST

1 point
1.Consider a vector (1,−2,0.5). Apply a softmax transform to it and enter the first component (accurate to 2 decimal places).

1 point
2.Suppose you are solving a 5-class classification problem with 10 features. How many parameters a linear model would have? Don’t forget bias terms!

1 point
3.There is an analytical solution for linear regression parameters and MSE loss, but we usually prefer gradient descent optimization over it. What are the reasons?

Gradient descent is more scalable and can be applied for problems with high number of features

Gradient descent is a method developed especially for MSE loss

Gradient descent can find parameter values that give lower MSE value than parameters from analytical solution

Gradient descent doesn’t require to invert a matrix

Regularization in machine learning

Overfitting problem and model validation 6 min

Model regularization 5 min

Quiz: Overfitting and regularization 4 questions

QUIZ
Overfitting and regularization
4 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 26, 11:59 PM PST

1 point
1.Select correct statements about overfitting:

Overfitting is a situation where a model gives lower quality for new data compared to quality on a training sample

Overfitting happens when model is too simple for the problem

Overfitting is a situation where a model gives comparable quality on new data and on a training sample

Large model weights can indicate that model is overfitted

1 point
2.What disadvantages do model validation on holdout sample have?

It requires multiple model fitting

It is sensitive to the particular split of the sample into training and test parts

It can give biased quality estimates for small samples

123,1,13,|3,12,2|23

1 point
3.Suppose you are using k-fold cross-validation to assess model quality. How many times should you train the model during this procedure?

k(k−1)/2

k2

1 point
4.Select correct statements about regularization:

Weight penalty reduces the number of model parameters and leads to faster model training

Reducing the training sample size makes data simpler and then leads to better quality

Regularization restricts model complexity (namely the scale of the coefficients) to reduce overfitting

Weight penalty drives model parameters closer to zero and prevents the model from being too sensitive to small changes in features

Stochastic methods for optimization

Stochastic gradient descent 5 min

Gradient descent extensions 9 min

Linear models and optimization

Programming Assignment: Linear models and optimization 3h


# coding: utf-8
# # Programming assignment (Linear models, Optimization)
# 
# In this programming assignment you will implement a linear classifier and train it using stochastic gradient descent modifications and numpy.
# In[1]:
import numpy as np
get_ipython().magic('matplotlib inline')
import matplotlib.pyplot as plt
# In[2]:
import sys
sys.path.append("..")
import grading
grader = grading.Grader(assignment_key="UaHtvpEFEee0XQ6wjK-hZg", 
                      all_parts=["xU7U4", "HyTF6", "uNidL", "ToK7N", "GBdgZ", "dLdHG"])
# In[3]:
# token expires every 30 min
COURSERA_TOKEN = ""
COURSERA_EMAIL = ""
# ## Two-dimensional classification
# 
# To make things more intuitive, let's solve a 2D classification problem with synthetic data.
# In[4]:
with open('train.npy', 'rb') as fin:
    X = np.load(fin)
    
with open('target.npy', 'rb') as fin:
    y = np.load(fin)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, s=20)
plt.show()
# In[5]:
print(X.shape)
print(y.shape)
# # Task
# 
# ## Features
# 
# As you can notice the data above isn't linearly separable. Since that we should add features (or use non-linear model). Note that decision line between two classes have form of circle, since that we can add quadratic features to make the problem linearly separable. The idea under this displayed on image below:
# 
# ![](kernel.png)
# In[6]:
def expand(X):
    """
    Adds quadratic features. 
    This expansion allows your linear model to make non-linear separation.
    
    For each sample (row in matrix), compute an expanded row:
    [feature0, feature1, feature0^2, feature1^2, feature1*feature2, 1]
    
    :param X: matrix of features, shape [n_samples,2]
    :returns: expanded features of shape [n_samples,6]
    """
    X_expanded = np.zeros((X.shape[0], 6))
    
    # TODO:<your code here>
    X_expanded[:,0], X_expanded[:,1] = X[:,0],X[:,1]
    X_expanded[:,2], X_expanded[:,3]= X[:,0]**2, X[:,1]**2
    X_expanded[:,4], X_expanded[:,5] = X[:,0]*X[:,1], np.ones(X.shape[0])
    
    return X_expanded
# In[7]:
X_expanded = expand(X)
# Here are some tests for your implementation of `expand` function.
# In[8]:
# simple test on random numbers
dummy_X = np.array([
        [0,0],
        [1,0],
        [2.61,-1.28],
        [-0.59,2.1]
    ])
# call your expand function
dummy_expanded = expand(dummy_X)
# what it should have returned:   x0       x1       x0^2     x1^2     x0*x1    1
dummy_expanded_ans = np.array([[ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  1.    ],
                               [ 1.    ,  0.    ,  1.    ,  0.    ,  0.    ,  1.    ],
                               [ 2.61  , -1.28  ,  6.8121,  1.6384, -3.3408,  1.    ],
                               [-0.59  ,  2.1   ,  0.3481,  4.41  , -1.239 ,  1.    ]])
#tests
assert isinstance(dummy_expanded,np.ndarray), "please make sure you return numpy array"
assert dummy_expanded.shape == dummy_expanded_ans.shape, "please make sure your shape is correct"
assert np.allclose(dummy_expanded,dummy_expanded_ans,1e-3), "Something's out of order with features"
print("Seems legit!")
# ## Logistic regression
# 
# To classify objects we will obtain probability of object belongs to class '1'. To predict probability we will use output of linear model and logistic function:
# 
# $$ a(x; w) = \langle w, x \rangle $$
# $$ P( y=1 \; \big| \; x, \, w) = \dfrac{1}{1 + \exp(- \langle w, x \rangle)} = \sigma(\langle w, x \rangle)$$
# 
# In[9]:
def probability(X, w):
    """
    Given input features and weights
    return predicted probabilities of y==1 given x, P(y=1|x), see description above
        
    Don't forget to use expand(X) function (where necessary) in this and subsequent functions.
    
    :param X: feature matrix X of shape [n_samples,6] (expanded)
    :param w: weight vector w of shape [6] for each of the expanded features
    :returns: an array of predicted probabilities in [0,1] interval.
    """
    # TODO:<your code here>
    z = np.dot(X,w)
    
    a = 1./(1+np.exp(-z))
    
    return np.array(a)
# In[10]:
dummy_weights = np.linspace(-1, 1, 6)
ans_part1 = probability(X_expanded[:1, :], dummy_weights)[0]
# In[11]:
print(ans_part1)
# In[12]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("xU7U4", ans_part1)
# In[13]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# In logistic regression the optimal parameters $w$ are found by cross-entropy minimization:
# 
# $$ L(w) =  - {1 \over \ell} \sum_{i=1}^\ell \left[ {y_i \cdot log P(y_i \, | \, x_i,w) + (1-y_i) \cdot log (1-P(y_i\, | \, x_i,w))}\right] $$
# 
# 
# In[14]:
def compute_loss(X, y, w):
    """
    Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
    and weight vector w [6], compute scalar loss function using formula above.
    """
    # TODO:<your code here>
    l = X.shape[0]
    
    a = probability(X, w)
    
    cross_entropy = y*np.log(a) +(1-y)*np.log(1-a)
    cost = -np.sum(cross_entropy)/float(l)
    
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())
    
    return cost
# In[15]:
# use output of this cell to fill answer field 
ans_part2 = compute_loss(X_expanded, y, dummy_weights)
# In[16]:
print(ans_part2)
# In[17]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("HyTF6", ans_part2)
# In[18]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# Since we train our model with gradient descent, we should compute gradients.
# 
# To be specific, we need a derivative of loss function over each weight [6 of them].
# 
# $$ \nabla_w L = ...$$
# 
# We won't be giving you the exact formula this time — instead, try figuring out a derivative with pen and paper. 
# 
# As usual, we've made a small test for you, but if you need more, feel free to check your math against finite differences (estimate how $L$ changes if you shift $w$ by $10^{-5}$ or so).
# In[19]:
def compute_grad(X, y, w):
    """
    Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
    and weight vector w [6], compute vector [6] of derivatives of L over each weights.
    """
    
    # TODO<your code here>
    m = X.shape[0]
    A = probability(X, w)
    dZ = A - y
    #cost = compute_loss(X, y, w)
    dW = np.dot(dZ, X) / float(m)
    
    return dW
# In[20]:
# use output of this cell to fill answer field 
ans_part3 = np.linalg.norm(compute_grad(X_expanded, y, dummy_weights))
# In[21]:
print(ans_part3)
# In[22]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("uNidL", ans_part3)
# In[23]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# Here's an auxiliary function that visualizes the predictions:
# In[24]:
from IPython import display
h = 0.01
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
def visualize(X, y, w, history):
    """draws classifier prediction with matplotlib magic"""
    Z = probability(expand(np.c_[xx.ravel(), yy.ravel()]), w)
    Z = Z.reshape(xx.shape)
    plt.subplot(1, 2, 1)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    
    plt.subplot(1, 2, 2)
    plt.plot(history)
    plt.grid()
    ymin, ymax = plt.ylim()
    plt.ylim(0, ymax)
    display.clear_output(wait=True)
    plt.show()
# In[25]:
visualize(X, y, dummy_weights, [0.5, 0.5, 0.25])
# ## Training
# In this section we'll use the functions you wrote to train our classifier using stochastic gradient descent.
# 
# You can try change hyperparameters like batch size, learning rate and so on to find the best one, but use our hyperparameters when fill answers.
# ## Mini-batch SGD
# 
# Stochastic gradient descent just takes a random example on each iteration, calculates a gradient of the loss on it and makes a step:
# $$ w_t = w_{t-1} - \eta \dfrac{1}{m} \sum_{j=1}^m \nabla_w L(w_t, x_{i_j}, y_{i_j}) $$
# 
# 
# In[26]:
# please use np.random.seed(42), eta=0.1, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta= 0.1 # learning rate
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
    ind = np.random.choice(X_expanded.shape[0], batch_size)
    loss[i] = compute_loss(X_expanded, y, w)
    if i % 10 == 0:
        visualize(X_expanded[ind, :], y[ind], w, loss)
    # TODO:<your code here>    
    dW = compute_grad(X_expanded[ind, :], y[ind], w)
    w = w - eta * dW 
visualize(X, y, w, loss)
plt.clf()
# In[27]:
# use output of this cell to fill answer field 
ans_part4 = compute_loss(X_expanded, y, w)
# In[28]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("ToK7N", ans_part4)
# In[29]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# ## SGD with momentum
# 
# Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below. It does this by adding a fraction $\alpha$ of the update vector of the past time step to the current update vector.
# <br>
# <br>
# 
# $$ \nu_t = \alpha \nu_{t-1} + \eta\dfrac{1}{m} \sum_{j=1}^m \nabla_w L(w_t, x_{i_j}, y_{i_j}) $$
# $$ w_t = w_{t-1} - \nu_t$$
# 
# <br>
# 
# 
# ![](sgd.png)
# 
# In[30]:
# please use np.random.seed(42), eta=0.05, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
    ind = np.random.choice(X_expanded.shape[0], batch_size)
    loss[i] = compute_loss(X_expanded, y, w)
    if i % 10 == 0:
        visualize(X_expanded[ind, :], y[ind], w, loss)
    # TODO:<your code here>
    dW = compute_grad(X_expanded[ind, :], y[ind], w)
    nu = alpha*nu+eta*dW
    w = w - nu
visualize(X, y, w, loss)
plt.clf()
# In[31]:
# use output of this cell to fill answer field 
ans_part5 = compute_loss(X_expanded, y, w)
# In[32]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("GBdgZ", ans_part5)
# In[33]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# ## RMSprop
# 
# Implement RMSPROP algorithm, which use squared gradients to adjust learning rate:
# 
# $$ G_j^t = \alpha G_j^{t-1} + (1 - \alpha) g_{tj}^2 $$
# $$ w_j^t = w_j^{t-1} - \dfrac{\eta}{\sqrt{G_j^t + \varepsilon}} g_{tj} $$
# In[34]:
# please use np.random.seed(42), eta=0.1, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
G = np.zeros_like(w)
g2 = np.zeros_like(w)
eps = 1e-8
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
    ind = np.random.choice(X_expanded.shape[0], batch_size)
    loss[i] = compute_loss(X_expanded, y, w)
    if i % 10 == 0:
        visualize(X_expanded[ind, :], y[ind], w, loss)
    # TODO:<your code here>
    dW = compute_grad(X_expanded[ind, :], y[ind], w)
    g2 = dW**2
    G = alpha*G+(1-alpha)*g2
    
    w = w - eta*dW/np.sqrt(G+eps)
    
visualize(X, y, w, loss)
plt.clf()
# In[35]:
# use output of this cell to fill answer field 
ans_part6 = compute_loss(X_expanded, y, w)
# In[36]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("dLdHG", ans_part6)
# In[37]:
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# In[ ]:

Week 2 Introduction to neural networks

This module is an introduction to the concept of a deep neural network. You’ll begin with the linear model in numpy and finish with writing your very first deep network.

Learning Objectives

Explain the mechanics of basic building blocks for neural networks
Apply backpropagation algorithm to train deep neural networks using automatic differentiation
Implement, train and test neural networks using TensorFlow and Keras

Multilayer perceptron, or the basic principles of deep learning

Multilayer perceptron6 min

Deep Learning

Training a neural network7 min

Matrix Operation

Backpropagation primer7 min

Practice Quiz: Multilayer perceptron4 questions

PRACTICE QUIZ
Multilayer perceptron
4 questions
To Pass100% or higher
Deadline
December 3, 11:59 PM PST

Question 11point

Question 1
The best nonlinearity functions to use in a Multilayer perceptron are step functions as they allow to reconstruct the decision boundary with better precision.
Yes
No

Question 21 point
Question 2
A dense layer applies a linear transformation to its input
Yes
No

Question 31 point
Question 3
For an MLP to work, the nonlinearity function must have a finite upper bound
Yes
No

1 point
Question 4
How many dimensions will a derivative of a 1-D vector by a 2-D matrix have?

Tensorflow

Tensorflow_task.ipynb

Going deeper with Tensorflow11 min

Practice Programming Assignment: MSE in TensorFlow15 min

Gradients & optimization in Tensorflow8 min

Programming Assignment: Logistic regression in TensorFlow30 min


# coding: utf-8
# # Going deeper with Tensorflow
# 
# In this video, we're going to study the tools you'll use to build deep learning models. Namely, [Tensorflow](https://www.tensorflow.org/).
# 
# If you're running this notebook outside the course environment, you'll need to install tensorflow:
# * `pip install tensorflow` should install cpu-only TF on Linux & Mac OS
# * If you want GPU support from offset, see [TF install page](https://www.tensorflow.org/install/)
# In[1]:
import sys
sys.path.append("..")
import grading
# # Visualization
# Plase note that if you are running on the Coursera platform, you won't be able to access the tensorboard instance due to the network setup there. If you run the notebook locally, you should be able to access TensorBoard on http://127.0.0.1:7007/
# In[ ]:
get_ipython().system(' killall tensorboard')
import os
os.system("tensorboard --logdir=/tmp/tboard --port=7007 &");
# In[2]:
import tensorflow as tf
s = tf.InteractiveSession()
# # Warming up
# For starters, let's implement a python function that computes the sum of squares of numbers from 0 to N-1.
# In[3]:
import numpy as np
def sum_sin(N):
    return np.sum(np.arange(N)**2)
# In[4]:
get_ipython().run_cell_magic('time', '', 'sum_sin(10**8)')
# # Tensoflow teaser
# 
# Doing the very same thing
# In[5]:
# An integer parameter
N = tf.placeholder('int64', name="input_to_your_function")
# A recipe on how to produce the same result
result = tf.reduce_sum(tf.range(N)**2)
# In[6]:
result
# In[7]:
get_ipython().run_cell_magic('time', '', 'result.eval({N: 10**8})')
# In[8]:
writer = tf.summary.FileWriter("/tmp/tboard", graph=s.graph)
# # How does it work?
# 1. Define placeholders where you'll send inputs
# 2. Make symbolic graph: a recipe for mathematical transformation of those placeholders
# 3. Compute outputs of your graph with particular values for each placeholder
#   * `output.eval({placeholder:value})`
#   * `s.run(output, {placeholder:value})`
# 
# So far there are two main entities: "placeholder" and "transformation"
# * Both can be numbers, vectors, matrices, tensors, etc.
# * Both can be int32/64, floats, booleans (uint8) of various size.
# 
# * You can define new transformations as an arbitrary operation on placeholders and other transformations
#  * `tf.reduce_sum(tf.arange(N)**2)` are 3 sequential transformations of placeholder `N`
#  * There's a tensorflow symbolic version for every numpy function
#    * `a+b, a/b, a**b, ...` behave just like in numpy
#    * `np.mean` -> `tf.reduce_mean`
#    * `np.arange` -> `tf.range`
#    * `np.cumsum` -> `tf.cumsum`
#    * If if you can't find the op you need, see the [docs](https://www.tensorflow.org/api_docs/python).
#    
# `tf.contrib` has many high-level features, may be worth a look.
# In[9]:
with tf.name_scope("Placeholders_examples"):
    # Default placeholder that can be arbitrary float32
    # scalar, vertor, matrix, etc.
    arbitrary_input = tf.placeholder('float32')
    # Input vector of arbitrary length
    input_vector = tf.placeholder('float32', shape=(None,))
    # Input vector that _must_ have 10 elements and integer type
    fixed_vector = tf.placeholder('int32', shape=(10,))
    # Matrix of arbitrary n_rows and 15 columns
    # (e.g. a minibatch your data table)
    input_matrix = tf.placeholder('float32', shape=(None, 15))
    
    # You can generally use None whenever you don't need a specific shape
    input1 = tf.placeholder('float64', shape=(None, 100, None))
    input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))
    # elementwise multiplication
    double_the_vector = input_vector*2
    # elementwise cosine
    elementwise_cosine = tf.cos(input_vector)
    # difference between squared vector and vector itself plus one
    vector_squares = input_vector**2 - input_vector + 1
# In[10]:
my_vector =  tf.placeholder('float32', shape=(None,), name="VECTOR_1")
my_vector2 = tf.placeholder('float32', shape=(None,))
my_transformation = my_vector * my_vector2 / (tf.sin(my_vector) + 1)
# In[11]:
print(my_transformation)
# In[12]:
dummy = np.arange(5).astype('float32')
print(dummy)
my_transformation.eval({my_vector:dummy, my_vector2:dummy[::-1]})
# In[13]:
writer.add_graph(my_transformation.graph)
writer.flush()
# TensorBoard allows writing scalars, images, audio, histogram. You can read more on tensorboard usage [here](https://www.tensorflow.org/get_started/graph_viz).
# # Summary
# * Tensorflow is based on computation graphs
# * The graphs consist of placehlders and transformations
# # Mean squared error
# 
# Your assignment is to implement mean squared error in tensorflow.
# In[16]:
with tf.name_scope("MSE"):
    y_true = tf.placeholder("float32", shape=(None,), name="y_true")
    y_predicted = tf.placeholder("float32", shape=(None,), name="y_predicted")
    # Your code goes here
    # You want to use tf.reduce_mean
    # mse = tf.<...>
    mse = tf.reduce_mean(tf.squared_difference(y_true, y_predicted)) 
def compute_mse(vector1, vector2):
    return mse.eval({y_true: vector1, y_predicted: vector2})
# In[17]:
writer.add_graph(mse.graph)
writer.flush()
# Tests and result submission. Please use the credentials obtained from the Coursera assignment page.
# In[18]:
import submit
# In[19]:
submit.submit_mse(compute_mse, <your email>, <your token>)
# # Variables
# 
# The inputs and transformations have no value outside function call. This isn't too comfortable if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.
# 
# Tensorflow solves this with `tf.Variable` objects.
# * You can assign variable a value at any time in your graph
# * Unlike placeholders, there's no need to explicitly pass values to variables when `s.run(...)`-ing
# * You can use variables the same way you use transformations 
#  
# In[20]:
# Creating a shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5),
                              name="example_variable")
# In[21]:
# Initialize variable(s) with initial values
s.run(tf.global_variables_initializer())
# Evaluating shared variable (outside symbolicd graph)
print("Initial value", s.run(shared_vector_1))
# Within symbolic graph you use them just
# as any other inout or transformation, not "get value" needed
# In[22]:
# Setting a new value
s.run(shared_vector_1.assign(np.arange(5)))
# Getting that new value
print("New value", s.run(shared_vector_1))
# # tf.gradients - why graphs matter
# * Tensorflow can compute derivatives and gradients automatically using the computation graph
# * True to its name it can manage matrix derivatives
# * Gradients are computed as a product of elementary derivatives via the chain rule:
# 
# $$ {\partial f(g(x)) \over \partial x} = {\partial f(g(x)) \over \partial g(x)}\cdot {\partial g(x) \over \partial x} $$
# 
# It can get you the derivative of any graph as long as it knows how to differentiate elementary operations
# In[23]:
my_scalar = tf.placeholder('float32')
scalar_squared = my_scalar**2
# A derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, [my_scalar, ])
# In[24]:
derivative
# In[25]:
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
x = np.linspace(-3, 3)
x_squared, x_squared_der = s.run([scalar_squared, derivative[0]],
                                 {my_scalar:x})
plt.plot(x, x_squared,label="$x^2$")
plt.plot(x, x_squared_der, label=r"$\frac{dx^2}{dx}$")
plt.legend();
# # Why that rocks
# In[26]:
my_vector = tf.placeholder('float32', [None])
# Compute the gradient of the next weird function over my_scalar and my_vector
# Warning! Trying to understand the meaning of that function may result in permanent brain damage
weird_psychotic_function = tf.reduce_mean(
    (my_vector+my_scalar)**(1+tf.nn.moments(my_vector,[0])[1]) + 
    1./ tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
    2*my_scalar**1.5)*(tf.reduce_sum(my_vector)* my_scalar**2
                      )*tf.exp((my_scalar-4)**2)/(
    1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2)
                                    )/(1+tf.exp(-(my_scalar-4)**2)))**2
der_by_scalar = tf.gradients(weird_psychotic_function, my_scalar)
der_by_vector = tf.gradients(weird_psychotic_function, my_vector)
# In[27]:
# Plotting the derivative
scalar_space = np.linspace(1, 7, 100)
y = [s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 2, 3]})
     for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [s.run(der_by_scalar,
                         {my_scalar:x, my_vector:[1, 2, 3]})
                   for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend();
# # Almost done - optimizers
# 
# While you can perform gradient descent by hand with automatic grads from above, tensorflow also has some optimization methods implemented for you. Recall momentum & rmsprop?
# In[28]:
y_guess = tf.Variable(np.zeros(2, dtype='float32'))
y_true = tf.range(1, 3, dtype='float32')
loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2) 
#loss = tf.reduce_mean((y_guess - y_true)**2) 
optimizer = tf.train.MomentumOptimizer(0.01, 0.5).minimize(
    loss, var_list=y_guess)
# In[29]:
from matplotlib import animation, rc
import matplotlib_utils
from IPython.display import HTML
fig, ax = plt.subplots()
y_true_value = s.run(y_true)
level_x = np.arange(0, 2, 0.02)
level_y = np.arange(0, 3, 0.02)
X, Y = np.meshgrid(level_x, level_y)
Z = (X - y_true_value[0])**2 + (Y - y_true_value[1])**2
ax.set_xlim(-0.02, 2)
ax.set_ylim(-0.02, 3)
s.run(tf.global_variables_initializer())
ax.scatter(*s.run(y_true), c='red')
contour = ax.contour(X, Y, Z, 10)
ax.clabel(contour, inline=1, fontsize=10)
line, = ax.plot([], [], lw=2)
def init():
    line.set_data([], [])
    return (line,)
guesses = [s.run(y_guess)]
def animate(i):
    s.run(optimizer)
    guesses.append(s.run(y_guess))
    line.set_data(*zip(*guesses))
    return (line,)
anim = animation.FuncAnimation(fig, animate, init_func=init,
                               frames=400, interval=20, blit=True)
# In[ ]:
try:
    HTML(anim.to_html5_video())
# In case the build-in renderers are unaviable, fall back to
# a custom one, that doesn't require external libraries
except RuntimeError:
    anim.save(None, writer=matplotlib_utils.SimpleMovieWriter(0.001))
# # Logistic regression
# Your assignment is to implement the logistic regression
# 
# Plan:
# * Use a shared variable for weights
# * Use a matrix placeholder for `X`
#  
# We shall train on a two-class MNIST dataset
# * please note that target `y` are `{0,1}` and not `{-1,1}` as in some formulae
# In[31]:
from sklearn.datasets import load_digits
mnist = load_digits(2)
X, y = mnist.data, mnist.target
print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))
# In[32]:
print('X:\n',X[:3,:10])
print('y:\n',y[:10])
plt.imshow(X[0].reshape([8,8]));
# It's your turn now!
# Just a small reminder of the relevant math:
# 
# $$
# P(y=1|X) = \sigma(X \cdot W + b)
# $$
# $$
# \text{loss} = -\log\left(P\left(y_\text{predicted} = 1\right)\right)\cdot y_\text{true} - \log\left(1 - P\left(y_\text{predicted} = 1\right)\right)\cdot\left(1 - y_\text{true}\right)
# $$
# 
# $\sigma(x)$ is available via `tf.nn.sigmoid` and matrix multiplication via `tf.matmul`
# In[33]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42)
# __Your code goes here.__ For the training and testing scaffolding to work, please stick to the names in comments.
# In[102]:
# Model parameters - weights and bias
# weights = tf.Variable(...) shape should be (X.shape[1], 1)
# b = tf.Variable(...)
weights = tf.Variable(initial_value=np.random.randn(X.shape[1], 1)*0.01, name="weights", dtype="float32")
b = tf.Variable(initial_value=0, name="b", dtype="float32")
print(weights)
print(b)
# In[103]:
# Placeholders for the input data
# input_X = tf.placeholder(...)
# input_y = tf.placeholder(...)
input_X = tf.placeholder(tf.float32, name="input_X")
input_y = tf.placeholder(tf.float32, name="input_y")
print(input_X)
print(input_y)
# In[104]:
# The model code
# Compute a vector of predictions, resulting shape should be [input_X.shape[0],]
# This is 1D, if you have extra dimensions, you can  get rid of them with tf.squeeze .
# Don't forget the sigmoid.
# predicted_y = <predicted probabilities for input_X>
predicted_y = tf.squeeze(tf.nn.sigmoid(tf.add(tf.matmul(input_X, weights), b)))
print(predicted_y)
# Loss. Should be a scalar number - average loss over all the objects
# tf.reduce_mean is your friend here
# loss = <logistic loss (scalar, mean over sample)>
loss = -tf.reduce_mean(tf.log(predicted_y)*input_y + tf.log(1-predicted_y)*(1-input_y))
print(loss)
# See above for an example. tf.train.*Optimizer
# optimizer = <optimizer that minimizes loss>
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
print(optimizer)
# A test to help with the debugging
# In[105]:
validation_weights = 1e-3 * np.fromiter(map(lambda x:
        s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 0.1, 2]}),
                                   0.15 * np.arange(1, X.shape[1] + 1)),
                                   count=X.shape[1], dtype=np.float32)[:, np.newaxis]
# Compute predictions for given weights and bias
prediction_validation = s.run(
    predicted_y, {
    input_X: X,
    weights: validation_weights,
    b: 1e-1})
# Load the reference values for the predictions
validation_true_values = np.loadtxt("validation_predictons.txt")
assert prediction_validation.shape == (X.shape[0],),       "Predictions must be a 1D array with length equal to the number "        "of examples in input_X"
assert np.allclose(validation_true_values, prediction_validation)
loss_validation = s.run(
        loss, {
            input_X: X[:100],
            input_y: y[-100:],
            weights: validation_weights+1.21e-3,
            b: -1e-1})
assert np.allclose(loss_validation, 0.728689)
# In[106]:
from sklearn.metrics import roc_auc_score
s.run(tf.global_variables_initializer())
for i in range(5):
    s.run(optimizer, {input_X: X_train, input_y: y_train})
    loss_i = s.run(loss, {input_X: X_train, input_y: y_train})
    print("loss at iter %i:%.4f" % (i, loss_i))
    print("train auc:", roc_auc_score(y_train, s.run(predicted_y, {input_X:X_train})))
    print("test auc:", roc_auc_score(y_test, s.run(predicted_y, {input_X:X_test})))
# ### Coursera submission
# In[107]:
grade_submitter = grading.Grader("BJCiiY8sEeeCnhKCj4fcOA")
# In[108]:
test_weights = 1e-3 * np.fromiter(map(lambda x:
    s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 2, 3]}),
                               0.1 * np.arange(1, X.shape[1] + 1)),
                               count=X.shape[1], dtype=np.float32)[:, np.newaxis]
# First, test prediction and loss computation. This part doesn't require a fitted model.
# In[109]:
prediction_test = s.run(
    predicted_y, {
    input_X: X,
    weights: test_weights,
    b: 1e-1})
# In[110]:
assert prediction_test.shape == (X.shape[0],),       "Predictions must be a 1D array with length equal to the number "        "of examples in X_test"
# In[111]:
grade_submitter.set_answer("0ENlN", prediction_test)
# In[112]:
loss_test = s.run(
    loss, {
        input_X: X[:100],
        input_y: y[-100:],
        weights: test_weights+1.21e-3,
        b: -1e-1})
# Yes, the X/y indices mistmach is intentional
# In[113]:
grade_submitter.set_answer("mMVpM", loss_test)
# In[114]:
grade_submitter.set_answer("D16Rc", roc_auc_score(y_test, s.run(predicted_y, {input_X:X_test})))
# Please use the credentials obtained from the Coursera assignment page.
# In[115]:
grade_submitter.submit(<your email>, <your token>)
# In[ ]:

my1stNN boilerplate

Peer-graded Assignment: my1stNN1h


# coding: utf-8
# In[1]:
from preprocessed_mnist import load_dataset
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
print(X_train.shape, y_train.shape)
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
plt.imshow(X_train[0], cmap="Greys");
# In[2]:
import tensorflow as tf
import numpy as np
import math
from tensorflow.python.framework import ops
# In[3]:
print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)
print(X_test.shape, y_test.shape)
# In[4]:
print(y_train[0])
# In[5]:
# Reshape the training, validate and test examples 
X_train_flatten = X_train.reshape(X_train.shape[0], -1).T   # The "-1" makes reshape flatten the remaining dimensions
X_val_flatten = X_val.reshape(X_val.shape[0], -1).T
X_test_flatten = X_test.reshape(X_test.shape[0], -1).T
print(X_train_flatten.shape)
print(X_val_flatten.shape)
print(X_test_flatten.shape)
# In[6]:
def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension
    
    Returns: 
    one_hot -- one hot matrix
    """
    
    ### START CODE HERE ###
    
    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    depth = tf.constant(C, name = "C")
    
    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(labels, depth, axis = 0)
    
    # Create the session (approx. 1 line)
    sess = tf.Session()
    
    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix)
    
    # Close the session (approx. 1 line). See method 1 above.
    sess.close()
    
    ### END CODE HERE ###
    
    return one_hot
# In[7]:
# encode y with one-hot
y_train_one_hot = one_hot_matrix(y_train, 10)
y_val_one_hot = one_hot_matrix(y_val, 10)
y_test_one_hot = one_hot_matrix(y_test, 10)
print(y_train_one_hot.shape)
print(y_val_one_hot.shape)
print(y_test_one_hot.shape)
# In[8]:
def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 28 * 28 = 784)
    n_y -- scalar, number of classes (from 0 to 9, so -> 10)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """
    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(tf.float32, [n_x, None], name = "X")
    Y = tf.placeholder(tf.float32, [n_y, None], name = "Y")
    ### END CODE HERE ###
    
    return X, Y
# In[9]:
def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [50, 784]
                        b1 : [50, 1]
                        W2 : [10, 50]
                        b2 : [10, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2
    """
    
    tf.set_random_seed(1)                   # so that your "random" numbers match ours
        
    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable("W1", [50,784], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b1 = tf.get_variable("b1", [50,1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [10, 50], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b2 = tf.get_variable("b2", [10,1], initializer = tf.zeros_initializer())
    ### END CODE HERE ###
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters
# In[10]:
def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> SIGMOID -> LINEAR -> SOFTMAX
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2"
                  the shapes are given in initialize_parameters
    Returns:
    Z2 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)                      # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                    # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)                     # Z2 = np.dot(W2, a1) + b2
    ### END CODE HERE ###
    
    return Z2
# In[11]:
def compute_cost(Z2, Y):
    """
    Computes the cost
    
    Arguments:
    Z2 -- output of forward propagation (output of the last LINEAR unit), of shape (10, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z2
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z2)
    labels = tf.transpose(Y)
    
    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
    ### END CODE HERE ###
    
    return cost
# In[12]:
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
    """
    Creates a list of random minibatches from (X, Y)
    
    Arguments:
    X -- input data, of shape (input size, number of examples)
    Y -- input target, of shape (10, number of examples)
    mini_batch_size -- size of the mini-batches, integer
    
    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """
    
    np.random.seed(seed)            # To make your "random" minibatches the same as ours
    m = X.shape[1]                  # number of training examples
    mini_batches = []
        
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation]
    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        ### START CODE HERE ### (approx. 2 lines)
        mini_batch_X = shuffled_X[:, k*mini_batch_size : (k+1)*mini_batch_size]
        mini_batch_Y = shuffled_Y[:, k*mini_batch_size : (k+1)*mini_batch_size]
        ### END CODE HERE ###
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        ### START CODE HERE ### (approx. 2 lines)
        mini_batch_X = shuffled_X[:, num_complete_minibatches*mini_batch_size : ]
        mini_batch_Y = shuffled_Y[:, num_complete_minibatches*mini_batch_size : ]
        ### END CODE HERE ###
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    return mini_batches
# In[27]:
def model(X_train, Y_train, X_val, Y_val, learning_rate = 0.0001,
          num_epochs = 1000, minibatch_size = 32, print_cost = True):
    """
    Implements a two-layer tensorflow neural network: LINEAR->SIGMOID->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 784, number of training examples = 50000)
    Y_train -- training set, of shape (output size = 10, number of training examples = 50000)
    X_val -- validation set, of shape (input size = 784, number of validation examples = 10000)
    Y_val -- validation set, of shape (output size = 10, number of validation examples = 10000)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]                            # n_y : output size
    costs = []                                        # To keep track of the cost
    
    # Create Placeholders of shape (n_x, n_y)
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_x, n_y)
    ### END CODE HERE ###
    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###
    
    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z2 = forward_propagation(X, parameters)
    ### END CODE HERE ###
    
    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z2, Y)
    ### END CODE HERE ###
    
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    ### END CODE HERE ###
    
    # Initialize all the variables
    init = tf.global_variables_initializer()
    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        
        # Run the initialization
        sess.run(init)
        
        # Do the training loop
        for epoch in range(num_epochs):
            epoch_cost = 0.                       # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
            for minibatch in minibatches:
                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                ### END CODE HERE ###
                
                epoch_cost += minibatch_cost / num_minibatches
            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)
            
            # Early stoping condition 
            #if np.absolute(costs[-1] - epoch_cost) < 1e-12:
            #    break
                
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()
        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")
        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z2), tf.argmax(Y))
        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print ("Validation Accuracy:", accuracy.eval({X: X_val, Y: Y_val}))
        
        return parameters
# In[ ]:
parameters = model(X_train_flatten, y_train_one_hot, X_val_flatten, y_val_one_hot)
# In[32]:
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
    n_x = 784
    n_y = 10
    X, Y = create_placeholders(n_x, n_y)
    Z2 = forward_propagation(X, parameters)
    # Calculate the correct predictions
    correct_prediction = tf.equal(tf.argmax(Z2), tf.argmax(Y))
    # Calculate accuracy on the test set
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print ("Test Accuracy:", accuracy.eval({X: X_test_flatten, Y: y_test_one_hot}))
# In[ ]:

Review Your Peers: my1stNN

Keras

Keras-task.ipynb

Keras introduction10 min

Programming Assignment: my1stNN - Keras this time1h

primary

Philosophy of deep learning

What Deep Learning is and is not8 min

Deep learning as a language6 min

Optional Honors Content

Neural networks the hard way

NumpyNN (honor).ipynb

Peer-graded Assignment: Your very own neural network2h

primary