Coursera HSE Advanced Machine Learning Specialization

For quick searching
Course can be found here
Video in YouTube
Lecture Slides can be found in my Github

About This Specialization
This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

Projects Overview
You will master your skills by solving a wide variety of real-world problems like image captioning and automatic game playing throughout the course projects. You will gain the hands-on experience of applying advanced machine learning techniques that provide the foundation to the current state-of-the art in AI.

Introduction to Deep Learning

Course can be found here
Lecture slides can be found here

About this course: The goal of this course is to give learners basic understanding of modern neural networks and their applications in computer vision and natural language understanding. The course starts with a recap of linear models and discussion of stochastic optimization methods that are crucial for training deep neural networks. Learners will study all popular building blocks of neural networks including fully connected layers, convolutional and recurrent layers.
Learners will use these building blocks to define complex modern architectures in TensorFlow and Keras frameworks. In the course project learner will implement deep neural network for the task of image captioning which solves the problem of giving a text description for an input image.

The prerequisites for this course are:
1) Basic knowledge of Python.
2) Basic linear algebra and probability.

Please note that this is an advanced course and we assume basic knowledge of machine learning. You should understand:
1) Linear regression: mean squared error, analytical solution.
2) Logistic regression: model, cross-entropy loss, class probability estimation.
3) Gradient descent for linear models. Derivatives of MSE and cross-entropy loss functions.
4) The problem of overfitting.
5) Regularization for linear models.

Who is this class for: Developers, analysts and researchers who are faced with tasks involving complex structure understanding such as image, sound and text analysis.

Week 1 Introduction to optimization

Welcome to the “Introduction to Deep Learning” course! In the first week you’ll learn about linear models and stochatic optimization methods. Linear models are basic building blocks for many deep architectures, and stochastic optimization is used to learn every model that we’ll discuss in our course.

Learning Objectives

  • Train a linear model for classification or regression task using stochastic gradient descent
  • Tune SGD optimization using different techniques
  • Apply regularization to train better models
  • Use linear models for classification and regression tasks

Course intro

Welcome!5 min

Linear model as the simplest neural network

Linear regression 9 min

Linear classification 10 min

Gradient descent 5 min

Quiz: Linear models 3 questions

QUIZ
Linear models
3 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 26, 11:59 PM PST

1 point
1.Consider a vector (1,−2,0.5). Apply a softmax transform to it and enter the first component (accurate to 2 decimal places).




1 point
2.Suppose you are solving a 5-class classification problem with 10 features. How many parameters a linear model would have? Don’t forget bias terms!




1 point
3.There is an analytical solution for linear regression parameters and MSE loss, but we usually prefer gradient descent optimization over it. What are the reasons?


Gradient descent is more scalable and can be applied for problems with high number of features


Gradient descent is a method developed especially for MSE loss


Gradient descent can find parameter values that give lower MSE value than parameters from analytical solution


Gradient descent doesn’t require to invert a matrix

Regularization in machine learning

Overfitting problem and model validation 6 min

Model regularization 5 min

Quiz: Overfitting and regularization 4 questions

QUIZ
Overfitting and regularization
4 questions
To Pass80% or higher
Attempts3 every 8 hours
Deadline
November 26, 11:59 PM PST

1 point
1.Select correct statements about overfitting:


Overfitting is a situation where a model gives lower quality for new data compared to quality on a training sample


Overfitting happens when model is too simple for the problem


Overfitting is a situation where a model gives comparable quality on new data and on a training sample


Large model weights can indicate that model is overfitted


1 point
2.What disadvantages do model validation on holdout sample have?


It requires multiple model fitting


It is sensitive to the particular split of the sample into training and test parts


It can give biased quality estimates for small samples


123,1,13,|3,12,2|23


1 point
3.Suppose you are using k-fold cross-validation to assess model quality. How many times should you train the model during this procedure?


1


k


k(k−1)/2


k2


1 point
4.Select correct statements about regularization:


Weight penalty reduces the number of model parameters and leads to faster model training


Reducing the training sample size makes data simpler and then leads to better quality


Regularization restricts model complexity (namely the scale of the coefficients) to reduce overfitting


Weight penalty drives model parameters closer to zero and prevents the model from being too sensitive to small changes in features

Stochastic methods for optimization

Stochastic gradient descent 5 min

Gradient descent extensions 9 min

Linear models and optimization

Programming Assignment: Linear models and optimization 3h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
# coding: utf-8
# # Programming assignment (Linear models, Optimization)
#
# In this programming assignment you will implement a linear classifier and train it using stochastic gradient descent modifications and numpy.
# In[1]:
import numpy as np
get_ipython().magic('matplotlib inline')
import matplotlib.pyplot as plt
# In[2]:
import sys
sys.path.append("..")
import grading
grader = grading.Grader(assignment_key="UaHtvpEFEee0XQ6wjK-hZg",
all_parts=["xU7U4", "HyTF6", "uNidL", "ToK7N", "GBdgZ", "dLdHG"])
# In[3]:
# token expires every 30 min
COURSERA_TOKEN = ""
COURSERA_EMAIL = ""
# ## Two-dimensional classification
#
# To make things more intuitive, let's solve a 2D classification problem with synthetic data.
# In[4]:
with open('train.npy', 'rb') as fin:
X = np.load(fin)
with open('target.npy', 'rb') as fin:
y = np.load(fin)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, s=20)
plt.show()
# In[5]:
print(X.shape)
print(y.shape)
# # Task
#
# ## Features
#
# As you can notice the data above isn't linearly separable. Since that we should add features (or use non-linear model). Note that decision line between two classes have form of circle, since that we can add quadratic features to make the problem linearly separable. The idea under this displayed on image below:
#
# ![](kernel.png)
# In[6]:
def expand(X):
"""
Adds quadratic features.
This expansion allows your linear model to make non-linear separation.
For each sample (row in matrix), compute an expanded row:
[feature0, feature1, feature0^2, feature1^2, feature1*feature2, 1]
:param X: matrix of features, shape [n_samples,2]
:returns: expanded features of shape [n_samples,6]
"""
X_expanded = np.zeros((X.shape[0], 6))
# TODO:<your code here>
X_expanded[:,0], X_expanded[:,1] = X[:,0],X[:,1]
X_expanded[:,2], X_expanded[:,3]= X[:,0]**2, X[:,1]**2
X_expanded[:,4], X_expanded[:,5] = X[:,0]*X[:,1], np.ones(X.shape[0])
return X_expanded
# In[7]:
X_expanded = expand(X)
# Here are some tests for your implementation of `expand` function.
# In[8]:
# simple test on random numbers
dummy_X = np.array([
[0,0],
[1,0],
[2.61,-1.28],
[-0.59,2.1]
])
# call your expand function
dummy_expanded = expand(dummy_X)
# what it should have returned: x0 x1 x0^2 x1^2 x0*x1 1
dummy_expanded_ans = np.array([[ 0. , 0. , 0. , 0. , 0. , 1. ],
[ 1. , 0. , 1. , 0. , 0. , 1. ],
[ 2.61 , -1.28 , 6.8121, 1.6384, -3.3408, 1. ],
[-0.59 , 2.1 , 0.3481, 4.41 , -1.239 , 1. ]])
#tests
assert isinstance(dummy_expanded,np.ndarray), "please make sure you return numpy array"
assert dummy_expanded.shape == dummy_expanded_ans.shape, "please make sure your shape is correct"
assert np.allclose(dummy_expanded,dummy_expanded_ans,1e-3), "Something's out of order with features"
print("Seems legit!")
# ## Logistic regression
#
# To classify objects we will obtain probability of object belongs to class '1'. To predict probability we will use output of linear model and logistic function:
#
# $$ a(x; w) = \langle w, x \rangle $$
# $$ P( y=1 \; \big| \; x, \, w) = \dfrac{1}{1 + \exp(- \langle w, x \rangle)} = \sigma(\langle w, x \rangle)$$
#
# In[9]:
def probability(X, w):
"""
Given input features and weights
return predicted probabilities of y==1 given x, P(y=1|x), see description above
Don't forget to use expand(X) function (where necessary) in this and subsequent functions.
:param X: feature matrix X of shape [n_samples,6] (expanded)
:param w: weight vector w of shape [6] for each of the expanded features
:returns: an array of predicted probabilities in [0,1] interval.
"""
# TODO:<your code here>
z = np.dot(X,w)
a = 1./(1+np.exp(-z))
return np.array(a)
# In[10]:
dummy_weights = np.linspace(-1, 1, 6)
ans_part1 = probability(X_expanded[:1, :], dummy_weights)[0]
# In[11]:
print(ans_part1)
# In[12]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("xU7U4", ans_part1)
# In[13]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# In logistic regression the optimal parameters $w$ are found by cross-entropy minimization:
#
# $$ L(w) = - {1 \over \ell} \sum_{i=1}^\ell \left[ {y_i \cdot log P(y_i \, | \, x_i,w) + (1-y_i) \cdot log (1-P(y_i\, | \, x_i,w))}\right] $$
#
#
# In[14]:
def compute_loss(X, y, w):
"""
Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
and weight vector w [6], compute scalar loss function using formula above.
"""
# TODO:<your code here>
l = X.shape[0]
a = probability(X, w)
cross_entropy = y*np.log(a) +(1-y)*np.log(1-a)
cost = -np.sum(cross_entropy)/float(l)
cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost
# In[15]:
# use output of this cell to fill answer field
ans_part2 = compute_loss(X_expanded, y, dummy_weights)
# In[16]:
print(ans_part2)
# In[17]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("HyTF6", ans_part2)
# In[18]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# Since we train our model with gradient descent, we should compute gradients.
#
# To be specific, we need a derivative of loss function over each weight [6 of them].
#
# $$ \nabla_w L = ...$$
#
# We won't be giving you the exact formula this time — instead, try figuring out a derivative with pen and paper.
#
# As usual, we've made a small test for you, but if you need more, feel free to check your math against finite differences (estimate how $L$ changes if you shift $w$ by $10^{-5}$ or so).
# In[19]:
def compute_grad(X, y, w):
"""
Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
and weight vector w [6], compute vector [6] of derivatives of L over each weights.
"""
# TODO<your code here>
m = X.shape[0]
A = probability(X, w)
dZ = A - y
#cost = compute_loss(X, y, w)
dW = np.dot(dZ, X) / float(m)
return dW
# In[20]:
# use output of this cell to fill answer field
ans_part3 = np.linalg.norm(compute_grad(X_expanded, y, dummy_weights))
# In[21]:
print(ans_part3)
# In[22]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("uNidL", ans_part3)
# In[23]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# Here's an auxiliary function that visualizes the predictions:
# In[24]:
from IPython import display
h = 0.01
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
def visualize(X, y, w, history):
"""draws classifier prediction with matplotlib magic"""
Z = probability(expand(np.c_[xx.ravel(), yy.ravel()]), w)
Z = Z.reshape(xx.shape)
plt.subplot(1, 2, 1)
plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.subplot(1, 2, 2)
plt.plot(history)
plt.grid()
ymin, ymax = plt.ylim()
plt.ylim(0, ymax)
display.clear_output(wait=True)
plt.show()
# In[25]:
visualize(X, y, dummy_weights, [0.5, 0.5, 0.25])
# ## Training
# In this section we'll use the functions you wrote to train our classifier using stochastic gradient descent.
#
# You can try change hyperparameters like batch size, learning rate and so on to find the best one, but use our hyperparameters when fill answers.
# ## Mini-batch SGD
#
# Stochastic gradient descent just takes a random example on each iteration, calculates a gradient of the loss on it and makes a step:
# $$ w_t = w_{t-1} - \eta \dfrac{1}{m} \sum_{j=1}^m \nabla_w L(w_t, x_{i_j}, y_{i_j}) $$
#
#
# In[26]:
# please use np.random.seed(42), eta=0.1, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta= 0.1 # learning rate
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
# TODO:<your code here>
dW = compute_grad(X_expanded[ind, :], y[ind], w)
w = w - eta * dW
visualize(X, y, w, loss)
plt.clf()
# In[27]:
# use output of this cell to fill answer field
ans_part4 = compute_loss(X_expanded, y, w)
# In[28]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("ToK7N", ans_part4)
# In[29]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# ## SGD with momentum
#
# Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below. It does this by adding a fraction $\alpha$ of the update vector of the past time step to the current update vector.
# <br>
# <br>
#
# $$ \nu_t = \alpha \nu_{t-1} + \eta\dfrac{1}{m} \sum_{j=1}^m \nabla_w L(w_t, x_{i_j}, y_{i_j}) $$
# $$ w_t = w_{t-1} - \nu_t$$
#
# <br>
#
#
# ![](sgd.png)
#
# In[30]:
# please use np.random.seed(42), eta=0.05, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
# TODO:<your code here>
dW = compute_grad(X_expanded[ind, :], y[ind], w)
nu = alpha*nu+eta*dW
w = w - nu
visualize(X, y, w, loss)
plt.clf()
# In[31]:
# use output of this cell to fill answer field
ans_part5 = compute_loss(X_expanded, y, w)
# In[32]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("GBdgZ", ans_part5)
# In[33]:
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# ## RMSprop
#
# Implement RMSPROP algorithm, which use squared gradients to adjust learning rate:
#
# $$ G_j^t = \alpha G_j^{t-1} + (1 - \alpha) g_{tj}^2 $$
# $$ w_j^t = w_j^{t-1} - \dfrac{\eta}{\sqrt{G_j^t + \varepsilon}} g_{tj} $$
# In[34]:
# please use np.random.seed(42), eta=0.1, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
G = np.zeros_like(w)
g2 = np.zeros_like(w)
eps = 1e-8
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
# TODO:<your code here>
dW = compute_grad(X_expanded[ind, :], y[ind], w)
g2 = dW**2
G = alpha*G+(1-alpha)*g2
w = w - eta*dW/np.sqrt(G+eps)
visualize(X, y, w, loss)
plt.clf()
# In[35]:
# use output of this cell to fill answer field
ans_part6 = compute_loss(X_expanded, y, w)
# In[36]:
## GRADED PART, DO NOT CHANGE!
grader.set_answer("dLdHG", ans_part6)
# In[37]:
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)
# In[ ]:

Week 2 Introduction to neural networks

This module is an introduction to the concept of a deep neural network. You’ll begin with the linear model in numpy and finish with writing your very first deep network.

Learning Objectives

Explain the mechanics of basic building blocks for neural networks
Apply backpropagation algorithm to train deep neural networks using automatic differentiation
Implement, train and test neural networks using TensorFlow and Keras

Multilayer perceptron, or the basic principles of deep learning

Multilayer perceptron6 min

Deep Learning

Training a neural network7 min

Matrix Operation

Backpropagation primer7 min

Practice Quiz: Multilayer perceptron4 questions

PRACTICE QUIZ
Multilayer perceptron
4 questions
To Pass100% or higher
Deadline
December 3, 11:59 PM PST

Question 11point

  1. Question 1
    The best nonlinearity functions to use in a Multilayer perceptron are step functions as they allow to reconstruct the decision boundary with better precision.




    Question 21 point
  2. Question 2
    A dense layer applies a linear transformation to its input




    Question 31 point
  3. Question 3
    For an MLP to work, the nonlinearity function must have a finite upper bound




    1 point
  4. Question 4
    How many dimensions will a derivative of a 1-D vector by a 2-D matrix have?


Tensorflow

Tensorflow_task.ipynb

Going deeper with Tensorflow11 min

Practice Programming Assignment: MSE in TensorFlow15 min

Gradients & optimization in Tensorflow8 min

Programming Assignment: Logistic regression in TensorFlow30 min

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
# coding: utf-8
# # Going deeper with Tensorflow
#
# In this video, we're going to study the tools you'll use to build deep learning models. Namely, [Tensorflow](https://www.tensorflow.org/).
#
# If you're running this notebook outside the course environment, you'll need to install tensorflow:
# * `pip install tensorflow` should install cpu-only TF on Linux & Mac OS
# * If you want GPU support from offset, see [TF install page](https://www.tensorflow.org/install/)
# In[1]:
import sys
sys.path.append("..")
import grading
# # Visualization
# Plase note that if you are running on the Coursera platform, you won't be able to access the tensorboard instance due to the network setup there. If you run the notebook locally, you should be able to access TensorBoard on http://127.0.0.1:7007/
# In[ ]:
get_ipython().system(' killall tensorboard')
import os
os.system("tensorboard --logdir=/tmp/tboard --port=7007 &");
# In[2]:
import tensorflow as tf
s = tf.InteractiveSession()
# # Warming up
# For starters, let's implement a python function that computes the sum of squares of numbers from 0 to N-1.
# In[3]:
import numpy as np
def sum_sin(N):
return np.sum(np.arange(N)**2)
# In[4]:
get_ipython().run_cell_magic('time', '', 'sum_sin(10**8)')
# # Tensoflow teaser
#
# Doing the very same thing
# In[5]:
# An integer parameter
N = tf.placeholder('int64', name="input_to_your_function")
# A recipe on how to produce the same result
result = tf.reduce_sum(tf.range(N)**2)
# In[6]:
result
# In[7]:
get_ipython().run_cell_magic('time', '', 'result.eval({N: 10**8})')
# In[8]:
writer = tf.summary.FileWriter("/tmp/tboard", graph=s.graph)
# # How does it work?
# 1. Define placeholders where you'll send inputs
# 2. Make symbolic graph: a recipe for mathematical transformation of those placeholders
# 3. Compute outputs of your graph with particular values for each placeholder
# * `output.eval({placeholder:value})`
# * `s.run(output, {placeholder:value})`
#
# So far there are two main entities: "placeholder" and "transformation"
# * Both can be numbers, vectors, matrices, tensors, etc.
# * Both can be int32/64, floats, booleans (uint8) of various size.
#
# * You can define new transformations as an arbitrary operation on placeholders and other transformations
# * `tf.reduce_sum(tf.arange(N)**2)` are 3 sequential transformations of placeholder `N`
# * There's a tensorflow symbolic version for every numpy function
# * `a+b, a/b, a**b, ...` behave just like in numpy
# * `np.mean` -> `tf.reduce_mean`
# * `np.arange` -> `tf.range`
# * `np.cumsum` -> `tf.cumsum`
# * If if you can't find the op you need, see the [docs](https://www.tensorflow.org/api_docs/python).
#
# `tf.contrib` has many high-level features, may be worth a look.
# In[9]:
with tf.name_scope("Placeholders_examples"):
# Default placeholder that can be arbitrary float32
# scalar, vertor, matrix, etc.
arbitrary_input = tf.placeholder('float32')
# Input vector of arbitrary length
input_vector = tf.placeholder('float32', shape=(None,))
# Input vector that _must_ have 10 elements and integer type
fixed_vector = tf.placeholder('int32', shape=(10,))
# Matrix of arbitrary n_rows and 15 columns
# (e.g. a minibatch your data table)
input_matrix = tf.placeholder('float32', shape=(None, 15))
# You can generally use None whenever you don't need a specific shape
input1 = tf.placeholder('float64', shape=(None, 100, None))
input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))
# elementwise multiplication
double_the_vector = input_vector*2
# elementwise cosine
elementwise_cosine = tf.cos(input_vector)
# difference between squared vector and vector itself plus one
vector_squares = input_vector**2 - input_vector + 1
# In[10]:
my_vector = tf.placeholder('float32', shape=(None,), name="VECTOR_1")
my_vector2 = tf.placeholder('float32', shape=(None,))
my_transformation = my_vector * my_vector2 / (tf.sin(my_vector) + 1)
# In[11]:
print(my_transformation)
# In[12]:
dummy = np.arange(5).astype('float32')
print(dummy)
my_transformation.eval({my_vector:dummy, my_vector2:dummy[::-1]})
# In[13]:
writer.add_graph(my_transformation.graph)
writer.flush()
# TensorBoard allows writing scalars, images, audio, histogram. You can read more on tensorboard usage [here](https://www.tensorflow.org/get_started/graph_viz).
# # Summary
# * Tensorflow is based on computation graphs
# * The graphs consist of placehlders and transformations
# # Mean squared error
#
# Your assignment is to implement mean squared error in tensorflow.
# In[16]:
with tf.name_scope("MSE"):
y_true = tf.placeholder("float32", shape=(None,), name="y_true")
y_predicted = tf.placeholder("float32", shape=(None,), name="y_predicted")
# Your code goes here
# You want to use tf.reduce_mean
# mse = tf.<...>
mse = tf.reduce_mean(tf.squared_difference(y_true, y_predicted))
def compute_mse(vector1, vector2):
return mse.eval({y_true: vector1, y_predicted: vector2})
# In[17]:
writer.add_graph(mse.graph)
writer.flush()
# Tests and result submission. Please use the credentials obtained from the Coursera assignment page.
# In[18]:
import submit
# In[19]:
submit.submit_mse(compute_mse, <your email>, <your token>)
# # Variables
#
# The inputs and transformations have no value outside function call. This isn't too comfortable if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.
#
# Tensorflow solves this with `tf.Variable` objects.
# * You can assign variable a value at any time in your graph
# * Unlike placeholders, there's no need to explicitly pass values to variables when `s.run(...)`-ing
# * You can use variables the same way you use transformations
#
# In[20]:
# Creating a shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5),
name="example_variable")
# In[21]:
# Initialize variable(s) with initial values
s.run(tf.global_variables_initializer())
# Evaluating shared variable (outside symbolicd graph)
print("Initial value", s.run(shared_vector_1))
# Within symbolic graph you use them just
# as any other inout or transformation, not "get value" needed
# In[22]:
# Setting a new value
s.run(shared_vector_1.assign(np.arange(5)))
# Getting that new value
print("New value", s.run(shared_vector_1))
# # tf.gradients - why graphs matter
# * Tensorflow can compute derivatives and gradients automatically using the computation graph
# * True to its name it can manage matrix derivatives
# * Gradients are computed as a product of elementary derivatives via the chain rule:
#
# $$ {\partial f(g(x)) \over \partial x} = {\partial f(g(x)) \over \partial g(x)}\cdot {\partial g(x) \over \partial x} $$
#
# It can get you the derivative of any graph as long as it knows how to differentiate elementary operations
# In[23]:
my_scalar = tf.placeholder('float32')
scalar_squared = my_scalar**2
# A derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, [my_scalar, ])
# In[24]:
derivative
# In[25]:
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
x = np.linspace(-3, 3)
x_squared, x_squared_der = s.run([scalar_squared, derivative[0]],
{my_scalar:x})
plt.plot(x, x_squared,label="$x^2$")
plt.plot(x, x_squared_der, label=r"$\frac{dx^2}{dx}$")
plt.legend();
# # Why that rocks
# In[26]:
my_vector = tf.placeholder('float32', [None])
# Compute the gradient of the next weird function over my_scalar and my_vector
# Warning! Trying to understand the meaning of that function may result in permanent brain damage
weird_psychotic_function = tf.reduce_mean(
(my_vector+my_scalar)**(1+tf.nn.moments(my_vector,[0])[1]) +
1./ tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
2*my_scalar**1.5)*(tf.reduce_sum(my_vector)* my_scalar**2
)*tf.exp((my_scalar-4)**2)/(
1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2)
)/(1+tf.exp(-(my_scalar-4)**2)))**2
der_by_scalar = tf.gradients(weird_psychotic_function, my_scalar)
der_by_vector = tf.gradients(weird_psychotic_function, my_vector)
# In[27]:
# Plotting the derivative
scalar_space = np.linspace(1, 7, 100)
y = [s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [s.run(der_by_scalar,
{my_scalar:x, my_vector:[1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend();
# # Almost done - optimizers
#
# While you can perform gradient descent by hand with automatic grads from above, tensorflow also has some optimization methods implemented for you. Recall momentum & rmsprop?
# In[28]:
y_guess = tf.Variable(np.zeros(2, dtype='float32'))
y_true = tf.range(1, 3, dtype='float32')
loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)
#loss = tf.reduce_mean((y_guess - y_true)**2)
optimizer = tf.train.MomentumOptimizer(0.01, 0.5).minimize(
loss, var_list=y_guess)
# In[29]:
from matplotlib import animation, rc
import matplotlib_utils
from IPython.display import HTML
fig, ax = plt.subplots()
y_true_value = s.run(y_true)
level_x = np.arange(0, 2, 0.02)
level_y = np.arange(0, 3, 0.02)
X, Y = np.meshgrid(level_x, level_y)
Z = (X - y_true_value[0])**2 + (Y - y_true_value[1])**2
ax.set_xlim(-0.02, 2)
ax.set_ylim(-0.02, 3)
s.run(tf.global_variables_initializer())
ax.scatter(*s.run(y_true), c='red')
contour = ax.contour(X, Y, Z, 10)
ax.clabel(contour, inline=1, fontsize=10)
line, = ax.plot([], [], lw=2)
def init():
line.set_data([], [])
return (line,)
guesses = [s.run(y_guess)]
def animate(i):
s.run(optimizer)
guesses.append(s.run(y_guess))
line.set_data(*zip(*guesses))
return (line,)
anim = animation.FuncAnimation(fig, animate, init_func=init,
frames=400, interval=20, blit=True)
# In[ ]:
try:
HTML(anim.to_html5_video())
# In case the build-in renderers are unaviable, fall back to
# a custom one, that doesn't require external libraries
except RuntimeError:
anim.save(None, writer=matplotlib_utils.SimpleMovieWriter(0.001))
# # Logistic regression
# Your assignment is to implement the logistic regression
#
# Plan:
# * Use a shared variable for weights
# * Use a matrix placeholder for `X`
#
# We shall train on a two-class MNIST dataset
# * please note that target `y` are `{0,1}` and not `{-1,1}` as in some formulae
# In[31]:
from sklearn.datasets import load_digits
mnist = load_digits(2)
X, y = mnist.data, mnist.target
print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))
# In[32]:
print('X:\n',X[:3,:10])
print('y:\n',y[:10])
plt.imshow(X[0].reshape([8,8]));
# It's your turn now!
# Just a small reminder of the relevant math:
#
# $$
# P(y=1|X) = \sigma(X \cdot W + b)
# $$
# $$
# \text{loss} = -\log\left(P\left(y_\text{predicted} = 1\right)\right)\cdot y_\text{true} - \log\left(1 - P\left(y_\text{predicted} = 1\right)\right)\cdot\left(1 - y_\text{true}\right)
# $$
#
# $\sigma(x)$ is available via `tf.nn.sigmoid` and matrix multiplication via `tf.matmul`
# In[33]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state=42)
# __Your code goes here.__ For the training and testing scaffolding to work, please stick to the names in comments.
# In[102]:
# Model parameters - weights and bias
# weights = tf.Variable(...) shape should be (X.shape[1], 1)
# b = tf.Variable(...)
weights = tf.Variable(initial_value=np.random.randn(X.shape[1], 1)*0.01, name="weights", dtype="float32")
b = tf.Variable(initial_value=0, name="b", dtype="float32")
print(weights)
print(b)
# In[103]:
# Placeholders for the input data
# input_X = tf.placeholder(...)
# input_y = tf.placeholder(...)
input_X = tf.placeholder(tf.float32, name="input_X")
input_y = tf.placeholder(tf.float32, name="input_y")
print(input_X)
print(input_y)
# In[104]:
# The model code
# Compute a vector of predictions, resulting shape should be [input_X.shape[0],]
# This is 1D, if you have extra dimensions, you can get rid of them with tf.squeeze .
# Don't forget the sigmoid.
# predicted_y = <predicted probabilities for input_X>
predicted_y = tf.squeeze(tf.nn.sigmoid(tf.add(tf.matmul(input_X, weights), b)))
print(predicted_y)
# Loss. Should be a scalar number - average loss over all the objects
# tf.reduce_mean is your friend here
# loss = <logistic loss (scalar, mean over sample)>
loss = -tf.reduce_mean(tf.log(predicted_y)*input_y + tf.log(1-predicted_y)*(1-input_y))
print(loss)
# See above for an example. tf.train.*Optimizer
# optimizer = <optimizer that minimizes loss>
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
print(optimizer)
# A test to help with the debugging
# In[105]:
validation_weights = 1e-3 * np.fromiter(map(lambda x:
s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 0.1, 2]}),
0.15 * np.arange(1, X.shape[1] + 1)),
count=X.shape[1], dtype=np.float32)[:, np.newaxis]
# Compute predictions for given weights and bias
prediction_validation = s.run(
predicted_y, {
input_X: X,
weights: validation_weights,
b: 1e-1})
# Load the reference values for the predictions
validation_true_values = np.loadtxt("validation_predictons.txt")
assert prediction_validation.shape == (X.shape[0],), "Predictions must be a 1D array with length equal to the number " "of examples in input_X"
assert np.allclose(validation_true_values, prediction_validation)
loss_validation = s.run(
loss, {
input_X: X[:100],
input_y: y[-100:],
weights: validation_weights+1.21e-3,
b: -1e-1})
assert np.allclose(loss_validation, 0.728689)
# In[106]:
from sklearn.metrics import roc_auc_score
s.run(tf.global_variables_initializer())
for i in range(5):
s.run(optimizer, {input_X: X_train, input_y: y_train})
loss_i = s.run(loss, {input_X: X_train, input_y: y_train})
print("loss at iter %i:%.4f" % (i, loss_i))
print("train auc:", roc_auc_score(y_train, s.run(predicted_y, {input_X:X_train})))
print("test auc:", roc_auc_score(y_test, s.run(predicted_y, {input_X:X_test})))
# ### Coursera submission
# In[107]:
grade_submitter = grading.Grader("BJCiiY8sEeeCnhKCj4fcOA")
# In[108]:
test_weights = 1e-3 * np.fromiter(map(lambda x:
s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 2, 3]}),
0.1 * np.arange(1, X.shape[1] + 1)),
count=X.shape[1], dtype=np.float32)[:, np.newaxis]
# First, test prediction and loss computation. This part doesn't require a fitted model.
# In[109]:
prediction_test = s.run(
predicted_y, {
input_X: X,
weights: test_weights,
b: 1e-1})
# In[110]:
assert prediction_test.shape == (X.shape[0],), "Predictions must be a 1D array with length equal to the number " "of examples in X_test"
# In[111]:
grade_submitter.set_answer("0ENlN", prediction_test)
# In[112]:
loss_test = s.run(
loss, {
input_X: X[:100],
input_y: y[-100:],
weights: test_weights+1.21e-3,
b: -1e-1})
# Yes, the X/y indices mistmach is intentional
# In[113]:
grade_submitter.set_answer("mMVpM", loss_test)
# In[114]:
grade_submitter.set_answer("D16Rc", roc_auc_score(y_test, s.run(predicted_y, {input_X:X_test})))
# Please use the credentials obtained from the Coursera assignment page.
# In[115]:
grade_submitter.submit(<your email>, <your token>)
# In[ ]:

my1stNN boilerplate

Peer-graded Assignment: my1stNN1h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# coding: utf-8
# In[1]:
from preprocessed_mnist import load_dataset
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
print(X_train.shape, y_train.shape)
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
plt.imshow(X_train[0], cmap="Greys");
# In[2]:
import tensorflow as tf
import numpy as np
import math
from tensorflow.python.framework import ops
# In[3]:
print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)
print(X_test.shape, y_test.shape)
# In[4]:
print(y_train[0])
# In[5]:
# Reshape the training, validate and test examples
X_train_flatten = X_train.reshape(X_train.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions
X_val_flatten = X_val.reshape(X_val.shape[0], -1).T
X_test_flatten = X_test.reshape(X_test.shape[0], -1).T
print(X_train_flatten.shape)
print(X_val_flatten.shape)
print(X_test_flatten.shape)
# In[6]:
def one_hot_matrix(labels, C):
"""
Creates a matrix where the i-th row corresponds to the ith class number and the jth column
corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
will be 1.
Arguments:
labels -- vector containing the labels
C -- number of classes, the depth of the one hot dimension
Returns:
one_hot -- one hot matrix
"""
### START CODE HERE ###
# Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
depth = tf.constant(C, name = "C")
# Use tf.one_hot, be careful with the axis (approx. 1 line)
one_hot_matrix = tf.one_hot(labels, depth, axis = 0)
# Create the session (approx. 1 line)
sess = tf.Session()
# Run the session (approx. 1 line)
one_hot = sess.run(one_hot_matrix)
# Close the session (approx. 1 line). See method 1 above.
sess.close()
### END CODE HERE ###
return one_hot
# In[7]:
# encode y with one-hot
y_train_one_hot = one_hot_matrix(y_train, 10)
y_val_one_hot = one_hot_matrix(y_val, 10)
y_test_one_hot = one_hot_matrix(y_test, 10)
print(y_train_one_hot.shape)
print(y_val_one_hot.shape)
print(y_test_one_hot.shape)
# In[8]:
def create_placeholders(n_x, n_y):
"""
Creates the placeholders for the tensorflow session.
Arguments:
n_x -- scalar, size of an image vector (num_px * num_px = 28 * 28 = 784)
n_y -- scalar, number of classes (from 0 to 9, so -> 10)
Returns:
X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
Tips:
- You will use None because it let's us be flexible on the number of examples you will for the placeholders.
In fact, the number of examples during test/train is different.
"""
### START CODE HERE ### (approx. 2 lines)
X = tf.placeholder(tf.float32, [n_x, None], name = "X")
Y = tf.placeholder(tf.float32, [n_y, None], name = "Y")
### END CODE HERE ###
return X, Y
# In[9]:
def initialize_parameters():
"""
Initializes parameters to build a neural network with tensorflow. The shapes are:
W1 : [50, 784]
b1 : [50, 1]
W2 : [10, 50]
b2 : [10, 1]
Returns:
parameters -- a dictionary of tensors containing W1, b1, W2, b2
"""
tf.set_random_seed(1) # so that your "random" numbers match ours
### START CODE HERE ### (approx. 6 lines of code)
W1 = tf.get_variable("W1", [50,784], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [50,1], initializer = tf.zeros_initializer())
W2 = tf.get_variable("W2", [10, 50], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b2 = tf.get_variable("b2", [10,1], initializer = tf.zeros_initializer())
### END CODE HERE ###
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
# In[10]:
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model: LINEAR -> SIGMOID -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2"
the shapes are given in initialize_parameters
Returns:
Z2 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
### START CODE HERE ### (approx. 5 lines) # Numpy Equivalents:
Z1 = tf.add(tf.matmul(W1, X), b1) # Z1 = np.dot(W1, X) + b1
A1 = tf.nn.relu(Z1) # A1 = relu(Z1)
Z2 = tf.add(tf.matmul(W2, A1), b2) # Z2 = np.dot(W2, a1) + b2
### END CODE HERE ###
return Z2
# In[11]:
def compute_cost(Z2, Y):
"""
Computes the cost
Arguments:
Z2 -- output of forward propagation (output of the last LINEAR unit), of shape (10, number of examples)
Y -- "true" labels vector placeholder, same shape as Z2
Returns:
cost - Tensor of the cost function
"""
# to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
logits = tf.transpose(Z2)
labels = tf.transpose(Y)
### START CODE HERE ### (1 line of code)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
### END CODE HERE ###
return cost
# In[12]:
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
Creates a list of random minibatches from (X, Y)
Arguments:
X -- input data, of shape (input size, number of examples)
Y -- input target, of shape (10, number of examples)
mini_batch_size -- size of the mini-batches, integer
Returns:
mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
"""
np.random.seed(seed) # To make your "random" minibatches the same as ours
m = X.shape[1] # number of training examples
mini_batches = []
# Step 1: Shuffle (X, Y)
permutation = list(np.random.permutation(m))
shuffled_X = X[:, permutation]
shuffled_Y = Y[:, permutation]
# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
for k in range(0, num_complete_minibatches):
### START CODE HERE ### (approx. 2 lines)
mini_batch_X = shuffled_X[:, k*mini_batch_size : (k+1)*mini_batch_size]
mini_batch_Y = shuffled_Y[:, k*mini_batch_size : (k+1)*mini_batch_size]
### END CODE HERE ###
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
# Handling the end case (last mini-batch < mini_batch_size)
if m % mini_batch_size != 0:
### START CODE HERE ### (approx. 2 lines)
mini_batch_X = shuffled_X[:, num_complete_minibatches*mini_batch_size : ]
mini_batch_Y = shuffled_Y[:, num_complete_minibatches*mini_batch_size : ]
### END CODE HERE ###
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
return mini_batches
# In[27]:
def model(X_train, Y_train, X_val, Y_val, learning_rate = 0.0001,
num_epochs = 1000, minibatch_size = 32, print_cost = True):
"""
Implements a two-layer tensorflow neural network: LINEAR->SIGMOID->LINEAR->SOFTMAX.
Arguments:
X_train -- training set, of shape (input size = 784, number of training examples = 50000)
Y_train -- training set, of shape (output size = 10, number of training examples = 50000)
X_val -- validation set, of shape (input size = 784, number of validation examples = 10000)
Y_val -- validation set, of shape (output size = 10, number of validation examples = 10000)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1) # to keep consistent results
seed = 3 # to keep consistent results
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
# Create Placeholders of shape (n_x, n_y)
### START CODE HERE ### (1 line)
X, Y = create_placeholders(n_x, n_y)
### END CODE HERE ###
# Initialize parameters
### START CODE HERE ### (1 line)
parameters = initialize_parameters()
### END CODE HERE ###
# Forward propagation: Build the forward propagation in the tensorflow graph
### START CODE HERE ### (1 line)
Z2 = forward_propagation(X, parameters)
### END CODE HERE ###
# Cost function: Add cost function to tensorflow graph
### START CODE HERE ### (1 line)
cost = compute_cost(Z2, Y)
### END CODE HERE ###
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
### START CODE HERE ### (1 line)
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
### END CODE HERE ###
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0. # Defines a cost related to an epoch
num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# IMPORTANT: The line that runs the graph on a minibatch.
# Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
### START CODE HERE ### (1 line)
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
### END CODE HERE ###
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# Early stoping condition
#if np.absolute(costs[-1] - epoch_cost) < 1e-12:
# break
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# lets save the parameters in a variable
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z2), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
print ("Validation Accuracy:", accuracy.eval({X: X_val, Y: Y_val}))
return parameters
# In[ ]:
parameters = model(X_train_flatten, y_train_one_hot, X_val_flatten, y_val_one_hot)
# In[32]:
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
n_x = 784
n_y = 10
X, Y = create_placeholders(n_x, n_y)
Z2 = forward_propagation(X, parameters)
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z2), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Test Accuracy:", accuracy.eval({X: X_test_flatten, Y: y_test_one_hot}))
# In[ ]:

Review Your Peers: my1stNN

Keras

Keras-task.ipynb

Keras introduction10 min

Programming Assignment: my1stNN - Keras this time1h

primary

Philosophy of deep learning

What Deep Learning is and is not8 min

Deep learning as a language6 min

Optional Honors Content

Neural networks the hard way

NumpyNN (honor).ipynb

Peer-graded Assignment: Your very own neural network2h

primary

Review Your Peers: Your very own neural network

Week

primary

primary

Week

primary

primary

Week

primary

primary

Week

primary

primary

How to Win a Data Science Competition: Learn from Top Kagglers

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary

Bayesian Methods for Machine Learning

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary

Introduction to Reinforcement Learning

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary

Deep Learning in Computer Vision

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary

Natural Language Processing

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary

Addressing Large Hadron Collider Challenges by Machine Learning

Course can be found here
Lecture slides can be found here

Week

primary

primary

Week

primary

primary

Week

primary

primary