Coursera S Algorithms Specialization Notebook

Posted on 2017-07-17 | In Coursera , Algorithm | | Visitors |

Writing this blog for quick searching, resume and interview.
Shortest Paths Revisited, NP-Complete Problems and What To Do About Them.
stay stuned…
Courses can be found here
Summary can be found here

Algorithms are the heart of computer science, and the subject has countless practical applications as well as intellectual depth. This specialization is an introduction to algorithms for learners with at least a little programming experience. The specialization is rigorous but emphasizes the big picture and conceptual understanding over low-level implementation and mathematical details. After completing this specialization, you will be well-positioned to ace your technical interviews and speak fluently about algorithms with other programmers and computer scientists.

About the instructor: Tim Roughgarden has been a professor in the Computer Science Department at Stanford University since 2004. He has taught and published extensively on the subject of algorithms and their applications.

Divide and Conquer, Sorting and Searching, and Randomized Algorithms

Course can be found here
Lecture slides can be found here
Summary can be found in my Github

About this course: The primary topics in this part of the specialization are: asymptotic (“Big-oh”) notation, sorting and searching, divide and conquer (master method, integer and matrix multiplication, closest pair), and randomized algorithms (QuickSort, contraction algorithm for min cuts).

Who is this class for: Learners with at least a little bit of programming experience who want to learn the essentials of algorithms. In a University computer science curriculum, this course is typically taken in the third year.

Week 1

Introduction;
“big-oh” notation and asymptotic analysis.

I. INTRODUCTION

Welcome and Week 1 Overview 10 min

WELCOME: Welcome to the Algorithms Specialization! Here’s an overview of the first week of material.

INTRODUCTION: The first set of lectures for this week is meant to give you the flavor of the course, and hopefully get you excited about it. We begin by discussing algorithms in general and why they’re so important, and then use the problem of multiplying two integers to illustrate how algorithmic ingenuity can often improve over more straightforward or naive solutions. We discuss the Merge Sort algorithm in detail, for several reasons: it’s a practical and famous algorithm that you should all know; it’s a good warm-up to get you ready for more intricate algorithms; and it’s the canonical introduction to the “divide and conquer” algorithm design paradigm. These lectures conclude by describing several guiding principles for how we’ll analyze algorithms in this course.

ASYMPTOTIC ANALYSIS: The second set of lectures for this week is an introduction to big-oh notation and its relatives, which belongs in the vocabulary of every serious programmer and computer scientist. The goal is to identify a “sweet spot” of granularity for reasoning about algorithms — we want to suppress second-order details like constant factors and lower-order terms, and focus on how the running time of an algorithm scales as the input size grows large.

PREREQUISITES: This course is not an introduction to programming, and it assumes that you have basic programming skills in a language such as Python, Java, or C. There are several outstanding free online courses that teach basic programming. We also use mathematical analysis as needed to understand how and why algorithms and data structures really work. If you need a refresher on the basics of proofs (induction, contradiction, etc.), I recommend the lecture notes “Mathematics for Computer Science” by Lehman and Leighton (see separate Resources pages).

DISCUSSION FORUMS: The discussion forums play a crucial role in massive online courses like this one. If you have trouble understanding a lecture or completing an assignment, you should turn to the forums for help. After you’ve mastered the lectures and assignments for a given week, I hope you’ll contribute to the forums and help out your fellow students. While I won’t have time to carefully monitor the discussion forums, I’ll check in and answer questions whenever I find the time.

VIDEOS AND SLIDES: Videos can be streamed or downloaded and watched offline (recommended for commutes, etc.). We are also providing PDF lecture slides (typed versions of what’s written in the lecture videos), as well as subtitle files (in English and in some cases other languages as well). And if you find yourself wishing that I spoke more quickly or more slowly, note that you can adjust the video speed to accommodate your preferred pace.

HOMEWORK #1: The first problem set consists of 5 multiple choice problems, mostly about Merge Sort and asymptotic notation. The first programming assignment asks you to implement one or more of the integer multiplication algorithms covered in lecture.

SUGGESTED READINGS FOR WEEK 1: Abbreviations in suggested readings refer to the following textbooks:

CLRS - Cormen, Leiserson, Rivest, and Stein, Introduction to Algorithms (3rd edition)
DPV - Dasgupta, Papadimitriou, and Vazirani, Algorithms
KT - Kleinberg and Tardos, Algorithm Design
SW - Sedgewick and Wayne, Algorithms (4th edition)
CLRS: Chapters 2 and 3
DPV: Sections 0.3, 2.1, 2.3
KT: Sections 2.1, 2.2, 2.4, and 5.1
SW: Sections 1.4 and 2.2

Overview, Resources, and Policies 10 min

Course overview

Algorithms are the heart of computer science, and the subject has countless practical applications as well as intellectual depth. This specialization is an introduction to algorithms for learners with at least a little programming experience. The specialization is rigorous but emphasizes the big picture and conceptual understanding over low-level implementation and mathematical details. After completing this specialization, you will have a greater mastery of algorithms than almost anyone without a graduate degree in the subject. Specific topics in Part 1 of the specialization include: “Big-oh” notation, sorting and searching, divide and conquer (master method, integer and matrix multiplication, closest pair), and randomized algorithms (QuickSort, contraction algorithm for min cuts).

Resources

Mathematics for Computer Science (by Eric Lehman and Tom Leighton):
book

Course Policies

Video Lectures

Lectures will be made available weekly, for a total of four weeks. In a given week, there will be roughly three hours of material. Some weeks include extra optional material (some review, some advanced topics).
Below each lecture video there is a PDF of typed versions of the slides.
You can download the videos of the lectures for offline viewing.
The video player supports speeding up and slowing down the pace.

Weekly Programming Assignments and Problem Sets

Every week there will be a new problem set and a new programming assignment.
For each problem set you are allowed a maximum of two attempts in a 12-hour period (we’ll use the best score).
For each programming assignment you’re allowed a maximum of 10 attempts in a 12-hour period (we’ll use the best score).
For the final exam, you’re allowed one attempt per 24 hours.

Grading

To pass a problem set, you must get at least 4 of the 5 questions correct (80%).
To pass a programming assignment, you must get all of the answers correct (100%).
To pass the final exam, you must get at least 70% of the total points (7 out of 10).
To pass the course, you must pass all of the problem sets, all of the programming assignments, and the final exam.

Theory Problems

These are totally optional theory questions (no deadlines or credit given).
We encourage you to attempt these questions and discuss them in the forums to develop a deeper understanding of the design and analysis of algorithms.

Lecture slides 10 min

See the zip file below (which includes slides for both Part 1 and Part 2 of the specialization). Also available from the video lectures on a lecture-by-lecture basis.

algo1slides.zip

Why Study Algorithms?4 min

Integer Multiplication 8 min

Karatsuba Multiplication 12 min

About the Course17 min

Merge Sort: Motivation and Example8 min

Merge Sort: Pseudocode12 min

Merge Sort: Analysis 9 min

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/wW9On/merge-sort-analysis

Guiding Principles for Analysis of Algorithms 15 min

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/q5fV4/guiding-principles-for-analysis-of-algorithms

II. ASYMPTOTIC ANALYSIS

Additional Examples [Review - Optional]7 min

Problem Set #1

Quiz: Problem Set #1 5 questions

Programming Assignment #1

Quiz: Programming Assignment #1 1 question

In this programming assignment you will implement one or more of the integer multiplication algorithms described in lecture.

To get the most out of this assignment, your program should restrict itself to multiplying only pairs of single-digit numbers. You can implement the grade-school algorithm if you want, but to get the most out of the assignment you’ll want to implement recursive integer multiplication and/or Karatsuba’s algorithm.

So: what’s the product of the following two 64-digit numbers?

3141592653589793238462643383279502884197169399375105820974944592

2718281828459045235360287471352662497757247093699959574966967627

[TIP: before submitting, first test the correctness of your program on some small test cases of your own devising. Then post your best test cases to the discussion forums to help your fellow students!]

[Food for thought: the number of digits in each input number is a power of 2. Does this make your life easier? Does it depend on which algorithm you’re implementing?]

The numeric answer should be typed in the space below. So if your answer is 1198233847, then just type 1198233847 in the space provided without any space / commas / any other punctuation marks.

(We do not require you to submit your code, so feel free to use any programming language you want — just type the final numeric answer in the following space.)

Week 2

Divide-and-conquer basics;
the master method for analyzing divide and conquer algorithms.

III. DIVIDE & CONQUER ALGORITHMS

Week 2 Overview 10 min

DIVIDE AND CONQUER ALGORITHMS: The next set of lectures discusses three non-trivial examples of the divide and conquer algorithm design paradigm. The first is for counting the number of inversions in an array. This problem is related to measuring similarity between two ranked lists, which in turn is relevant for making good recommendations to someone based on your knowledge of their and others’ preferences (“collaborative filtering”). The second algorithm is Strassen’s mind-blowing recursive algorithm for matrix multiplication, which improves over the obvious iterative method. The third algorithm, which is more advanced and is optional material, is for computing the closest pair of points in the plane.

THE MASTER METHOD: These lectures cover a “black-box” method for solving recurrences. You can then immediately determine the running time of most of the divide-and-conquer algorithms that you’ll ever see! (Including Karatsuba’s integer multiplication algorithm from Week 1.) The proof is a nice generalization of the recursion tree method that we used to analyze MergeSort. Ever wonder about the mysterious three cases of the Master Method? Watch these videos and hopefully all will become clear.

HOMEWORK: Problem Set #2 has five questions that should give you practice with divide-and-conquer algorithms and the Master Method. Programming assignment #2 asks you to implement the counting inversions algorithm (from Part III) in whatever programming language you please, run it on a quite large input, and enter the answer.

O(n log n) Algorithm for Counting Inversions I12 min

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/GFmmJ/o-n-log-n-algorithm-for-counting-inversions-i

O(n log n) Algorithm for Counting Inversions II1 6 min

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/IUiUk/o-n-log-n-algorithm-for-counting-inversions-ii

Strassen’s Subcubic Matrix Multiplication Algorithm22 min

O(n log n) Algorithm for Closest Pair I [Advanced - Optional]31 min

O(n log n) Algorithm for Closest Pair II [Advanced - Optional]18 min

IV. THE MASTER METHOD

Motivation7 min

Formal Statement10 min

Examples13 min

Proof I9 min

Interpretation of the 3 Cases10 min

Proof II16 min

Problem Set #2

Quiz: Problem Set #2 5 questions

Optional Theory Problems (Batch #1) 10 min

The following problems are for those of you looking to challenge yourself beyond the required problem sets and programming questions. Most of these have been given in Stanford’s CS161 course, Design and Analysis of Algorithms, at some point. They are completely optional and will not be graded. While they vary in level, many are pretty challenging, and we strongly encourage you to discuss ideas and approaches with your fellow students on the “Theory Problems” discussion form.

You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most n+log2n−2 comparisons.
You are a given a unimodal array of n distinct elements, meaning that its entries are in increasing order up until its maximum element, after which its elements are in decreasing order. Give an algorithm to compute the maximum element that runs in O(log n) time.
You are given a sorted (from smallest to largest) array A of n distinct integers which can be positive, negative, or zero. You want to decide whether or not there is an index i such that A[i] = i. Design the fastest algorithm that you can for solving this problem.
You are given an n by n grid of distinct numbers. A number is a local minimum if it is smaller than all of its neighbors. (A neighbor of a number is one immediately above, below, to the left, or the right. Most numbers have four neighbors; numbers on the side have three; the four corners have two.) Use the divide-and-conquer algorithm design paradigm to compute a local minimum with only O(n) comparisons between pairs of numbers. (Note: since there are n2 numbers in the input, you cannot afford to look at all of them. Hint: Think about what types of recurrences would give you the desired upper bound.)

Programming Assignment #2

Quiz: Programming Assignment #2 1 question

Download the following text file:

IntegerArray.txt
This file contains all of the 100,000 integers between 1 and 100,000 (inclusive) in some order, with no integer repeated.

Your task is to compute the number of inversions in the file given, where the ith row of the file indicates the ith entry of an array.

Because of the large size of this array, you should implement the fast divide-and-conquer algorithm covered in the video lectures.

The numeric answer for the given input file should be typed in the space below.

So if your answer is 1198233847, then just type 1198233847 in the space provided without any space / commas / any other punctuation marks. You can make up to 5 attempts, and we’ll use the best one for grading.

(We do not require you to submit your code, so feel free to use any programming language you want — just type the final numeric answer in the following space.)

[TIP: before submitting, first test the correctness of your program on some small test files or your own devising. Then post your best test cases to the discussion forums to help your fellow students!]

Week 3

The QuickSort algorithm and its analysis;
probability review.

V. QUICKSORT - ALGORITHM

Week 3 Overview10 min

QUICKSORT - THE ALGORITHM (Part V). One of the greatest algorithms ever, and our first example of a randomized algorithm. These lectures go over the pseudocode — the high-level approach, how to partition an array around a pivot element in linear time with minimal extra storage, and the ramifications of different pivot choices — and explain how the algorithm works.

QUICKSORT - THE ANALYSIS (Part VI). These lectures prove that randomized QuickSort (i.e., with random pivot choices) runs in O(n log n) time on average. The analysis is as elegant as the algorithm itself, and is based on a “decomposition principle” that is often useful in the analysis of randomized algorithms. Note that there are some accompanying lecture notes for this part (available for download underneath each video). Also, it may be helpful to watch the first probability review video (below) before watching this sequence.

PROBABILITY REVIEW (Part VII). This first of these videos reviews the concepts from discrete probability that are necessary for the QuickSort analysis — sample spaces, events, random variables, expectation, and linearity of expectation. The second video covers just two topics, although quite tricky ones! (Namely, conditional probability and independence.) You need to review these two topics (via this video or some other source, as you wish) before studying the analysis of the randomized contraction algorithm in Week 4.

HOMEWORK: This week’s quiz will help you understand QuickSort and probability more deeply. Programming Assignment #3 asks you to implement QuickSort and compute the number of comparisons that it makes for three different pivot rules.

SUGGESTED READINGS FOR WEEK 3:

CLRS Chapter 7
KT Section 13.5
SW Section 2.3

Quicksort: Overview12 min

Partitioning Around a Pivot24 min

Correctness of Quicksort [Review - Optional]10 min

Choosing a Good Pivot22 min

VI. QUICKSORT - ANALYSIS

Analysis I: A Decomposition Principle21 min

Analysis II: The Key Insight11 min

Analysis III: Final Calculations8 min

VII. PROBABILITY REVIEW

Probability Review I25 min

Probability Review II17 min

Problem Set #3

Quiz: Problem Set #35 questions

Programming Assignment #3

Quiz: Programming Assignment #33 questions

QuickSort.txt

Week 4

Linear-time selection;
graphs, cuts, and the contraction algorithm.

VIII. LINEAR-TIME SELECTION

Week 4 Overview10 min

Part VIII — LINEAR-TIME SELECTION (Required). These lectures study the problem of computing the ith smallest element of an input array (e.g., the median). It’s easy to solve this problem in O(n log n) time using sorting, but we can do better! The required material goes over a super-practical randomized algorithm, very much in the spirit of QuickSort, that has linear expected running time. Don’t forget that it takes linear time just to read the input! The analysis is somewhat different than what we studied for QuickSort, but is equally slick. Basically, there’s a cool way to think about the progress the algorithm makes in terms of a simple coin-flipping experiment. Linearity of expectation (yes, it’s back…) seals the deal.

Part VIII — LINEAR-TIME SELECTION (Optional). I couldn’t resist covering some advanced related material. The first is an algorithm that has more Turing Award-winning authors than any other algorithm that I know of. It is a deterministic (i.e., no randomization allowed) linear-time algorithm for the Selection problem, based on an ingenious “median-of-medians” idea for guaranteeing good pivot choices. (There are some accompanying lectures notes for this part, available for download underneath each video.) The second optional topic answers the question “can we do better?” for sorting, unfortunately in the negative. That is, a counting argument shows that there is no “comparison-based” sorting algorithm (like MergeSort, QuickSort, or HeapSort) with worst-case running time better than n log n.

Part IX — GRAPHS AND THE CONTRACTION ALGORITHM. The second set of lectures for this week is a segue from randomized algorithms to graph algorithms. We first review graphs and the most standard ways of representing them (most commonly, by adjacency lists). We then consider the random contraction algorithm, discovered by Karger “only” 20ish years ago (while a PhD student here at Stanford). This algorithm solves the minimum cut problem — given an undirected graph, separate the vertices into two non-empty groups to minimize the number of “crossing edges”. Such problems come up when reasoning about, for example, physical networks, social networks, and images. This algorithm was perhaps the first strong evidence that graph problems could be added to the long list of “killer applications” of random sampling. Don’t tune out before the final plot twist — a simple but useful trick for transforming an algorithm that almost always fails into one that almost always succeeds.

HOMEWORK: Problem Set #4 has five questions about the randomized selection algorithm, cuts in graphs, and the contraction algorithm. Programming Assignment #4 asks you to implement the contraction algorithm and use it to compute the min cut of the graph that we provide.

SUGGESTED READINGS FOR WEEK 3:

CLRS Chapter 9, 22 (Only 22.1)
DPV Chapter 3 (only 3.1)
KT Chapter 13, Sections 13.2,13.5
SW Chapter 4, Section 4.1

Randomized Selection - Algorithm21 min

Randomized Selection - Analysis20 min

Deterministic Selection - Algorithm [Advanced - Optional]16 min

Deterministic Selection - Analysis I [Advanced - Optional]22 min

Deterministic Selection - Analysis II [Advanced - Optional]12 min

Omega(n log n) Lower Bound for Comparison-Based Sorting [Advanced - Optional]13 min

IX. GRAPHS AND THE CONTRACTION ALGORITHM

Graphs and Minimum Cuts15 min

Graph Representations14 min

Random Contraction Algorithm8 min

Analysis of Contraction Algorithm30 min

Counting Minimum Cuts7 min

Problem Set #4

Quiz: Problem Set #45 questions

Optional Theory Problems (Batch #2)10 min

Programming Assignment #4

Quiz: Programming Assignment #41 question

Final Exam (1 attempt per 24 hours)

Info and FAQ for final exam10 min

Quiz: Final Exam10 questions

Graph Search, Shortest Paths, and Data Structures

Course can be found here
Lecture slides can be found here
Summary can be found in my Github

About this course: The primary topics in this part of the specialization are: data structures (heaps, balanced search trees, hash tables, bloom filters), graph primitives (applications of breadth-first and depth-first search, connectivity, shortest paths), and their applications (ranging from deduplication to social network analysis).

Who is this class for: Learners with at least a little bit of programming experience who want to learn the essentials of algorithms. In a University computer science curriculum, this course is typically taken in the third year.

Week 1

Breadth-first and depth-first search;
computing strong components;
applications.

X. GRAPH SEARCH AND CONNECTIVITY (Week 1)

Week 1 Overview10 min

WELCOME: Welcome to Part 2 of the Algorithms Specialization: Graph Search, Shortest Paths, and Data Structures! Like the previous part, the course will have four weeks of lectures and assignments, followed by a final exam. Here are the highlights of the course’s first week.

SUMMARY: Week 1 is all about graph search and its applications (section X). We’ll cover a selection of fundamental primitives for reasoning about graphs. One very cool aspect of this material is that all of the algorithms that we’ll cover are insanely fast (linear time with small constants); and, it can be quite subtle to understand why they work! The culmination of these lectures — computing the strongly connected components of a directed graph with just two passes of depth-first search — vividly illustrates the point that fast algorithms often require deep insight into the structure of the problem that you’re solving. There are also lecture notes for this last topic (available for download underneath the videos). (If you’re feeling REALLY motivated, you might read up on Tarjan’s earlier algorithm for computing SCCs that needs only a single DFS pass!)

THE VIDEOS: We begin with an overview video, which gives some reasons why you should care about graph search, a general strategy for searching a graph without doing any redundant work, and a high-level introduction to the two most important search strategies, breadth-first search (BFS) and depth-first search (DFS). The second video discusses BFS in more detail, including applications to computing shortest paths and the connected components of an undirected graph. The third video drills down on DFS, and shows how to use it to compute a toplogical ordering of a directed acyclic graph (i.e., to sequence tasks in a way that respects precedence constraints). The fourth and fifth videos cover the description and analysis, respectively, of the aforementioned algorithm for computing SCCs. A final optional video — hopefully one of the more fun ones in the course — discusses the structure of the Web, and lightly touches on topics like Google’s PageRank algorithm and the “six degrees of separation” phenomenon in social networks.

THE HOMEWORK: Problem Set #1 should help you solidify your understanding of graph representations and graph search. The programming assignment asks you to implement the SCC algorithm from the lectures, and report your findings about the SCCs of a large graph. Programming Assignment #1 is the most difficult one of the course (and one of the more difficult ones in the entire specialization); as always, I encourage you to use the discussion forums to exchange ideas, tips, and test cases. If you can get through the first week of the course, it should be all downhill from there!

SUGGESTED READINGS FOR WEEK 1: Abbreviations in suggested readings refer to the following textbooks:

CLRS - Cormen, Leiserson, Rivest, and Stein, Introduction to Algorithms (3rd edition)
DPV - Dasgupta, Papadimitriou, and Vazirani, Algorithms
KT - Kleinberg and Tardos, Algorithm Design
SW - Sedgewick and Wayne, Algorithms (4th edition)
CLRS Chapter 22
DPV Chapter 3
KT Chapter 3, Section 3.5, 3.6
SW Chapter 4, Section 4.1,4.2

Overview, Resources, and Policies10 min

Resources

Mathematics for Computer Science (by Eric Lehman and Tom Leighton): https://www.cs.princeton.edu/courses/archive/fall06/cos341/handouts/mathcs.pdf

Lecture slides10 min

Graph Search - Overview23 min

Breadth-First Search (BFS): The Basics14 min

BFS and Shortest Paths7 min

BFS and Undirected Connectivity13 min

Depth-First Search (DFS): The Basics7 min

Topological Sort21 min

Computing Strong Components: The Algorithm29 min

Computing Strong Components: The Analysis26 min

Structure of the Web [Optional]18 min

Problem Set #1

Quiz: Problem Set #15 questions

Optional Theory Problems (Week 1)10 min

In the 2SAT problem, you are given a set of clauses, where each clause is the disjunction of two literals (a literal is a Boolean variable or the negation of a Boolean variable). You are looking for a way to assign a value “true” or “false” to each of the variables so that all clauses are satisfied — that is, there is at least one true literal in each clause. For this problem, design an algorithm that determines whether or not a given 2SAT instance has a satisfying assignment. (Your algorithm does not need to exhibit a satisfying assignment, just decide whether or not one exists.) Your algorithm should run in O(m+n) time, where m and n are the number of clauses and variables, respectively. [Hint: strongly connected components.]

Programming Assignment #1

Quiz: Programming Assignment #11 question

Week 2

Dijkstra’s shortest-path algorithm.

XI. DIJKSTRA’S SHORTEST-PATH ALGORITHM (Week 2)

Week 2 Overview10 min

SUMMARY: This week is the climax of all our work on graph search — Dijkstra’s shortest-path algorithm, surely one of the greatest hits of algorithms (Part XI). It works in any directed graph with non-negative edge lengths, and it computes the shortest paths from a source vertex to all other vertices. Particularly nice is the blazingly fast implementation that uses a heap data structure (more on heaps next week).

THE HOMEWORK: Problem Set #2 should solidify your understanding of Dijkstra’s shortest-path algorithm. In the programming assignment you’ll implement Dijkstra’s algorithm. You can just implement the basic version, but those of you who want a bigger challenge are encouraged to devise a heap-based implementation.

Dijkstra’s Shortest-Path Algorithm20 min

Dijkstra’s Algorithm: Examples12 min

Correctness of Dijkstra’s Algorithm19 min

Dijkstra’s Algorithm: Implementation and Running Time26 min

Problem Set #2

Quiz: Problem Set #25 questions

Optional Theory Problems (Week 2)10 min

In lecture we define the length of a path to be the sum of the lengths of its edges. Define the bottleneck of a path to be the maximum length of one of its edges. A mininum-bottleneck path between two vertices s and t is a path with bottleneck no larger than that of any other s-t path. Show how to modify Dijkstra’s algorithm to compute a minimum-bottleneck path between two given vertices. The running time should be O(mlogn), as in lecture.
We can do better. Suppose now that the graph is undirected. Give a linear-time (O(m)) algorithm to compute a minimum-bottleneck path between two given vertices.
What if the graph is directed? Can you compute a minimum-bottleneck path between two given vertices faster than O(mlogn)?

Programming Assignment #2

Quiz: Programming Assignment #21 question

Week 3

Heaps;
balanced binary search trees.

XII. HEAPS (Week 3)

Week 3 Overview10 min

SUMMARY: This week we start our discussion of data structures, a huge (and hugely useful) topic. This week covers heaps and balanced binary search trees. My primary goals here are to teach you the operations that these data structures support (along with their running times), and to develop your intuition about which data structures are useful for which sorts of problems.

THE VIDEOS: Part XII begins with a short overview video in which I explain my approach to teaching data structures in this course. The next two videos discuss heaps and some of their applications (required), and some details about how they are typically implemented (optional, recommended for hardcore programmer/computer science types). Part XIII discusses balanced binary search trees — the supported operations and canonical uses (required) and a glimpse at what’s “under the hood” (optional).

THE HOMEWORK: Problem Set #3 should solidify your understanding of heaps and search trees. In the programming assignment you’ll implement a slick data structure application covered in the lectures (median maintenance).

SUGGESTED READINGS FOR WEEK 3:

CLRS Chapter 6,11,12,13
DPV Section 4.5
SW Section 3.3, 3.4

Data Structures: Overview4 min

Heaps: Operations and Applications18 min

Heaps: Implementation Details [Advanced - Optional]20 min

XIII. BALANCED BINARY SEARCH TREES (Week 3)

Balanced Search Trees: Operations and Applications10 min

Binary Search Tree Basics, Part I13 min

Binary Search Tree Basics, Part II30 min

Red-Black Trees21 min

Rotations [Advanced - Optional]7 min

Insertion in a Red-Black Tree [Advanced]14 min

Problem Set #3

Quiz: Problem Set #35 questions

Programming Assignment #3

Quiz: Programming Assignment #31 question

Week 4

Hashing;
bloom filters.

XIV. HASHING: THE BASICS (Week 4)

Week 4 Overview10 min

SUMMARY: This week wraps up our discussion of data structures. We’ll finish with a bang by covering hash tables — certainly one of the most important data structures — in detail. First (Part XIV) we discuss hash table basics — the supported operations, the types of problems they’re useful for, and an overview of implementation approaches. Part XV has one required video about “pathological data sets” for hash functions, and three optional videos that cover great stuff — how randomization dodges pathological data sets, a construction of simple hash functions that are guaranteed to (in expectation over the choice of a random hash function) spread every data set out evenly, and performance analyses of hashing with both chaining and open addressing. Part XVI contains two required videos on the implementation and performance of bloom filters, which are like hash tables except that they are insanely space-efficient and occasionally suffer from false positives.

THE HOMEWORK: Problem Set #4 should solidify your understanding of hash tables and bloom filters. In the programming assignment you’ll implement another slick data structure application covered in the lectures (solving the 2-SUM problem using hash tables).

Hash Tables: Operations and Applications19 min

Hash Tables: Implementation Details, Part I18 min

Hash Tables: Implementation Details, Part II22 min

XV. UNIVERSAL HASHING (Week 4)

Pathological Data Sets and Universal Hashing Motivation21 min

Universal Hashing: Definition and Example [Advanced - Optional]25 min

Universal Hashing: Analysis of Chaining [Advanced - Optional]18 min

Hash Table Performance with Open Addressing [Advanced - Optional]15 min

XVI. BLOOM FILTERS (Week 4)

Bloom Filters: The Basics15 min

Bloom Filters: Heuristic Analysis13 min

Problem Set #4

Quiz: Problem Set #45 questions

Optional Theory Problems (Week 4)10 min

Recall that a set H of hash functions (mapping the elements of a universe U to the buckets {0,1,2,…,n−1}) is universal if for every distinct x,y∈U, the probability Prob[h(x)=h(y)] that x and y collide, assuming that the hash function h is chosen uniformly at random from H, is at most 1/n. In this problem you will prove that a collision probability of 1/n is essentially the best possible. Precisely, suppose that H is a family of hash functions mapping U to {0,1,2,…,n−1}, as above. Show that there must be a pair x,y∈U of distinct elements such that, if h is chosen uniformly at random from H, then Prob[h(x)=h(y)]≥1n−1|U|.

Programming Assignment #4

Quiz: Programming Assignment #41 question

Final Exam (1 attempt per 24 hours)

Info and FAQ for final exam10 min

Unlike the problem sets, you have only ONE attempt per 24 hours. Thus DO NOT START THE EXAM until you are ready.
There are 10 questions, and each is worth 1 point. Many are of the “select all that apply” type, rather than the strict multiple choice format that is more common on the problem sets. You will receive partial credit for each option that you leave correctly checked or correctly unchecked. (So if you mark 4 out of the 5 options for a problem correctly, you’ll receive 0.8 out of the 1 point.) You need 7 points total to pass the exam. All questions are about material covered in the required (i.e., non-optional) videos.
Roughly half of the questions follow fairly directly from the lecture material (assuming that you understand it thoroughly). Roughly a quarter of the questions are variations on problem set questions. Roughly a quarter of the questions demand more thought, for example by asking you to consider whether certain results from lecture do or do not extend to more general situations.
Good luck!

Quiz: Final Exam10 questions

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

Course can be found here
Lecture slides and solutions can be found here

About this course: The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Week 1

Two motivating applications;
selected review;
introduction to greedy algorithms;
a scheduling application;
Prim’s MST algorithm.

XVII. TWO MOTIVATING APPLICATIONS (Week 1)

Week 1 Overview 10 min

WELCOME: Welcome to Part 3 of the Algorithms Specialization: Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming! Like the previous two parts, the course will have four weeks of lectures and assignments, followed by a final exam. Here are the highlights of the course’s first week.

TWO MOTIVATING APPLICATIONS: We begin with a fairly non-technical discussion of two motivating applications — distributed shortest-path routing in the Internet, and sequence alignment — to build excitement for the tools that you’ll acquire later in this course (and also in Part 4 of the specialization).

INTRODUCTION TO GREEDY ALGORITHMS: The focus of this week and the next is the greedy algorithm design paradigm. These two non-technical videos discuss the pros and cons of this paradigm and describe a cool application to the optimal management of the contents of a cache.

A SCHEDULING APPLICATION: Scheduling problems come up all the time (e.g., how should a shared resource be allocated?) and greedy algorithms are often useful for them. We’ll discuss a specific success story — minimizing the weighted sum of completion times of a bunch of tasks — in detail. The correctness proof furnishes a particularly clean example of an “exchange argument.”

PRIM’S MST ALGORITHM: The minimum spanning tree (MST) problem, in addition to enjoying several applications, is a uniquely great problem for the study of greedy algorithms. Unusually, several different greedy algorithms always compute an optimal solution. We begin here with the Dijkstra-esque Prim’s algorithm. The correctness proof requires understanding the subtly beautiful structure of cuts in graphs, while its blazingly fast implementation relies on a deft application of the heap data structure.

VIDEOS AND SLIDES: A reminder that videos can be streamed or downloaded and watched offline (recommended for commutes, etc.). We are also providing PDF lecture slides (typed versions of what’s written in the lecture videos), as well as subtitle files (in English and in some cases other languages as well). And if you find yourself wishing that I spoke more quickly or more slowly, note that you can adjust the video speed to accommodate your preferred pace.

HOMEWORK #1: The first problem set consists of 5 problems, about greedy scheduling algorithms and minimum spanning trees. The first programming assignment asks you to implement some of the algorithms that we’ve covered, run them on large inputs, and enter the answer. For the seasoned programmers out there looking for an additional challenge, try doing the programming assignments in a programming language that you don’t already know!

DISCUSSION FORUMS: Discussion forums play an absolutely crucial role in massive online courses. If you have trouble understanding a lecture or completing an assignment, you should turn to the forums for help. After you’ve mastered the lectures and assignments for a given week, I hope you’ll contribute to the forums and help out your fellow learners. While I won’t have time to carefully monitor the discussion forums, I’ll check in and answer questions whenever I find the time.

SUGGESTED READINGS FOR WEEK 1: Abbreviations in suggested readings refer to the following textbooks:

CLRS - Cormen, Leiserson, Rivest, and Stein, Introdution to Algorithms (3rd edition)
DPV - Dasgupta, Papadimitriou, and Vazirani, Algorithms
KT - Kleinberg and Tardos, Algorithm Design
SW - Sedgewick and Wayne, Algorithms (4th edition)
CLRS: Chapter 16 (Sections 1 and 2) and Chapter 23
DPV: Sections 5.1.1, 5.1.2, and 5.1.5
KT: Sections 4.1, 4.2, 4.3, and 4.5
SW: Section 4.3

Overview, Resources, and Policies 10 min

Course overview

Algorithms are the heart of computer science, and the subject has countless practical applications as well as intellectual depth. This specialization is an introduction to algorithms for learners with at least a little programming experience. The specialization is rigorous but emphasizes the big picture and conceptual understanding over low-level implementation and mathematical details. After completing this specialization, you will have a greater mastery of algorithms than almost anyone without a graduate degree in the subject. Specific topics in Part 3 of the specialization include: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Resources

Mathematics for Computer Science (by Eric Lehman and Tom Leighton): https://www.cs.princeton.edu/courses/archive/fall06/cos341/handouts/mathcs.pdf

Course Policies

Video Lectures

Lectures will be made available weekly, for a total of four weeks. In a given week, there will generally be two hours of required material, plus additional optional material (some review, some advanced topics).
Below each lecture video there is a PDF of typed versions of the slides.
You can download the videos of the lectures for offline viewing.
The video player supports speeding up and slowing down the pace.

Weekly Programming Assignments and Problem Sets

Each week there will be a new problem set and a new programming assignment.
For each problem set you are allowed a maximum of two attempts in a 12-hour period (we’ll use the best score).
For each programming assignment you’re allowed a maximum of 10 attempts in a 12-hour period (we’ll use the best score).
For the final exam, you’re allowed one attempt per 24 hours.

Grading

To pass a problem set, you must get at least 4 of the 5 questions correct (80%).
To pass a programming assignment, you must get all of the answers correct (100%).
To pass the final exam, you must get at least 70% of the total points (14 out of 20).
To pass the course, you must pass all of the problem sets, all of the programming assignments, and the final exam.

Theory Problems

These are totally optional theory questions (no deadlines or credit given).
We encourage you to attempt these questions and discuss them in the forums to develop a deeper understanding of the design and analysis of algorithms.

Lecture slides 10 min

See the zip file below (which includes slides for both Part 3 and Part 4 of the specialization). Also available from the video lectures on a lecture-by-lecture basis.

Algo2Typed.zip

Application: Internet Routing 10 min

https://www.coursera.org/learn/algorithms-greedy/lecture/0VcrE/application-internet-routing
Dijkstra’s algorithm
Bellman-Ford algorithm

Application: Sequence Alignment 8 min

https://www.coursera.org/learn/algorithms-greedy/lecture/ekVkk/application-sequence-alignment
Brute-force search: Try all possible alignments, remember the best
one.

XVIII. INTRODUCTION TO GREEDY ALGORITHMS (Week 1)

Introduction to Greedy Algorithms 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/WHe2b/introduction-to-greedy-algorithms
DANGER: Most greedy algorithms are NOT correct. (Even if your
intuition says otherwise!)

Application: Optimal Caching 10 min

https://www.coursera.org/learn/algorithms-greedy/lecture/VMnNW/application-optimal-caching
The “furthest-in-future” algorithm is optimal (i.e., minimizes the number of cache misses).
Least Recently Used (LRU)

XIX. A SCHEDULING APPLICATION (Week 1)

Problem Definition 5 min

https://www.coursera.org/learn/algorithms-greedy/lecture/ZvZGZ/problem-definition
Goal: Minimize the weighted sum of completion times:

A Greedy Algorithm 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/Jo6gK/a-greedy-algorithm
Guess (1): Order jobs by decreasing value of wj - lj .
Guess (2): Order wj/lj .
Alg#1 not (always) correct.
Alg#2 (order by decreasing ratio wj=lj ‘s) is always correct. not obvious! - proof coming up next

Correctness Proof - Part I 6 min

https://www.coursera.org/learn/algorithms-greedy/lecture/rFQ5P/correctness-proof-part-i
Proof: By an Exchange Argument.

Correctness Proof - Part II 4 min

https://www.coursera.org/learn/algorithms-greedy/lecture/mrXwO/correctness-proof-part-ii
Cost-Benefit Analysis,
Upshot:

Cost of exchange wi lj . [Ci goes up by lj ]
Benefit of exchange is wj li . [Cj goes down by li ]
Note: i > j $\rightarrow$ wi/li < wj/lj $\rightarrow$ wi lj < wj li $\rightarrow$ COST < BENEFIT $\rightarrow$ Swap improves $\sigma^{\star}$, contradicts optimality of $\sigma^{\star}$.

Handling Ties [Advanced - Optional] 7 min

https://www.coursera.org/learn/algorithms-greedy/lecture/YuoAV/handling-ties-advanced-optional

XX. PRIM’S MINIMUM SPANNING TREE ALGORITHM (Week 1)

MST Problem Definition 11 min

https://www.coursera.org/learn/algorithms-greedy/lecture/9D5ML/mst-problem-definition
spanning tree

Prim’s MST Algorithm 7 min

https://www.coursera.org/learn/algorithms-greedy/lecture/tQ6gK/prims-mst-algorithm
Prim’s MST Algorithm

Correctness Proof I 15 min

https://www.coursera.org/learn/algorithms-greedy/lecture/15UXn/correctness-proof-i
Proof of Prim’s algorithm outputs a spanning tree.
(1) Algorithm maintains invariant that T spans X
(2) Can’t get stuck with X $\neq$ V
(3) No cycles ever get created in T.

Correctness Proof II 8 min

https://www.coursera.org/learn/algorithms-greedy/lecture/hYzal/correctness-proof-ii
Proof of Prim’s algorithm always outputs a minimum-cost
spanning tree.
The Cut Property

Proof of Cut Property [Advanced - Optional] 11 min

https://www.coursera.org/learn/algorithms-greedy/lecture/UImix/proof-of-cut-property-advanced-optional

Fast Implementation I 14 min

https://www.coursera.org/learn/algorithms-greedy/lecture/bYMq1/fast-implementation-i
why O(m) time per iteration?
Because computer does not know which vertice is in or not in X and V-X, so computer iterate through all edges which v in X and w in V-X.

Check: Can initialize heap with O( m + n log n ) = O(mlog n)
preprocessing.
m: iterate through m edges and put it into heap
nlogn: might be time of heap insert operation

Fast Implementation II 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/qzdR8/fast-implementation-ii
O(mlog n) time
m: (n - 1) Inserts during preprocessing
(n - 1) Extract-Mins (one per iteration of while loop)
O[(n-1) + (n-1) log n] = O(m+mlog n) = O(mlog n)

Problem Set #1

Quiz: Problem Set #1 5 questions

QUIZ
Problem Set #1
5 questions
To Pass80% or higher
Attempts2 every 12 hours
Deadline
July 23, 11:59 PM PDT

1 point
1.We are given as input a set of $n$ requests (e.g., for the use of an auditorium), with a known start time $s_i$ and finish time $t_i$ for each request $i$. Assume that all start and finish times are distinct. Two requests conflict if they overlap in time — if one of them starts between the start and finish times of the other. Our goal is to select a maximum-cardinality subset of the given requests that contains no conflicts. (For example, given three requests consuming the intervals [0,3], [2,5], and [4,7], we want to return the first and third requests.) We aim to design a greedy algorithm for this problem with the following form: At each iteration we select a new request i, including it in the solution-so-far and deleting from future consideration all requests that conflict with i.

Which of the following greedy rules is guaranteed to always compute an optimal solution?

At each iteration, pick the remaining request which requires the least time (i.e., has the smallest value of $t_i−s_i$) (breaking ties arbitrarily).

This should not be selected
Exercise: find a counterexample that uses only three requests.

At each iteration, pick the remaining request with the fewest number of conflicts with other remaining requests (breaking ties arbitrarily).

This should not be selected
There are counterexamples, though finding one requires a bit of work!

At each iteration, pick the remaining request with the earliest finish time.

Correct
Let Rj denote the requests with the j earliest finish times. Prove by induction on j that this greedy algorithm selects the maximum-number of non-conflicting requests from Sj.

At each iteration, pick the remaining request with the earliest start time.

This should not be selected
What if there is a request with start time 0 and an extremely large finish time?

1 point
2.We are given as input a set of $n$ jobs, where job $j$ has a processing time $p_j$ and a deadline $d_j$. Recall the definition of completion times $C_j$ from the video lectures. Given a schedule (i.e., an ordering of the jobs), we define the lateness $l_j$ of job $j$ as the amount of time $C_j−d_j$ after its deadline that the job completes, or as 0 if Cj≤dj. Our goal is to minimize the maximum lateness, maxj$l_j$.

Which of the following greedy rules produces an ordering that minimizes the maximum lateness? You can assume that all processing times and deadlines are distinct.

Schedule the requests in increasing order of processing time pj

Schedule the requests in increasing order of the product dj$\cdot$pj

None of the other answers are correct.

This should not be selected
One of the proposed greedy rules works. Try an exchange argument.

Schedule the requests in increasing order of deadline dj

Correct
Proof by an exchange argument, analogous to minimizing the weighted sum of completion times.

1 point
3.In this problem you are given as input a graph T=(V,E) that is a tree (that is, T is undirected, connected, and acyclic). A perfect matching of T is a subset F$\subset$E of edges such that every vertex v∈V is the endpoint of exactly one edge of F. Equivalently, F matches each vertex of T with exactly one other vertex of T. For example, a path graph has a perfect matching if and only if it has an even number of vertices.

Consider the following two algorithms that attempt to decide whether or not a given tree has a perfect matching. The degree of a vertex in a graph is the number of edges incident to it. (The two algorithms differ only in the choice of v in line 5.)

Algorithm A:

While T has at least one vertex:
  If T has no edges: 
    halt and output "T has no perfect matching."
  Else:
    Let v be a vertex of T with maximum degree.
    Choose an arbitrary edge e incident to v.
    Delete e and its two endpoints from T.
[end of while loop]
Halt and output "T has a perfect matching."

Algorithm B:

While T has at least one vertex:
  If T has no edges: 
    halt and output "T has no perfect matching."
  Else:
    Let v be a vertex of T with minimum non-zero degree.
    Choose an arbitrary edge e incident to v.
    Delete e and its two endpoints from T.
[end of while loop]
Halt and output "T has a perfect matching."

Is either algorithm correct?

Both algorithms always correctly determine whether or not a given tree graph has a perfect matching.

This should not be selected
Consider a three-hop path.

Neither algorithm always correctly determines whether or not a given tree graph has a perfect matching.

This should not be selected
Try to prove correctness by induction.

Algorithm A always correctly determines whether or not a given tree graph has a perfect matching; algorithm B does not.

This should not be selected
Consider a three-hop path.

Algorithm B always correctly determines whether or not a given tree graph has a perfect matching; algorithm A does not.

Correct
Algorithm A can fail, for example, on a three-hop path. Correctness of algorithm B can be proved by induction on the number of vertices in T. Note that the tree property is used to argue that there must be a vertex with degree 1; if there is a perfect matching, it must include the edge incident to this vertex.

1 point
4.Consider an undirected graph G=(V,E) where every edge e∈E has a given cost ce. Assume that all edge costs are positive and distinct. Let T be a minimum spanning tree of G and P a shortest path from the vertex s to the vertex t. Now suppose that the cost of every edge e of G is increased by 1 and becomes ce+1. Call this new graph G′. Which of the following is true about G′?

T must be a minimum spanning tree but P may not be a shortest s-t path.

Correct
The positive statement has many proofs (e.g., via the Cut Property). For the negative statement,
think about two different paths from s to t that contain a different number of edges.

T may not be a minimum spanning tree but P is always a shortest s-t path.

T is always a minimum spanning tree and P is always a shortest s-t path.

This should not be selected
Think about two different paths from s to t that contain a different number of edges.

T may not be a minimum spanning tree and P may not be a shortest s-t path.

This should not be selected
What does the Cut Property tell you about G′ vs. G?

1 point
5.Suppose T is a minimum spanning tree of the connected graph G. Let H be a connected induced subgraph of G. (I.e., H is obtained from G by taking some subset S$\subseteq$V of vertices, and taking all edges of E that have both endpoints in S. Also, assume H is connected.) Which of the following is true about the edges of T that lie in H? You can assume that edge costs are distinct, if you wish. [Choose the strongest true statement.]

For every G and H, these edges form a spanning tree (but not necessary minimum-cost) of H

This should not be selected
Suppose G is a triangle and H is an edge.

For every G and H, these edges form a minimum spanning tree of H

This should not be selected
How do you know that they form a connected subgraph of H?

For every G and H, these edges are contained in some minimum spanning tree of H

Correct
Proof via the Cut Property (cuts in G correspond to cuts in H with only fewer crossing edges).

For every G and H and spanning tree TH of H, at least one of these edges is missing from TH

This should not be selected
Think about the Cut Property.

Optional Theory Problems (Week 1)10 min

The following problems are for those of you looking to challenge yourself beyond the required problem sets and programming questions. They are completely optional and will not be graded. While they vary in level, many are pretty challenging, and we strongly encourage you to discuss ideas and approaches with your fellow students on the “Theory Problems” discussion forum.

Consider a connected undirected graph G with not necessarily distinct edge costs. Consider two different minimum-cost spanning trees of G, T and T′. Is there necessarily a sequence of minimum-cost spanning trees T=T0,T1,T2,…,Tr=T′ with the property that each consecutive pair Ti,Ti+1 of MSTs differ by only a single edge swap? Prove the statement or exhibit a counterexample.
Consider the following algorithm. The input is a connected undirected graph with edge costs (distinct, if you prefer). The algorithm proceeds in iterations. If the current graph is a spanning tree, then the algorithm halts. Otherwise, it picks an arbitrary cycle of the current graph and deletes the most expensive edge on the cycle. Is this algorithm guaranteed to compute a minimum-cost spanning tree? Prove it or exhibit a counterexample.
Consider the following algorithm. The input is a connected undirected graph with edge costs (distinct, if you prefer). The algorithm proceeds in phases. Each phase adds some edges to a tree-so-far and reduces the number of vertices in the graph (when there is only 1 vertex left, the MST is just the empty set). In a phase, we identify the cheapest edge ev incident on each vertex v of the current graph. Let F={ev} be the collection of all such edges in the current phase. Obtain a new (smaller) graph by contracting all of the edges in F — so that each connected component of F becomes a single vertex in the new graph — discarding any self-loops that result. Let T denote the union of all edges that ever get contracted in a phase of this algorithm. Is T guaranteed to be a minimum-cost spanning tree? Prove it or exhibit a counterexample.

Programming Assignment #1

Quiz: Programming Assignment #1 3 questions

QUIZ
Programming Assignment #1
3 questions
To Pass100% or higher
Attempts10 every 12 hours
Deadline
July 23, 11:59 PM PDT
code can be found here

1 point
1.In this programming problem and the next you’ll code up the greedy algorithms from lecture for minimizing the weighted sum of completion times..

Download the text file below.

jobs.txt
This file describes a set of jobs with positive and integral weights and lengths. It has the format

[number_of_jobs]

[job_1_weight] [job_1_length]

[job_2_weight] [job_2_length]

…

For example, the third line of the file is “74 59”, indicating that the second job has weight 74 and length 59.

You should NOT assume that edge weights or lengths are distinct.

Your task in this problem is to run the greedy algorithm that schedules jobs in decreasing order of the difference (weight - length). Recall from lecture that this algorithm is not always optimal. IMPORTANT: if two jobs have equal difference (weight - length), you should schedule the job with higher weight first. Beware: if you break ties in a different way, you are likely to get the wrong answer. You should report the sum of weighted completion times of the resulting schedule — a positive integer — in the box below.

ADVICE: If you get the wrong answer, try out some small test cases to debug your algorithm (and post your test cases to the discussion forum).

1 point
2.For this problem, use the same data set as in the previous problem.

Your task now is to run the greedy algorithm that schedules jobs (optimally) in decreasing order of the ratio (weight/length). In this algorithm, it does not matter how you break ties. You should report the sum of weighted completion times of the resulting schedule — a positive integer — in the box below.

1 point
3.In this programming problem you’ll code up Prim’s minimum spanning tree algorithm.

Download the text file below.

edges.txt
This file describes an undirected graph with integer edge costs. It has the format

[number_of_nodes] [number_of_edges]

[one_node_of_edge_1] [other_node_of_edge_1] [edge_1_cost]

[one_node_of_edge_2] [other_node_of_edge_2] [edge_2_cost]

…

For example, the third line of the file is “2 3 -8874”, indicating that there is an edge connecting vertex #2 and vertex #3 that has cost -8874.

You should NOT assume that edge costs are positive, nor should you assume that they are distinct.

Your task is to run Prim’s minimum spanning tree algorithm on this graph. You should report the overall cost of a minimum spanning tree — an integer, which may or may not be negative — in the box below.

IMPLEMENTATION NOTES: This graph is small enough that the straightforward O(mn) time implementation of Prim’s algorithm should work fine. OPTIONAL: For those of you seeking an additional challenge, try implementing a heap-based version. The simpler approach, which should already give you a healthy speed-up, is to maintain relevant edges in a heap (with keys = edge costs). The superior approach stores the unprocessed vertices in the heap, as described in lecture. Note this requires a heap that supports deletions, and you’ll probably need to maintain some kind of mapping between vertices and their positions in the heap.

Week 2

Kruskal’s MST algorithm and applications to clustering;
advanced union-find (optional).

XXI. KRUSKAL’S MINIMUM SPANNING TREE ALGORITHM (Week 2)

Week 2 Overview 10 min

KRUSKAL’S MST ALGORITHM: Last week we covered Prim’s MST algorithm and a blazingly fast implementation of it. There are several reasons for studying a second greedy MST algorithm, due to Kruskal. First, it’s a beautiful algorithm, worthy of a greatest hits compilation. Second, to obtain a super-fast implementation of it, we’ll need to learn about the simple but fundamental “Union-Find” data structure. The third reason is covered in the next section…

CLUSTERING: Clustering is an important form of unsupervised learning (i.e., extracting patterns from unlabeled data). These two videos discuss how Kruskal’s MST algorithm suggests flexible and useful greedy approaches to clustering problems.

UNION-FIND LECTURES: This is purely optional material about advanced implementations and analysis of the union-find data structure. For those of you looking for some seriously next-level (but beautiful) material, check it out when you get a chance.

HOMEWORK #2: The second problem set is all about MSTs. The second programming assignment asks you to implement the greedy clustering algorithm from lecture. For part (a), a straightforward implementation should suffice. Part (b) involves a graph that is likely too big to fit in your computer’s memory, so answering this part might take some ingenuity.

Kruskal’s MST Algorithm 7 min

https://www.coursera.org/learn/algorithms-greedy/lecture/PLdBf/kruskals-mst-algorithm
Kruskal’s MST Algorithm

Correctness of Kruskal’s Algorithm 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/U3ukN/correctness-of-kruskals-algorithm

Implementing Kruskal’s Algorithm via Union-Find I 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/e0TJP/implementing-kruskals-algorithm-via-union-find-i

FIND(X): Return name of group that X belongs to.
UNION(Ci , Cj ): Fuse groups Ci , Cj into a single one.

Implementing Kruskal’s Algorithm via Union-Find II 13 min

https://www.coursera.org/learn/algorithms-greedy/lecture/TvDMg/implementing-kruskals-algorithm-via-union-find-ii

O(mlog n) total (Matching Prim’s algorithm)

MSTs: State-of-the-Art and Open Questions [Advanced - Optional] 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/Wt9aw/msts-state-of-the-art-and-open-questions-advanced-optional

XXII. CLUSTERING (Week 2)

Application to Clustering 11 min

https://www.coursera.org/learn/algorithms-greedy/lecture/QWubN/application-to-clustering

Call points p & q separated if they’re assigned to different clusters.

Initially, each point in a separate cluster
Repeat until only k clusters:
Let p, q = closest pair of separated points (determines the
current spacing)
-Merge the clusters containing p & q into a single cluster.

Correctness of Clustering Algorithm 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/7lWTf/correctness-of-clustering-algorithm

XXIII. ADVANCED UNION-FIND (Week 2)

Lazy Unions [Advanced - Optional] 10 min

https://www.coursera.org/learn/algorithms-greedy/lecture/WX3tg/lazy-unions-advanced-optional

Union-by-Rank [Advanced - Optional] 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/RF7aH/union-by-rank-advanced-optional

Analysis of Union-by-Rank [Advanced - Optional] 14 min

https://www.coursera.org/learn/algorithms-greedy/lecture/r2nA3/analysis-of-union-by-rank-advanced-optional

Path Compression [Advanced - Optional]14 min

Path Compression: The Hopcroft-Ullman Analysis I [Advanced - Optional]9 min

Path Compression: The Hopcroft-Ullman Analysis II [Advanced - Optional]11 min

The Ackermann Function [Advanced - Optional]16 min

Path Compression: Tarjan’s Analysis I [Advanced - Optional]14 min

Path Compression: Tarjan’s Analysis II [Advanced - Optional]13 min

Problem Set #2

Quiz: Problem Set #2 5 questions Due in 6 days

QUIZ
Problem Set #2
5 questions
To Pass80% or higher
Attempts2 every 12 hours
Deadline
July 30, 11:59 PM PDT

1 point
1.Suppose we are given a directed graph G=(V,E) in which every edge has a distinct positive edge weight. A directed graph is acyclic if it has no directed cycle. Suppose that we want to compute the maximum-weight acyclic subgraph of G (where the weight of a subgraph is the sum of its edges’ weights). Assume that G is weakly connected, meaning that there is no cut with no edges crossing it in either direction.

Here is an analog of Prim’s algorithm for directed graphs. Start from an arbitrary vertex s, initialize S={s} and F=∅. While S≠V, find the maximum-weight edge (u,v) with one endpoint in S and one endpoint in V−S. Add this edge to F, and add the appropriate endpoint to S.

Here is an analog of Kruskal’s algorithm. Sort the edges from highest to lowest weight. Initialize F=∅. Scan through the edges; at each iteration, add the current edge i to F if and only if it does not create a directed cycle.

Which of the following is true?

Only the modification of Kruskal’s algorithm always computes a maximum-weight acyclic subgraph.

This should not be selected
Four vertices suffice for a counterexample.

Both algorithms always compute a maximum-weight acyclic subgraph.

Both algorithms might fail to compute a maximum-weight acyclic subgraph.

Correct
Indeed. Any ideas for a correct algorithm?

Only the modification of Prim’s algorithm always computes a maximum-weight acyclic subgraph.

1 point
2.Consider a connected undirected graph G with edge costs that are not necessarily distinct. Suppose we replace each edge cost $c_e$ by $−c_e$; call this new graph G′. Consider running either Kruskal’s or Prim’s minimum spanning tree algorithm on G′, with ties between edge costs broken arbitrarily, and possibly differently, in each algorithm. Which of the following is true?

Prim’s algorithm computes a maximum-cost spanning tree of G but Kruskal’s algorithm might not.

Both algorithms compute a maximum-cost spanning tree of G, but they might compute different ones.

Correct
Different tie-breaking rules generally yield different spanning trees.

Both algorithms compute the same maximum-cost spanning tree of G.

This should not be selected
Does the spanning tree produced by either algorithm depend on how ties between edge costs are broken?

Kruskal’s algorithm computes a maximum-cost spanning tree of G but Prim’s algorithm might not.

1 point
3.Consider the following algorithm that attempts to compute a minimum spanning tree of a connected undirected graph G with distinct edge costs. First, sort the edges in decreasing cost order (i.e., the opposite of Kruskal’s algorithm). Initialize T to be all edges of G. Scan through the edges (in the sorted order), and remove the current edge from T if and only if it lies on a cycle of T.

Which of the following statements is true?

The output of the algorithm will never have a cycle, but it might not be connected.

The algorithm always outputs a spanning tree, but it might not be a minimum cost spanning tree.

The output of the algorithm will always be connected, but it might have cycles.

The algorithm always outputs a minimum spanning tree.

Correct
During the iteration in which an edge is removed, it was on a cycle C of T. By the sorted ordering, it must be the maximum-cost edge of C. By an exchange argument, it cannot be a member of any minimum spanning tree. Since every edge deleted by the algorithm belongs to no MST, and its output is a spanning tree (no cycles by construction, connected by the Lonely Cut Corollary), its output must be the (unique) MST.

1 point
4.Consider an undirected graph G=(V,E) where edge e∈E has cost $c_e$. A minimum bottleneck spanning tree T is a spanning tree that minimizes the maximum edge cost $max_{e∈T}c_e$. Which of the following statements is true? Assume that the edge costs are distinct.

A minimum bottleneck spanning tree is not always a minimum spanning tree, but a minimum spanning tree is always a minimum bottleneck spanning tree.

Correct
For the positive statement, recall the following (from correctness of Prim’s algorithm): for every edge e of the MST, there is a cut (A,B) for which e is the cheapest one crossing it. This implies that every other spanning tree has maximum edge cost at least as large. For the negative statement, use a triangle with one extra high-cost edge attached.

A minimum bottleneck spanning tree is always a minimum spanning tree and a minimum spanning tree is always a minimum bottleneck spanning tree.

This should not be selected
Is the minimum bottleneck spanning tree unique (even with distinct edge costs)?

A minimum bottleneck spanning tree is not always a minimum spanning tree and a minimum spanning tree is not always a minimum bottleneck spanning tree.

A minimum bottleneck spanning tree is always a minimum spanning tree but a minimum spanning tree is not always a minimum bottleneck spanning tree.

1 point
5.You are given a connected undirected graph G with distinct edge costs, in adjacency list representation. You are also given the edges of a minimum spanning tree T of G. This question asks how quickly you can recompute the MST if we change the cost of a single edge. Which of the following are true? [RECALL: It is not known how to deterministically compute an MST from scratch in O(m) time, where m is the number of edges of G.] [Check all that apply.]

Suppose e∈T and we increase the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Correct
Let A,B be the two connected components of T−{e}. Edge e no longer belongs to the new MST if and only if it is no longer the cheapest edge crossing the cut (A,B) (this can be checked in O(m) time). If f is the new cheapest edge crossing (A,B), then the new MST is T−{e}∪{f}.

Suppose e∈T and we decrease the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Correct
The MST does not change (by the Cut Property), so no re-computation is needed.

Suppose e∉T and we decrease the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Correct
Let C be the cycle of T∪{e}. Edge e belongs to the new MST if and only if it is no longer the most expensive edge of C (this can be checked in O(n) time). If f is the new most expensive edge of C, then the new MST is T∪{e}−{f}.

Suppose e∉T and we increase the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Correct
The MST does not change (by the Cycle Property of the previous problem), so no re-computation is needed.

Optional Theory Problems (Week 2) 10 min

Consider a connected undirected graph G with edge costs, which need not be distinct. Prove the following statement or provide a counterexample: for every MST T of G, there exists a way to sort G’s edges in nondecreasing order of cost so that Kruskal’s algorithm outputs the tree T.
Recall the definition of a minimum bottleneck spanning tree from Problem Set #2. Give a linear-time (i.e., O(m)) algorithm for computing a minimum bottleneck spanning tree of a connected undirected graph. [Hint: make use of a non-trivial linear-time algorithm discussed in Part 1 of the specialization.]
Consider a connected undirected graph G with distinct edge costs that are positive integers between 1 and n3, where n is the number of vertices of G. How fast can you compute the MST of G?
Read about matroids correctly computes a maximum-weight basis. For the matroid of spanning trees of a graph, this algorithm becomes Kruskal’s algorithm. Can you formulate an analog of Prim’s MST algorithm for matroids?
Prove that our analysis of union-find with lazy unions and union by rank (but without path compression) is asymptotically optimal (i.e., there are sequences of operations where you do Θ(logn) work on most of the operations).
Prove that in our union-find data structure with lazy unions, union by rank, and path compression, some operations might require Θ(logn) time.

Programming Assignment #2

Quiz: Programming Assignment #2 2 questions Due in 6 days

QUIZ
Programming Assignment #2
2 questions
To Pass100% or higher
Attempts10 every 12 hours
Deadline
July 30, 11:59 PM PDT

1 point
1.In this programming problem and the next you’ll code up the clustering algorithm from lecture for computing a max-spacing k-clustering.

Download the text file below.

clustering1.txt

This file describes a distance function (equivalently, a complete graph with edge costs). It has the following format:

[number_of_nodes]

[edge 1 node 1] [edge 1 node 2] [edge 1 cost]

[edge 2 node 1] [edge 2 node 2] [edge 2 cost]

…

There is one edge (i,j) for each choice of 1≤i<j≤n, where n is the number of nodes.

For example, the third line of the file is “1 3 5250”, indicating that the distance between nodes 1 and 3 (equivalently, the cost of the edge (1,3)) is 5250. You can assume that distances are positive, but you should NOT assume that they are distinct.

Your task in this problem is to run the clustering algorithm from lecture on this data set, where the target number k of clusters is set to 4. What is the maximum spacing of a 4-clustering?

ADVICE: If you’re not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

1 point
2.In this question your task is again to run the clustering algorithm from lecture, but on a MUCH bigger graph. So big, in fact, that the distances (i.e., edge costs) are only defined implicitly, rather than being provided as an explicit list.

The data set is below.

clustering_big.txt

The format is:

[# of nodes] [# of bits for each node’s label]

[first bit of node 1] … [last bit of node 1]

[first bit of node 2] … [last bit of node 2]

…

For example, the third line of the file “0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1” denotes the 24 bits associated with node #2.

The distance between two nodes u and v in this problem is defined as the Hamming distance— the number of differing bits — between the two nodes’ labels. For example, the Hamming distance between the 24-bit label of node #2 above and the label “0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0 1” is 3 (since they differ in the 3rd, 7th, and 21st bits).

The question is: what is the largest value of k such that there is a k-clustering with spacing at least 3? That is, how many clusters are needed to ensure that no pair of nodes with all but 2 bits in common get split into different clusters?

NOTE: The graph implicitly defined by the data file is so big that you probably can’t write it out explicitly, let alone sort the edges by cost. So you will have to be a little creative to complete this part of the question. For example, is there some way you can identify the smallest distances without explicitly looking at every pair of nodes?

https://github.com/dbarabanov/coursera/blob/master/algorithms_2/assignment_2/question_1.py

Week 3

Huffman codes; introduction to dynamic programming.

XXIV. HUFFMAN CODES (Week 3)

Week 3 Overview 10 min

HUFFMAN CODES: Everybody loves compression. It means more songs on your smartphone, faster downloads, and so on. Huffman coding is a fundamental type of lossless compression, used for example in the MP3 standard. In these videos we’ll learn about the optimality of Huffman codes, and a blazingly fast greedy algorithm for computing them.

DYNAMIC PROGRAMMING: This week also introduces the dynamic programming design paradigm; next week we’ll see a selection of killer applications. For now, we content ourselves with the development of a linear-time algorithm for a relatively simple problem, computing a maximum-weight independent set of a path graph. We conclude the week by zooming out and identifying the key principles of dynamic programming.

HOMEWORK #3: The third problem set and programming assignment reinforce the concepts and algorithms studied in the lectures, and the programming assignment asks you to implement the greedy algorithm for Huffman coding and the dynamic programming algorithm for finding a maximum-weight independent set of a path.

SUGGESTED READINGS FOR WEEK 3:

CLRS Chapter 16 (Section 3) and Chapter 15 (Section 3)
DPV Sections 5.2 and 6.7
KT Sections 4.8, 6.1, 6.2
SW Section 5.5

Introduction and Motivation 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/plgXS/introduction-and-motivation

fixed-length
variable-length(prefix free)

Problem Definition 10 min

https://www.coursera.org/learn/algorithms-greedy/lecture/IWxVe/problem-definition

A binary tree T minimizing the average encoding length
L.

A Greedy Algorithm 16 min

https://www.coursera.org/learn/algorithms-greedy/lecture/ZIJwv/a-greedy-algorithm

Huffman’s Algorithm

A More Complex Example 4 min

https://www.coursera.org/learn/algorithms-greedy/lecture/rTB4s/a-more-complex-example

Correctness Proof I 10 min

https://www.coursera.org/learn/algorithms-greedy/lecture/eSz8f/correctness-proof-i

Correctness Proof II 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/l3Ss5/correctness-proof-ii

XXV. INTRODUCTION TO DYNAMIC PROGRAMMING (Week 3)

Introduction: Weighted Independent Sets in Path Graphs 7 min

https://www.coursera.org/learn/algorithms-greedy/lecture/WENc1/introduction-weighted-independent-sets-in-path-graphs

max-weight independent set

WIS in Path Graphs: Optimal Substructure 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/t9XAF/wis-in-path-graphs-optimal-substructure

Upshot: A max-weight IS must be either
(i) a max-weight IS of G%\prime% or
(ii) $v_n$ + a max-weight IS of G%\prime\prime%

WIS in Path Graphs: A Linear-Time Algorithm 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/w040v/wis-in-path-graphs-a-linear-time-algorithm

A[0] = 0; A[1] = w1
For i = 2, 3, … n:
A[i] = max{ A[i - 1] , A[i - 2] + wi }

WIS in Path Graphs: A Reconstruction Algorithm 6 min

https://www.coursera.org/learn/algorithms-greedy/lecture/TZgJM/wis-in-path-graphs-a-reconstruction-algorithm

Let S = Null;
While i >= 1 [scan through array from right to left]
If A[i - 1] >= A[i - 2] + wi [i.e. case 1 wins]
- Decrease i by 1
Else [i.e., case 2 wins]
- Add vi to S, decrease i by 2
Return S

Principles of Dynamic Programming 7 min

https://www.coursera.org/learn/algorithms-greedy/lecture/VEc7L/principles-of-dynamic-programming

Problem Set #3

Quiz: Problem Set #35 questions

QUIZ
Problem Set #3
5 questions
To Pass80% or higher
Attempts2 every 12 hours
Deadline
August 6, 11:59 PM PDT

1 point
1.Consider an alphabet with five letters, {a,b,c,d,e}, and suppose we know the frequencies fa=0.32, fb=0.25, fc=0.2, fd=0.18, and fe=0.05. What is the expected number of bits used by Huffman’s coding scheme to encode a 1000-letter document?

a:0.32, b:0.25, c:0.2, de:0.23
a:0.32, b:0.25, cde: 0.43
ab: 0.57, cde: 0.43
abcde
a:00, b:01, c:10, d: 110, e:111
0.322 + 0.252 + 0.22 + 0.183 + 0.053 = 2.23
2.231000 = 2230

2230

2400

3000

3450

1 point
2.Under a Huffman encoding of n symbols, how long (in terms of number of bits) can a codeword possibly be?

log2n

This should not be selected
This is the best-case scenario for the length of the longest codeword; in general the longest codeword can be longer than this.

n-1

Correct
For the lower bound, take frequencies proportional to powers of 2. For the upper bound, note that the total number of merges is exactly n−1.

lnn

1 point
3.Which of the following statements holds for Huffman’s coding scheme?

If the most frequent letter has frequency less than 0.33, then all letters will be coded with at least two bits.

Such a letter will endure a merge in at least two iterations: the last one (which involves all letters), and at least one previous iteration. In the penultimate iteration, if the letter has not yet endured a merge, at least one of the two other remaining subtrees has cumulative frequency at least (1−.33)/2>.33, so the letter will get merged in this iteration.

A letter with frequency at least 0.5 might get encoded with two or more bits.

If a letter’s frequency is at least 0.4, then the letter will certainly be coded with only one bit.

This should not be selected
Counterexample: frequencies 40%, 40%, and 20%.

If the most frequent letter has frequency less than 0.5, then all letters will be coded with more than one bit.

This should not be selected
Counterexample: frequencies 40%, 30%, and 30%.

1 point
4.Which of the following is true for our dynamic programming algorithm for computing a maximum-weight independent set of a path graph? (Assume there are no ties.)

If a vertex is excluded from the optimal solution of a subproblem, then it is excluded from the optimal solutions of all bigger subproblems.

As long as the input graph has at least two vertices, the algorithm never selects the minimum-weight vertex.

If a vertex is excluded from the optimal solution of two consecutive subproblems, then it is excluded from the optimal solutions of all bigger subproblems.

Correct
By induction, since the optimal solution to a subproblem depends only on the solutions of the previous two subproblems.

The algorithm always selects the maximum-weight vertex.

1 point
5.Recall our dynamic programming algorithm for computing the maximum-weight independent set of a path graph. Consider the following proposed extension to more general graphs. Consider an undirected graph with positive vertex weights. For a vertex v, obtain the graph G′(v) by deleting v and its incident edges from G, and obtain the graph G′′(v) from G by deleting v, its neighbors, and all of the corresponding incident edges from G. Let OPT(H) denote the value of a maximum-weight independent set of a graph H. Consider the formula OPT(G)=max{OPT(G′(v)),wv+OPT(G′′(v))}, where v is an arbitrary vertex of G of weight wv. Which of the following statements is true?

The formula is always correct in trees, but does not lead to an efficient dynamic programming algorithm.

The formula is always correct in general graphs, and it leads to an efficient dynamic programming algorithm.

The formula is correct in path graphs but is not always correct in trees.

This should not be selected
Do weighted independent sets in trees have nice optimal substructure?

The formula is always correct in trees, and it leads to an efficient dynamic programming algorithm.

Correct
Indeed. What running time can you get?

Programming Assignment #3

Quiz: Programming Assignment #33 questions

QUIZ
Programming Assignment #3
3 questions
To Pass100% or higher
Attempts 10 every 12 hours
Deadline
August 6, 11:59 PM PDT

1 point
1.In this programming problem and the next you’ll code up the greedy algorithm from the lectures on Huffman coding.

Download the text file below.

huffman.txt
This file describes an instance of the problem. It has the following format:

[number_of_symbols]

[weight of symbol #1]

[weight of symbol #2]

…

For example, the third line of the file is “6852892,” indicating that the weight of the second symbol of the alphabet is 6852892. (We’re using weights instead of frequencies, like in the “A More Complex Example” video.)

Your task in this problem is to run the Huffman coding algorithm from lecture on this data set. What is the maximum length of a codeword in the resulting Huffman code?

ADVICE: If you’re not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

1 point
2.Continuing the previous problem, what is the minimum length of a codeword in your Huffman code?

1 point
3.In this programming problem you’ll code up the dynamic programming algorithm for computing a maximum-weight independent set of a path graph.

Download the text file below.

mwis.txt
This file describes the weights of the vertices in a path graph (with the weights listed in the order in which vertices appear in the path). It has the following format:

[number_of_vertices]

[weight of first vertex]

[weight of second vertex]

…

For example, the third line of the file is “6395702,” indicating that the weight of the second vertex of the graph is 6395702.

Your task in this problem is to run the dynamic programming algorithm (and the reconstruction procedure) from lecture on this data set. The question is: of the vertices 1, 2, 3, 4, 17, 117, 517, and 997, which ones belong to the maximum-weight independent set? (By “vertex 1” we mean the first vertex of the graph—there is no vertex 0.) In the box below, enter a 8-bit string, where the ith bit should be 1 if the ith of these 8 vertices is in the maximum-weight independent set, and 0 otherwise. For example, if you think that the vertices 1, 4, 17, and 517 are in the maximum-weight independent set and the other four vertices are not, then you should enter the string 10011010 in the box below.

Week 4

Advanced dynamic programming:
the knapsack problem,
sequence alignment,
and optimal binary search trees.

XXVI. THE KNAPSACK PROBLEM (Week 4)

Week 4 Overview 10 min

DYNAMIC PROGRAMMING BOOT CAMP: This week is devoted to the dynamic programming design paradigm and a selection of killer applications: the famous Knapsack problem in Part XXVI, the sequence alignment problem in Part XXVII, and computing optimal binary search trees in Part XXVIII.

HOMEWORK #4: The fourth problem set and programming assignment reinforce the concepts and algorithms studied in the lectures, and the programming assignment asks you to implement a dynamic programming algorithm for the Knapsack problem.

SUGGESTED READINGS FOR WEEK 3:

CLRS Chapter 15
DPV Chapter 6
KT Sections 6.3-6.6

The Knapsack Problem 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/LIgLJ/the-knapsack-problem
Developing a Dynamic Programming Algorithm
Step 1: Formulate recurrence [optimal solution as function of
solutions to smaller subproblems] based on a structure of an
optimal solution.
case 1 and case 2 with proofed

A Dynamic Programming Algorithm 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/0n68L/a-dynamic-programming-algorithm
knapsack algorithm

Example [Review - Optional] 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/LADQc/example-review-optional

XXVII. SEQUENCE ALIGNMENT (Week 4)

Optimal Substructure 13 min

https://www.coursera.org/learn/algorithms-greedy/lecture/QJkyp/optimal-substructure

A Dynamic Programming Algorithm 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/tNmae/a-dynamic-programming-algorithm

XXVIII. OPTIMAL BINARY SEARCH TREES (Week 4)

Problem Definition 12 min

https://www.coursera.org/learn/algorithms-greedy/lecture/GKCeN/problem-definition

Optimal Substructure 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/rUDLu/optimal-substructure

Proof of Optimal Substructure 6 min

https://www.coursera.org/learn/algorithms-greedy/lecture/0qjbs/proof-of-optimal-substructure

A Dynamic Programming Algorithm I 9 min

https://www.coursera.org/learn/algorithms-greedy/lecture/3wrTN/a-dynamic-programming-algorithm-i

A Dynamic Programming Algorithm II9 min

Problem Set #4

Quiz: Problem Set #4 5 questions

QUIZ
Problem Set #4
5 questions
To Pass80% or higher
Attempts2 every 12 hours
Deadline
August 13, 11:59 PM PDT

1 point
1.Consider a variation of the Knapsack problem where we have two knapsacks, with integer capacities W1 and W2. As usual, we are given n items with positive values and positive integer weights. We want to pick subsets S1,S2 with maximum total value (i.e., ∑i∈S1vi+∑i∈S2vi) such that the total weights of S1 and S2 are at most W1 and W2, respectively. Assume that every item fits in either knapsack (i.e., wi≤min{W1,W2} for every item i). Consider the following two algorithmic approaches. (1) Use the algorithm from lecture to pick a a max-value feasible solution S1 for the first knapsack, and then run it again on the remaining items to pick a max-value feasible solution S2 for the second knapsack. (2) Use the algorithm from lecture to pick a max-value feasible solution for a knapsack with capacity W1+W2, and then split the chosen items into two sets S1,S2 that have size at most W1 and W2, respectively. Which of the following statements is true?

Neither algorithm is guaranteed to produce an optimal feasible solution to the original problem.

Correct
Indeed. Can you devise from scratch a dynamic programming algorithm that correctly solves the problem?

Algorithm (1) is guaranteed to produce an optimal feasible solution to the original problem but algorithm (2) is not.

Algorithm (2) is guaranteed to produce an optimal feasible solution to the original problem but algorithm (1) is not.

Algorithm (1) is guaranteed to produce an optimal feasible solution to the original problem provided W1=W2.

This should not be selected
There’s a counterexample with 4 items.

1 point
2.Recall the dynamic programming algorithms from lecture for the Knapsack and sequence alignment problems. Both fill in a two-dimensional table using a double-for loop. Suppose we reverse the order of the two for loops. (I.e., cut and paste the second for loop in front of the first for loop, without otherwise changing the text in any way.) Are the resulting algorithms still well defined and correct?

Both algorithms remain well defined and correct after reversing the order of the for loops.

Correct
The necessary subproblem solutions are still available for constant-time lookup.

The Knapsack algorithm remains well defined and correct after reversing the order of the for loops, but the sequence alignment algorithm does not.

The sequence alignment algorithm remains well defined and correct after reversing the order of the for loops, but the Knapsack algorithm does not.

Neither algorithm remains well defined and correct after reversing the order of the for loops.

1 point
3.Consider an instance of the optimal binary search tree problem with 7 keys (say 1,2,3,4,5,6,7 in sorted order) and frequencies w1=.05,w2=.4,w3=.08,w4=.04,w5=.1,w6=.1,w7=.23. What is the minimum-possible average search time of a binary search tree with these keys?

2.18

Correct

2.42

2.9

2.08

1 point
4.The following problems all take as input two strings X and Y, of length m and n, over some alphabet Σ. Which of them can be solved in O(mn) time? [Check all that apply.]

Compute the length of a longest common subsequence of X and Y. (Recall a subsequence need not be consecutive. For example, the longest common subsequence of “abcdef” and “afebcd” is “abcd”.)

Correct
Similar dynamic programming to sequence alignment, with one subproblem for each Xi and Yj. Alternatively, this reduces to sequence alignment by setting the gap penalty to 1 and making the penalty of matching two different characters to be very large.

Assume that X and Y have the same length n. Does there exist a permutation f, mapping each i∈{1,2,…,n} to a distinct f(i)∈{1,2,…,n}, such that Xi=Yf(i) for every i=1,2,…,n?

Correct
This problem can be solved in O(n) time, without dynamic programming. Just count the frequency of each symbol in each string. The permutation f exists if and only if every symbol occurs exactly the same number of times in each string.

Consider the following variation of sequence alignment. Instead of a single gap penalty αgap, you’re given two numbers a and b. The penalty of inserting k gaps in a row is now defined as ak+b, rather than kαgap. Other penalties (for matching two non-gaps) are defined as before. The goal is to compute the minimum-possible penalty of an alignment under this new cost model.

Correct
Variation on the original sequence alignment dynamic program. With each subproblem, you need to keep track of what gaps you insert, since the costs you incur in the current position depend on whether or not the previous subproblems inserted gaps. Blows up the number of subproblems and running time by a constant factor.

Compute the length of a longest common substring of X and Y. (A substring is a consecutive subsequence of a string. So “bcd” is a substring of “abcdef”, whereas “bdf” is not.)

Correct
Similar dynamic programming to sequence alignment, with one subproblem for each Xi and Yj.

1 point
5.Recall our dynamic programming algorithms for maximum-weight independent set, sequence alignment, and optimal binary search trees. The space requirements of these algorithms are proportional to the number of subproblems that get solved: Θ(n) (where n is the number of vertices), Θ(mn) (where m,n are the lengths of the two strings), and Θ(n2) (where n is the number of keys), respectively.

Suppose we only want to compute the value of an optimal solution (the final answer of the first, forward pass) and don’t care about actually reconstructing an optimal solution (i.e., we skip the second, reverse pass over the table). How much space do you then really need to run each of three algorithms?

Θ(1), Θ(n), and Θ(n2)

Θ(n), Θ(mn), and Θ(n2)

This should not be selected
By throwing away old subproblems that you won’t ever need again, you can save space.

Θ(1), Θ(n), and Θ(n)

This should not be selected
How do you avoid keeping track of Θ(n2) subproblem solutions when computing an optimal binary search tree?

Θ(1), Θ(1), and Θ(n)

Optional Theory Problems (Week 4) 10 min

Give a dynamic programming algorithm that computes an optimal binary search tree and runs in O(n2) time.

Programming Assignment #4

Quiz: Programming Assignment #4 2 questions

QUIZ
Programming Assignment #4
2 questions
To Pass100% or higher
Attempts10 every 12 hours
Deadline
August 13, 11:59 PM PDT

1 point
1.In this programming problem and the next you’ll code up the knapsack algorithm from lecture.

Let’s start with a warm-up. Download the text file below.

knapsack1.txt
This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

…

For example, the third line of the file is “50074 659”, indicating that the second item has value 50074 and size 659, respectively.

You can assume that all numbers are positive. You should assume that item weights and the knapsack capacity are integers.

In the box below, type in the value of the optimal solution.

ADVICE: If you’re not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

1 point
2.This problem also asks you to solve a knapsack instance, but a much bigger one.

Download the text file below.

knapsack_big.txt
This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

…

For example, the third line of the file is “50074 834558”, indicating that the second item has value 50074 and size 834558, respectively. As before, you should assume that item weights and the knapsack capacity are integers.

This instance is so big that the straightforward iterative implemetation uses an infeasible amount of time and space. So you will have to be creative to compute an optimal solution. One idea is to go back to a recursive implementation, solving subproblems — and, of course, caching the results to avoid redundant work — only on an “as needed” basis. Also, be sure to think about appropriate data structures for storing and looking up solutions to subproblems.

In the box below, type in the value of the optimal solution.

ADVICE: If you’re not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

Final Exam (1 attempt per 24 hours)

Info and FAQ for final exam 10 min

Unlike the problem sets, you have only ONE attempt per 24 hours. Thus DO NOT START THE EXAM until you are ready.
There are 10 questions, and each is worth 1 point. Many are of the “select all that apply” type, rather than the strict multiple choice format that is more common on the problem sets. You will receive partial credit for each option that you leave correctly checked or correctly unchecked. (So if you mark 4 out of the 5 options for a problem correctly, you’ll receive 0.8 out of the 1 point.) You need 7 points total to pass the exam. All questions are about material covered in the required (i.e., non-optional) videos.
Roughly half of the questions follow fairly directly from the lecture material (assuming that you understand it thoroughly). Roughly a quarter of the questions are variations on problem set questions. Roughly a quarter of the questions demand more thought, for example by asking you to consider whether certain results from lecture do or do not extend to more general situations.
Good luck!

Quiz: Final Exam 10 questions

QUIZ
Final Exam
10 questions
To Pass70% or higher
Attempts1 every 1 day
Deadline
August 13, 11:59 PM PDT

1 point
1.Consider a connected undirected graph with distinct edge costs. Which of the following are true? [Check all that apply.]

Suppose the edge e is not the cheapest edge that crosses the cut (A,B). Then e does not belong to any minimum spanning tree.

Suppose the edge e is the most expensive edge contained in the cycle C. Then e does not belong to any minimum spanning tree.

Suppose the edge e is the cheapest edge that crosses the cut (A,B). Then e belongs to every minimum spanning tree.

The minimum spanning tree is unique.

correct

1 point
2.You are given a connected undirected graph G with distinct edge costs, in adjacency list representation. You are also given the edges of a minimum spanning tree T of G. This question asks how quickly you can recompute the MST if we change the cost of a single edge. Which of the following are true? [RECALL: It is not known how to deterministically compute an MST from scratch in O(m) time, where m is the number of edges of G.] [Check all that apply.]

Suppose e$\notin$T and we decrease the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Suppose e$\notin$T and we increase the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Suppose e∈T and we decrease the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

Suppose e∈T and we increase the cost of e. Then, the new MST can be recomputed in O(m) deterministic time.

correct

1 point
3.Which of the following graph algorithms can be sped up using the heap data structure?

Dijkstra’s single-source shortest-path algorithm (from Part 2).

Prim’s minimum-spanning tree algorithm.

Our dynamic programming algorithm for computing a maximum-weight independent set of a path graph.

Kruskal’s minimum-spanning tree algorithm.

correct

1 point
4.Which of the following problems reduce, in a straightforward way, to the minimum spanning tree problem? [Check all that apply.]

The minimum bottleneck spanning tree problem. That is, among all spanning trees of a connected graph with edge costs, compute one with the minimum-possible maximum edge cost.

The maximum-cost spanning tree problem. That is, among all spanning trees of a connected graph with edge costs, compute one with the maximum-possible sum of edge costs.

Given a connected undirected graph G=(V,E) with positive edge costs, compute a minimum-cost set F$\subseteq$E such that the graph (V,E−F) is acyclic.

The single-source shortest-path problem.

correct

1 point
5.Recall the greedy clustering algorithm from lecture and the max-spacing objective function. Which of the following are true? [Check all that apply.]

Suppose the greedy algorithm produces a k-clustering with spacing S. Then, if x,y are two points in a common cluster of this k-clustering, the distance between x and y is at most S.

If the greedy algorithm produces a k-clustering with spacing S, then every other k-clustering has spacing at most S.

This greedy clustering algorithm can be viewed as Prim’s minimum spanning tree algorithm, stopped early.

If the greedy algorithm produces a k-clustering with spacing S, then the distance between every pair of points chosen by the greedy algorithm (one pair per iteration) is at most S.

correct

1 point
6.We are given as input a set of n jobs, where job j has a processing time pj and a deadline dj. Recall the definition of completion times Cj from the video lectures. Given a schedule (i.e., an ordering of the jobs), we define the lateness lj of job j as the amount of time Cj−dj after its deadline that the job completes, or as 0 if Cj≤dj.

Our goal is to minimize the total lateness,

∑jlj.

Which of the following greedy rules produces an ordering that minimizes the total lateness?

You can assume that all processing times and deadlines are distinct.

WARNING: This is similar to but not identical to a problem from Problem Set #1 (the objective function is different).

Schedule the requests in increasing order of processing time pj

None of the other options are correct

Schedule the requests in increasing order of deadline dj

Schedule the requests in increasing order of the product dj$\cdot$pj

correct

1 point
7.Consider an alphabet with five letters, {a,b,c,d,e}, and suppose we know the frequencies fa=0.28, fb=0.27, fc=0.2, fd=0.15, and fe=0.1. What is the expected number of bits used by Huffman’s coding scheme to encode a 1000-letter document?

2230

2250

2520

correct

2450

1 point
8.Which of the following extensions of the Knapsack problem can be solved in time polynomial in n, the number of items, and M, the largest number that appears in the input? [Check all that apply.]

You are given n items with positive integer values and sizes, as usual, and m positive integer capacities, W1,W2,…,Wm . These denote the capacities of m different Knapsacks, where m could be as large as Θ(n). The problem is to pack items into these knapsacks to maximize the total value of the packed items. You are not allowed to split a single item between two of the knapsacks.

You are given n items with positive integer values and sizes, and a positive integer capacity W, as usual. The problem is to compute the max-value set of items with total size exactly W. If no such set exists, the algorithm should correctly detect that fact.

You are given n items with positive integer values and sizes, as usual, and two positive integer capacities, W1 and W2. The problem is to pack items into these two knapsacks (of capacities W1 and W2) to maximize the total value of the packed items. You are not allowed to split a single item between the two knapsacks.

You are given n items with positive integer values and sizes, and a positive integer capacity W, as usual. You are also given a budget k≤n on the number of items that you can use in a feasible solution. The problem is to compute the max-value set of at most k items with total size at most W.

correct

1 point
9.The following problems all take as input two strings X and Y, of length m and n, over some alphabet Σ. Which of them can be solved in O(mn) time? [Check all that apply.]

Compute the length of a longest common substring of X and Y. (A substring is a consecutive subsequence of a string. So “bcd” is a substring of “abcdef”, whereas “bdf” is not.)

Assume that X and Y have the same length n. Does there exist a permutation f, mapping each i∈{1,2,…,n} to a distinct f(i)∈{1,2,…,n}, such that Xi=Yf(i) for every i=1,2,…,n?

correct

1 point
10.Consider an instance of the optimal binary search tree problem with 7 keys (say 1,2,3,4,5,6,7 in sorted order) and frequencies w1=.2,w2=.05,w3=.17,w4=.1,w5=.2,w6=.03,w7=.25. What is the minimum-possible average search time of a binary search tree with these keys?

2.18

2.33

2.29

2.23

correct

Shortest Paths Revisited, NP-Complete Problems and What To Do About Them

Course can be found here
My certificate can be found here
Lecture slides can be found here

Divide and Conquer, Sorting and Searching, and Randomized Algorithms

Week 1

I. INTRODUCTION

Welcome and Week 1 Overview 10 min

Overview, Resources, and Policies 10 min

Course overview

Resources

Course Policies

Video Lectures

Weekly Programming Assignments and Problem Sets

Grading

Theory Problems

Lecture slides 10 min

Why Study Algorithms?4 min

Integer Multiplication 8 min

Karatsuba Multiplication 12 min

About the Course17 min

Merge Sort: Motivation and Example8 min

Merge Sort: Pseudocode12 min

Merge Sort: Analysis 9 min

Guiding Principles for Analysis of Algorithms 15 min

II. ASYMPTOTIC ANALYSIS

The Gist 14 min

Big-Oh Notation 4 min

Basic Examples 7 min

Big Omega and Theta7 min

Additional Examples [Review - Optional]7 min

Problem Set #1

Quiz: Problem Set #1 5 questions

Programming Assignment #1

Quiz: Programming Assignment #1 1 question

Week 2

III. DIVIDE & CONQUER ALGORITHMS

Week 2 Overview 10 min

O(n log n) Algorithm for Counting Inversions I12 min

O(n log n) Algorithm for Counting Inversions II1 6 min

Strassen’s Subcubic Matrix Multiplication Algorithm22 min

O(n log n) Algorithm for Closest Pair I [Advanced - Optional]31 min

O(n log n) Algorithm for Closest Pair II [Advanced - Optional]18 min

IV. THE MASTER METHOD

Motivation7 min

Formal Statement10 min

Examples13 min

Proof I9 min

Interpretation of the 3 Cases10 min

Proof II16 min

Problem Set #2

Quiz: Problem Set #2 5 questions

Optional Theory Problems (Batch #1) 10 min

Programming Assignment #2

Quiz: Programming Assignment #2 1 question

Week 3

V. QUICKSORT - ALGORITHM

Week 3 Overview10 min

Quicksort: Overview12 min

Partitioning Around a Pivot24 min

Correctness of Quicksort [Review - Optional]10 min

Choosing a Good Pivot22 min

VI. QUICKSORT - ANALYSIS

Analysis I: A Decomposition Principle21 min

Analysis II: The Key Insight11 min

Analysis III: Final Calculations8 min

VII. PROBABILITY REVIEW

Probability Review I25 min

Probability Review II17 min

Problem Set #3

Quiz: Problem Set #35 questions

Programming Assignment #3

Quiz: Programming Assignment #33 questions

Week 4

VIII. LINEAR-TIME SELECTION

Week 4 Overview10 min

Randomized Selection - Algorithm21 min

Randomized Selection - Analysis20 min

Deterministic Selection - Algorithm [Advanced - Optional]16 min

Deterministic Selection - Analysis I [Advanced - Optional]22 min

Deterministic Selection - Analysis II [Advanced - Optional]12 min

Omega(n log n) Lower Bound for Comparison-Based Sorting [Advanced - Optional]13 min

IX. GRAPHS AND THE CONTRACTION ALGORITHM

Graphs and Minimum Cuts15 min