Quiz: Print All Links
print_all_links must keep going until there are no more links to print. Think about looping forever (set while loop condition) until there are no more links (i.e. else:). What do you do when there are no more links (body of else: condition)?
At 1:26, Dave uses a procedure get_page(). The code for this procedure is given later in the course, in Lesson 4. This is the code:
def get_page(url): try: import urllib return urllib.urlopen(url).read() except: return ''
Include this code above your get_next_target() procedure in your answer.
Quiz: Crawling Process
Pseudo code from the video:
start with tocrawl = [seed] crawled =  while there are more pages tocrawl: pick a page from tocrawl add that page to crawled add all the link targets on this page to tocrawl return crawled
The seed page where crawling begins.
Quiz: Finishing Crawl Web
Hint: at some point, you will have to call get_page on page. It seems counterintuitive, but we use the word page to refer to both the url and the html of a webpage. The get_page procedure takes a url and returns html.
showing list1 and list2: [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, [5, 6]]
Welcome to Unit 4!
By the end of this unit, we’ll have a working search engine that can crawl and build an index of set of web pages, and respond to keyword queries!
You’ll also learn about designing and using complex data structures that build on the list structure we introduced in the previous unit.
Ben Bitdiddle suggests changing the index code by replacing the add_to_index and lookup procedures with the ones shown below the question.
def add_to_index(index, keyword, url): index.append([keyword, url]) def lookup(index, keyword): result =  for entry in index: if entry == keyword: result.append(entry) return result
This changes the structure of index, but suppose the only way we use index is by calling add_to_index and lookup. How would this affect the search engine?
**It would produce the wrong results for some lookup queries.
It would produce the same results for all queries, but lookup would sometimes be faster than the original code.
It would produce the same results for all queries, but add_to_index would be faster and lookup would usually be slower than the original code.
It would produce the same results and take the same amount of time for all queries**
def add_to_index(index, keyword, url):
for entry in index:
if entry == keyword:
# not found, add new keyword to index index.append([keyword, [url]]) def lookup(index, keyword): for entry in index: if entry == keyword: return entry return 
Welcome to Unit 5!
The main goal of Unit 5 is to learn about how computer scientists measure cost, which is mostly about understanding how the resources needed to run a program scale with the size of its input. We’ll also learn about implementing and using a hash table, a data structure that will massively improve the performance of our search engine.
It was a privilege to meet with Gabriel Weinberg, the founder of DuckDuckGo, to film this introduction. DuckDuckGo protects the privacy of its users and gets around 3 million searches per day. Gabriel’s blog is full of interesting articles about computing and startups.
>>> time_execution('lookup(index10000, "udacity")') (None, 0.000968000000000302) >>> time_execution('lookup(index10000, "udacity")') (None, 0.000905999999863066) >>> time_execution('lookup(index100000, "udacity")') (None, 0.008590000000002652) >>> time_execution('lookup(index100000, "udacity")') (None, 0.008517999999998093)
This quiz depends on the code for
make_big_index(size) from a few segments before:
def make_big_index(size): index =  letters = ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'] while len(index) < size: word = make_string(letters) add_to_index(index, word, 'fake') for i in range(len(letters) - 1, 0, -1): if letters[i] < 'z': letters[i] = chr(ord(letters[i]) + 1) break else: letters[i] = 'a' return index
This quiz depends on the code for
make_big_index(size) from a few segments before (as well as the code for
My claim about the performance being better with the
% buckets inside the loop is (often, and possibly always?) incorrect. Some enterprising students have done experiments showing this, and there is more discussion in the forum.
For further information on Hash Tables in Python, please refer to this article here
Here’s the code in a more readable format:
Here’s the code in a more readable way: (thanks to Christina-49) def get_all_links(page): links =  while True: url, endpos = get_next_target(page) if url: links.append(url) page = page[endpos:] else: break return links
Here’s the code in a more readable format: (thanks to Christina-49)
For which of these procedures does the worst-case running time scale linearly in the number of elements in the input list p? (You may assume that the elements in the list are all small numbers)
def sum_list(p): sum = 0 for e in p: sum = sum + e return sum Has_duplicate_element def has_duplicate_element(p): res =  for i in range(0, len(p)): for j in range(0, len(p)): if i != j and p[i] == p[j]: return True return False Mystery def mystery(p): i = 0 while True: if i >= len(p): break if p[i] % 2: i = i + 2 else: i = i + 1 return i
Peter muddles up odd and even in the last question. The statement p[i] % 2 is True whenp[i] is odd and False when it is even, so the worst case is when all the elements in the list are even.
Suppose we have a hash table implemented as described in Unit 5 using the hash_string function.
def hash_string(keyword, buckets): h = 0 for c in keyword: h = (h + ord(c)) % buckets return h
Which of the following are true statements?
The number of string comparisons done to lookup a keyword that is not a key in the hash table may be less than the number needed to lookup a keyword that is a key in the hash table.
We should expect the time to lookup most keywords in the hash table will decrease as we increase the number of buckets.
It is always better to have more buckets in a hash table.
The time to lookup a keyword in the hash table is always less than the time it would take in a linear time list (as used in Unit 4).
The notes for Unit 6 are here: PDF and web.
This unit introduces what I think is the most fascinating and powerful idea in all of computing - recursive definitions. Understanding them requires some mind-bending gymnastics, but once you do, you will find elegant and powerful new ways to think about nearly all problems you encounter.
The course moves through this pretty quickly, but fortunately many students have contributed great additional resources that explain things very well and with more detail than I do in the course, and give you more practice with recursive programs.
Here are a few that I think are especially good:
Around ~0:06, Dave says that factorial takes a positive whole number as its input, but factorial can also take 0 as an input as well. Instead, then, the input to factorial can be any positive integer or 0.
(Side note: whole numbers are defined differently in different contexts, but they are often defined as all of the non-negative integers. This means the whole numbers are 0, 1, 2, 3, 4…, and if we use this terminology, factorial could take any whole number as its input.)
Note: If you get a
The server encountered an error. Please try running again. error, that may mean that your program is not terminating when tested. Make sure your recursion will eventually reach a base case.
Note that this is the standard mathematical definition of the Fibonacci sequence, which is a bit different from the counting rabbits motivation in the original problem. The mathematical sequence starts with 0, which is more elegant mathematically, but wouldn’t make as much sense for rabbits multiplying.
At 1:58 minutes onwards, the formula should be fibo(36 - (n - 1)) = fibo(36 - n + 1) and not fibo(36 - n - 1).
A note on the notation: friends(p) is a list of all friends of p.
AltaVista was finally shut down in July 2013. Here’s an interesting article from the Washington Post: AltaVista is dead. Here’s why it’s so hard to compete with Google. (mostly an interview with Gabriel Weinberg).
rank(page, time) is defined as: ∑
rank(page, 0) = 1/npages
rank(page, t) = (1-d)/npages
+ sum (d * rank(p, t - 1) / number of outlinks from p) over all pages p that link to this page
Thanks to Henry for suggesting to add this.
The URLs have changed around a bit! Here’s a new index page you can start with to test out your search engine: https://www.udacity.com/cs101x/urank/index.html
The provided solution isn’t complete - it doesn’t actually include the ordered_search code, but only the code for sorting the pages.
Atlas7-115196 provided a more complete solution to this problem! See the forum post: http://forums.udacity.com/questions/100371211/corrected-udacity-solution#cs101
Correction - Peter Norvig teaches class CS212.
For the procedures below, check the procedures whose running time scales linearly with the length of the input in the worst case. You may assume the elements in input_list are fairly small numbers.
def proc1(input_list): maximum = None for element in input_list : if not maximum or maximum < element: maximum = element return maximum def proc2(input_list): sum = 0 while len(input_list) > 0: sum = sum + input_list # Assume input_list is constant time input_list = input_list[1:] # Assume input_list[1:] is constant time return sum def proc3(input_list): for i in range(0, len(input_list)): for j in range(0, len(input_list)): if input_list[i] == input_list[j] and i != j: return False return True
For each of the procedures defined below, check the box if the procedure always terminates for all inputs that are natural numbers (1,2,3…).
def proc1(n): while True: n = n - 1 if n == 0: break return 3 def proc2(n): if n == 0: return n return 1 + proc2(n - 2) def proc3(n): if n <= 3: return 1 return proc3(n - 1) + proc3(n - 2) + proc3(n - 3)
There was a typo in the last test example.
# a->a, a->b, b->a, a->c, c->d, c->a links are reciprocal
# a->a, a->b, b->a, a->c, c->d, d->a links are reciprocal
Sorry about this. There is a mistake in the video in generation 3 for pattern 30, which makes all the following lines incorrect as well. The corrected output is:
...x.... (input) ..xxx... ( generations = 1) .xx..x.. ( generations = 2) xx.xxxx. ( generations = 3) x..x.... ( generations = 4) xxxxx..x ( generations = 5) .....xxx ( generations = 6)
Additional information: Elementary Cellular Automaton at Wolfram’s Mathworld
Congratulations on making it to the final project! Your job is to take simple text strings like “Alex likes Carla, Ryan, and Priya” and turn them into a social network. To do this, you must complete a number of required procedures, as described on the next screen. You must also create a “make-your-own” procedure.
Most of this project will take place inside the browser and most of it will be auto-graded. Feel free to share your final code with your peers in the Discussion Forum for additional feedback.
If you have any questions, ask on the Discussion Forum!