diff --git a/.gitignore b/.gitignore index 09b31bb9c5934b45ba086c8962c604c74296dab1..bae9d534fcdd3054a82102e48de9a145600694f4 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ __pycache__/ media/ +uv.lock \ No newline at end of file diff --git a/Makefile b/Makefile deleted file mode 100644 index ba9a306f6524bf832f628e10f7e4d4603851c99a..0000000000000000000000000000000000000000 --- a/Makefile +++ /dev/null @@ -1,16 +0,0 @@ -.PHONY: test-search test-tree test-bst test - -test-search: - pytest tests/test_binary_search.py - -test-tree: - pytest tests/test_binary_tree.py - -test-bst: - pytest tests/test_binary_search_tree.py - -test-avl: - pytest tests/test_balanced_tree.py - -test: - pytest diff --git a/README.md b/README.md index 1f1e7ca7ad729467af97986c62004c54357a5d69..12fdebc87bd86ac0e040f15b42571d113ae31cf6 100644 --- a/README.md +++ b/README.md @@ -1,359 +1,7 @@ # Trees -In this class, you will learn about a very common algorithmic tool: trees. -The data structure itself will be motivated by a simple search example and then -implemented and studied from scratch. +This project is part of the 1MAE002 course. Please refer to the [website](https://gitlab-pages.isae-supaero.fr/mae-ac/website/projects/trees/introduction.html) for more information. -Finally, a few examples of trees will be showcased: -- _perfect_ maze generation -- loss-less text compression -- neighborhood optimization in $n$-body systems +# Contact -# Table of content -- [_Upload your work to the LMS_](#upload-your-work-to-the-lms-toc) -- [_Binary search_](#binary-search-toc) - - [_Question 1._](#question-1-toc) - - [_Question 2._](#question-2-toc) -- [_Binary search trees_](#binary-search-trees-toc) - - [_The tree data structure_](#the-tree-data-structure-toc) - - [_Question 3._](#question-3-toc) - - [_Question 4._](#question-4-toc) - - [_Question 5._](#question-5-toc) - - [_Question 6._](#question-6-toc) - - [_Tree traversal_](#tree-traversal-toc) - - [_Question 7._](#question-7-toc) - - [_Implementing the BST_](bst.md#implementing-the-bst-toc) - - [_Question 8._](bst.md#question-8-toc) - - [_Insertion_](bst.md#insertion-toc) - - [_Question 9._](bst.md#question-9-toc) - - [_Search_](bst.md#search-toc) - - [_Question 10._](bst.md#question-10-toc) - - [_Deletion_](bst.md#deletion-toc) - - [_Question 11._](bst.md#question-11-toc) -- [_Huffman encoding_](huffman.md#huffman-encoding-toc) - - [_Introduction_](huffman.md#introduction-toc) - - [_Question 12._](huffman.md#question-12-toc) - - [_Question 13._](huffman.md#question-13-toc) - - [_Huffman trees_](huffman.md#huffman-trees-toc) - - [_Question 14._](huffman.md#question-14-toc) - - [_Coding books_](huffman.md#coding-books-toc) - - [_Question 15._](huffman.md#question-15-toc) - - [_Text compression_](huffman.md#text-compression-toc) - - [_Question 16._](huffman.md#question-16-toc) - - [_Text decompression_](huffman.md#text-decompression-toc) - - [_Question 17._](huffman.md#question-17-toc) - - [_Question 18._](huffman.md#question-18-toc) - - [_Writing a CLI compression tool_](huffman.md#writing-a-cli-compression-tool-toc) - - [_The compression format_](huffman.md#the-compression-format-toc) - - [_Writing to and reading from the disk_](huffman.md#writing-to-and-reading-from-the-disk-toc) - - [_Question 19._](huffman.md#question-19-toc) - - [_Question 20._](huffman.md#question-20-toc) - - [_Wrapping up in a CLI application_](huffman.md#wrapping-up-in-a-cli-application-toc) - - [_Question 21._](huffman.md#question-21-toc) -- [_Unbalanced trees_ (_optional_)](avl.md#unbalanced-trees-toc-optional) - - [_Question 22._](avl.md#question-22-toc) -- [_Height-balanced BSTs_ (_optional_)](avl.md#height-balanced-bsts-toc-optional) - - [_Question 23._](avl.md#question-23-toc) - - [_Question 24._](avl.md#question-24-toc) - - [_Question 25._](avl.md#question-25-toc) - - [_Tree rotations_](avl.md#tree-rotations-toc) - - [_Question 26._](avl.md#question-26-toc) - - [_Question 27._](avl.md#question-27-toc) - - [_Balancing our BSTs_](avl.md#balancing-our-bsts-toc) - - [_Question 28._](avl.md#question-28-toc) -- [_Maze generation and solving_ (_optional_)](maze.md#maze-generation-and-solving-toc-optional) - - [_Generating perfect mazes_](maze.md#generating-perfect-mazes-toc) - - [_Question 29._](maze.md#question-29-toc) - - [_Question 30._](maze.md#question-30-toc) - - [_Question 31._](maze.md#question-31-toc) - - [_solving perfect mazes_](maze.md#solving-perfect-mazes-toc) - - [_Question 32._](maze.md#question-32-toc) - - [_Question 33._](maze.md#question-33-toc) -- [_Quad trees_ (_optional_)](qtree.md#quad-trees-toc-optional) - - [_$n$-body collision simulation_](qtree.md#n-body-collision-simulation-toc) - - [_Question 34._](qtree.md#question-34-toc) - - [_Question 35._](qtree.md#question-35-toc) - - [_Question 36._](qtree.md#question-36-toc) - - [_Let's speed things up_](qtree.md#lets-speed-things-up-toc) - - [_Question 37._](qtree.md#question-37-toc) - - [_Question 38._](qtree.md#question-38-toc) - - [_Question 39._](qtree.md#question-39-toc) - - [_Question 40._](qtree.md#question-40-toc) - - [_Question 41._](qtree.md#question-41-toc) - - [_Question 42._](qtree.md#question-42-toc) - - [_Question 43._](qtree.md#question-43-toc) - - [_Question 44._](qtree.md#question-44-toc) - - [_Question 45._](qtree.md#question-45-toc) - - - ---- - -## Upload your work to the LMS [[toc](#table-of-content)] -- open a terminal -- go into the folder containing your project -- use the `zip` command to compress your project -```shell -zip -r project.zip . -x "venv/**" ".git/**" -``` -- upload the ZIP archive to the [LMS](https://lms.isae.fr/mod/assign/view.php?id=116610&action=editsubmission) - ---- - -## Binary search [[toc](#table-of-content)] -Let's start this class with a quite simple problem: given a list $l$ of $n$ -integers and an integer $i$, how can you tell whether or not $i$ is in $l$? - -:gear: You can run `pytest`. You should have 14 failing tests, that's -normal you will complete your code base and make sure the tests pass during the -class. - -### Question 1. [[toc](#table-of-content)] -:file_folder: Create a new file `search.py`. - -:pencil: let's create a big list, and suppose it is not sorted. Using the module `time`, measure the time to find if the element `199_999_999` is in the list (you can use the keyword `in`). -```python -l = list(range(200_000_000)) -``` - -:question: What is the complexity of your algorithm? - -From now on, we will assume that the elements in $l$ are sorted in ascending -order. -This will greatly help us improve our search algorithm. - -The idea of _binary search_ is the following: because the elements are sorted in -ascending order, if we look at any element $j$ in the middle of a list, if -$i \lt j$ we know that, if $i$ is in $l$, then it must be on the left of $j$! -Now that one half of $l$ has been completely discarded, we can repeat the search -on one of the halfs of $l$. - -The algorithm repeats recursively until either -- $i = j$: $i$ is in $l$ -- we find the empty list: $i$ is not in $l$ - -### Question 2. [[toc](#table-of-content)] -:pencil: Implement binary search and measure the time needed to call it. -You should write a function `binary_search` in `search.py`: -- arguments: - - `l`: a sorted list of integers - - `i`: an integer -- return: a boolean which tells whether `i` is inside `l` or not - -:question: What is the complexity of this new algorithm? - -:gear: Run `pytest tests/test_binary_search.py`. If it's all green, good job :clap: :clap: - -That's much better! However, we have made a quite huge assumption about the -input data... it should be sorted!! - -In the real world, we won't have perfectly sorted data in general and it is -quite expensive to sort lists (remember, comparison-based sorting algorithms are -$O(n \log n)$ at best) or to insert an element (around $O(n)$). - -## Binary search trees [[toc](#table-of-content)] - -This is where trees, and especially _**B**inary **S**earch **T**ree**s**_ (BSTs), -come into play! - -But before getting to BSTs, which will definitely solve our key problem above, -we need to talk about trees. - -### The tree data structure [[toc](#table-of-content)] - -Trees are a very common algorithmic data structure. A tree is a recursive -structure that can either -- be _empty_ -- hold a _value_ and _references_ to other trees - -Some naming conventions: -- the _references_ to other trees are called _subtrees_ or _children_ and if there - are only two such _subtrees_, they are often called _left_ and _right_ -- the entry point of the tree is called the _root_ -- a tree that has no _subtrees_ is called a leaf -- if $a$ is a tree that has $b$ as a _child_, then $a$ is called the _parent_ of - $b$ - -The properties that a tree must satisfy: -- :exclamation: a tree must NOT contain any cycle, i.e. every _node_ in the tree - should have a unique _parent_ - -Below are some examples of trees and non-trees: -- the _empty_ tree (yeah, there is nothing to show with that one) -- a single _leaf_ -```mermaid -flowchart TD - 0((0)) -``` -- a tree where all nodes have at most 2 children -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)) - 0 --> 1 - 0 --> 2 - 1 --> 3 - 1 --> 4 -``` -- a tree with more than 2 children per node -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)); 5((5)); 6((6)); 7((7)) - 0 --> 1 - 0 --> 2 - 0 --> 3 - 0 --> 4 - 3 --> 5 - 3 --> 6 - 3 --> 7 -``` -- the following _thing_ is NOT a tree -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)) - 0 --> 1 - 0 --> 2 - 1 --> 3 - 1 --> 4 - 2 --> 4 -``` - -#### Question 3. [[toc](#table-of-content)] -:file_folder: Create a new file `tree.py`. - -:pencil: In `tree.py`, create a new class called `BinaryTree`. It should have -the following attributes which should all default to `None`: -- `value`: an integer -- `left`: another `BinaryTree`, the left child -- `right`: another `BinaryTree`, the right child - -:gear: Run `pytest tests/test_binary_tree.py`. You should have 6 failing tests, that's normal. - -:pencil: In order to help you debug the code and _see_ the binary trees, please -copy-paste the following method into your `BinaryTree` class. Now, you will be -able to call `print(binary_tree)` and see it in your terminal: -```python - def __repr__(self) -> str: - def aux(t, before: str, is_right: bool, has_right_brother: bool): - if t is None: - return '' - - if before is None: - curr = f"{t.value}\n" - else: - marker = '`' if is_right else '|' - curr = before + f"{marker}---- {t.value}\n" - - next = '' if before is None else ( - before + "| " if has_right_brother else before + " " - ) - right = t.right is not None - if t.left is not None: - left = aux( - t.left, next, is_right=not right, has_right_brother=right - ) - else: - left = '' - if t.right is not None: - right = aux( - t.right, next, is_right=True, has_right_brother=False - ) - else: - right = '' - return curr + left + right - - return aux(self, None, is_right=False, has_right_brother=False).strip() -``` - -First, we will implement some general methods on binary trees and then we'll -move on to BSTs. - -#### Question 4. [[toc](#table-of-content)] -:pencil: Add a method `is_empty` to your `BinaryTree` class. It should return a -`bool`, `True` if the tree is empty and `False` otherwise. - -:gear: Run `pytest tests/test_binary_tree.py`. If the "empty" test is green, good job :clap: -:clap: - -#### Question 5. [[toc](#table-of-content)] -:pencil: Add a method `is_leaf` to your `BinaryTree` class. It should return a -`bool`, `True` if the tree is a leaf and `False` otherwise. - -> :bulb: **Note** -> -> remember the naming conventions from [_The tree data structure_](#the-tree-data-structure-) - -:gear: Run `pytest tests/test_binary_tree.py`. If the "leaf" test is green, good job :clap: :clap: - -#### Question 6. [[toc](#table-of-content)] -:pencil: At the end of `tree.py`, in a _main_ block, define a variable that -represents the following tree: -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)) - 0 --> 1 - 0 --> 2 - 1 --> 3 - 1 --> 4 -``` - -:question: Can you print the tree to your terminal using `print`? - -:question: By using `BinaryTree.is_empty` and `BinaryTree.is_leaf`, is your tree -variable empty? Is it a leaf? - -:question: What about the right child of the left child? - -### Tree traversal [[toc](#table-of-content)] -In this section, you will learn about the simplest _real_ operation on trees: -_traversal_. - -_Traversing_ a tree means going through every node of the tree and performing -an operation on each one of the nodes. - -There are four main _traversal_ techniques on binary trees: -- **D**epth **F**irst **S**earch (DFS): the tree is explored as deep as possible - first - - prefix: a node is treated before its children - - infix: a node is treated in between its children - - postfix: a node is treated after its children -- **B**readth **F**irst **S**earch (BFS): the tree is explored level of depth - after level of depth - -Below is an animation for each one of these _traversal_ techniques: - - - -#### Question 7. [[toc](#table-of-content)] -:pencil: Implement the four _traversal_ techniques in your `BinaryTree` class. -At the end of this question, you should have four new methods: -- `BinaryTree.dfs_prefix` -- `BinaryTree.dfs_infix` -- `BinaryTree.dfs_postfix` -- `BinaryTree.bfs` - -These methods take as argument a function that will be applied to each node of the tree. - -:pencil: Call the `dfs_prefix` on the previous tree by providing the `print` method as the argument. You should get the following list: `0 1 3 4 2`. - - -:question: With the tree below, compare the order of _traversal_ for each technique. -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)); 5((5)); 6((6)) - 0 --> 1 - 0 --> 2 - 1 --> 3 - 1 --> 4 - 2 --> 5 - 2 --> 6 -``` - -:question: If you look back at the `BinaryTree.__repr__` method that was given -to you, what is the _traversal_ technique used to achieve the pretty output? - -:gear: Use `pytest tests/test_binary_tree.py` to make sure your traversal implementations are -correct :wink: - ---- ---- -> [go to next](bst.md) +jonathan.detchart@isae-supaero.fr diff --git a/assets/delete.mp4 b/assets/delete.mp4 deleted file mode 100644 index 17e4761f13253b8fdbb27c8a4ba46495e975017a..0000000000000000000000000000000000000000 Binary files a/assets/delete.mp4 and /dev/null differ diff --git a/assets/find.mp4 b/assets/find.mp4 deleted file mode 100644 index ea6597f3e590cb6e102da05d6969dc30903f8464..0000000000000000000000000000000000000000 Binary files a/assets/find.mp4 and /dev/null differ diff --git a/assets/insert.mp4 b/assets/insert.mp4 deleted file mode 100644 index 6fabd0bb2c12987c060548ff0a6ce7a85d7ff789..0000000000000000000000000000000000000000 Binary files a/assets/insert.mp4 and /dev/null differ diff --git a/assets/maze-tree.jpg b/assets/maze-tree.jpg deleted file mode 100644 index 0128283f919c4a019a89548509837b26ca4c67d2..0000000000000000000000000000000000000000 Binary files a/assets/maze-tree.jpg and /dev/null differ diff --git a/assets/maze.jpg b/assets/maze.jpg deleted file mode 100644 index a58adf1c9749cbc75dd08952ac5779a11f32e7ca..0000000000000000000000000000000000000000 Binary files a/assets/maze.jpg and /dev/null differ diff --git a/assets/rotate.mp4 b/assets/rotate.mp4 deleted file mode 100644 index 62ebc6bc2efb0528bfb2ed006949c9edc23f0f2a..0000000000000000000000000000000000000000 Binary files a/assets/rotate.mp4 and /dev/null differ diff --git a/assets/traverse.mp4 b/assets/traverse.mp4 deleted file mode 100644 index 1685efd5cec71197fe67c70b1ef63d72c7f8ce20..0000000000000000000000000000000000000000 Binary files a/assets/traverse.mp4 and /dev/null differ diff --git a/assets/trees.png b/assets/trees.png deleted file mode 100644 index 86567f9884828bd864d800c059040afddcc57f6f..0000000000000000000000000000000000000000 Binary files a/assets/trees.png and /dev/null differ diff --git a/avl.md b/avl.md deleted file mode 100644 index caa11df1cb7e6abcc4f25ed8728fae72f5c7d8a2..0000000000000000000000000000000000000000 --- a/avl.md +++ /dev/null @@ -1,186 +0,0 @@ -# Unbalanced trees [[toc](README.md#table-of-content)] (_optional_) - -Let's go back to the Binary Search Trees. - -We have a problem. - -### Question 22. [[toc](README.md#table-of-content)] -:question: What happens if you insert the following values, in that order, in an -empty tree? -```python -values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] -``` - -:question: What does the tree look like? - -:question: What is the consequence on the complexity of all our BST operations? - -Fortunately, there is a way to solve this last issue and make our BST quite -good! - -# Height-balanced BSTs [[toc](README.md#table-of-content)] (_optional_) - -First, let's formalize a bit the intuition from previous section about "_what it -means for a tree to be unbalanced?_" - -The height of a tree is the _length of the longest path from the root to the -leaves_. - -- the empty tree has a height of $0$ -- a leaf has a height of $1$ -- the following tree has a height of $3$ because the longest path from the root - to the leaves is either $0 \rightarrow 3$ or $0 \rightarrow 4$ which are both of - length $3$ -```mermaid -flowchart TD - 0((0)); 1((1)); 2((2)); 3((3)); 4((4)) - 0 --> 1 - 0 --> 2 - 1 --> 3 - 1 --> 4 -``` - -:gear: You can run `make test-avl` and you should have 4 failing tests. - -## Question 23. [[toc](README.md#table-of-content)] -:pencil: Add a method `height` to the `BinaryTree` class in `tree.py`. - -:question: What is the complexity of this algorithm in terms of the number of -nodes in the tree? - -:gear: You can run `make test-avl`. If the "height" test is green, good job -:clap: :clap: - -Next we can define the _balance_ of a binary tree by the difference of the -height of the right child and the height of the left child. - -Below are some examples. All the labels of the nodes in the trees show the -height of the tree on the left and the balance on the right, not the real value -held by the tree, which is not relevant when looking at the _balance_ of a tree. - -```mermaid -flowchart TD - a0(("1, 0")); - a0 - - b5(("3, -1")); b0(("1, 0")); b3(("1, 0")); b7(("1, 0")); b1(("2, 0")); - b5 --- b1 - b1 --- b3 - b1 --- b0 - b5 --- b7 - - c5(("4, -2")); c0(("1, 0")); c3(("2, 1")); c7(("1, 0")); c1(("3, -1")); - c4(("1, 0")); - c5 --- c1 - c1 --- c3 - c1 --- c0 - c5 --- c7 - c3 --- c4 - - d0(("1, 0")); d1(("2, 1")); d3(("3, 0")); d4(("1, 0")); d5(("2, 0")); - d7(("1, 0")); - d3 --- d5 - d3 --- d1 - d1 --- d0 - d5 --- d4 - d5 --- d7 -``` -```mermaid -flowchart TD - e1(("4, -1")); e2(("3, 0")); e3(("2, 1")); e4(("2, 0")); e5(("2, 0")); - e6(("1, 0")); e7(("1, 0")); e8(("1, 0")); e9(("1, 0")); e10(("1, 0")); - e1 --- e2 - e1 --- e3 - e3 --- e6 - e2 --- e4 - e2 --- e5 - e4 --- e7 - e4 --- e8 - e5 --- e9 - e5 --- e10 -``` - -## Question 24. [[toc](README.md#table-of-content)] -:pencil: Write a method `balance` to `BinaryTree` to compute the _balance_ of -a tree. - -:gear: You can run `make test-avl`. If the "balance" test is green, good job -:clap: :clap: - -:question: Looking at the example trees above, what could be a _good_ definition -of a _balanced_ tree? - -> :bulb: **Hint** -> -> the trees are symmetric from left to right, so the _balance_ being positive -> or negative does not matter - -## Question 25. [[toc](README.md#table-of-content)] -:pencil: Write a method `is_balanced` to `BinaryTree` to tell whether a tree is -balanced or not. - -:gear: You can run `make test-avl`. If the "is_balanced" test is green, good job -:clap: :clap: - -## Tree rotations [[toc](README.md#table-of-content)] - -If a tree is _unbalanced_, it is possible to fix the balance by applying a -_rotation_ on the root of the tree. - -There are two types of rotations, left and right, depending on the side where -the balance is wrong. - -As illustrated in the animation below, a right rotation consists of -- bubbling up the left child as the new root -- moving the previous root to the right child of the new root -- correcting the BST property - - - -### Question 26. [[toc](README.md#table-of-content)] -:pencil: Implement the two tree rotations. - -:gear: You can run `make test-avl`. If the "rotate right" and "rotate left" -tests are green, good job :clap: :clap: - -In order to help keeping track of the height of each node in the tree without -recomputing it all the time, we will add the height to our BST class and -recompute it after each operation on the tree. - -:pencil: Add a field `height` to the BST class. - -:pencil: Recompute the `height` after `insert`, `delete` and the rotations. - -Below is the pseudo-algorithm of the tree "rebalancing" operation: -```js -1 rebalance(t) -2 if t.balance() < -1 and t.left.balance() == -1 then t.rotate_right() -3 else if t.balance() > 1 and t.right.balance() == 1 then t.rotate_left() -4 else if t.balance() < -1 and t.left.balance() == 1 then t.rotate_left_right() -5 else if t.balance() > 1 and t.right.balance() == -1 then t.rotate_right_left() -``` - -Where the "right-left" rotation consists of -- rotating the right subtree to the right -- rotating the root to the left - -and the "left-right" rotation consists of -- rotating the left subtree to the left -- rotating the root to the right - -### Question 27. [[toc](README.md#table-of-content)] -:pencil: Implement "left-right" and "right-left" rotations. - -:pencil: Implement the `rebalance` method for the BST class. - -## Balancing our BSTs [[toc](README.md#table-of-content)] - -### Question 28. [[toc](README.md#table-of-content)] -:pencil: Call `rebalance` at the end of `insert` and `delete`. - -:question: Can you run the same example as in the beginning, i.e. inserting -integers in ascending order and make sure that the tree is balanced at the end? - ---- ---- -> [go to next](maze.md) diff --git a/bst.md b/bst.md deleted file mode 100644 index 8a1efdd5244fd9f90b123cb4b636b999dd8e772c..0000000000000000000000000000000000000000 --- a/bst.md +++ /dev/null @@ -1,119 +0,0 @@ -### Implementing the BST [[toc](README.md#table-of-content)] - -Now that we have seen what trees are and the basic operations on binary trees, -it is time to go back to our original problem. - -We will assume that we get elements, i.e. integers, at regular intervals and we -want to manage a data structure that allows us to _quickly_ (1) insert, (2) -search and (3) delete elements. - -In order to achieve that, we will add one property to our quite general binary -trees above: the _Binary Search_ property. - -:bulb: Let $t = (v, l, r)$ be a tree, all values in $l$ should be stricly -smaller than $v$ and all values in $r$ should be strictly greater than $v$. - -And this is the only thing that each operation on a BST should preserve, all -other properties, e.g. the fact that we can search any element in the tree -quickly, will come from it! - -#### Question 8. [[toc](README.md#table-of-content)] -:pencil: First, create a new class `BinarySearchTree` in `tree.py`. Because a -BST is also just a tree, this new class should inherit from `BinaryTree`. - -:pencil: Overwrite the constructor of `BinarySearchTree`. It should only take a -value as argument and, if the value is not `None`, i.e. the tree is not empty, it -should set the left and right children to empty BSTs. - -:gear: You can run `pytest tests/test_binary_search_tree.py`, you should have 3 failing tests. - -#### Insertion [[toc](README.md#table-of-content)] -The first operation will be to insert an element $v$ in the tree. - -This first algorithm, as can be seen in the animation below, is very -straightforward: -- if the tree is empty, then set its value to $v$ and initialize its children -- if the tree is not empty and holds a value $u$, there are three cases: - - $v \lt u$, insert $v$ in the left child - - $v \gt u$, insert $v$ in the right child - - $v = u$, don't do anything as we won't treat duplicate elements in this - class - - - -##### Question 9. [[toc](README.md#table-of-content)] -:pencil: Implement `BinarySearchTree.insert`. This method must take as an argument the `value` to insert. - -:question: What is the average complexity of `BinarySearchTree.insert`? You -don't need to prove a precise complexity, just an idea would be enough here as -the computation can be quite... messy :wink: - -:gear: Run `pytest tests/test_binary_search_tree.py`. If the "insert" test is green, good job :clap: -:clap: - -#### Search [[toc](README.md#table-of-content)] -Next is the main operation of our BSTs and our original question: finding if an -element is in the BST or not! - -Thanks to the _Binary search_ property and as can be seen in the following -animation, finding an element $v$ works as follows: -- if the tree is empty, then $v$ is not in the tree -- if the tree is not empty and holds a value $u$, there are three cases - - $v = u$: then $v$ is in the tree - - $v \lt u$: we need to look for $v$ in the left child - - $v \gt u$: we need to look for $v$ in the right child - - - -##### Question 10. [[toc](README.md#table-of-content)] -:pencil: Implement `BinarySearchTree.find`. This method takes as an argument the `value` to find, and return `True` or `False` - -:question: What is the average complexity of `BinarySearchTree.find`? - -:gear: Run `pytest tests/test_binary_search_tree.py`. If the "find" test is green, good job :clap: :clap: - -#### Deletion [[toc](README.md#table-of-content)] -Now, we would like to do a bit more with our BSTs, allowing them to do more than -simply search for elements once they have been inserted. - -For instance, we might want to find the smallest element in the BST and then -remove it, that would create a _priority queue_. - -For that, we need to be able to remove an element from the BST if it is in there -in the first place, while preserving the _Binary search_ property. - -As illustrated in the animation below, there are once again a few cases to -consider: -- if the tree is empty, then there is nothing to do -- if the tree is not empty and holds a value $u$, there are three cases - - $v \lt u$: try to remove $v$ from the left child - - $v \gt u$: try to remove $v$ from the right child - - $v = u$: remove $v$ - -But the question is: how can we remove the root of a tree and preserve the BST -property? - -Well, once again, there are a few cases to look at -- if the tree is a leaf, i.e. with no children, then we can simply set it to the - empty tree -- if the tree has a single child, then this child should take the place of its - parent -- if the tree has its two children, we can either - - find the max $M$ in the left child, set the value of the tree to $M$ and - remove $M$ from the left child - - find the min $m$ in the right child, set the value of the tree to $m$ and - remove $m$ from the right child - - - -##### Question 11. [[toc](README.md#table-of-content)] -:pencil: Implement `BinarySearchTree.delete`. This method take as argument the `value` to delete. - -:question: What is the average complexity of `BinarySearchTree.delete`? - -:gear: Run `pytest tests/test_binary_search_tree.py`. If the "delete" test is green, good job :clap: -:clap: - ---- ---- -> [go to next](huffman.md) diff --git a/dev-requirements.txt b/dev-requirements.txt deleted file mode 100644 index c75731e6ea69782c38b77ab8010a3f3edc2a2477..0000000000000000000000000000000000000000 --- a/dev-requirements.txt +++ /dev/null @@ -1 +0,0 @@ -pytest==8.2.2 diff --git a/huffman.md b/huffman.md deleted file mode 100644 index a23268e50fe08557b1514494d3d70807e0bc5405..0000000000000000000000000000000000000000 --- a/huffman.md +++ /dev/null @@ -1,665 +0,0 @@ -## Huffman encoding [[toc](README.md#table-of-content)] - -In this section, you will write a little tool that will (1) compress text using -_Huffman_ encoding and (2) decompress and reconstruct the original text, without -any loss. You will also measure the _efficiency_ of your algorithm. - -Of course, the _Huffman_ encoding will use a tree to reduce the size of the -original text. - -:file_folder: Create a file called `huffman.py`. - -### Introduction [[toc](README.md#table-of-content)] - -A text file is composed of symbols, e.g. ASCII or Unicode characters. It is thus -possible to -- define an alphabet of valid symbols -- count each symbol from the alphabet in the text -- compute the frequency of appearance of each symbol -- compute the information entropy of the text - -The entropy is a mathematical concept that measures the "amount of information" -of a probability distribution. If we consider a variable which takes its values -in the set ${a_1, a_2, \ldots, a_n}$ with the probabilities $p_i = p(a_i)$ for -$i = 1, \ldots, n$, then the entropy $H(X)$ is defined by: -$$H(X) = - \sum\limits_{i = 1}^{n}p_i \log_2 p_i$$ - -The entropy gives a bound on the average number of bits used to code a symbol. -That is, it is impossible to compress any binary to a size that is smaller than -its entropy times the number of bytes. - -In the rest of this document, we will take as an example input text the following -```text -this is an example of a huffman tree -``` - -You can define it in `huffman.py` and use it later to make sure your implementation is correct. - -#### Question 12. [[toc](README.md#table-of-content)] -:pencil: Write a function `compute_distribution` that will compute the -probability distribution of all the characters in a text. It should have the -following signature: -- arguments: - - `text`: a string containing all the characters of the text, in order -- return: the probability distribution $X$, i.e. the frequencies of appearances, - of all the symbols in the text, represented as a dictionary where the keys are - the $(a_i)$ and the values are the associated $(p_i)$ - -> :bulb: **Note** -> -> another version of this would be to simply compute the number of occurrences, as the rest of the -> Huffman algorithm only needs to be able to sort symbols by "_frequency_" and the two quantities -> are equal up to a multiplicative constant. - -:bulb: The distribution for our example text is the following -| symbol | frequency (3 decimals) | -| ------ | ---------------------- | -| t | $0.056$ | -| h | $0.056$ | -| i | $0.056$ | -| s | $0.056$ | -| | $0.194$ | -| a | $0.111$ | -| n | $0.056$ | -| e | $0.111$ | -| x | $0.028$ | -| m | $0.056$ | -| p | $0.028$ | -| l | $0.028$ | -| o | $0.028$ | -| f | $0.083$ | -| u | $0.028$ | -| r | $0.028$ | - -#### Question 13. [[toc](README.md#table-of-content)] -:pencil: Write a function `entropy` that will compute the entropy of a -probability distribution. It should have the following signature: -- arguments: - - `x`: a dictionary representing the probability distribution $X$, i.e. where - the keys are the $(a_i)$ and the values are the associated $(p_i)$ -- return: the entropy of `x`, i.e. $H(X)$ - -:bulb: The entropy of our example input is $3.714192447093237$. - -:question: What is the entropy of `README.md`? - -> :bulb: **Note** -> -> you can use the `read_text` function from `src.helpers` to open text files. - -### Huffman trees [[toc](README.md#table-of-content)] - -The idea behind _Huffman_ compression is to assign to each symbol in the text a -binary sequence of $0$s and $1$s. - -There are two properties that we want to have: -- (1) the more common a symbol in the text, the shorter the encoding. And - vice-versa with the least common symbols, e.g. in English, "_e_" should have - the smallest encoding because it is the most common letter and "_z_" should - have the longest one. -- (2) we want to use an encoding where no encoded symbol is the prefix of - another one - -Intuitively, property (1) will help reduce the size of the text file by -assigning shorter encodings to the most common symbols in the text. - -Property (2) will make sure there is no ambiguity in the encoding and greatly -help the decoding process. - -In order to achieve this, we will be using a particular tree: the _Huffman_ -tree. - -The construction of the tree is as follows: -- compute the probability distribution of the symbols in the text -- put all the symbols with their "weights", i.e. their frequencies at the start, - in a _priority queue_ -- pop the two least frequent symbols from the _queue_ -- merge them into a tree where they are the two children -- set their "weight" to the sum of the "weights" of the two children -- push this tree back into the queue -- rince and repeat as long as there are strictly more than $1$ item in the _queue_ - -When there is only one item left in the queue, you can extract it and you should -have the _Huffman_ tree! - -#### Question 14. [[toc](README.md#table-of-content)] -:pencil: Write a function called `build_huffman_tree`. It should have the -following signature: -- arguments: - - `p`: a probability distribution of all the symbols in the text -- return: the _Huffman_, i.e. a nested tuple (see the example tree below for what this can mean) - -> :bulb: **None** -> -> the [`src/priority_queue.py`](src/priority_queue.py) provides a _naive_ implementation of a -> _priority queue_ that should be good enough for this class. -> -> you are encouraged to use it or feel free to use the built-in `heapq` module of Python. -> -> there is an example usage of the `src.priority_queue.PriorityQueue` class at the bottom of the module. - -:question: Test your `build_huffman_tree` on our example text. Do you get the same tree as below? - -> :bulb: **Note** -> -> the weights are expressed as the number of occurrences, which is equivalent to -> the frequencies you have computed, just need to divide / multiply by the total -> number of symbols to get from one to the other. -> -> the weights are here just for reference and easier debugging, they should not -> be there in the final tree output, only the structure of the tree matters, not -> the values in the nodes. - -```mermaid -flowchart TD - %% all the leaves with their respective number of occurrences - t["'t', 2"] - h["'h', 2"] - i["'i', 2"] - s["'s', 2"] - spc["' ', 7"] - a["'a', 4"] - n["'n', 2"] - e["'e', 4"] - x["'x', 1"] - m["'m', 2"] - p["'p', 1"] - l["'l', 1"] - o["'o', 1"] - f["'f', 3"] - u["'u', 1"] - r["'r', 1"] - - %% intermediate nodes whose value are the sums of the values of their children - %% these are required because we want the same labels but different Mermaid nodes - eight1(("8")) - eight2(("8")) - eight3(("8")) - - four1(("4")) - four2(("4")) - four3(("4")) - four4(("4")) - - two1(("2")) - two2(("2")) - two3(("2")) - - %% depth-first traversal - 36 --- 16 - 16 --- eight1 - eight1 --- a - eight1 --- e - 16 --- eight2 - eight2 --- four1 - four1 --- t - four1 --- h - eight2 --- four2 - four2 --- i - four2 --- s - 36 --- 20 - 20 --- eight3 - eight3 --- four3 - four3 --- n - four3 --- m - eight3 --- four4 - four4 --- two1 - two1 --- x - two1 --- p - four4 --- two2 - two2 --- l - two2 --- o - 20 --- 12 - 12 --- 5 - 5 --- two3 - two3 --- u - two3 --- r - 5 --- f - 12 --- spc -``` - -In terms of Python code, the your tree might look like the following _nested tuple_ - -```python -# a nested tuple is simply tuples inside other tuples -( # this is the root - ( # this is the left child of the root - ('a', 'e'), - ( - ('t', 'h'), # t and h are siblings - ('i', 's'), - ), - ), - ( # this is the right child of the root - ( - ('n', 'm'), - ( - ('x', 'p'), - ('l', 'o'), - ), - ), - ( - ( - ('u', 'r'), - 'f', - ), - ' ', - ), - ), -) -``` - -> :bulb: **Note** -> -> the exact placement of the nodes in this tree is not important. However, the -> depth at which the nodes are is really important and should depend on the -> frequency of each symbol compared to the other symbols. -> -> e.g. a and e could be swapped, depending on how they are sorted in the _priority queue_, -> however, they need to remain at the same depth, in order to have a encoding of -> the correct length! - -### Coding books [[toc](README.md#table-of-content)] - -The _Huffman_ tree contains all the information we need about the input text. -However, its shape is not very useful to compress the symbols... We would rather -like something that maps each symbol to the appropriate sequence of $0$s and -$1$s. - -This is where dictionaries come into play! We will build a dictionary, from the -tree computed earlier, that will map each symbol to its binary sequence. - -The idea is to traverse the tree from the root to each one of the leaves, our -symbols. When going to the left, a $0$ will be added to the sequence, going to -the right will add a $1$. - -This will ensure, by construction of the _Huffman_ tree based on the frequencies -of the symbols, that the two _Huffman_ properties defined above are satisfied! - -#### Question 15. [[toc](README.md#table-of-content)] -:pencil: Write a function `build_coding_book` that will compute the coding book -of any _Huffman_ tree. It should have the following signature: -- arguments: - - `t`: the _Huffman_ tree represented as a nested tuple of symbols -- return: the coding book, represented as a dictionary where the keys are the - symbols and the values are the associated binary sequences - -:question: Do you get the expected encoding for all symbols with our example -sentence? - -Verify that you get the correct encoding, as below: -```mermaid -flowchart TD - a["'a', 000"] - e["'e', 001"] - t["'t', 0100"] - h["'h', 0101"] - i["'i', 0110"] - s["'s', 0111"] - n["'n', 1000"] - m["'m', 1001"] - x["'x', 10100"] - p["'p', 10101"] - l["'l', 10110"] - o["'o', 10111"] - u["'u', 11000"] - r["'r', 11001"] - f["'f', 1101"] - spc["' ', 111"] - - n36((" ")) - n16((" ")) - n20((" ")) - n12((" ")) - n5((" ")) - - eight1((" ")) - eight2((" ")) - eight3((" ")) - - four1((" ")) - four2((" ")) - four3((" ")) - four4((" ")) - - two1((" ")) - two2((" ")) - two3((" ")) - - %% depth-first traversal - n36 -- 0 --- n16 - n16 -- 0 --- eight1 - eight1 -- 0 --- a - eight1 -- 1 --- e - n16 -- 1 --- eight2 - eight2 -- 0 --- four1 - four1 -- 0 --- t - four1 -- 1 --- h - eight2 -- 1 --- four2 - four2 -- 0 --- i - four2 -- 1 --- s - n36 -- 1 --- n20 - n20 -- 0 --- eight3 - eight3 -- 0 --- four3 - four3 -- 0 --- n - four3 -- 1 --- m - eight3 -- 1 --- four4 - four4 -- 0 --- two1 - two1 -- 0 --- x - two1 -- 1 --- p - four4 -- 1 --- two2 - two2 -- 0 --- l - two2 -- 1 --- o - n20 -- 1 --- n12 - n12 -- 0 --- n5 - n5 -- 0 --- two3 - two3 -- 0 --- u - two3 -- 1 --- r - n5 -- 1 --- f - n12 -- 1 --- spc -``` -Which translates to the following in Python -```python -{ - 'a': '000', - 'e': '001', - 't': '0100', - 'h': '0101', - 'i': '0110', - 's': '0111', - 'n': '1000', - 'm': '1001', - 'x': '10100', - 'p': '10101', - 'l': '10110', - 'o': '10111', - 'u': '11000', - 'r': '11001', - 'f': '1101', - ' ': '111', -} -``` - -:question: Can you find any encoded sequence of bits that is the prefix of -another one? - -> :bulb: **Spoiler** -> -> thanks to property (2) of the _Huffman_ trees, that should be impossible! - -### Text compression [[toc](README.md#table-of-content)] - -It is now time to put together the compression algorithm. The steps are as follows: -- compute the frequencies of all symbols -- build the _Huffman_ tree -- build the coding book -- build the compressed sequence or $0$s and $1$s by replacing all symbols by - their associated binary encoding -- convert the sequence to real binary -- return the coding book, the binary sequence and the number of _padding_ bits - -But what is _padding_? Well, due to the nature of the _Huffman_ tree, we don't -know what the size of the encodings for each symbol will be. And we know even -less what the final size of the compressed bits will be! In particular, we don't -know if the compressed bits will be a multiple of $8$. To make sure the -compressed data fits perfectly into a whole number of bytes, we'll be adding -some _padding_ bits, e.g. let's say the bits are `0101101011`, we need to add 6 -bits of _padding_ to get to 2 bytes, i.e. `0101101011111111` if we pad with $1$s. - -#### Question 16. [[toc](README.md#table-of-content)] -:pencil: Write a function `compress` that implements the _Huffman_ compression. -It should have the following signature: -- arguments: - - `text`: a string containing all the symbols of the input text -- return: the coding book, the encoded binary sequence and the number of padding - bits - -> :bulb: **Note** -> -> you can use the `into_bytes` function from [`src.helpers`](src/helpers.py) to -> convert a string of $0$s and $1$s into real bytes. -> -> there is documentation and you can find some simple examples at the bottom of -> the module. - -:question: Test your `compress` function by compressing a bunch of text. - -For instance, on our example text from the start, the bits, arranged in groups -of $8$, should be -``` -01000101 -01100111 -11101100 -11111100 -01000111 -00110100 -00010011 -01011011 -00011111 -01111101 -11100011 -10101110 -00110111 -01100100 -01000111 -01001100 -1001001 -``` -and there should be 1 bit of padding. - -:question: Is the average number of bits used in the compressed output always -higher than the entropy of the input text? - -You can plot your results for -- samples of different lengths in a real text file, e.g. written in English to - use the structure of the language -- sample random character strings of various length, i.e. without structure - -### Text decompression [[toc](README.md#table-of-content)] - -Having a compressed version of any input text, without any loss, is great, but -it would be great to get back the original text! - -In order to achieve that, we need three things -- the decoding book -- the compressed bytes -- the number of padding bits - -Luckily, the decoding book is as simple as inverting the keys and the values in -the coding book. This will give us a dictionary that maps the sequences of bits -to their original symbol, i.e. the exact inverse of the coding book. - -#### Question 17. [[toc](README.md#table-of-content)] -:pencil: Write a function `inverse_dict` that will inverse the keys and the -values of a dictionary. It should have the following signature: -- arguments: - - `d`: a dictionary -- return: the same dictionary as `d` but where the keys are now the values and -vice-versa - -> :bulb: **Note** -> -> this dictionary inversion only makes sense for dictionaries where all values -> are unique. - -#### Question 18. [[toc](README.md#table-of-content)] -:pencil: Write a function `decompress` that implements the decompression. It -should have the following signature: -- arguments: - - `book`: the codebook that comes from the compression - - `compressed`: the compressed binary data - - `padding`: the number of padding bits -- return: the original decompressed text file - -> :bulb: **Hint** -> -> First, you can use the `from_bytes` function from [`src.helpers`](src/helpers.py) that will -> convert bytes into their string $0$s and $1$s. -> -> there is documentation and you can find some simple examples at the bottom of -> the module. -> -> Second, you need to iterate over the $0$s and $1$s and read the "decode book" -> to rebuild the original sequence of symbols. - -:question: Can you reconstruct our original example text with that last function? - -As the first byte of this example is `01000101`, the only symbol in our coding -book, thanks to the first _Huffman_ property, that fits is `t` with encoding -`0100`. So we know the first character is `t`. -Then we can remove the first $4$ bits of the compressed bits and search for the -next symbol. -Without the bits of `t`, the first remaining $8$ bits are `01010110`. We see -that the only encoding that fits these first bits is `0101` and corresponds to -the symbol `h`. - -And so on... - - -### Writing a CLI compression tool [[toc](README.md#table-of-content)] - -Finally, we will be writing a little CLI application that will allow us to -compress and decompress any file from the terminal! - -There will be three steps -- define the file format we want to use for the compressed files -- write "write" and "read" functions that will use this format -- write a CLI interface to wrap this all up - -#### The compression format [[toc](README.md#table-of-content)] -In this application, we will be using a very simple and naive output file -format. -We need to write the codebook, the bytes and the number of padding bits to the -disk. The format can be the following: -- write the codebook as raw JSON on a first line -- add a newline -- write the number of padding bits as a single byte -- add a newline -- write the compressed bytes in the rest of the file - -> :bulb: **Note** -> -> the JSON part is really probably not the best... -> -> you can try to find a better file format for that part :wink: - -Graphically, a compressed file could look something like the following -``` -.-----------------------------. -| JSON codebook | -| .----.---------.----.----| -| | \n | padding | \n | | -|----`----'---------'----' | -| | -| | -| encoded binary | -| | -| | -`-----------------------------' -``` - -#### Writing to and reading from the disk [[toc](README.md#table-of-content)] -##### Question 19. [[toc](README.md#table-of-content)] -:pencil: Write a function `write` that will write all compression data to the -disk following the file format above. It should have the following signature: -- arguments: - - `file`: a filename where to write on the disk - - `book`: the codebook that comes from the compression - - `compressed`: the compressed binary data - - `padding`: the number of padding bits - -> :bulb: **Note** -> -> below is a small example of how to write newline-separated binary data to a file -> ```python -> with open("my_file.bin", "wb") as handle: -> handle.write("hello world\n".encode()) -> handle.write(b"abc") -> ``` - -##### Question 20. [[toc](README.md#table-of-content)] -:pencil: Write a function `read` that will read compression information from -disk following the file format above. It should have the following signature: -- arguments: - - `file`: a filename where to read from the disk -- return: the codebook that comes from the compression, the compressed binary - data and the number of padding bits - -> :bulb: **Note** -> -> below is a small example of how to read newline-separated binary data from a file -> ```python -> with open("my_file.bin", 'rb') as handle: -> hello_world = handle.readline().decode() -> abc = handle.readline() -> ``` - -#### Wrapping up in a CLI application [[toc](README.md#table-of-content)] - -In order to write a CLI application, we will be using the `argparse` module. -Here is an example application with two subcommands - -```python -import argparse - -parser = argparse.ArgumentParser("") -subparsers = parser.add_subparsers(title="subcommands", dest="subcommand") - -foo_parser = subparsers.add_parser("foo") -foo_parser.add_argument("arg", type=str) - -bar_parser = subparsers.add_parser("bar") -bar_parser.add_argument("arg", type=str) - -args = parser.parse_args() - -if args.subcommand == "foo": - print(f"foo called with {args.arg}") -elif args.subcommand == "bar": - print(f"bar called with {args.arg}") -else: - print("nothing to do") - exit(0) -``` - -This allows you to call the source file as -```shell -python my_file.py foo 123 -``` -or -```shell -python my_file.py bar 456 -``` - -##### Question 21. [[toc](README.md#table-of-content)] -:pencil: In a _main_ Python block at the end of `huffman.py`, adapt the snippet -above and define two subcommands, `compress` and `decompress` that will either -compress or decompress some data. - -The `compress` subcommand should take two arguments: the input file to compress -and the name of the output file to write to. - -The `decompress` subcommand should take a single argument, the location of the -compressed data and print the decompressed text. - -:question: Test your application. Does it works? - -:question: Compute the compression rate and the average number of bits per -symbol for a few files. Does you application still satisfy the "entropy bound"? - -> :bulb: **Hint** -> -> if you want to count the number of bytes contained in a file `file.txt`, you -> can use the following command: -> ```shell -> wc --bytes file.txt | sed 's/ .*//' -> ``` - ---- ---- -> this is the end of the mandatory part :clap: :clap: -> [return to TOC](README.md#table-of-content) -> -> You can continue with the optional parts : -> -> [go to next](avl.md) diff --git a/maze.md b/maze.md deleted file mode 100644 index 13caecd0493d1b441b164b67d790e42883cf3430..0000000000000000000000000000000000000000 --- a/maze.md +++ /dev/null @@ -1,123 +0,0 @@ -## Maze generation and solving [[toc](README.md#table-of-content)] (_optional_) - -In this section, you will have to produce some code that will (1) generate a -_perfect_ maze and (2) solve the maze. - -But, first of all, what is a _maze_? And what does it mean for a maze to be -_perfect_? - -Typically a _maze_ is defined on a grid of cells and generating a maze is the -same as deciding whether two adjacent cells should be connected or separated by -a wall. - -A _perfect_ maze is simply a maze that have (1) no cycle and (2) no isolated -parts. - -These two statements are equivalent to the following property: _any two cells in -a perfect maze should be connected by exactly one path_. - -Below is an example of a _perfect_ maze: - - - -Thanks to the _perfect maze_ property, if we choose one of the cells to be the -_root_, then the _maze_ is simply a tree! - - - -> :bulb: **Note** -> -> In the image above, the green cell can be chosen to be the root. It is then -> possible to _unfold_ the maze into a tree where the root is at the top. -> -> Thanks to the _perfect maze_ property, it is possible to find the unique path -> between the two purple cells. - -We can thus apply our new knowledge about trees to _perfect mazes_! :tada: - -In order to visualize the maze generation and the maze solving, you can use the -[`src.ui.maze` library](src/README.md#srcuimaze) provided with the class material. - -:mag: Read the document to familiarize yourself with how to use the library in -code and what the keybindings and actions are in the visualization. - -### Generating _perfect_ mazes [[toc](README.md#table-of-content)] - -As said earlier, our mazes will live in a rectangular grid of square cells. - -Let's assume the size of the grid is $n$ rows by $m$ columns. - -The cells will be identified by a number, starting from $0$ up to $nm - 1$, $0$ -being the top-left cell and $nm - 1$ being the bottom-right one. - -The first thing we need to do is compute the neighbours of any given cell $c$. - -> :exclamation: **Important** -> -> most of the cells will have $4$ neighbours, but be careful with the borders -> and corners of the grid! - -#### Question 29. [[toc](README.md#table-of-content)] -:file_folder: Create a new file `maze.py`. - -:pencil: As shown in the documentation of `src.ui.maze`, create a small maze, a -canva and run `canva.loop` in an infinite loop. - -> :bulb: **Note** -> -> for now, you can omit the return value of `canva.loop` and not bother about -> the example `path` function. - -#### Question 30. [[toc](README.md#table-of-content)] -:pencil: Write a function `neighbours` that will compute the list of all valid -neighbours for a given cell. It should have the following signature: -- arguments: - - `c`: the index of the cell - - `w`: the width of the grid - - `h`: the height of the grid -- return: the list of indices of all valid neighbours of `c` - -#### Question 31. [[toc](README.md#table-of-content)] -:pencil: Write an iterative `build` function using a _depth-first_ approach to -build the list of edges in the maze. - -It should take as arguments an empty maze, a starting cell and a _callback_ -function that takes a dictionary as input and don't return anything. - -Call your new `build` function on your `maze`, with a random starting point and -with the following callback: -```python -lambda m: canva.step(m) -``` -where `m` is the maze at any time of the generation. - -If you see a maze building itself, congratulations, your `build` is working -:clap: :clap: - -### Solving _perfect_ mazes [[toc](README.md#table-of-content)] -Now that we have a complete valid _perfect_ maze, which is also a tree, we can -use our knowledge about trees to solve it. - -More specifically, we can use tree traversal algorithms such as DFS and BFS to -find a path between two cells in the maze. - -#### Question 32. [[toc](README.md#table-of-content)] -:pencil: Write a `path` function that will compute the path between two cells -in a maze using a _depth-first_ approach. - -Using the [documentation of `src.ui.maze`](src/README.md#srcuimaze) and the way `canva` will return signals -containing the start and end cells selected with the mouse, call your new `path` -function on your `maze`, using the start and end cells returned by the canva and -the following callback: -```python -lambda m, p, v, n: canva.step(m, p, v, n, complete=False) -``` -where `m` is the maze, `p` is an optional path of cells, `v` is an optional list -of already visited cells and `n` is an optional list of next cells to visit. - -#### Question 33. [[toc](README.md#table-of-content)] -:pencil: Can you adapt the `path` function to use a _breadth-first_ approach? - ---- ---- -> [go to next](qtree.md) diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000000000000000000000000000000000000..66a4b3e6d58d78f9fdfca33965e6df78e4c1a57b --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,12 @@ +tool.uv.package = true + +[project] +name = "Trees" +version = "0.1.0" +description ="Project 3 of the course '1MAE002' at ISAE-SUPAERO" +readme = "README.md" +requires-python = ">=3.12" +dependencies = [ + "pytest", + "pygame==2.6.0" +] \ No newline at end of file diff --git a/qtree.md b/qtree.md deleted file mode 100644 index e215e3261b8d5c1533ab6a0b5b3f800d0c3285c4..0000000000000000000000000000000000000000 --- a/qtree.md +++ /dev/null @@ -1,179 +0,0 @@ -## Quad trees [[toc](README.md#table-of-content)] (_optional_) - -In this section, we will demonstrate the power of trees by drastically improving -the performance of an $n$-body simulation. -In order to achieve that, we will be using _quad trees_ and split the 2d space -into boxes of equal size, recursively. - -### $n$-body collision simulation [[toc](README.md#table-of-content)] - -In order to demonstrate the drastic performance improvements given by trees, we -will be writing a small $n$-body collision simulation, i.e. $n$ balls will be -moving around randomly in the canva and we will highlight the balls that are -colliding with others. - -#### Question 34. [[toc](README.md#table-of-content)] -:mag: Read and familiarize yourself with the `src.ui.qtree` UI library given with the class -material. The documentation can be found [here](src/README.md#srcuiqtree). - -:file_folder: Create a new file called `qtree.py`. - -:pencil: Complete the following steps: -- create and setup a `Canva` from the `src.ui.qtree` module. -- because we will be manipulating lots of balls, create a `Ball` class. - It should have three number fields, the x and y coordinates of its center and - its radius. -- store a bunch of `Ball`s, e.g. $1000$, in a field `points` of your `canva` - -> :bulb: **Note** -> -> you can use the `randint` function from the `random` module to draw random -> integers between $0$ and the dimensions of the canva. -> -> you can set the radius of the balls to $3$. - -#### Question 35. [[toc](README.md#table-of-content)] -We would like the balls to move around with time. - -:pencil: Write an `update` function that takes and returns a list of `Ball`s, -with all their positions updated. Apply a simple _Brownian_ motion to them, e.g. -by adding a random number between $-1$ and $1$ to each one of their coordinates. - -#### Question 36. [[toc](README.md#table-of-content)] -Now, let's compute which balls are colliding with others. For now, we don't have -the choice: for each ball, we need to check all the other balls! - -:pencil: Write a function `f` that will take a list of `Ball`s and the position -of the mouse which won't be used for now. - -> :bulb: **Note** -> -> see the [doc of `src.ui.qtree`](src/README.md#srcuiqtree) for an example of -> what the signature of `f` should look like. - -`f` should return a list of the same size of `canva.points` and where item `i` -is the number of balls the $i$-th ball is colliding with. - -:pencil: Run `canva.loop` with `update` and `f`. - -:question: What is happening? Is this any good? - -### Let's speed things up [[toc](README.md#table-of-content)] - -So far, when we want to know the number of balls the $i$-th one is colliding -with, we need to look at all the other balls because we don't have the -information about where the balls are relative to their neighbours, especially -when they are all moving! - -However, these balls are living in 2d space. So if a ball is in the top-right -part of the canva, it's probably a good idea to only look at balls in the -top-right part or near its border. If a ball is in the bottom-left part of the -top-right part, there might be even less things to check! - -Extending this idea recursively, we are starting to build a _quad tree_. It is -a tree with 4 children, the top-left, top-right, bottom-left and bottom-right -parts. -Because it is part of 2d space, it also needs to keep track of its bounding box, -a rectangle. -It finally has a list of items and a capacity. - -#### Question 37. [[toc](README.md#table-of-content)] -:pencil: In the `qtree.py` file, create a new class `QuadTree`. Its constructor -should take a `boundary` and a capacity `n`. It should have the following -fields: -- the `boundary` and the capacity `n` -- a list of `Ball` -- a boolean `divided` telling if the tree has been subdivided -- its four children: `se`, `sw`, `ne` and `nw`, named after cardinal directions - and set to `None` as default - -#### Question 38. [[toc](README.md#table-of-content)] -:pencil: Create a class `Rectangle`. This will be our boundary. A `Rectangle` -should have four number fields: `x`, `y`, `w` and `h`, where $(x, y)$ is the -coordinate of the center of the rectangle, and $w$ and $h$ are the width and -height respectively. - -:question: What will the boundary of the root of the quad tree be? - -#### Question 39. [[toc](README.md#table-of-content)] -To know if we need to insert a `Ball` in the quad tree based on its -position, we need to able to check if a given `Ball` is inside a `Rectangle`. - -:pencil: Add a method `contains` to `Rectangle` that will return `True` if the -argument, a `Ball` is inside, and `False` otherwise. - -The insertion of an item in a quad tree is as follows: -- if the item is not contained in the boundary, then there is nothing to do -- otherwise - - if the number of items in the quad tree is less than the capacity, we can - add the item to it - - otherwise - - if the quad tree has not been yet subdivided, subdivide it - - insert the item in its children, recursively - -The subdividion of a quad tree contains three steps: -- compute the boundaries for all the 4 children -- initialize all children to empty quad trees with the correct boundaries -- mark the parent quad tree as _divided_ - -#### Question 40. [[toc](README.md#table-of-content)] -:pencil: Write the `insert` method of `QuadTree`. It should take a `Ball` as -argument and implement the algorithm described above. - -> :bulb: **Hint** -> -> you can write another method, `QuadTree.subdivide` to help you. - -#### Question 41. [[toc](README.md#table-of-content)] -:pencil: Visualize your quad tree to make sure it works as expected. - -Write a new function, `f_qtree_viz`, that takes a list of `Ball`s and a mouse -position and returns a quad tree containing all the `Ball`s and `None`. You -should also write an empty `query` method for `QuadTree`, that will take a -boundary, e.g. a `Rectangle` and return a list of `Ball`s. You can simply return -the empty list from `query` and we will get back to it later :relieved: - -> :bulb: **Note** -> -> you can set the capacity of the quad tree to $10$. - -Call `canva.loop` with `f_qtree_viz` instead of `f`. If you see your quad tree, -congratulations :clap: :clap: - -As spoiled by the previous question, the only missing part of our quad tree is -the ability to query the items inside a certain boundary, e.g. a rectangle or -better a circle around a point! - -#### Question 42. [[toc](README.md#table-of-content)] -:pencil: Add an `intersects` method to `Rectangle`. It should take another -`Rectangle` and return `True` if they intersect or overlap, and `False` -otherwise. - -#### Question 43. [[toc](README.md#table-of-content)] -:pencil: Write the body of the `QuadTree.query` method. The algorithm is as -follows: -- if the query boundary does not intersect with the boundary of the quad tree, - there are no points in there -- otherwise, add to the list of _found_ items the items from the quad tree that - are contained in the query boundary -- recursively query the children of the tree if it has been divided - -#### Question 44. [[toc](README.md#table-of-content)] -:pencil: Instead of returning a quad tree and `None` from `f_qtree_viz`, return -the same quad tree and a `Rectangle` centered on the mouse. You should see the -quad tree as before, but also the query rectangle and the `Ball`s inside it -highlighted in green. - -#### Question 45. [[toc](README.md#table-of-content)] -We can now wrap this all up and speed up our simulations! - -:pencil: Copy your `f` function and call it `f_qtree`. Instead of iterating over -all other `Ball`s, build a quad tree containing all the `Ball`s and query a -rectangle around each ball. - -:question: Is the simulation running normally now? - ---- ---- -> this is the end of the class :clap: :clap: -> [return to TOC](README.md#table-of-content) \ No newline at end of file diff --git a/requirements.txt b/requirements.txt deleted file mode 100644 index 03f0c54d14a6808351ad5f79deefbb09bf07ebb0..0000000000000000000000000000000000000000 --- a/requirements.txt +++ /dev/null @@ -1,2 +0,0 @@ -# python==3.10.12 -pygame==2.6.0