## Q2. Enhanced occurrence frequency analysis [[toc](#table-of-content)]
:file_folder: Create a new file called `enhanced_occurrence_frequencies.py` in the `src/` directory.
:gear: you can run the `pytest -k required` command and see 5 failing tests. These tests should hopefully
:gear: run the `pytest -k required` command, you should see 5 failing tests. These tests should hopefully
pass at the end of this question!
Now we will use the [_Wuthering Heights_ by Emily Bronte](data/english_wuthering_heights.txt) text and analyze it.
...
...
@@ -166,6 +166,9 @@ To try improve this result, you can try to capture the redundancy of a language
## Q4. Compression with Huffman [[toc](#table-of-content)] (_optional_)
:file_folder: Create a new file `src/data_compression.py`.
:gear: run the `pytest -k optional` command, you should see 1 failing test. It should hopefully
pass at the end of this question!
:gear: running `python src/data_compression.py` or `make q4` should output the answers to the question
The analysis of the occurrence frequencies of the characters in the languages shows that there is a strong variability between the characters. This shows that there is some "redundancy" in the languages. In other words, this means that it is possible to compress the languages by using these occurrence frequencies.