Python Programming Assignment

Question Description

I have attached the files. please see the full details in pdf file.

The purpose of this project is to demonstrate an acceptable level of expertise with the basic programming
concepts/techniques and Python syntax addressed through the first half of the semester. It also involves
demonstration of appropriate critical reading skills. This includes (but is not necessarily limited to): data types,
variables, operators, expressions, statements, I/O operations, user-defined and built-in functions, modules and
control structures.

IT106- Project #1 Specification
(Simple Data Compression)
The purpose of this project is to demonstrate an acceptable level of expertise with the basic programming
concepts/techniques and Python syntax addressed through the first half of the semester. It also involves
demonstration of appropriate critical reading skills. This includes (but is not necessarily limited to): data types,
variables, operators, expressions, statements, I/O operations, user-defined and built-in functions, modules and
control structures.
NOTE: Recall that the acceptable resources for this assignment differs from those approved for lab assignments,
and are limited to the class text, Python Library, Language and Tutorial references, lecture and lab slides/notes.
The completed project must be submitted via Blackboard NLT (no later than) April 1, 2018 at 11:50 PM.
No late pass is allowed for projects. This is an individual effort, no collaboration is allowed.
Lab Requirements:
• Submitted files should adhere to the following name format:

Firstname + underscore + lastname + underscore + “Project1” + dot + py
• Example:
• The source code should be written in the template file that are being provided with the specification.
Data Compression Background
Data compression is the process of encoding data in fewer bits that normally required. The terminology
may be new to you but you might be familiar with utilities like zip, rar, bzip etc. These are nothing but tools for
compressing data.
These tools make use of the inherent redundancy in data to encode data using fewer bits. For this project,
you will learn a simplified version of a common data compression technique known as Lempel-Ziv encoding.
Some minor details of have been excluded so as to enable you to implement it using the concepts that you have
learned so far. Due to this, your program might not actually perform any compression, but you will be very close
to the actual implementation.
Although, our implementation does not achieve any real compression, it will help you appreciate many
core concepts of this technique and mainly will help you develop your problem solving and programming skills.
Discussion of Problem and Implementation Procedure
What has been encoded, has to be decoded. Therefore, we discuss compression/encoding as well as
decompression/decoding. These two processes are complementary to each other. The first part of this section
discusses the encoding logic, which also introduces the basic idea of the algorithm. Therefore, this section is
lengthier than the decoding logic. Decoding is just the reverse process of encoding, so it will not need much
IT106- Project #1 Specification
(Simple Data Compression)
Encoding Logic
Usually compression and decompression are performed on files, but since we have not covered file I/O in
this course yet, our program will be modified to work on the strings directly. You program should take input (that
normally comes from files) as a string, from the user, and encode it and print it to standard input.
We introduce some terminology here, before we describe the algorithm:
Source String: This is the string that you get as input from the user.
Prefix String: Any contiguous sequence of characters taken from the beginning of a string forms
its prefix string. For the string ‘ABCD’, all the strings ”, ‘A’, ‘AB’, ‘ABC’, ‘ABCD’ are prefix strings
(notice that the list includes the empty string as well as the complete original string). Similarly, we
can define a suffix string, which takes characters from the end of the string. But since we aren’t
going to use them in our algorithm, we will not discuss them.
Processed String: We process the input string one character at a time. At any stage in our algorithm,
the input string can be partitioned into two sections. The first part will consist of characters that
we have already encoded or processed. This segment of the original string will be referred to as
processed string in our discussion.
Unprocessed String: This will be the part of the source string, which we have not yet processed.
Initially, the entire source string will be unprocessed. As we process the source string, one character at a
time, our processed string will grow in size and the unprocessed string will shrink.
The encoding process works as follows. As we move along the source string, we create a list of unique
prefix strings of the unprocessed string that we have not encountered till now. Initially this list will be empty and
we will gradually add new strings to it. Since we add only the strings that we haven’t seen till now, all the
individual entries of this list will be unique. The notion of a sequence of characters or word for our algorithm is
different from its general meaning, which is a sequence of characters separated by spaces on either sides. Our
strings can have spaces, tabs, as well as new line character ‘n’.
The idea will be illustrated using the source string “ABABBABA”. When you process this string, the list
of unique word that you get will be [‘A’, ‘B’, ‘AB’, ‘BA’]. The process is explained in the following paragraphs.
As you can see from the example, there is some redundancy in the source data. Compression techniques
make use of this redundancy to reduce the number of bits required to code the data. Now you may have also
noticed, that every new string that we add to our list of unique prefixes can be constructed from the concatenation
of a string that we have encountered already, and another single character. In the above example, AB is a
concatenation of A (already encountered) and B. Similarly BA is a concatenation of B, and A. This is the heart of
our algorithm. In summary, you can represent any string in the list of unique entries as a combination of one of
the previous entries from the list, plus a character.
As a convenience, we are going to add a dummy entry to the start of the list, which is going to make our
IT106- Project #1 Specification
(Simple Data Compression)
lives easier. This is will be an empty string “”.
Now initially, our list consists of only one entry, the empty string [”]. Then, as we process the first character
‘A’, we see that the prefix is not in our list of unique prefixes, so we add it to the list. Now the list becomes [”,’A’].
Then, we search for an unseen prefix in the remaining string ‘BABBABA’. Before you proceed, list down all the
prefix strings of this string. They are ”,’B’,’BA’,’BAB’ and so on. Prefix String, ‘B’ is not in the list, so we add it.
Next we take the unprocessed string ‘ABBABA’. We already have ” and ‘A’ in our list of know prefixes, so the
next entry to our list will be ‘AB’. And so we continue along the source string.
But, our primary purpose is to encode the input string, not just retrieve a list of unique prefixes. We do
that as follows. Whenever we add a new entry to the list of unseen strings, we have effectively encoded a block
of characters. So let us say we are currently at the beginning and ‘A’ is our new entry. We can think of ‘A’ as
concatenation of two parts, ” + ‘A’ (which is the string at index 0 from the unique prefix list + a new character
‘A’). Similarly ‘AB’ can be split into ‘A’ + ‘B’, which is string at index 1 from the unique prefix list + ‘B’. When
you encode the input string ‘ABABBABA’, you should get an encoding list, which logically looks like this:
[(0, ‘A’), (0, ‘B’), (1, ‘B’), (2, ‘A’), (4, ”)] → Encoded List
The list of unique prefixes should be something like this (including the dummy empty string):
[”, ‘A’, ‘B’, ‘AB’, ‘BA’] → List of unique prefixes
Using the encoded list and the unique prefixes list, try to reconstruct the original string. Give it a try; it
should be fun :). And it does not require programming. Not yet, at least.
The terminating condition, in the encoding process is a bit tricky. When you reach the end of the source
string, and there are no unique prefixes, our logic as described above will fail. The source string in the example
has this problem. When you have processed all except the last two characters of the example string, you will be
left with only ‘AB’, which is not a new string. It is already present at index 4, in the unique prefixes list. So, you
represent it is as (4,”), concatenation of string at 4th index, plus an empty string. You will have to take care of this
condition in your implementation. Also, make sure that you provide one such input while testing your
Once you have the encoded list. The next step is to convert it to a binary string. Let us take the encoded
list from the previous example:
[(0, ‘A’), (0, ‘B’), (1, ‘B’), (2, ‘A’), (4, ”)]
As we have already discussed, an entry in the encoded list has two parts, an index to a previous element
and a character. As we sequentially scan the string and add entries to the list, it grows in size, as a consequence
the number of bits required to represent the index also keeps increasing (because the magnitude of the indices
keeps increasing). So, we will encode the indices as follows. The first two indices will be encoded using 1 bit, the
next two will use 2 bits and the next two will be encoded with 3 bits. In general, any number in the encode list at
index n (this is list index), will be encoded using floor[n/2 + 1] bits. (For a real number k, floor[k] represents the
integer less than or equal to k)
For the given encode list, the indices are [0, 0, 1, 2, 4]. They will be encoded as [‘0’, ‘0’, ’01’, ’10’, ‘100’].
IT106- Project #1 Specification
(Simple Data Compression)
Similarly the index list [0, 1, 1, 2, 3, 1] will be encoded as [‘0’, ‘1’, ’01’, ’10’, ‘011’, ‘001’]. Notice that the same
number 1, occurring at different places is encoded using different number of bits, [padded with ‘0’s if required].
Similarly, we will convert characters to bit strings, by first converting them to a number using their ASCII
value, and then to a binary number. We will not use variable number of bits to encode characters and they will
always be translated into binary string of 8 bits. Find out more about ASCII code from the web, it is the most
common mapping of characters to integers used in computers. Also look at the function in python that can be used
to get the ASCII value of a character.
The encode list will translate to the following after the binary conversion.
[(‘0’, ‘01000001’), (‘0’, ‘01000010’), (’01’, ‘01000010’), (’10’, ‘01000001’), (‘100’, ”)]
The final step is to join all the strings in the encode list into a single string in order. The example string
after joining will result into:
Can be visualized with | in between the individual elements as show below
‘0’| ‘01000001’| ‘0’| ‘01000010’| ’01’| ‘01000010’| ’10’| ‘01000001’| ‘100’| ”
A standard function is available in Python which can be used to convert integer to bit string. When you use it
to convert an integer to bit string, it appends ‘0b’, to the front. You should get rid of it using slicing.
If the converted binary string does not have sufficient bits, then you should pad it with zeroes. There is a
function on string for this as well. When we say some value ‘x’ is converted to a bit string of length m bits, it
means – irrespective of the magnitude of the value, the bit string representation should be m bits long, possibly
with zeros padded at the front.
Decoding Logic
Decoding process is exactly the reverse of the encoding process. You will start with the encoded bit string
and decode it into the original string:
– The first step is to split the input string into blocks of variable bits, as per the logic used in encoding.
Each block will contain the bits required to extract an element of encode list that we have seen in
encoding process. The block lengths will be gradually increasing, as explained in encoding step. The
general patter for length of the block will be floor(n/2 +1) + 8. Where n is the index of the block. The
indexes of the blocks start with 0. So the pattern will be [ 1 + 8, 1+8, 2 + 8, 2+8, 3 +8, 3+8 …]. Notice
that the last block may not have a character, in which case it will have only floor(n/2 + 1) bits and not
floor(n/2 + 1) +8 bits.
– The first floor(n/2 +1) bits of the blocks obtained in previous step, will be converted to an integer and
the last 8 bits will be converted into a character. After you perform this conversion, you will get a
sequence, which looks something like this – [(0, ‘A’), (0, ‘B’), (1, ‘B’), (2, ‘A’), (4, ”)]. This sequence is
the list of encoded entries that you encountered in the encoding step.
IT106- Project #1 Specification
(Simple Data Compression)
– Similar to the encoding case, you start with a list of unique prefixes which contains only the empty
string, ”, and reconstruct the list of unique prefixes. Simultaneously you decode the list of encoded
entries into a string of the original characters.
Here is how the decoding works on the given encoded sequence in the example:
– Initially,
– When we encounter the first entry from encoded list, (0,’A’), we concatenate ‘A’ to the end of the entry at
index 0 from unique strings list to obtain the next sequence of characters. So our first decoded set of
characters is ” + ‘A’ = ‘A’. We add this to our list of unique prefixes. Now it becomes [”,’A’].
– The next entry (0,’B’) is decoded in a similar fashion to obtain ”+’B’ = ‘B’. We add this to our running
decoded sequence, which now becomes ‘AB’ and we also add this to our list of unique prefixes which
– After processing (1, ‘B’), we get the next unique word ‘A’ + ‘B’ = ‘AB’. We concatenate this to our running
sequence ‘AB’ + ‘AB’ = ‘ABAB’. And the unique prefix list = [”,’A’,’B’,’AB’].
– After (2, ‘A’), we have. The new word = ‘BA’. Unique prefixes = [ ”,’A’,’B’,’AB’,’BA’] and the running
– The last entry (4,”), does not give a new unique sequence, but points to the 4th element of the list ‘BA’. We
append it to our running sequence, which results in ‘ABABBABA’. This is our original string.
– Output the string to the screen.
For this you need to define total of three functions: compression, decompression and rest oof the
part (IO) needs to be organized in the main().
You will not be required to handle exceptions, so you can safely assume that the user input will
not break your code.
Implementation procedure
The above specs may confuse you at first, but the underlying idea of this compression technique is very
simple. Once you understand it thoroughly implementation will be quite easy. The core skill that is most essential
for designing an algorithm and developing a program is a good understanding of the problem. This has nothing
to do with Python Language or Programming in general. The discussion in this document is self-contained, so
you are not expected to have any background knowledge.
IT106- Project #1 Specification
(Simple Data Compression)
Your first step should be to understand the procedure. Work out the example provided in the document.
Simple examples have the advantage that their calculations are manageable, so they can be done by hand. Trace
out the encoding and decoding steps. Once you are comfortable with the given example, chose your own sequence
of characters and apply the encoding and decoding steps.
After the previous step, you should be quite comfortable with the algorithm. This should help you develop
the algorithm for the program. Now, that you know how to go about the solving the problem, note down the
sequence of steps that you would follow. Although it is quite easy to extract the pseudo-code from the specs, we
recommend that you try to summarize the idea in your own words.
Now in the next step, think about the various data structures and programming blocks you would need to
develop these programs. What should you use to store the list of unique words and for the encoded list? How
should you iterate over the input string? What are the initial conditions before you start a loop? What are its
terminating conditions? One these questions are clear, you will find it easier to code.
Know your tools: Although is quite possible to implement the project without using any library functions,
it would certainly be a pain. And we don’t want you to through any more pain that is necessary for you to learn.
So look for library functions that can make your life easier (e.g., for converting integer to a binary, or character
to integer etc.). Play with these functions and understand their behavior. It is very easy to commit logical errors
when you don’t know exactly how a function behaves.
Completing code is not the completion of implementation. Code review and testing are also very important
parts of development cycle. Carefully reading your code can reveal a lot of errors. Test your program thoroughly
using good test cases, which take care of different possible scenarios.
IT106- Project #1 Specification
(Simple Data Compression)
Grading Rubric:
Both file names and
headers meet the specs.
Either filename is in
incorrect or headers are
mission section/details.
Pseudo code
Pseudocode is present
Pseudocode is present,
and provides sufficient but provides insufficient
detail for a developer to detail for a developer to
accurately implement
completely implement
the process.
the process.
Comments &
Comments clearly
Comments generally
demonstrate which
demonstrate which
sections of code are
sections of code are
related to specific steps related to specific steps
in the pseudocode and
in the pseudocode and
variable names clearly
variable names generally
reflect what they
reflect what they
Compression Algorithm,
Algorithm was
was generally
correctly implemented, implemented correctly,
and takes into
but may have overlooked
consideration all the
some special cases/ has
minor logic errors.
Decompression Decompression
Algorithm was
Algorithm, was generally
correctly implemented, implemented correctly,
and takes into
but may have overlooked
consideration all the
some special cases/ has
minor logic errors.
Transformations Transformation of
Transformation of
/ Conversions
intermediate results to
intermediate results to
binary strings in
binary strings in
compression and the
compression and the
reverse transformation reverse transformation in
in decompression have decompression have been
been implemented
generally implemented
correctly will minor
errors such as not
padding bits to make the
lengths uniform.
Needs Improving
Both file name and file
header is missing or are
incorrectly implemented.
Pseudocode is missing or is
so general that it provides
little assistance in
implementing the process.
Comments are missing or
provide little assistance to
the understanding of the
code and/or variable naming
conventions provide no
insight into the referenced
Compression Algorithm is
significantly flawed/
deviates significantly from
the specifications/ contains
significant number of
logical errors.
Decompression Algorithm
is significantly flawed/
deviates significantly from
the specifications/ contains
significant number of
logical errors.
The required
transformations are not
implemented or are
significantly flawed.

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.