Page 1 of 1
Not Smaller Yet
Posted: Mon Nov 24, 2008 1:31 pm
by tails
I have a question. How should we use the value 34 on decoding?
Posted: Mon Nov 24, 2008 4:54 pm
by adum
i guess it's not totally necessary in BWT but it serves as a hint. i'm not an expert here by any means...
Posted: Mon Nov 24, 2008 5:23 pm
by papa
I suppose, 34 defines the beginning of the original text. Since the BWT ist rotation invariant (abcde has the same bwt-result as e.g. bcdea or deabc), you cannot otherwise know where the text starts in the backtransformed sequence.
But I thought that this should be the position of the last (or the first) character of the original text in the BWT-Sequence, which is 17 (resp. 20)?
Posted: Mon Nov 24, 2008 7:06 pm
by tails
Yes, that's exactly the same as my thought. If the value were either 17 or 20, I would not have the question.
Posted: Mon Nov 24, 2008 9:06 pm
by adum
was the title of the challenge a hint for you guys? were you already familiar with BWT?
Posted: Mon Nov 24, 2008 9:33 pm
by papa
BWT is the first step in the compression algorithm of bzip2. It sorts the letters such that those appearing in a similar surrounding come together. E.g. a 'q' is always followed by a 'u', so the q's will appear only within a certain region of the BWT-sequence. Therefore this sequence is easier to compress with Move-To-Front- and Huffman-Coding, which are the next steps in bzip2 and make the sequence shorter actually.
By the way, I did pattern matching on BWT-sequences in my diploma thesis, that's why I found the solution to this challenge at first glance (in contrast to some other crypto challenges).
Posted: Mon Nov 24, 2008 9:47 pm
by joki
About the meaning of the number: in this case it denotes which character of the original text appears first in the encoded text. In the given text, the rotation starting with " Burr..." is lexically smallest, so the 'e' of "the" in front of it appears first in the transformed output, and this is character #34 of the original text. To the decoder, this means that after the text has been reconstructed by iterating the "sorting permutation" starting with the first character, the correct rotation will be obtained after moving the last 34 characters to the front of the string.
Note that this is quite similar in spirit, yet opposite in effect to the number expected by tails and papa. In that case the decoder would obtain the correct rotation by applying the "sorting permutation" repeatedly, starting with the specified character of the transformed string.
This is the first challenge I have created and I'm sorry if the number caused much confusion... I wasn't sure whether to include it in the puzzle because there is more than one equally useful way to convey the missing information to the decoder and this puzzle is most likely just as solvable without it (modulo rotation).
Cheers and happy puzzling,
Joachim
Posted: Mon Nov 24, 2008 11:12 pm
by tails
adum wrote:was the title of the challenge a hint for you guys?
Yes, it was a big hint for me. I was like this:
(1) frequency analysis -> hmm, it seems a transposition cipher.
(2) "Not Smaller Yet" -> going to be smaller?
(1) + (2) -> Aha!
adum wrote:were you already familiar with BWT?
No, I didn't even know the word BWT. But I know the very basic things about how bzip2 works. It's called as "block sort" in Japan, so this time I googled it (actually in Japanese) and found the Wikipedia article.
Posted: Sun Dec 14, 2008 11:03 pm
by m!nus
I recognized it even without the challenge title, but the title made it even more clear.
I know BWT because I once tried to implement DEFLATE in PHP (didn't make it but that's why i know about it). For the challenge I did BWT in PHP, not really hard, the number was very confusing though.
Quite nice challenge, maybe I should take a (closer) look at MTF und Huffman again aswell.
Posted: Fri Oct 01, 2010 10:45 pm
by man
'Not smaller yet' immediately reminded me of BWT.
I know that because I read some articles about compression some time ago.
It seems that you can only solve this one if you already knew something about bz2.
Very easy for us, but very hard for the others.
Posted: Thu Jul 21, 2011 6:14 pm
by rmplpmpl
I didn't know about this before, though the challenge's name was a hint that I would be looking for a algorithm which is performed before compressing data.
Nevertheless, it took me ages to learn about BWT (though I used to know Huffman encoding)
Posted: Sat Apr 26, 2014 5:38 pm
by AMindForeverVoyaging
To be honest, I found the (34) rather confusing than helping. Also, it took me ages to find out which sort of algorithm is to be applied here.