diff --git a/README.md b/README.md index e6d7c88..7cd187a 100644 --- a/README.md +++ b/README.md @@ -21,4 +21,58 @@ $ sudo make install ## Algorithm description -TOOD. \ No newline at end of file +Standard brainfuck text generation algorithms are usually stateless (i.e. +load a byte value, output, clear, repeat) or use otherwise low order finite +state. Using [optimal constants](https://esolangs.org/wiki/Brainfuck_constants) +for such a generator renders it optimal on IID inputs. The generator is still +however unable to efficiently exploit the redundancy in such an input stemming +from the underlying probability distribution of the source. + +Low-order finite state generators usually encode transitions between cell states +that are desired to be output. As an example, redundancy on the 2-wide sliding +window level can be exploited by computing a 256x256 table of optimal transition +phrases between cell values. + +This however does not approximate real, variable order redundancy in the source. +Techniques that implement data compressors in brainfuck and load the compressed +data to memory, decompressing it at the runtime, generally exhibit poor performance +characteristics due to the high overhead of random memory access in Brainfuck +(fastest algorithms are quintic-time). + +blz78suf operates by finding long phrases in the input that can be encoded using +procedural logic. For example: + +``` +void the() { printf("the"); } +int main() { + the(); printf(" quick brown fox jumps over "); + the(); printf(" lazy dog"); +} +``` + +Could be shorter than: + +``` +int main() { + printf("the quick brown fox jumps over the lazy dog"); +} +``` + +Had an approperiate, sufficiently repetitive phrase been chosen. The benefit of this +approach is that we can deduplicate repeating phrases and delegate the more granular, +lower order redundancy to a stateless generator. + +blz78suf builds on a stateless text generator nicked from the CodeGolf StackExchange +website for the Brainfuck Golf challenge. The same generator is used by +[copy.sh](https://copy.sh/brainfuck/text). Phrases are found by constructing the +suffix trie and ranking potential replacements by their frequency and length. +Then, the individual messages are encoded and the output is generated. + +## Future improvements + +blz78suf is a prototype and as such, it is not optimized for speed. The algorithm +could be sped up by using an efficient exclusion algorithm for suffix tries and +by using Ukkonen's algorithm for linear-time compressed structures. The procedural +structure of the output could be optimized by allowing phrasal chaining. Further, +a more efficient low order generator could be used. For example, such a desirable +tool would detect patterns via delta encoding and run length encoding. \ No newline at end of file