On a displayed character, you had to send a 8bits character, but in case of pure text, the inverted video character can be deleted, and the bit #6 too...
In fact, only 6bits are needed to display a standard character.
We can compress this text to get "111111-11 + 1111-1111 + 11-111111" bits = 4 characters in 3 Bytes.
To read it, have a look to the machine code listing.
To compile it, i had to add a new function in VB (r. 12/05/22)...
Compressed/not compressed: (The compressed REM seem to be longer than the text one, due to the BASIC tockens in it!)
Line 2: 613 bytes uncompressed.
Line 2: 462 bytes compressed.
Basic CODE:
Code: Select all
# VB81 XuR [Textc.bas]
#
# 1 REM: 129 Bytes@4082-4102
1 REM [HEX:\
76,76,E7,CD,92,0D,CD,F5,0B,68,61,CD,D8,09,01,05,\
00,09,EB,21,3D,40,0E,03,1A,FD,77,21,06,08,FD,CB,\
21,7E,28,04,3E,80,18,02,3E,00,77,FD,CB,21,16,23,\
10,EC,13,0D,20,E2,21,3D,40,01,04,06,FD,36,21,00,\
FD,CB,21,16,7E,CB,7F,20,06,FD,CB,21,86,18,04,FD,\
CB,21,C6,23,10,EA,FD,7E,21,FE,0A,CA,5B,00,FE,0C,\
20,17,22,7B,40,2A,0E,40,7E,FE,76,28,03,23,18,F8,\
23,22,0E,40,2A,7B,40,18,01,D7,0D,28,96,06,06,18,\
BB ]
2 REM[TX6=\
TEXT COMPRESSION SEEMS NATURAL£\
FOR HUFFMAN CODING. IN TEXT, WE£\
HAVE A DISCRETE ALPHABET THAT,£\
IN A GIVEN CLASS, HAS RELATIVELY£\
STATIONARY PROBABILITIES.£\
FOR EXAMPLE, THE PROBABILITY£\
MODEL FOR A PARTICULAR NOVEL£\
WILL NOT DIFFER SIGNIFICANTLY£\
FROM THE PROBABILITY MODEL FOR£\
ANOTHER NOVEL. SIMILARLY, THE£\
PROBABILITY MODEL FOR A SET OF£\
C PROGRAMS IS NOT GOING TO BE£\
MUCH DIFFERENT THAN THE£\
PROBABILITY MODEL FOR A£\
DIFFERENT SET OF C PROGRAMS.£\
£\
THE PROBABILITIES IN TABLE 3.32£\
ARE THE PROBABILITIES OF THE 26£\
LETTERS (UPPER- AND LOWERCASE)£\
OBTAINED FOR THE U.S. CONSTITUT£\
-ION AND ARE REPRESENTATIVE OF£\
ENGLISH TEXT.¿]
3 REM[TX6=\
THE PROBABILITIES IN TABLE 3.33£\
WERE OBTAINED BY COUNTING THE£\
FREQUENCY OF OCCURRENCES OF£\
LETTERS IN AN EARLIER VERSION£\
OF THIS CHAPTER.\
£\
WHILE THE TWO DOCUMENTS ARE£\
SUBSTANTIALLY DIFFERENT, THE TWO£\
SETS OF PROBABILITIES ARE VERY£\
MUCH ALIKE.¿]
4 REM[TX6=\
WE ENCODED THE EARLIER VERSION£\
OF THIS CHAPTER USING HUFFMAN£\
CODES THAT WERE CREATED USING£\
THE PROBABILITIES OF OCCURRENCE£\
OBTAINED FROM THE CHAPTER.£\
THE FILE SIZE DROPPED FROM ABOUT£\
70,000 BYTES TO ABOUT 43,000£\
BYTES WITH HUFFMAN CODING.£\
WHILE THIS REDUCTION IN FILE£\
SIZE IS USEFUL,WE COULD HAVE£\
OBTAINED BETTER COMPRESSION IF£\
WE HAD FIRST REMOVED THE£\
STRUCTURE EXISTING IN THE FORM£\
OF CORRELATION BETWEEN THE£\
SYMBOLS IN THE FILE. OBVIOUSLY,£\
THERE IS A SUBSTANTIAL AMOUNT£\
OF CORRELATION IN THIS TEXT.£\
FOR EXAMPLE, HUF IS ALWAYS£\
FOLLOWED BY FMAN. UNFORTUNATELY£\
THIS CORRELATION IS NOT AMENABLE\
¿]
5 REM[TX6=\
TO SIMPLE NUMERICAL MODELS, AS£\
WAS THE CASE FOR THE IMAGE£\
FILES. HOWEVER, THERE ARE OTHER£\
SOMEWHAT MORE COMPLEX TECHNIQUES£\
THAT CAN BE USED TO REMOVE THE£\
CORRELATION IN TEXT FILES.£\
WE WILL LOOK MORE CLOSELY AT£\
THESE IN CHAPTERS 5 AND 6.£\
£\
MARKOV MODELS IN TEXT£\
COMPRESSION£\
£\
AS EXPECTED, MARKOV MODELS ARE£\
PARTICULARLY USEFUL IN TEXT£\
COMPRESSION, WHERE THE£\
PROBABILITY OF THE NEXT LETTER£\
IS HEAVILY INFLUENCED BY THE£\
PRECEDING LETTERS.¿]
6 REM[TX6=\
IN FACT, THE USE OF MARKOV£\
MODELS FOR WRITTEN ENGLISH£\
APPEARS IN THE ORIGINAL WORK£\
OF SHANNON (3).£\
IN CURRENT TEXT COMPRESSION£\
LITERATURE, THE KTH-ORDER£\
MARKOV MODELS ARE MORE WIDELY£\
KNOWN AS FINITE CONTEXT MODELS,£\
WITH THE WORD CONTEXT BEING£\
USED FOR WHAT WE HAVE EARLIER£\
DEFINED AS STATE.£\
CONSIDER THE WORD PRECEDING.£\
SUPPOSE WE HAVE ALREADY£\
PROCESSED PRECEDIN AND ARE£\
GOING TO ENCODE THE NEXT LETTER.¿]
7 REM[TX6=\
IF WE TAKE NO ACCOUNT OF THE£\
CONTEXT AND TREAT EACH LETTER£\
AS A SURPRISE, THE PROBABILITY£\
OF THE LETTER G OCCURRING IS£\
RELATIVELY LOW. IF WE USE A£\
FIRST-ORDER MARKOV MODEL OR£\
SINGLE-LETTER CONTEXT (THAT IS,£\
WE LOOK AT THE PROBABILITY MODEL£\
GIVEN N), WE CAN SEE THAT THE£\
PROBABILITY OF G WOULD INCREASE£\
SUBSTANTIALLY. AS WE INCREASE£\
THE CONTEXT SIZE (GO FROM N TO£\
IN TO DIN AND SO ON), THE£\
PROBABILITY OF THE ALPHABET£\
BECOMES MORE AND MORE SKEWED,£\
WHICH RESULTS IN LOWER ENTROPY.¿]
8 REM[TX6=\
SHANNON USED A SECOND-ORDER£\
MODEL FOR ENGLISH TEXT£\
CONSISTING OF THE 26 LETTERS AND£\
ONE SPACE TO OBTAIN AN ENTROPY£\
OF 3.1 BITS/LETTER (4). USING A£\
MODEL WHERE THE OUTPUT SYMBOLS£\
WERE WORDS RATHER THAN LETTERS£\
BROUGHT DOWN THE ENTROPY TO 2.4£\
BITS/LETTER. SHANNON THEN USED£\
PREDICTIONS GENERATED BY PEOPLE£\
(RATHER THAN STATISTICAL MODELS)£\
TO ESTIMATE THE UPPER AND LOWER£\
BOUNDS ON THE ENTROPY OF THE£\
SECOND ORDER MODEL. FOR THE£\
CASE WHERE THE SUBJECTS KNEW£\
THE 100 PREVIOUS LETTERS, HE£\
ESTIMATED THESE BOUNDS TO BE£\
1.3 AND 0.6 BITS/LETTER,£\
RESPECTIVELY.¿]
9 REM[TX6=\
THE LONGER THE CONTEXT, THE£\
BETTER ITS PREDICTIVE VALUE.£\
HOWEVER, IF WE WERE TO STORE£\
THE PROBABILITY MODEL WITH£\
RESPECT TO ALL CONTEXTS OF A£\
GIVEN LENGTH, THE NUMBER OF£\
CONTEXTS WOULD GROW£\
EXPONENTIALLY WITH THE LENGTH£\
OF CONTEXT. FURTHERMORE, GIVEN£\
THAT THE SOURCE IMPOSES SOME£\
STRUCTURE ON ITS OUTPUT, MANY£\
OF THESE CONTEXTS MAY CORRESPOND£\
TO STRINGS THAT WOULD NEVER£\
OCCUR IN PRACTICE. CONSIDER A£\
CONTEXT MODEL OF ORDER FOUR£\
(THE CONTEXT IS DETERMINED BY£\
THE LAST FOUR SYMBOLS).¿]
10 REM[TX6=\
IF WE TAKE AN ALPHABET SIZE OF£\
95, THE POSSIBLE NUMBER OF£\
CONTEXTS IS 954-MORE THAN 81£\
MILLION.£\
£\
THIS PROBLEM IS FURTHER£\
EXACERBATED BY THE FACT THAT£\
DIFFERENT REALIZATIONS OF THE£\
SOURCE OUTPUT MAY VARY£\
CONSIDERABLY IN TERMS OF£\
REPEATING PATTERNS. THEREFORE,£\
CONTEXT MODELING IN TEXT£\
COMPRESSION SCHEMES TENDS TO BE£\
AN ADAPTIVE STRATEGY IN WHICH£\
THE PROBABILITIES FOR DIFFERENT£\
SYMBOLS IN THE DIFFERENT£\
CONTEXTS ARE UPDATED AS THEY£\
ARE ENCOUNTERED. HOWEVER, THIS£\
MEANS THAT WE WILL OFTEN¿]
11 REM[TX6=\
ENCOUNTER SYMBOLS THAT HAVE NOT£\
BEEN ENCOUNTERED BEFORE FOR ANY£\
OF THE GIVEN CONTEXTS (THIS IS£\
KNOWN AS THE ZERO FREQUENCY£\
PROBLEM). THE LARGER THE£\
CONTEXT, THE MORE OFTEN THIS£\
WILL HAPPEN. THIS PROBLEM COULD£\
BE RESOLVED BY SENDING A CODE£\
TO INDICATE THAT THE FOLLOWING£\
SYMBOL WAS BEING ENCOUNTERED£\
FOR THE FIRST TIME, FOLLOWED BY£\
A PREARRANGED CODE FOR THAT£\
SYMBOL.¿]
12 REM[TX6=\
THIS WOULD SIGNIFICANTLY£\
INCREASE THE LENGTH OF THE CODE£\
FOR THE SYMBOL ON ITS FIRST£\
OCCURRENCE (IN THE GIVEN£\
CONTEXT). HOWEVER, IF THIS£\
SITUATION DID NOT OCCUR TOO£\
OFTEN, THE OVERHEAD ASSOCIATED£\
WITH SUCH OCCURRENCES WOULD BE£\
SMALL COMPARED TO THE TOTAL£\
NUMBER OF BITS USED TO ENCODE£\
THE OUTPUT OF THE SOURCE.£\
UNFORTUNATELY, IN CONTEXT-BASED¿]
13 REM[TX6=\
ENCODING, THE ZERO FREQUENCY£\
PROBLEM IS ENCOUNTERED OFTEN£\
ENOUGH FOR OVERHEAD TO BE A£\
PROBLEM, ESPECIALLY FOR LONGER£\
CONTEXTS. SOLUTIONS TO THIS£\
PROBLEM ARE PRESENTED BY THE£\
PPM (PREDICTION WITH PARTIAL£\
MATCH) ALGORITHM AND ITS£\
VARIANTS (DESCRIBED IN DETAIL£\
IN CHAPTER 6).¿]
14 REM[TX6=\
BRIEFLY, THE PPM ALGORITHM£\
FIRST ATTEMPTS TO FIND IF THE£\
SYMBOL TO BE ENCODED HAS A£\
NONZERO PROBABILITY WITH RESPECT£\
TO THE MAXIMUM CONTEXT LENGTH.£\
IF THIS IS SO, THE SYMBOL IS£\
ENCODED AND TRANSMITTED.£\
IF NOT, AN ESCAPE SYMBOL IS£\
TRANSMITTED, THE CONTEXT SIZE£\
IS REDUCED BY ONE, AND THE£\
PROCESS IS REPEATED. THIS£\
PROCEDURE IS REPEATED UNTIL A£\
CONTEXT IS FOUND WITH RESPECT£\
TO WHICH THE SYMBOL HAS A£\
NONZERO PROBABILITY.¿]
15 REM[TX6=\
TO GUARANTEE THAT THIS PROCESS£\
CONVERGES, A NULL CONTEXT IS£\
ALWAYS INCLUDED WITH RESPECT TO£\
WHICH ALL SYMBOLS HAVE EQUAL£\
PROBABILITY. INITIALLY, ONLY£\
THE SHORTER CONTEXTS ARE LIKELY£\
TO BE USED. HOWEVER, AS MORE AND£\
MORE OF THE SOURCE OUTPUT IS£\
PROCESSED, THE LONGER CONTEXTS,£\
WHICH OFFER BETTER PREDICTION,£\
WILL BE USED MORE OFTEN.¿]
16 REM[TX6=\
THE PROBABILITY OF THE ESCAPE£\
SYMBOL CAN BE COMPUTED IN A£\
NUMBER OF DIFFERENT WAYS LEADING£\
TO DIFFERENT IMPLEMENTATIONS(1).£\
THE USE OF MARKOV MODELS IN TEXT£\
COMPRESSION IS A RICH AND ACTIVE£\
AREA OF RESEARCH. WE DESCRIBE£\
SOME OF THESE APPROACHES IN£\
CHAPTER 6 (FOR MORE DETAILS,£\
SEE (1)).¿]
17 REM[TX6=\
FURTHER READING£\
1.£\
TEXT COMPRESSION, BY T.C. BELL,£\
J.G. CLEARY, AND I.H. WITTEN (1)£\
, PROVIDES AN EXCELLENT£\
EXPOSITION OF DICTIONARY-BASED£\
CODING TECHNIQUES.£\
2.£\
THE DATA COMPRESSION BOOK, BY£\
M. NELSON AND J.-L. GAILLEY (69)£\
, ALSO DOES A GOOD JOB OF£\
DESCRIBING THE ZIV-LEMPEL£\
ALGORITHMS.£\
THERE IS ALSO A VERY NICE£\
DESCRIPTION OF SOME OF THE£\
SOFTWARE IMPLEMENTATION ASPECTS.¿]
18 REM[TX6=\
3.£\
DATA COMPRESSION, BY G. HELD AND£\
T.R. MARSHALL (70), CONTAINS A£\
DESCRIPTION OF DIGRAM CODING£\
UNDER THE NAME "DIATOMIC CODING.£\
" THE BOOK ALSO INCLUDES BASIC£\
PROGRAMS THAT HELP IN THE DESIGN£\
OF DICTIONARIES.££\
4.£\
THE PNG ALGORITHM IS DESCRIBED£\
IN A VERY ACCESSIBLE MANNER IN£\
"PNG LOSSLESS COMPRESSION," BY£\
G. ROELOFS (71) IN THE LOSSLESS£\
COMPRESSION HANDBOOK.¿]
19 REM[TX6=\
5.£\
A MORE IN-DEPTH LOOK AT£\
DICTIONARY COMPRESSION IS£\
PROVIDED IN "DICTIONARY-BASED£\
DATA COMPRESSION: AN ALGORITHMIC£\
PERSPECTIVE," BY S.C. SAHINALP£\
AND N.M. RAJPOOT (72) IN THE£\
LOSSLESS COMPRESSION HANDBOOK.¿]
100 FOR A=2 TO 19
101 CLS
102 PRINT USR 16516,A
103 IF INKEY$="" THEN GOTO 103
104 NEXT A
105 STOP
9998 SAVE "TEXT"
9999 RUN
Code: Select all
;------- TASM ASM mnemonics. -------
; Compile this file using:
; Set TASMOPTS = -b
; tasm -80 ThisCode.tas MyBinary.BIN
;-----------------------------------
; Zx81 Program name: VB81 XuR [Decoder.p] :
; REM line name: D=16514/16562 : H=4082/40B2
#define ORG .org ; TASM cross-assembler definitions
#define equ .equ
;-----------------------------------
;------------------------------------
;-Basic sub-routine entry. -
;+----------------------------------+
; Lb4084 ; <- USR Basic Enty. (16516)
;+----------------------------------+
;------- Rom and Ram Symbols -------
RAM_D_FILE equ $400C
EXTERR .equ $005B ; Basic Break function ! Ignore line instructions.
SPARE16 .equ $407B
PRBUFFER .equ $403D
DFCC .equ $400E ; Next displayed char.
ORG $4082 ; [@16514/@h4082]
Lb4082:
.db $76,$76
Lb4086:
RST 20H
CALL $0D92
CALL $0BF5
LD L,B
LD H,C
CALL $09D8 ; offset to HL
LD BC,5
ADD HL,BC
EX DE,HL ; GET the HL offset from the specified line number.
; [111111-11] [1111-1111] [11-111111] 3 bytes=4 characters
Nxtscan:
LD HL,PRBUFFER
LD C,$03
Rescan:
LD A,(DE)
LD (IY+33),A
LD B,$08
Loop1:
BIT 7,(IY+33); SPARE 8b
JR Z,SB1
LD A,128
JR SB2
SB1:
LD A,0
SB2:
LD (HL),A
RL (IY+33); SPARE 8b
INC HL
DJNZ Loop1
INC DE
DEC C
JR NZ,Rescan
LD HL,PRBUFFER
LD BC,$0604
NXTbyte:
LD (IY+33),0
Feed6b:
RL (IY+33)
LD A,(HL)
BIT 7,A
JR NZ,Feed6b1
RES 0,(IY+33)
JR Feed6bn
Feed6b1:
SET 0,(IY+33)
Feed6bn:
INC HL
DJNZ Feed6b
LD A,(IY+33)
CP $0A ; /~~
JP Z,EXTERR
CP $0C ; "£"
JR NZ,NoNl
LD ($407B),HL
LD HL,(DFCC)
NEXTCHR:
LD A,(HL)
CP $76
JR Z,NoN0
INC HL
JR NEXTCHR
NoN0:
INC HL
LD (DFCC),HL
LD HL,($407B)
JR NoN2
NoNl:
RST 10H
NoN2:
DEC C
JR Z,Nxtscan
LD B,$06
JR NXTbyte
ENDROUT:
.end
Compressed text:
Uncompressed text (REMs not compiled to compare the memory room used):
This file can't work properly on a RUN... Have fun.