diff options
author | marha <marha@users.sourceforge.net> | 2009-06-28 22:07:26 +0000 |
---|---|---|
committer | marha <marha@users.sourceforge.net> | 2009-06-28 22:07:26 +0000 |
commit | 3562e78743202e43aec8727005182a2558117eca (patch) | |
tree | 8f9113a77d12470c5c851a2a8e4cb02e89df7d43 /openssl/crypto/des/asm/readme | |
download | vcxsrv-3562e78743202e43aec8727005182a2558117eca.tar.gz vcxsrv-3562e78743202e43aec8727005182a2558117eca.tar.bz2 vcxsrv-3562e78743202e43aec8727005182a2558117eca.zip |
Checked in the following released items:
xkeyboard-config-1.4.tar.gz
ttf-bitstream-vera-1.10.tar.gz
font-alias-1.0.1.tar.gz
font-sun-misc-1.0.0.tar.gz
font-sun-misc-1.0.0.tar.gz
font-sony-misc-1.0.0.tar.gz
font-schumacher-misc-1.0.0.tar.gz
font-mutt-misc-1.0.0.tar.gz
font-misc-misc-1.0.0.tar.gz
font-misc-meltho-1.0.0.tar.gz
font-micro-misc-1.0.0.tar.gz
font-jis-misc-1.0.0.tar.gz
font-isas-misc-1.0.0.tar.gz
font-dec-misc-1.0.0.tar.gz
font-daewoo-misc-1.0.0.tar.gz
font-cursor-misc-1.0.0.tar.gz
font-arabic-misc-1.0.0.tar.gz
font-winitzki-cyrillic-1.0.0.tar.gz
font-misc-cyrillic-1.0.0.tar.gz
font-cronyx-cyrillic-1.0.0.tar.gz
font-screen-cyrillic-1.0.1.tar.gz
font-xfree86-type1-1.0.1.tar.gz
font-adobe-utopia-type1-1.0.1.tar.gz
font-ibm-type1-1.0.0.tar.gz
font-bitstream-type1-1.0.0.tar.gz
font-bitstream-speedo-1.0.0.tar.gz
font-bh-ttf-1.0.0.tar.gz
font-bh-type1-1.0.0.tar.gz
font-bitstream-100dpi-1.0.0.tar.gz
font-bh-lucidatypewriter-100dpi-1.0.0.tar.gz
font-bh-100dpi-1.0.0.tar.gz
font-adobe-utopia-100dpi-1.0.1.tar.gz
font-adobe-100dpi-1.0.0.tar.gz
font-util-1.0.1.tar.gz
font-bitstream-75dpi-1.0.0.tar.gz
font-bh-lucidatypewriter-75dpi-1.0.0.tar.gz
font-adobe-utopia-75dpi-1.0.1.tar.gz
font-bh-75dpi-1.0.0.tar.gz
bdftopcf-1.0.1.tar.gz
font-adobe-75dpi-1.0.0.tar.gz
mkfontscale-1.0.6.tar.gz
openssl-0.9.8k.tar.gz
bigreqsproto-1.0.2.tar.gz
xtrans-1.2.2.tar.gz
resourceproto-1.0.2.tar.gz
inputproto-1.4.4.tar.gz
compositeproto-0.4.tar.gz
damageproto-1.1.0.tar.gz
zlib-1.2.3.tar.gz
xkbcomp-1.0.5.tar.gz
freetype-2.3.9.tar.gz
pthreads-w32-2-8-0-release.tar.gz
pixman-0.12.0.tar.gz
kbproto-1.0.3.tar.gz
evieext-1.0.2.tar.gz
fixesproto-4.0.tar.gz
recordproto-1.13.2.tar.gz
randrproto-1.2.2.tar.gz
scrnsaverproto-1.1.0.tar.gz
renderproto-0.9.3.tar.gz
xcmiscproto-1.1.2.tar.gz
fontsproto-2.0.2.tar.gz
xextproto-7.0.3.tar.gz
xproto-7.0.14.tar.gz
libXdmcp-1.0.2.tar.gz
libxkbfile-1.0.5.tar.gz
libfontenc-1.0.4.tar.gz
libXfont-1.3.4.tar.gz
libX11-1.1.5.tar.gz
libXau-1.0.4.tar.gz
libxcb-1.1.tar.gz
xorg-server-1.5.3.tar.gz
Diffstat (limited to 'openssl/crypto/des/asm/readme')
-rw-r--r-- | openssl/crypto/des/asm/readme | 131 |
1 files changed, 131 insertions, 0 deletions
diff --git a/openssl/crypto/des/asm/readme b/openssl/crypto/des/asm/readme new file mode 100644 index 000000000..1beafe253 --- /dev/null +++ b/openssl/crypto/des/asm/readme @@ -0,0 +1,131 @@ +First up, let me say I don't like writing in assembler. It is not portable, +dependant on the particular CPU architecture release and is generally a pig +to debug and get right. Having said that, the x86 architecture is probably +the most important for speed due to number of boxes and since +it appears to be the worst architecture to to get +good C compilers for. So due to this, I have lowered myself to do +assembler for the inner DES routines in libdes :-). + +The file to implement in assembler is des_enc.c. Replace the following +4 functions +des_encrypt1(DES_LONG data[2],des_key_schedule ks, int encrypt); +des_encrypt2(DES_LONG data[2],des_key_schedule ks, int encrypt); +des_encrypt3(DES_LONG data[2],des_key_schedule ks1,ks2,ks3); +des_decrypt3(DES_LONG data[2],des_key_schedule ks1,ks2,ks3); + +They encrypt/decrypt the 64 bits held in 'data' using +the 'ks' key schedules. The only difference between the 4 functions is that +des_encrypt2() does not perform IP() or FP() on the data (this is an +optimization for when doing triple DES and des_encrypt3() and des_decrypt3() +perform triple des. The triple DES routines are in here because it does +make a big difference to have them located near the des_encrypt2 function +at link time.. + +Now as we all know, there are lots of different operating systems running on +x86 boxes, and unfortunately they normally try to make sure their assembler +formating is not the same as the other peoples. +The 4 main formats I know of are +Microsoft Windows 95/Windows NT +Elf Includes Linux and FreeBSD(?). +a.out The older Linux. +Solaris Same as Elf but different comments :-(. + +Now I was not overly keen to write 4 different copies of the same code, +so I wrote a few perl routines to output the correct assembler, given +a target assembler type. This code is ugly and is just a hack. +The libraries are x86unix.pl and x86ms.pl. +des586.pl, des686.pl and des-som[23].pl are the programs to actually +generate the assembler. + +So to generate elf assembler +perl des-som3.pl elf >dx86-elf.s +For Windows 95/NT +perl des-som2.pl win32 >win32.asm + +[ update 4 Jan 1996 ] +I have added another way to do things. +perl des-som3.pl cpp >dx86-cpp.s +generates a file that will be included by dx86unix.cpp when it is compiled. +To build for elf, a.out, solaris, bsdi etc, +cc -E -DELF asm/dx86unix.cpp | as -o asm/dx86-elf.o +cc -E -DSOL asm/dx86unix.cpp | as -o asm/dx86-sol.o +cc -E -DOUT asm/dx86unix.cpp | as -o asm/dx86-out.o +cc -E -DBSDI asm/dx86unix.cpp | as -o asm/dx86bsdi.o +This was done to cut down the number of files in the distribution. + +Now the ugly part. I acquired my copy of Intels +"Optimization's For Intel's 32-Bit Processors" and found a few interesting +things. First, the aim of the exersize is to 'extract' one byte at a time +from a word and do an array lookup. This involves getting the byte from +the 4 locations in the word and moving it to a new word and doing the lookup. +The most obvious way to do this is +xor eax, eax # clear word +movb al, cl # get low byte +xor edi DWORD PTR 0x100+des_SP[eax] # xor in word +movb al, ch # get next byte +xor edi DWORD PTR 0x300+des_SP[eax] # xor in word +shr ecx 16 +which seems ok. For the pentium, this system appears to be the best. +One has to do instruction interleaving to keep both functional units +operating, but it is basically very efficient. + +Now the crunch. When a full register is used after a partial write, eg. +mov al, cl +xor edi, DWORD PTR 0x100+des_SP[eax] +386 - 1 cycle stall +486 - 1 cycle stall +586 - 0 cycle stall +686 - at least 7 cycle stall (page 22 of the above mentioned document). + +So the technique that produces the best results on a pentium, according to +the documentation, will produce hideous results on a pentium pro. + +To get around this, des686.pl will generate code that is not as fast on +a pentium, should be very good on a pentium pro. +mov eax, ecx # copy word +shr ecx, 8 # line up next byte +and eax, 0fch # mask byte +xor edi DWORD PTR 0x100+des_SP[eax] # xor in array lookup +mov eax, ecx # get word +shr ecx 8 # line up next byte +and eax, 0fch # mask byte +xor edi DWORD PTR 0x300+des_SP[eax] # xor in array lookup + +Due to the execution units in the pentium, this actually works quite well. +For a pentium pro it should be very good. This is the type of output +Visual C++ generates. + +There is a third option. instead of using +mov al, ch +which is bad on the pentium pro, one may be able to use +movzx eax, ch +which may not incur the partial write penalty. On the pentium, +this instruction takes 4 cycles so is not worth using but on the +pentium pro it appears it may be worth while. I need access to one to +experiment :-). + +eric (20 Oct 1996) + +22 Nov 1996 - I have asked people to run the 2 different version on pentium +pros and it appears that the intel documentation is wrong. The +mov al,bh is still faster on a pentium pro, so just use the des586.pl +install des686.pl + +3 Dec 1996 - I added des_encrypt3/des_decrypt3 because I have moved these +functions into des_enc.c because it does make a massive performance +difference on some boxes to have the functions code located close to +the des_encrypt2() function. + +9 Jan 1997 - des-som2.pl is now the correct perl script to use for +pentiums. It contains an inner loop from +Svend Olaf Mikkelsen <svolaf@inet.uni-c.dk> which does raw ecb DES calls at +273,000 per second. He had a previous version at 250,000 and the best +I was able to get was 203,000. The content has not changed, this is all +due to instruction sequencing (and actual instructions choice) which is able +to keep both functional units of the pentium going. +We may have lost the ugly register usage restrictions when x86 went 32 bit +but for the pentium it has been replaced by evil instruction ordering tricks. + +13 Jan 1997 - des-som3.pl, more optimizations from Svend Olaf. +raw DES at 281,000 per second on a pentium 100. + |