The Block Cipher Square JoanDaemen 1 LarsKnudsen 2 VincentRijmen ?2 1 Banksys Haachtesteenweg1442 B-1130Brussel,Belgium Daemen.J@banksys.be 2
KatholiekeUniversiteit Leuven,ESAT-COSIC K.Mercierlaan94,B-3001Heverlee,Belgium
lars.knudsen@esat.kuleuven.ac.be
vincent.rijmen@esat.kuleuven.ac.be
Abstract. Inthis pap er we presentanew 128-bit blo ckciphercalled
Square.Theoriginal designofSquareconcentratesontheresistance against dierential and linear cryptanalysis. However, after the initial design adedicatedattackwas mountedthat forced usto augment the numb erofrounds.Thegoalofthispap eristhepublicationofthe result-ingcipherforpublicscrutiny.ACimplementationofSquareisavailable thatrunsat2.63MByte/sona100MHzPentium.OurM68HC05Smart Cardimplementationtsin547bytesandtakeslessthan2msec.(4MHz Clo ck).Thehighdegreeofparallellismallowshardwareimplementations intheGbit/srangeto day.
1 Intro duction
Inthis pap er, we prop osethe blo ckcipherSquare. Ithas ablo cklengthand keylengthof128bits.However,itsmo dulardesignapproachallowsextensions to higher blo ck lengths in a straightforward way. The cipher has a new self-recipro calstructure, similartothatofThreewayandSHARK[2,15].
The structure of the cipher, i.e., the typ es of building blo cksand their in-teraction, hasb een carefully chosento allow veryecient implementations on a wide range of pro cessors. The sp ecic choice of the building blo cks them-selveshasb eenledbytheresistanceofthecipheragainstdierentialandlinear cryptanalysis.Aftertreatingthestructureofthecipheranditsconsequencesfor implementations,weexplainthestrategiesfollowedtothwart linearand dier-entialcryptanalysis.Thisisfollowedbyadescriptionofanecientattackthat exploitstheparticularprop ertiesofthecipherstructure.
We do not encourage anyoneto use Square to day in any sensitive appli-cation. Clearly, condence in the security of any cryptographic design must b e based on the resistance against eective cryptanalysis after intense public scrutiny.
?
F.W.Oresearchassistant,sp onsoredbytheFundforScienticResearch{Flanders (Belgium).
AreferenceimplementationofSquareisavailablefromthefollowingURL: http://www.esat.kuleuven.ac.b e/rijmen/square.
2 Structure of Square
Squareisaniteratedblo ckcipherwithablo cklengthandakeylengthof128 bits each. The round transformation of Square is comp osed of four distinct transformations.Itishoweverimp ortanttonotethatthesefourbuildingblo cks canb eecientlycombinedinasingleset oftable-lo okups andexorop erations. Thiswillb etreatedlaterinthesectiononimplementationasp ects.
Thebasicbuildingblo cks ofthecipherare vedierent invertible transfor-mationsthatop erateona44arrayofbytes.Theelementofastate
ain row
iandcolumn
j issp eciedas
ai;j.Bothindexesstartfrom0.
2.1 ALinear Transformation
is alinear op eration that op erates separately on each of the four rowsof a state.Wehave : b= ( a), bi;j= cjai;0cj 1ai;1 cj 2ai;2 cj 3ai;3;
wherethemultiplication isin GF (2
8
)and theindicesof
c mustb e takenmo d-ulo4.Note that theeld GF(2
n
)has characteristic2[9]. Thismeans thatthe additionin theeldcorresp ondstothebitwiseexor.
The rows of a state can b e denoted by p olynomials, i.e., the p olynomial corresp ondingtorow
iofastate
aisgivenby
ai(
x)=
ai;0ai;1xai;2x
2
ai;3x
3
:
Usingthisnotation,anddening
c( x)=
L
j cjx j wecandescrib e asamo dular p olynomialmultiplication: b= ( a), bi( x)= c( x) ai( x)mo d1 x 4 for 0 i <4 : Theinverseofcorresp ondstoap olynomial
d( x)givenby d( x) c( x)=1 (mo d 1 x 4 ) : 2.2 ANonlinear Transformation
isanonlinearbytesubstitution, identicalforallbytes.Wehave
: b= ( a), bi;j=S ( ai;j) ;
withS aninvertible8-bitsubstitutiontableorS-b ox.Theinverseof
consists oftheapplicationof theinversesubstitutionS
1
2.3 A BytePermutation
Theeect of
istheinterchangingofrowsandcolumns ofastate.Wehave
: b= ( a), bi;j = aj;i: isaninvolution,hence 1 = .
2.4 Bitwise Round Key Addition [
k
t
]consistsofthebitwise additionofaroundkey
k t .Wehave [ k t ]: b= [ k t ]( a), b= ak t : Theinverseof [ k t ]is [ k t ]itself.
2.5 The Round Key Evolution
Theround keys
k
t
arederivedfrom the cipherkey
K in the following way.
k
0
equalsthecipherkey
K.Theotherroundkeysarederivediterativelybymeans oftheinvertibleanetransformation .
: k t = ( k t 1 )
2.6 The Cipher Square
The building blo cks are comp osed into the round transformation denoted by
[ k t ]: [ k t ]= [ k t ] (1)
Squareisdenedaseightroundspreceededbyakeyaddition
[ k 0 ]andby 1 : Square[ k]= [ k 8 ] [ k 7 ] [ k 6 ] [ k 5 ] [ k 4 ] [ k 3 ] [ k 2 ] [ k 1 ] [ k 0 ] 1 (2)
2.7 The InverseCipher
Aswill b e shown in Section 9,thestructure ofSquare lends itself toecient implementations.Foranumb erofmo desof op erationitis imp ortantthatthis isalsothecasefortheinversecipher.Therefore,Square hasb eendesignedin suchawaythat thestructureof itsinverseis equaltothat ofthecipheritself, withtheexceptionofthekeyschedule.Notethatthisidentityinstructurediers fromtheidentityofcomponents andstructurein IDEA[10].
From(2)itcan b eseenthat
Square 1 [ k]= 1 [ k 0 ] 1 [ k 1 ] 1 [ k 2 ] 1 [ k 3 ] 1 [ k 4 ] 1 [ k 5 ] 1 [ k 6 ] 1 [ k 7 ] 1 [ k 8 ]
- a S[a] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] S[] - a b c d a b c d -
Fig. 1.Geometrical representationof the basic op erations of Square. consists of 4 parallel linear diusion mappings. consists of 16 separate substitutions. is a transp osition.
with 1 [ k t ]= 1 1 1 1 [ k t ]= 1 1 [ k t ] (3)
It mayseem that the structure of theinverse cipher dierssubstantially from thatofthecipheritself. Byexploitingsomealgebraicprop ertiesofthebuilding blo cks,wecan show this notto b ethe case.Since
only transp osesthebytes
ai;j and
only op eratesontheindividual bytes, indep endentof theirp osition ( i; j),wehave 1 = 1 : Moreover,since 1 ( a) k t = 1 ( a+ ( k t )),wehave [ k t ] 1 = 1 [ ( k t )] ;
Wenowdenetheroundtransformationoftheinversecipheras
0 [ k t ]= [ k t ] 1 1 ; (4)
which hasthe same structure as
itself, except that
and are replaced by 1 and 1
resp ectively.Usingthealgebraicprop ertiesab ove,wecanderive
[ k 0 ] 1 [ k 1 ]= [ k 0 ] 1 1 [ k 1 ] = 1 [ ( k 0 )] 1 [ k 1 ] = [ ( k 0 )] 1 [ k 1 ] = [ ( k 0 )] 1 [ k 1 ] 1 = [ ( k 0 )] 1 1 [ ( k 1 )] = 0 [ ( k 0 )] [ ( k 1 )]
Thisequationcanb egeneralizedinastraightforwardwaytoinclude morethan oneround. Now,with
t = ( k 8 t ),we have Square 1 = 0 [ 8 ] 0 [ 7 ] 0 [ 6 ] 0 [ 5 ] 0 [ 4 ] 0 [ 3 ] 0 [ 2 ] 0 [ 1 ] [ 0 ]
Hence the inversecipher is equal to the cipher itself with
replaced by 1 , with by 1
anddierentround keyvalues.
2.8 First round The 1 b efore [ k 0
]inSquarecanb eincorp oratedintherstround.Wehave
[ k 1 ] [ k 0 ] 1 = [ k 1 ] [ k 0 ] 1 = [ k 1 ] [ ( k 0 )] Hence theinitial
1
can b e discardedbyomitting
in therstroundand applying ( k 0 ) instead of k 0
. The same simplication can b e applied to the inversecipher.
3 Linear and Dierential Cryptanalysis
Theresistanceagainstlinearcryptanalysis[12]anddierentialcryptanalysis[1] hasb eentherationaleb ehindthecriteriabywhichtheS substitutionandthe
multiplicationp olynomial
c(
x)haveb eenchosen.
Adierencepropagationalongtheroundsofaniteratedblo ckcipheris gen-erallycalledadierentialcharacteristic. Acharacteristicissp eciedbyaseries of dierence patterns. The probability asso ciated with a characteristic is the probabilitythat all intermediatedierencepatterns havethevaluesp ecied in theab oveseries.Wecalladierentialcharacteristicadierentialtrail.The prob-abilityasso ciated withadierential trailcanb e approximatedby thepro duct ofthe dierencepropagationsb etween everypairof subsequentrounds (which canb e easilycalculated). Theprobabilitythat agiven dierencepattern
a
0
at theinputofanumb erofcipherroundsgivesrisetoadierencepattern
b
0
atthe outputis equalto the sumof theprobabilities of alldierential trails starting with
a
0
andendingwith
b
0
.Ingeneralthepropagationfromtheinputdierence pattern
a
0
totheoutputdierencepattern
b
0
is calledadierential.
Ascanb e seenin[12] thecorrelationb etweenalinearcombination ofinput bitsand alinearcombination ofoutput bitsof aniteratedblo ckciphercanb e treatedin ananalogous butslightlydierent way. Alinear trailis sp ecied by aseriesof selectionpatterns. Foragiven cipherkey, thecorrelation coecient
(p ositive or negative) corresp onding to a linear trail consists of the pro duct of thecorrelation co ecients b etween the linearcombinations of bits of every pairof subsequentrounds. In[2] it was shown that the correlation b etween a linearcombination of input bits, denoted by selection pattern
u, and a linear combinationofoutputbits,denoted by
v isequaltothesumof thecorrelation co ecientsofalllineartrailsstartingwith
uandendingin
v.Itmustb eremarked that the correlationco ecients mayb e p ositive ornegative and that thesign dep endsonthevalueofround keybits.
S and
c(
x)arechosentominimizethemaximumprobabilityof dierential trails and the maximum correlation of linear trails over four rounds. This is obtainedintheframeworkofaverysp ecicapproach.
3.1 WideTrail DesignStrategy
In[2] the `wide traildesign strategy' wasintro duced asa means to guarantee lowmaximumprobabilityofmultiple-rounddierentialtrailsandlowmaximum correlationof multiple-round linear trails.In this strategy the round transfor-mationis comp osed of anumb er of uniform transformations, that are split in thenonlinear blo ckwise substitution(corresp onding to our
) and thecomp o-sition of the linear transformations (corresp onding to our
). The round keyaddition do es notplayarole in thestrategy. It wasshown in [2]that the probabilityof a dierential trail is the pro duct of the input-output dierence propagationprobabilities of the S-b oxeswith nonzero inputdierence (`active S-b oxes'). The correlation of a linear trailis the pro duct of the input-output
correlations of the S-b oxes with nonzero output selection patterns (`active S-b oxes').Thetwo mechanismsfor eliminatinghigh-probabilitydierentialtrails andhigh-correlationlineartrails arethefollowing:
{ Cho oseanS-b oxwherethemaximumdierencepropagationprobabilityand themaximuminput-outputcorrelationareassmall asp ossible.
{ Cho osethelinearpartinsuchawaythattherearenotrailswithfewactive S-b oxes.
Therstmechanism givesus two clearcriteriafortheselection oftheS . The second mechanismgives ahint onhow to selectthe multiplicationp olynomial
c(
x). Inthefollowingsectionwewillfo cusonthelinearpart
.
4 The Multiplication Polynomial c(x)
Thetransformation
treatsthedierentrowsofastate
acompletelyseparately andinthesameway.Wewillnowstudythedierencepropagationand correla-tionprop ertiesof
, concentratingonasinglerow.Assume aninput dierence sp eciedby a 0 ( x)= a( x) a (
x).Theoutputdierencewillb e givenby
b 0 ( x)= c( x) a( x) c( x) a ( x)mo d1 x 4 = c( x) a 0 ( x)mo d1 x 4 :
Ontheotherhand,alinearcombinationofoutputbits,sp eciedbytheselection p olynomial
u(
x)is equalto(i.e., correlatedto, withcorrelationco ecient1) a linearcombinationofinputbits,sp eciedbythefollowingselectionp olynomial [2]: v( x)= c( x 1 ) u( x)mo d1+ x 4 :
Itisintuitivelyclearthatb othlinearanddierentialtrailswouldb enetfroma multiplicationp olynomialthatcouldlimitthenumb erofnonzerotermsininput andoutputdierence(andselection)p olynomials.Thisisexactlywhatwewant toavoidbycho osingap olynomialwithahighdiusionp ower,expressedbythe so-calledbranch number.
Let
wh(
a)denotetheHammingweightofavector,i.e.,thenumb erofnonzero comp onents in that vector. Applied to a state
a, a dierence pattern
a
0
or a selectionpattern
u,thiscorresp ondstothenumb erofnon-zerobytes.In[2]the branchnumb erBofaninvertiblelinearmappingwasintro ducedas
B( )=min a6=0 ( wh( a)+ wh( ( a))) :
ThisimpliesthatthesumoftheHammingweightsofapairofinputandoutput dierencepatterns(orselectionpatterns)to
isatleastB.Itcaneasilyb eshown that B is a lower b ound for the numb er of active S-b oxes in two consecutive roundsof alinearordierential trail.Since
op eratesoneachrow separately, wecanhaveB=5at most.
In [15] it was shown how a linear mapping over GF(2
m
)
n
with optimal B (B=
MDS-co de used is a Reed-Solomon co de over GF(2
m
): if
Ge = [
InnBnn] is theechelon form of the generation matrixof and (2
n; n; n+1)-RS-co de, then
:
X 7!Y =
BX denesalinearmappingwithoptimalbranchnumb er. Thep olynomial multiplication with
c(
x)corresp onds to asp ecial subsetof the MDS-co des, having the additional prop erty that
B is a circulant matrix. A circulantmatrixis amatrix whereevery rowconsists ofthe sameelements, shifted over one p osition, or
bi;j =
b0;j imo dn
. This prop erty is exploited in section9.2to pro duce amemory-ecient implementationofthe cipher.In[11] wendthefollowingtheorem:
Theorem1. An (
n; k; d)-code C with generator matrix G = [
I B] is MDS i everysquaresubmatrixof
B is nonsingular.
Inamatrixwithelements fromGF (2
m
)everydeterminant hasaprobabilityof 2
m
toevaluateto zero.Forincreasingsizeofthematrixthenumb erof deter-minantsincreasesexp onentially,makingitinfeasibleto searchrandomlyfor an MDS-co de.However,in acirculantmatrixthenumb erofdistinctdeterminants isonlyafractionofthenumb er forarbitrarymatrices(cf. Table1). Byimp os-ingtheextraconstraintthat thematrix shouldb e acirculant, weincreasethe probabilitytondanMDS-co debyrandomsearch.
ngenericcirculant ngenericcirculant
1 1 1 5 252 41
2 5 3 6 924 111
3 20 7 7 3431 309
4 70 17 8 12869 935
Table 1.Thenumb erof squaresubmatricesinagenericmatrixofordern, andthe numb erofnon-equivalentdeterminants ina circulantmatrixof thesame order.The numb ersofthelastcolumnwereobtainedbyanexhaustivecomputersearch.
c(
x) corresp onds to a 44 matrix, hence if we cho ose it randomly, the probabilitythat ithasB=5canb e approximatedby(1
1 256 ) 17 0 :93.This givesusahighdegreeoffreedom inthechoiceof
c( x).Wecho ose c( x)=2 x 1 x x1x x 2 3x x 3 : Thisdetermines d( x)uniquely. d( x)=E x 9 x xDx x 2 B x x 3
4.1 Motivation forthe Choice of
Since the branchnumb er of
c(
x) is 5, thenumb er of active S-b oxes in a two-round trailis at least 5.The eect of
theeectthatanytrailoverfourconsecutiveroundswillhaveatleast25active S-b oxes.A simpleand clearpro ofof thisisavailableandwill b epublishedin a moretheoreticalpap erthatisb eing written[3].
5 The Nonlinear Substitution
As explained ab ove, the relevant criteria imp osed up on the
S-b ox are the highest(inabsolutevalue)o ccurringcorrelationb etweenanypairoflinear com-bination of input bits and linear combinations of output bits (denoted by
) andthehighesto ccurring probabilitycorresp ondingto anypairofinput dier-enceandoutputdierencepattern.Thiscorresp ondstothehighestvalueinthe so-calledexortable ofthe
S-b ox,denedas Eij=#f xjS( x) S( xi)= jg: Wedene =maxi;jf Eijg 2 8 .
We present three alternative choices for the S-b ox: explicitly constructed nonlinearalgebraic transformations,slightlymo diedversionsofthelatterand randomlyselectedinvertiblemappings.
5.1 ExplicitConstruction
In [13] a metho d is given to construct
m-bit S-b oxes with
= 2
1 m=2
and
=2
2 m
,thetheoreticallyminimump ossiblevalues.Fromtheprop osalsin[13] weselectthemapping
x7!x 1 overGF(2 8 ),with =2 6 and =2 3 . Theproblemwiththischoiceisthatthemappinghasaverysimple descrip-tioninGF(2
8
).Theothercomp onents oftheroundtransformationalsohavea simpledescription in GF(2
8
). This mayenable cryptanalyticattacksbased on thealgebraic manipulationofequationstoderivekeyinformation[4].
Notethatany
m-bitmappingcanb erepresentedasap olynomialorarational forminGF(2
m
).Itishoweverunlikelythatthisrepresentationcanb eexploited in cryptanalysis if the p olynomial or rational form is of no sp ecial, relatively simple,form.
Thefeasibilityofalgebraicmanipulationcanb eseverelydiminished.The ele-mentsofGF(2
8
)canb erepresentedwithresp ecttodierentbases.Bycho osing a dierent basis for the denition of
and
we can prevent that the round transformationhasasimpledescriptioninanybasisofGF (2
8
).
Still,evensp eciedinanotherbasis,thechosennonlinearmappingstaysan involutionandhastwoxedp oints:0and1.Byapplyingananetransformation ontheindividualbitsoftheoutputtheseprop ertiescanb eremovedandasimple algebraicexpressionoftheroundtransformationin anybasisofGF(2
8
)canb e prevented.
5.2 Mo dications
Anothermetho dtopreventasimplealgebraicdescriptionisbycho osinga map-ping accordingto themetho d explained in the previoussubsection and subse-quentlymo difyingitslightlytodestroytheexploitablealgebraicstructure.Itwill b eseenthat thedisadvantageofthisapproachisthat
and/or
willincrease. We conducted some exp erimentsstarting from the mapping multiplicative inverseinGF(2
8
)as prop osedab ove(
=2 6 and =82 6 )andweapplied asmallnumb erofmo dications.
Whenweconsiderthemappingasalo ok-uptableandinvestigateallvariants thathaveapairofentriesswapp ed,anincreaseisobservedof
to62
8
and/or
to92
6
.Wealsotested300000variantsinwhichfouroreightentrieswere swapp ed.Swapping four entriesincreases
to 92
6
, swapping eightentries increases to102 6 and to62 8 . 5.3 RandomSearch
Algebraicallyconstructedp ermutationsalwaysexhibitsomestructurethatmay b eexploitedin attacksin unanticipatedways,designersoftenresorttorandom substitutions:asubstitutionisselectedfromasetofsubstitutionsthatare gen-eratedbytheuseofarandomsourceandevaluatedwithresp ectto(presumably) relevant nonlinearity criteria. In[14] the averagedierential prop ertiesof p er-mutationsareinvestigatedandab oundfortheexp ectedvalueof
isgiven.For an m-bit p ermutation lim m!1 E[ 2 m ] 2 m 1 :
We veried this exp erimentally for 1.5 million samples with
m = 8 and measuredatthesametime
and
.Theresultsaregivenintable2.TheS-b oxes with the highest resistance against b oth linear and dierential cryptanalysis, have =102 8 and =152 6 . 82 8 102 8 122 8 142 8 162 8 182 8 202 8 152 6 0 0.07 0.07 0.006 0.0001 0 0 162 6 0.0003 4.77 5.58 0.58 0.04 0.002 0 172 6 0.002 15.63 20.55 2.24 0.15 0.007 0.0004 182 6 0.000212.21 17.17 1.96 0.13 0.007 0.0005 192 6 0.0004 4.91 7.31 0.87 0.05 0.003 0 202 6 0 1.52 2.34 0.28 0.02 0.001 0 212 6 0 0.41 0.64 0.08 0.004 0.001 0
Table 2.Maximuminput-outputcorrelationanddierencepropagationprobabilityof randomlygeneratednonlinearp ermutations.Theentriesdenotethep ercentageofthe generatedmappingsthathavetheindicatedand.
5.4 OurChoice
Because of its optimal values for
and
, we havedecided to take for S an S-b ox that is constructed by taking the mapping
x 7! x
1
and applying an anetransformation(overGF (2))totheoutputbits.Thisanetransformation has the prop erty that it has a complicated description in GF (2
8
) to thwart interp olationattacks[4].
Ourchoicesforceallfour-rounddierentialtrailstohaveanasso ciated prob-abilitynothigherthan2
150
, farb elow thecriticalnoisevalueof2
127
. Equiv-alently,four-roundlineartrailshaveanasso ciatedcorrelationnotover2
75
,far b elowthecriticalnoisevalueof2
64
.Hence,forresistanceagainstconventional LCandDCsixroundsmayseemsucient.However,thesp ecicblo cked struc-tureof the cipher allowsfor more ecient dedicated dierentialattacks. This willb eexplainedin thefollowingsection.
6 A Dedicated Attack
Inthissectionwedescrib eadedicatedattackthatexploitsthecipherstructure of Square. Theattack is achosen plaintext attack andis indep endent of the sp ecicchoicesof
S ,
c(
x)andthekeyschedule.Itisfasterthananexhaustive key search for Square versions of up to 6 rounds. After describing the basic attackon4rounds,wewillshowhowitcanb eextendedto 5and6rounds.
6.1 Preliminaries
Leta
-set b easet of256statesthatarealldierentinsomeofthe(16)state bytes(the active)andall equalin theotherstatebytes (the passive). Let
b e theset ofindicesoftheactivebytes.Wehave
8 x; y2:
xi;j6= yi;j for( i; j)2 xi;j= yi;j for( i; j)62Inthissectionwewillmakeuseofthegeometricalinterpretationaspresented in Figure 1. Applying the transformations
and
[
k
t
] on (the elements of ) a
-setresultsina(generallydierent)
-setwiththesame
.Applying
results ina
-setin whichtheactivebytesaretransp osedby
. Applying
toa
-set do esnotnecessarilyresultin a
-set.However,sinceeveryoutputbyte of
is alinearcombination(withinvertibleco ecients) ofthefourinputbytesin the samerow, an input row with a singleactive byte gives rise to anoutput row withonlyactivebytes.
6.2 Four Rounds
Considera
-setinwhichonlyonebyteisactive.Wewillnowtracetheevolution ofthep ositionsoftheactivebytesthrough3rounds.The1stroundcontainsno
, hencethereisstillonlyonebyteactiveattheb eginningofthe2ndround.
2ndroundconvertsthisto acompleterowofactive bytes,that issubsequently transformedby
to a completecolumn.
of the3rd roundconverts this to a
-setwithonlyactivebytes.Thisisstillthecaseattheinputtothe4thround. Sincethebytesoftheoutputsofthe3rdround(denotedby
a)rangeoverall p ossiblevaluesandarethereforebalancedoverthe
-set,wehave
M
b= (a);a2 bi;j=M
a2M
k cj kai;k=M
l clM
a2 ai;l+j =M
l cl0=0 :Hence, the bytes of the output of
of the fourth round are balanced. This balancednessisingeneraldestroyedbythesubsequentapplication of
. An outputbyteofthe4thround(denoted by
ahere)canb eexpressedasa functionoftheintermediatestate
b ab ove ai;j = S [ bj;i] k 4 i;j:
Byassumingavaluefor
k
4
i;j,thevalueof
bj;iforallelementsofthe
-setcanb e calculatedfrom theciphertexts. Ifthevaluesofthisbytearenotbalancedover
,theassumedvalueforthekeybyte waswrong.Thisisexp ectedto eliminate allbut approximately1key value.This canb erep eated for theother bytesof
k
4
.
Weimplemented theattackand foundthat two
-setsof 256chosen plain-textseacharesucienttouniquelydeterminethecipherkeywithan overwhelm-ingprobabilityofsuccess.
6.3 Extension by a Roundat the End
If an additional round is added, we have to calculate the ab ove value of
bj;i
fromtheoutputofthe5throundinsteadofthe4thround.Thiscanb edoneby additionallyassumingavalueforasetof4bytesofthe5throundkey.Asinthe caseofthe 4-roundattack,wrong key assumptionsare eliminatedby verifying that
bj;i isnotbalanced. Inthis5-roundattack2
40
keyvaluesmustb e checked,andthismustb e re-p eated4times.Sincebychecking asingle
-setleavesonly1
=256ofthewrong keyassumptionsasp ossiblecandidates, thecipherkeycanb efoundwith over-whelmingprobabilitywithonly5
-sets.
6.4 Extension by a Roundat the Beginning
Thebasicideaistocho oseasetofplaintextsthatresultsina
-setattheoutput of the 2nd round with a single active S-b ox. This requires the assumption of valuesoffourbytesoftheroundkey
k
0
. If the intermediate state after
of the 2nd round hasonly a single active byte, this is also the case for the output of the 2nd round. This imp oses the followingconditionsonarowoffour inputbytesof
of thesecondround: one particular linear combination of these bytes must range over all 256 p ossible values(active)while3otherparticularlinearcombinationsmustb econstantfor
all256 states.This imp oses identical conditionson thebytes in thesame row in the input to
[
k
1
], and consequently on a column of bytes in the input to
of the1st round. Ifthe corresp ondingcolumn of bytes of
k
0
is known,these conditionscan b econvertedtoconditionsonfourplaintextbytes.
Nowwe consideraset of2
32
plaintexts, suchthatthearrayofbytesin one columnrangesoverallp ossible valuesandallotherbytesareconstant.
Now,makeanassumptionforthevalueofthe4bytesoftherelevantcolumn of
k
0
.Selectfrom thesetof2
32
availableplaintexts,asetof256plaintextsthat ob eytheconditionsindicatedab ove.Nowthe4-roundattackcanb ep erformed. Forthegivenkeyassumption,theattackcanb erep eatedforaseveralplaintext sets. Ifthebyte valuesof
k
5
suggestedby theseattacksare notconsistent,the initialassumptionmusthaveb eenwrong.Acorrectassumptionforthebytesof
k
0
willresultintheswiftandconsistentrecup erationofthelastround key. Weimplemented this attackwhere weassumed knowledge of16 bitsof the rst-round key. The attack found the other 16 bits of the rst-round key and 128bitsofthelast-roundkeyusingonly2structuresof256plaintextsforevery keyvalueguessedin therstround.
6.5 Complexityof the Attacks
Combiningb othextensionsresultsina6roundattack.Althoughinfeasiblewith currenttechnology,thisattackisfasterthanexhaustivekeysearch,andtherefore relevant. Wehavenotfound extensionsto 7roundsfaster than exhaustivekey search.
Wesummarizetheattacksin Table3.
Attack #Plaintexts Time Memory
4-round 2 9 2 9 small 5-roundtyp e1 2 11 2 40 small 5-roundtyp e2 2 32 2 40 2 32 6-round 2 32 2 72 2 32
Table 3.ComplexitiesoftheattackonSQUARE.
7 Numb er of Rounds
Duetotheseattackswehavetoincreasethenumb erofroundstoatleastseven. Asasafetymargin,wexedthenumb erofroundsto eight.
Conservative users are free to increase the numb er of rounds. This canb e donein astraightforwardway and requires no adaptationof the key schedule whatso ever.
8 The Key Evolution
Thekeyschedulesp eciesthederivationoftheroundkeysintermsofthecipher key.Its functionistoprovide resistanceagainstthefollowingtyp esofattack:
{ Attacksin which partofthe cipherkey isknown to thecryptanalyst, e.g., ifthecipherisusedwithakeyshorterthan128bits.
{ Attackswherethekeyentrytothecipherisknownorcanb echosen,e.g.,if thecipherisusedasthecompressionfunctionofahashalgorithm[7].
{ Related-keyattacks.
Resistanceagainstthersttyp eofattackcanb eimprovedbyakeyschedulein whichtheroundkeyundergo esatransformationwithhighdiusion.Forago o d scheme, theknowledgeof acertainnumb er of bits of oneroundkey xesvery fewbitsinotherroundkeys.Theothertwotyp esofattackexploitregularitiesin thestructureofthekeyschedulebylo callycomp ensatingroundkeydierences [5,7].
Thekeyschedulealsoplaysanimp ortantroleintheeliminationofsymmetry:
{ Symmetry in the round transformation: the round transformation treatsallbytesofastateinverymuchthesameway.Thissymmetrycanb e removedbyhavinground constantsinthekeyschedule.
{ Symmetry b etween the rounds: theround transformationis the same for all rounds. This equality can b e removed by having round-dep endent roundconstantsin thekeyschedule.
Thekeyscheduleisdened intermsoftherowsofthekey.Wecandenea leftbyte-rotationop erationrotl(
ai)onarowas
rotl[
ai;0ai;1ai;2ai;3]=[
ai;1ai;2ai;3ai;0] andarightbyterotationrotr(
ai)asitsinverse. Thekey scheduleiterationtransformation
k
t+1
= (
k
t
) and itsinverseare denedby k t+1 0 = k t 0 rotl( k t 3 ) Ct k t+1 1 = k t 1 k t+1 0 k t+1 2 = k t 2 k t+1 1 k t+1 3 = k t 3 k t+1 2 t+1 3 = t 3 t 2 t+1 2 = t 2 t 1 t+1 1 = t 1 t 0 t+1 0 = t 0 rotr( t 3 ) C 0 t
The simplicity of the inverse key scheduleis thanks to the fact that
and commute.Theroundconstants
Ctarealsodenediteratively.Wehave
C0 =1 x and Ct=2 x Ct 1 .
Thischoiceprovideshighdiusionandremovestheregularitiesinanecient way.
9 Implementation Asp ects 9.1 8-bit Pro cessor
Onan8-bitpro cessorSquarecanb eprogrammedbysimplyimplementingthe dierent comp onent transformations. This is straightforward for
,
and . The transformation
requires a table of 256 bytes.
requires multiplication in the eld GF (2
8
). However, the multiplication p olynomial has b een chosen to makethis very ecient. Wehavewritten aprogram implementing Square in Assembler for the Motorola's M68HC05 micropro cessor, typical for Smart Cards.The machineco deo ccupies in total 547bytes of ROM, needs 36bytes ofRAMandoneexecutionofSquare,includingthekeyschedule,takesab out 7500cycles.Thiscorresp ondstolessthan2msecwith a4MHzClo ck.
The inverse cipher howeveris signicantly slower than theforward cipher. Thisiscausedbythedierencein complexityb etween
and
1
.
9.2 32-bit Pro cessor
Intheimplementationofthecipher,thesuccessionofsteps
[ k t ] = [ k 0t ] with k 0t = ( k t
)can b ecombinedin asinglesetoftable lo okups.The interme-diatestatecan b erepresentedbyfour32-bit words,eachcontainingarow[
ai]. Itstransp oseisdenotedby[
ai] T .For b= ( ( ( a)))+ k 0t wehave [ bi] T =
2
664
c0c3c2c1 c1c0c3c2 c2c1c0c3 c3c2c1c03
775
2
664
S [ a0;i ] S [ a1;i ] S [ a2;i ] S [ a3;i ]3
775
[ k 0t i] T =2
664
c0 c1 c2 c33
775
S [ a0;i ]2
664
c3 c0 c1 c23
775
S [ a1;i ]2
664
c2 c3 c0 c13
775
S [ a2;i ]2
664
c1 c2 c3 c03
775
S [ a3;i ][ k 0t i] TWedene thetables
M and T as M[ a]= a
c0c1c2c3 T[ a]= M[ S[ a]] : T andM have 256 entries of four bytes each. The table
M implements the p olynomialmultiplication.
T combinesthenonlinearsubstitutionwiththis mul-tiplication.Nowwehave
[ bi]=
M
j rotrj( T[ aji])[ k 0t i] : We conclude that [ ( k t i)]can b e done with 16 table lo okups, 12 rotationsand16exorsof32-bitwords. Thisimplementationneedsthetable
T, with256entriesoffourbytes, i.e.onekilobyteintotal.
Last Round It can b e seenthat in this implementation,
of the last round is alreadyexecuted in theprevious set of table-lo okups. Inthe last round the functiontob eappliedis
[
k
8
]
.Thiscanb erealisedbyreplacingthetable
T[ x]= M[ S[ x]] by S[ x]. Since c2 =1 x , theunity in GF(2 8
),theentries ofthe smalltable
S canb e extractedfrom
T,removingtheextrastoragerequirement for
S.
Performance Thereference implementationis written in C and runs at 2.63 MByte/sona100MHzPentiumwiththeWindows95op eratingsystem.The in-verseciphercanb eimplementedinexactlythesamewayasthecipheritselfand hasthesamep erformance.Thedierenceisinthetablesandtheprecalculation oftheroundkeys.
10 Acknowledgements
WethankPauloBarreto,whowrotetheoptimizedreferenceimplementationof
Square.PauloBarretocanb ereachedatpbarreto@uninet.com.br.
References
1. E.BihamandA.Shamir,\DierentialcryptanalysisofDES-likecryptosystems,"
JournalofCryptology,Vol.4,No.1,1991,pp.3{72.
2. J.Daemen,\Cipherand hashfunctiondesignstrategiesbasedonlinearand dif-ferentialcryptanalysis,"DoctoralDissertation,March1995,K.U.Leuven.
3. J.DaemenandV.Rijmen,\Self-recipro calcipherstructures,"COSICinternal re-port96-3,1996.
4. T.JakobsenandL.R.Knudsen,\Theinterp olationattackonblo ckciphers,"these proceedings.
5. J.Kelsey,B.Schneier and D.Wagner, \Key-schedulecryptanalysis of IDEA, G-DES, GOST, SAFER, and Triple-DES," Advances in Cryptology, Proceedings Crypto’96,LNCS1109,N.Koblitz,Ed.,Springer-Verlag,1996,pp.237{252. 6. L.R.Knudsen,\Truncatedandhigherorderdierentials," FastSoftware
Encryp-tion,LNCS1008,B.Preneel, Ed.,Springer-Verlag,1995,pp.196{211.
7. L.R.Knudsen, \A key-schedule weakness in SAFER-K64," Advances in Cryp-tology, Proceedings Crypto’95,LNCS963,D.Copp ersmith, Ed.,Springer-Verlag, 1995,pp.274{286.
8. L.R.KnudsenandT.A.Berson, \Truncated dierentialsof SAFER," Fast Soft-wareEncryption,LNCS1039,D.Gollmann,Ed.,Springer-Verlag,1996,pp.15{26. 9. N.Koblitz, \A Course in Number Theory and Cryptography," Springer-Verlag,
NewYork,1987.
10. X.Lai, J.L.Masseyand S.Murphy,\Markovciphersand dierential cryptanaly-sis," AdvancesinCryptology,ProceedingsEurocrypt’91,LNCS547,D.W.Davies, Ed.,Springer-Verlag,1991,pp.17{38.
11. F.J.MacWilliams,N.J.A.Sloane,\TheTheoryofError-CorrectingCodes," North-Holland,Amsterdam,1977.
12. M.Matsui, \Linear cryptanalysis metho d for DES cipher," Advances in Cryp-tology, Proceedings Eurocrypt’93, LNCS765, T.Helleseth, Ed., Springer-Verlag, 1994,pp.386{397.
13. K.Nyb erg, \Dierentially uniform mappings for cryptography," Advances in Cryptology, Proceedings Eurocrypt’93, LNCS765, T.Helleseth, Ed., Springer-Verlag,1994,pp.55{64.
14. L.O'Connor,\Onthedistributionofcharacteristicsinbijectivemappings," Jour-nalofCryptology,Vol.8,No.2,1995, pp.67{86..
15. V.Rijmen,J.Daemen etal., \The cipher SHARK," Fast Software Encryption, LNCS1039,D.Gollmann,Ed.,Springer-Verlag,1996,pp.99{112.