THEOREM: I <= 4M in GF(p^n)



THEOREM: I <= 4M in GF(p^n)

That is field inversion is no more costly than the cost of 4 field
multiplications in GF(p^n).
A part of this hypotetic formula is in relation with the fact that N
subfield inversions can be computed with "4N- 5" sub-f multiplications
and 1 sub-f inversion. (via a complete binary tree and a specific 2-
pass algorithm that in the end computes the inverses in leafs).
Fast inversion in GF(p^n) comes with a modification of the Euclid
algorithm.
Given the equations: X= Ax+ By
Y= Cx+ Dy,
a reduction is: X= X- Y* W
A= A- C* W, where W= X[nx]/Y[ny]
and nx,ny= highest non-zero digit of X,Y
To avoid divisions, we can consider A[I]= A_hi[I]/A_lo[I] and X[I]=
X_hi[I]/X_lo[I], but this leads to a slow inversion algoritmh, cost
approximatively I<= 10M.
A fast algorithm derives from the point of view that X, Y, A and C
are vectors and pairs (X,A) and (Y,C) each have a common 1-digit
denominator, call them Lax and Lcy.
Now the Euclidean reduction step becomes:
X/Lax= X/Lax - (Y/Lcy)* (X[nx]/Lax)/(Y[ny]/Lcy)
=> X= (X* Y[ny]- Y* X[nx])/ Y[ny], which can be computed in 3 steps:
init : x_hi= X[nx], y_hi= Y[ny]
step 1: X= X* y_hi (X is full vector X[0..nx])
step 2: X[k..]= X[k..] - Y* x_hi (k= nx-ny>= 0, k is the difference
of digits)
step 3: Lax= Lax* y_hi
end .
For A and C computing, 2 additional steps are required. Let nx>= ny:
step 1: A= A* y_hi (A is full vector A from 0 to na)
step 2: A[k..]= A[k..]- C* x_hi (k=nx-ny, after this na becomes: na=
maxim(na, nc+k))
end .

Inside one loop of the algorithm, the number of mul's required is
"3+nx+ny+na+nc"

C++ defines:
- "qint" is the basic sub-field digit.
- NUM is a vector of qint's.
- zero(A, N) makes digits A[0..N-1] = 0
- let(X, Y, N) makes X[0..N-1]= Y[0..N-1]
- -1<= high_digit(X,N)< N is the math polynomial degree of X (=
number'of'digits- 1)
- q_swap(lax, lcy) swaps contents of 2 qint's
- q_mul(X, Y) makes qint: X= X* Y
- q_div(X, Y) returns X * (Y^-1)
- LinearMul(X, W, N) makes: X[0..N-1]*= W (N q_mul's)
- msub(X, Y, W, N) makes: X[0..N-1]-= Y[0..N-1]* W (N q_mul's and N
q_sub's)


The C++ implementation is:
//---------------------------------------------------------------------------
void invert(qint _X[], qint P[], int N)
{ //operation: X= 1/ X % (P + x^N) ; JUST 1 (ONE) FIELD
INVERT...
//rules: A*_X+ B*_Y= X ; firstly: A= D= 1 && B= C= 0
// C*_X+ D*_Y= Y ; X= _X; Y= P
// => (A-CW)*_X+ (B-DW)*_Y= X- YW; when W= hi(X)/
hi(Y)
// Lax = low for A and C; Lcy= low value for C and Y
// W= hi_X/ hi_Y; X= (X* hi_Y - Y* hi_X) / (Lax* hi_Y)
// => using 1. (A, X, Lax)* hi_Y; 2. (A, X)-= (C, Y)*
hi_X
// time: approx. 2(N- 1)* N+ 2N+ 2N* (N+ 1)= 4N^2+ O(N)
mul's
// estimative muls= 4N(N+1) [worst and medium case and
close in practice]
NUM _A, _C, _Y;
qint *A= _A, *C= _C,
*X= _X, *Y= _Y,
Lax, Lcy; //= denominator of (A and X) ; (C and
Y)
Lax= Lcy= 1;
zero(A, N); A[0]= 1;
zero(C, N+ 1);
let(X, X, N);
let(Y, P, N); Y[N]= 1;

int NX= high_digit(X, N),
NY= N, //high_digit(hY, N)
NA= 0,
NC= -1;
int muls= 0;
do {
if (NX< NY) {
qint *tmp1;
tmp1= X; X= Y; Y= tmp1; //swap(X,Y)
tmp1= A; A= C; C= tmp1; //swap(A,C)
q_swap(Lax, Lcy); //swap(Lax, Lcy)
[denominator]
int tmp2;
tmp2= NX; NX= NY; NY= tmp2; //swap(NX, NY)
tmp2= NA; NA= NC; NC= tmp2; //swap(NA, NC)
}//NX>= NY now... (lengths)
if (NY< 0)
break;
qint &hW= X[NX], //not modified, see reduction
below
&lW= Y[NY];
q_mul(Lax, lW);
int K= NX- NY;
//below X= X- [YW << (nx-ny)]
LinearMul(X, lW, NX ); msub(X+ K, Y, hW,
NY ); //-2 mul
muls+= 1+ NX+ NY;
while (--NX>= 0 && X[NX]== 0); //decrease NX
if (NX>= 0) {
LinearMul(A, lW, NA+ 1); msub(A+ K, C, hW, NC+
1);
muls+= 2+ NA+ NC;
if (NC+ K> NA)
NA= NC+ K; //NA= maxim(NA, NC+
K);
}
} while (1);
//result must be (A/Lax) / (x/Lax)= A / x
qint low= q_div(1, X[NX]); //mul-monic denominator (single
sub-f invert)
muls+= NA+ 1;
let(_X, A, N);
LinearMul(_X, low, NA+ 1); //result= A/ X[HX]= (mul-
monic)
//muls is the total number of multiplies required
}
//---------------------------------------------------------------------------

The test "if (NX>= 0)" saves 2+NA+NC mul's because we are sure at the
next step there is a swap and we need only A as result.
The main form of the function with "int NY= N" expects fictive P[N]=
1 (so P is monic and 1 degree higher than average to-invert number).
Without the inner test "if (NX>= 0)", the maximum number of mul's is
reached when "high_digit(X, N)= N -1" and is: muls<= 4N(N+1). When
test is made, muls<= (2N+1/2)^2 -5/4.
Whith this said, we can deduce that I<= 4M in GF(p^n) because I=
4N^2+2N-1 and M= N^2, and
so: I/M= 4+ 2/N- 1/N^2= 4+ O(1/N)+ O(1/N^2), and with this we conclude
the main theorem.
As a practical fact, this approach is faster than "Itoh-Tsujii
Inversion" because of the number of extension field multiplications
involved in the latter. And this approach can be successfuly adapted
to composite fields GF((2^m)^n) where the basic "qint" is the base-
reduced sub-field GF(2^m). This vector-common-denominator Euclid-
inversion method works for arbitrary modulo GF(p^n) monic polynomials.
General polynomials can be reduced to monic-ones, keeping math modular
properties.



Thanks,
Robert, B.

.



Relevant Pages