The Mathematician's Algorithm and Automated Theory Generation
- From: markwh04@xxxxxxxxx
- Date: 27 Oct 2006 13:40:09 -0700
Expanded from the original, 2002 August 26, sci.math.research
"The Mathematician's Algorithm and Automated Theory Generation"
http://groups.google.com/group/sci.math.research/msg/c19f40b0c299bc3a?dmode=source&hl=en
0. INTRODUCTION
The Mathematician's Algorithm uses a construct I call a Boolean Sieve
to closely capture the process a mathematician goes through in
developing a mathematical theory. In it, you see the concepts of
conjectures, counter-examples and theorems emerge quite naturally.
The term "algorithm" is used somewhat loosely. The steps described
herein need not all be computable. Nonetheless, the process described
provides an enveloping framework for computational approximations to
the 'algorithm' and also provides the basis for processes that only
carry out partial automation (i.e., computer-aided theory generation).
Ultimately, it also provides the foundation for fully automated theory
generation.
It also provides a bookkeeping technique for maintaining and updating
conjectures when carrying out the process by hand.
With the aid of the process, for instance, numerous characterization
results have been arrived at, including the following:
Abelian Groups (by the difference): a-(a-b) = b; (a-c)-(b-c) = a-b;
0-0=0
Non-Abelian Groups (by the quotient): (a/c)/(b/c) = a/b; a/(a/a) = a;
(a/a)/(b/b) = b/b; e/e = e
Non-Abelian Groups II: a/e = a; a/a = e; (a/c)/(b/c) = a/b
Torsors (General Affine Spaces) (by a ternary operation): a/b.b = a;
a/a.b = b; a/b.(c/d.e) = (a/b.c)/d.e
Affine Spaces (field F != GF(2), GF(3)) (by the barycentric operation):
[a,0,b] = a; [a,1,b] = b; [a,rt(1-t),[b,s,c]] =
[[a,rt(1-s),b],t,[a,rs(1-t),c]]
as well as numerous other structures, including Principal Bundles,
Affine Bundles, Projective Spaces, Kleene Algebras, Ortholattices, etc.
A toy example will be provided below to serve as demoonstration of the
general technique.
0.1 REFERENCES & RELATED DEVELOPMENTS
(Vector space : Affine space) :: (Group :: ???) & Principal Bundles
2006 May 18
sci.math.research
http://groups.google.com/group/sci.math.research/msg/737b3c114171d36c?dmode=source&hl=en
Axiomatic characterization for torsors.
Re: categorified gauge theory
2005 January 6
sci.physics.research
http://groups.google.com/group/sci.physics.research/msg/5e61e938eb27acc1?dmode=source&hl=en
Illustrates the application of the quotient operation in the definition
of gauge groupoids and principal bundles.
The Loop & Path Representation of Gauge & String Theories
2006 May 6
sci.physics.research
http://groups.google.com/group/sci.physics.research/msg/3643919f895c0213?dmode=source&hl=en
Further description of the "quotient" approach to gauge theory
Defining Groups By Their Inverse Operation
1997 August 22
sci.math.research
http://groups.google.com/group/sci.math.research/msg/45c6402226c498e5?dmode=source&hl=en
Axiomatizations of algebras in terms of - and /.
1996 August 8
sci.logic
http://groups.google.com/group/sci.logic/msg/846e76a513f731a9?dmode=source&hl=en
Mark's Elements I
1996 July 22
sci.math.research
http://groups.google.com/group/sci.math.research/msg/d50c8741e5f791ef?dmode=source&hl=en
Euclidean spaces as metric space + (???)
Metric space completion by constructive (epsilon-delta free) arguments
using "Zeno sequences"
Characterization of inner product spaces and affine spaces
Mark's Elements II
1996 July 24
sci.math.research
http://groups.google.com/group/sci.math.research/msg/e54a9a45eb1ed556?dmode=source&hl=en
Banach Space = Metric Space + (___)
Mark's Elements III
1996 July 25
sci.math.research
http://groups.google.com/group/sci.math.research/msg/a751c566a9103891?dmode=source&hl=en
Axiomatization of Minkowski space geometry
1. GENERAL DESCRIPTION
Consider the case where we have a set, N, of statements, a set, M, of
models over the underlying term language, and a set, T, of theorems.
Each theorem takes the form of a valid sequent formed out of statements
from M.
Let B(N) and L(N) = 2^{2^N} respectively denote the semantic lattice
and free lattice generated by N. The fundamental assertion of Proof
Theory is that the natural [Boolean lattice] homomorphism
phi: L(N) -> B(N)
will factor through the quotient lattice L(N)/T
phi = phi/T . T^,
where
T^: L(N) -> L(N)/T
is the canonical projection and phi/T: L(N)/T -> B(N) the reduced
homomorphism.
[Note: it's an open issue whether the analogous constructions described
here and below can be extended to Heyting Lattices or Quantum Logic].
For any n in L(N) and model m, let
m^(n) = 1 if m |= n; 0 if m |= not n.
This defines a boolean lattice homomorphism m^: L(N) -> 2^{m} = 2. This
definition of a lattice homomorphism can be extended to sets, M, of
models, defining M^: L(N) -> 2^M by
M^(g) = { m: m^(g) = 1, m in M }.
Let L(M;N) = M^(L(N)).
The fundamental assertion of Model Theory is that the homomorphism M^:
L(N) -> L(M;N) factors through the semantic lattice B(N)
M^ = M* . phi,
for some homomorphism
M*: B(N) -> L(M;N).
2. THE BOOLEAN SIEVE
The net result is that we have a net of homomorphisms
L(N) -- T^ --> L(N)/T -- phi/T --> B(N) -- M' --> L(M;N)
that successively approximate B(N) between tighter and tighter bounds
as the sets M and T grow. These bounds converge. As T -> Theory(N),
the theory generated by N, then L(N)/T -> B(N); and as M approaches
the class of all models over the underlying term language, then L(M;N)
-> B(N). In general L(N)/T will be monotonically "increasing" in
T, and L(M;N) will be monotonically "decreasing" in M.
The gap between the bounds is represented by the homomorphism
phi_M,T = M' . (phi/T): L(N)/T -> L(M;N)
which is, in turn, characterized by the filter
phi_{M,T}^ {-1}(1^M) subset of L(N)/T.
This filter can be represented as a filter over L(N) modulo T
phi_{M,T}^{-1}(1^M) = M^^{-1}(1^M)/T,
and, in turn, is a prime filter generated by a statement S_{M,T}
M^^{-1}(1^M)/T = <S_{M,T}> = { n in L(N): n >= S_{M,T}}
where >= is the lattice ordering. So, writing S_{M,T} in the form
S_{M,T} = c1 ^ c2 ^ ... ^ cq = ^(C)
where (n,n') |-> n ^ n' is the lattice join, we have a set C =
{c1,c2,...,cq} of CONJECTURES that mediates between the gap.
Together, this comprises what I call a BOOLEAN SIEVE.
3. INCREMENTAL UPDATES
Since L(N)/T and L(M;N) are monotonic in T and M respectively, then C
can be efficiently maintained by providing a suitable update formula
corresponding to each increment in T or M.
The statements in C are already taken relative to T,
T |- c (c in C).
So, in actuality, we're only seeking an update method for increments
in M.
At the same time, the contents of C direct the update process, itself.
For each c in C, we attempt to either find a proof of the inference
T |- c,
itself, or a model m which provides a counter-example to c
m |= not c.
In the former case, we can safely remove c from the set C and add it to
T. It then represents a conjecture that has been proven as a theorem.
In the latter case, we add the model m to M and then the task is to
update C, itself.
The general properties relating the net phi_M = M^^{-1}(1^M) of filters
are
phi_M(N) = <S_{M,N}> = { n in L(N): n >= S_{M,N} }
S_{M^M', N} = S_{M,N} v S_{M',N}
S_{{m},N} = S_{m,N} = ^ (n) ^ ^(not n)
where the first conjunct is taken over all n in N such that m |= n, the
second over all n in N such that m |= not n. The operation v is the
lattice join.
Thus, if the generating statement S_{M,N} = c1 ^ ... ^ cq = ^C, where C
= {c1,...,cq}, then the updated statement will be
s_{Mv{m},N} = (c1 ^ ... ^ cq) v m^(N).
This reduces to the expression
^(c) ^ ^(c v n) ^ ^(n -> c),
where the conjuncts are taken over c in C, n in N, the first taken
where m |= c, the second where m |= n, not c, the third where m |= not
n, not c.
So, the set C is modified as follows
(1) We take each c in C for which m |= not c and remove it. It is a
conjecture refuted by the model m, which then provides us with a
counter-example to c.
(2) We remove c from C, and for each n in N, we add c v n if m |= n,
or add n -> c if m |= not n.
(3) We reduce C, carrying out basic simplifications and removing all
the redundant statements.
When c is a sequent c = ^G -> vH, then the updated formulas would be
(in Classical logic) respectively
(n -> c) = ^({n} union G) -> v H,
(c v n) = ^G -> v(H union {n}).
The updating preserves the requirement that the filter corresponding to
the homomorphism phi_{M,T}: L(N)/T -> L(M;N) be the subset <^C>/T of
L(N)/T.
4. INITIALIZATIONS
By assumption, our exploration starts out with a concrete application.
So, that means the initial set N of statements comprise all those
statements that are true in the application at hand. If the model
corresponding to this application is m_0, then the initial values for
C,T,M will be
C = N, T = {}, M = {m0}.
Since m0 |= n for all n in N, then
S_{{m0},N} = ^N
So, the filter corresponding to the homomorphism
L(N) = L(N)/{} -> L({m0}; N}
is just
<^N> = <^N>/{}.
Therefore, with this initialization and the update previously
described, the filter for the homomorphism phi_{M,T}: L(N)/T -> L(M;
N) will always be <^C>/T.
5. THE ALGORITHM
So, in summary, the algorithm takes on the following form
GIVEN
A set N of statements and a model m0 of the underlying term language
such that
m0 |= n, for all n in N.
INITIALIZATION
Set C <- N, T <- {}, M <- {m0}.
UPDATING
Pick any c0 in C for which either a proof of T |- c0 can be found or a
counter-example m |= not c0.
(a) If T |- c0, then C <- C - {c0}; T <- T union {c0}.
(b) If m |= not c0, then M <- M union {m} and for each c in C such that
m |= not c
(b1) C <- C - {c}
(b2) For all n in N such that m |= n then C <- C union {c v n}.
(b3) For all n in N such that m |= not n then C <- C union {n -> c}.
Reduce C as far as possible, using the theorems in T, the repeat the
update process.
The set N is considered SOLVED if the process can be carried through to
completion, which occurs when C = {}. The solution will comprise a set,
T, of statements that completely characterize the logical
interrelations that exist between the statements in the set N.
6. THEORY GENERATION
The set M is ancillary to the algorithm, itself and doesn't need to
be explicitly represented in any form, but is useful to retain as it
provides a catalogue of all the relevant counter-examples.
If C can be reduced to {}, then the set T will provide a complete
characterization of all the logical interrelations that exist amongst
the original set N of statements. From this one can
(a) read off an AXIOMATIZATION by taking any minimal subset N0 of N
from which N can be generated via the sequents in T,
(b) read off all the significant WATERSHEDS - subsets of N which are
generated by a large number of different (and usually much smaller
still) subsets.
(c) classify all the GENERALIZATIONS - that is, all the subsets of N
which are closed under T.
So, the net result is that one not only arrives at a theory that
formalizes the statements of interest from the original application but
(even more significantly) a hierarchical classification of the
significant generalizations to this theory.
7. EXERCISES AND APPLICATIONS
The method described is most suitable for application to algebraic
theories. Two useful exercises one may attempt are the following.
7.0 CLASSIFICATION OF "THEORY SPACE"
The problem in its fullest glory is the completely characterize the
entirety of "theory space" -- i.e., determine B(N), where N is the set
of all statements expressed in a given formal language.
Generally, the problem admits no recursively computeable solution, but
siginificant partial solutions may exist.
A special case of interest restricts its focus to algebraic theories,
given by equational axioms. Here, too, one can restrict focus further
without significant compromise by dealing with only those theories that
admit a single binary operation. The most significant example of the
latter variety is the algebra underlying combinatory logic, which can
be captured by a single symbol and identity. Other examples (as
illustrated above) include groups and (by extension) semigroups,
monoids, boolean lattices, etc.
7.0.1 RESTRICTED THEORY SPACE
Here, identities are restricted to those of a given degree, e.g. cubic.
The possible combinations of terms are
a, aa, ab, a(aa), a(ab), a(ba), a(bb), a(bc), (aa)a, (aa)b, (ab)a,
(ab)b, (ab)c.
The non-trivial statements are the universally quantified equations
that can be constructed from these terms. Up to order 2, these include
a = aa; a = ab; a = ba; a = bb; aa = ab; aa = ba; aa = bb; aa = bc; ab
= ba; ab = bc.
A trivial model m0 is the trivial system consisting of {0} with the
operation 00 = 0.
7.0.2 GROUP THEORY
A similar construct can be carried out, with the intended
interpretation that ab denote a + -b in an Abelian group. Here, further
efficiency can be obtained in delimiting the possible range of
identities. Only terms that match in the free Abelian group upon will
be compatible with one another.
Up to order 3, the classification is into the subsets
Class a: a, a(aa), b(ba), a(bb)
Class 0: aa
Class a-b: ab
Class 2a-b: a(ba)
Class a-b+c: a(bc)
Class -a: (aa)a, (bb)a, (ba)b
Class a-2b: (ab)b
Class a-b-c: (ab)c
Potential identities can be posed between the terms in a given class,
allowing for permutations. This includes the following non-trivial
identities
a = a(aa); a = b(ba); a = a(bb);
a(aa) = b(ba); a(aa) = a(bb); b(ba) = c(ca); b(ba) = a(bb); b(ba) =
a(cc); a(bb) = a(cc).
aa = bb
(aa)a = (bb)a, (aa)a = (ba)b, (bb)a = (cc)a, (bb)a = (ba)b, (bb)a =
(ca)c, (ba)b = (ca)c
(ab)c = (ca)b
The identities a(bb) = a(cc), b(ba) = c(ca), (bb)a = (cc)a, (ba)b =
(ca)c are equivalent respectively to a(aa) = a(bb), a(aa) = b(ba),
(aa)a = (bb)a, (aa)a = (ba)b and so could be removed from the list.
In this exercise, not only will the axiomatizations of group theory be
arrived at, but also the significant generalizations of the algebraic
theory -- the "watersheds".
7.1 DERIVATION OF LATTICE THEORY
Starting out with the model
({0,1}, ^, v, not, ->, <->)
with the connectives defined by the standard truth tables, write down a
large set of identities (e.g., the set of all the equational identities
consisting of up to 4 variables on the left and 4 on the right). From
this carry out the reduction process to find the interrelations and
from this the significant watersheds.
to Heyting lattices, distributive lattices, modular lattices, etc.From this, one should be able to read off the watersheds corresponding
7.2 DERIVATION OF ABSTRACT ALGEBRA
Similar exercise, starting out with the model
Q = ({p/q: p in Z, q in Z - {0}}, +, -, x, /).
semigroups, monoids, semirings, etc. should all emerge naturally asFrom this, the structures corresponding to group theory, ring theory,
watersheds.
Using this method on the starting structure
(G, (x,y) |-> x-y); (G, (x,y) |-> x+y, 0, x |-> -x) = Abelian group
for instance, one should find axiomatizations similar to those
described above.
7.3 CASE STUDY
Here, we will consider an instance of 7.0.2 involving a limited number
of statements. This process will only be carried out for the first few
steps -- and will be left as an exercise for you to complete. (I have
carried this out to conclusion in the past; things start getting
interesting past the 5th step or so, just where I'm cutting it short
below).
The demonstration will also serve to illustrate some potential
optimizations that can be taken in an implementation of the algorithm.
To this end, consider the following set of statements, each universally
quantified over all its variables
A) aa = bb
B) a(ab) = b
C) a(bc) = a(ba)
D) a(aa) = a
E) a(bb) = a
F) a(bb) = b(ba)
G) (aa)(aa) = aa
H) (aa)(bb) = aa
I) (aa)(bb) = bb
J) (aa)(ab) = ba
K) (aa)(ba) = ab
L) (aa)(bc) = cb
M) (ac)(bc) = ab
N) (ab)(ac) = cb
O) ab = (a(bc))c
P) ab = (c(ca))b
Q) ab = (c(ba))c
R) a(b(bc)) = ac
S) a(b(ca) = cb
T) a(a(bc)) = bc
U) a(b(cd)) = (a(dc))b
V) a(b(cd)) = (a(bc))d
W) a(b(cd)) = (c(da))b
X) a(b(cd)) = (c(ba))d
Together, statements A, ..., X comprise the set N. These are all
statements that are true in the theory of Abelian groups, where ab = a
+ (-b). There are enough statements in the list to completely
characterize the group-theoretic properties of the additive operation
(a,b) |-> a+b , as well as the inverse a |-> -a and identity 0.
Therefore, the set N, taken in its entirety, provides an equivalent
definition of an Abelian group.
A theoretical characterization of these statements consists of a set,
T, of theorems from which every logical relation involving these
statements can be derived. An axiomatization is a minimal subset N0 of
N such that T, N0 |- N.
The first model is
m0: ab = a + (-b) in Abelian group theory
All of A, ..., X are true in m0. Therefore, the corresponding
statement vector is
m0^ = ABCDEFGHIJKLMNOPQRSTUVWX,
and the initial set of conjectures is read off from the conjunction
*C = ^{A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X}.
(I'll use *C to denote the conjecture list, to distinguish this from
the label for statement C).
First we try A, the statement that aa = bb. This is easily refuted
under the interpretation provided by model
m1: ab = a, for a, b in {0,1}.
The following statements are true in m1
D, E, G, H, K, M, O, R, U, V
and the following statements are false in m1
A, B, C, F, I, J, L, N, P, Q, S, T, W, X
The corresponding vector for m1 is
m1^ = A' B' C' D E F' G H I' J' K L' M N' O P' Q' R S' T' U V W' X'
where ()' is being used here to denote negation and concatenation to
denote the lattice join.
The conjecture list is updated to
*C <- *C v m1^
which reduces to
*C = ^ {D, E, G, H, K, M, O, R, U, V, a v b}
where
a = A B C F I J L N P Q S T W X; b = A' B' C' F' I' J' L' N' P' Q'
S' T' W' X'.
When reduced to minimal form, this becomes a chain of equivalences
involving the statements in u, v. Thus
*C = ^{D, E, G, H, K, M, O, R, U, V}
^ { A->B, B->C, C->F, F->I, I->J, J->L, L->N}
^ { N->P, P->Q, Q->S, S->T, T->W, W->X, X->A }.
We'll abbreviate chains of implications using obvious notation, and
denote conjunctions by commas. So the above statement now reads
*C = D,E,G,H,K,M,O,R,U,V,A=B=C=F=I=J=L=N=P=Q=S=T=W=X.
Now, we pick clause D. This states that a(aa) = a, regardless of how
we interpret the operation (a,b) |-> ab. This is definitely false. A
simple counter-example defines our next model
m2: ab = a+b, for a, b in R^{>=0}
Under m2, the following statements are true
C, F, U, V, W, X
and the remaining statements are false. The corresponding vector is
this
m2^ = A' B'' C D' E' F G' H' I' J' K' L' M' N' O' P' Q' R' S' T' U V
W X.
Updating
*C <- *C v m2^.
entails a major reduction, which ultimately leads to a conjunction of
the following:
*C = U,V, CvD, CD -> A -> C = F = W = X,
^ A = B = I = J = L = N = P = Q = S = T,
^ A -> D = E = G = H = K = M = O = R.
Now, we pick clause U, which reads a(b(cd)) = (a(dc))b. Obviously, this
is also false under an arbitrary interpretation of (a,b) |-> ab, with
the following counterexample
m3: ab = b, for a, b in {0,1}.
The corresponding vector is
m3^ = A' B C' D E' F' G H' I J' K' L' M' N' O' P Q' R S' T U' V W'
X.
The equivalences refine further into
A=B=I=J=K=N=P=Q=S=T becomes A=J=L=N=Q=S -> B=I=P=T
D=E=G=H=K=M=O=R becomes E=H=K=M=O -> D=G=R
C=V=W=X becomes C=F=W -> X.
The statements V, CvD, CD -> A -> C, A -> E continue to be valid, but
the statements
U, X->C, B->A, D->E
are now refuted. The result of the update *C <- *C v m3^ breaks apart
the equivalences as indicated and affects only the refuted statements,
to yield
*C = F, CvD, CD -> A -> C -> U, A -> E -> U, B -> AvX, D -> E v X
^ U v D, U v B, U v X, UX -> C = F = W -> X -> B v C
^ UB -> A = J = L = N = Q = S -> B = I = P = T -> A v D,
^ UD -> E = H = K = M = O -> D = G = R -> E v B.
Next, we pick clause F, which reads a(b(cd)) = (a(bc))d. Here, a
suitable model which falsifies this statement is given by
m4: ab = a b^{-1}, for a, b in S_3 the symmetry group for
permutations on triplets.
The corresponding vector reads
m4^ = A B' C' D E F' G H I J K L M N' O' P' Q' R' S' T' U V' W' X'.
This brings about further refinements in the equivalences, as follows:
A=J=L=N=Q=S becomes N=Q=S -> A=J=L,
B=I=P=T becomes B=P=T -> I,
E=H=K=M=O becomes O -> E=H=K=M,
D=G=R becomes R -> D=G.
The further refinements of this set will left as an exercise.
After the refinements, there will still be statements in the set
included as unconditional conjectures, including C v D. They can all be
refuted by a model in which refutes all the statements A, ..., X:
m5: ab = a^b, for a, b in R^{>= 0}.
Under this model, all statements are false. Hence, the corresponding
vector is
. m5^ = A' B' C' D' E' F' G' H' I' J' K' L' M' N' O' P' Q' R' S' T'
U' V' W' X'.
All the conditionals remain unaffected, each being vacuously true under
this model. But all the unconditional statements are now refuted.
Further tests and refinements will be left as an exercise.
.
- Prev by Date: Re: Counter example for Mr. Diaby algorithm solving TSP problem in polynomial time
- Next by Date: Finding edges of a given polyhedra in 3D
- Previous by thread: pebble automata
- Next by thread: Finding edges of a given polyhedra in 3D
- Index(es):
Relevant Pages
|