152q22 + 3472q21 + 38791 q20 + 293021q19 + 1370892q18 + 4067059q17 + 7964012q16 + 11159003q15 + 11808808q14 + 9859915q13 + 6778956q12 + 3964369q11 + 2015441q10 + 906567q9 + 363611q8 + 129820q7 + 41239q6 + 11426q5 + 2677q4 + 492q3 + 61q2 + 3q
value at 1: 60779787
That's the headline; what's the story? Quite a few of you have spent lots of your professional lives trying to understand representations of reductive groups, so I won't say too much about why that's interesting. A little of the background for this particular story is that it's been known for more than twenty years that the unitary dual of a real reductive group can be found by a finite calculation.
Twenty years ago, actually carrying out such computations for any interesting groups looked completely impossible. But in twenty years we've learned a little, and computers have gotten a lot bigger and faster. In 2002, Jeff Adams had the idea of getting computers really to make such calculations. Of course as mathematicians we want completely general theorems, and it's by no means clear that there is a finite calculation to find the unitary duals of all reductive groups at once. But the work of Dan Barbasch makes it possible to hope that one can treat classical groups with pencil and paper. The exceptional groups are finite in number, so treating them _is_ a finite calculation. That became the standard for measuring the computer problem: is it possible to treat E8?
It was clear that the first problem was to write a program that could work with the Cartan subgroups, maximal compact subgroups, and Weyl groups of real reductive groups. Jeff recruited Fokko du Cloux to do this, and he began to work in 2003. By 2004 his software could handle this structure theory (better than most of us, at least).
The next step was less obvious, but Fokko and Jeff settled on computing Kazhdan-Lusztig polynomials for real groups. Fokko had written the best software in the world to do this for Coxeter groups; the algorithms for real groups are more or less similar in structure (although MUCH more complicated in detail). Getting these polynomials would provide formulas for irreducible characters, and it is a critical step in several computational approaches to classifying unitary representations.
So late in 2004, Fokko began to add to his software code to implement the Kazhdan-Lusztig algorithm for real groups. Since the papers in which this algorithm is written down are (a) nearly impenetrable, and (b) full of small mistakes, and (c) written with no consideration for computational practice, it was clear that this job was a matter of five or six years of work. And that's how it turned out: it took Fokko almost a full year to finish. In November of 2005, he did. Very quickly he and Jeff used the software to compute KL polynomials (and so character tables) for all of the real forms of F4, E6, and E7, and for the non-split form of E8. That left the split form of E8.
So how big a computation is this character table for split E8? Fokko's software immediately told us exactly how many different representations we were talking about (453,060); that's also the number of terms that could appear in the character formulas. So the character table is a square matrix of size 453060. The number of entries is therefore about 200 billion, or 2 x 1011.
Fortunately the matrix is upper triangular, so we only need 100 billion entries.
Unfortunately the algorithm proceeds not by calculating entries directly but by calculating polynomials whose values at 1 are the entries. The polynomials have average degree about 11, so the total number of coefficients is about 1.2 trillion.
Fortunately many of the coefficients are easily seen to be equal. One needs to store only one representative polynomial for each family of "obviously equal" entries. Fokko's software calculated how many such families there were: a bit more than 6 billion. So we were back down to storing about 70 billion coefficients.
Unfortunately we had no clear idea how big the (non-negative integer) coefficients of these polynomials could be. In the case of split D5, the largest is 5. For split E6, the largest is 27. In the case of split E7, it's 3583. This trend was not encouraging; it seemed clear that the coefficients would exceed 65535, so that they could not be stored in two bytes of computer memory. The next practical size is four bytes.
Fortunately Fokko wrote the software to compute with four-byte integers and to test carefully for numeric overflow throughout the computation. (If overflow happened, the plan was to switch to eight-byte integers and try again.)
Unfortunately, 70 billion 4-byte integers require 280G of RAM. (The nature of the algorithm, which constantly looks at widely distributed results from earlier in the compuation, made storing results on disk impractical.) That's a lot of RAM even for rich scientists.
Fortunately, some of the six billion polynomials are zero, and some of them are equal to others "by chance" (that is, not for reasons that are easy to understand). So Fokko wrote the software to store only one copy of each distinct polynomial. He hoped that the number of distinct polynomials might be a few hundred million: so perhaps 6 billion coefficients, requiring 25G of RAM. The indexes keeping track of all these polynomials would occupy another 100G or so: perhaps 120G altogether.
Unfortunately, we didn't have a computer even with 100G of RAM. We hoped perhaps to buy one using the NSF FRG grant which Jeff had finally pushed through; but that was a year or more away.
Fortunately computer science is often computer art, and even Fokko's work could be improved. Fokko worked on that constantly over the last year, and Marc van Leeuwen began to make serious contributions as well. The two of them rearranged the indexes and the code in some extraordinarily clever ways; Marc's final changes in the last month reduced the sizes of the indexes to about 35G.
Unfortunately, tests running partway through the E8 calculation (done mostly by Birne Binegar, who got a supercomputer in Stillwater, but managed more or less to set fire to it using the atlas software) revealed that Fokko's first hopes about the number of distinct polynomials were too optimistic. Even an optimistic reading of Birne's tests suggested more like 800 million distinct polynomials, meaning 40G or more to hold the coefficients.
Fortunately Dan Barbasch is now chair of the math department at Cornell, and in September he managed to gain access to a machine with 128G of RAM and 128G of swap. He used it to run the E8 computation to the end. The fact that Fokko's tests weren't set off showed that all the coefficients really fit in four bytes.
Unfortunately he had no reasonable way to write the results to disk, so they disappeared. Also unfortunately, Dan didn't have the improvements that Fokko and Marc had made to the code: his computation used 224G of memory (half of it swap space) and took twelve days to finish.
Fortunately, by November, Fokko and Marc had trimmed memory use in the code a great deal. Through the persistence of Birne Binegar, and the generosity of William Stein, the atlas group got access to William Stein's computer sage at the University of Washington (with 64G of RAM and 75G of swap. On this machine we could finally do some large fraction of the E8 character table computation. By late November, we believed that we could finish E8 with about 150G of RAM.
Unfortunately, 150G is just a little more than sage has, even with swap. Birne and Jeff suggested that we start looking seriously at applying for an NSF grant to buy a machine with perhaps 256G of memory: something on the order of $150,000 or more. I asked a number of people whether they might be able to make use of such a computer.
Fortunately Noam Elkies had a fascinating reply:
A computation that actually uses N bytes of storage must take time
_at least_ N. But once N gets as large as 256GB it might not be
feasible to spend much _more_ than N time: N*log(N) or N*log2(N) is
certainly OK [e.g. fast integer or polynomial arithmetic, and other
applications of the Fast Fourier Transform; also solving f(x)=f(x')
by sorting N values of f(x) and finding a consecutive match]; maybe
also N(3/2) [e.g. linear algebra with dense matrices of size
sqrt(N), or computing the first N coefficients of modular forms such
as Delta without fast arithmetic]; probably not N2. So there might
not be all that much room for making use of such a huge machine,
though even O(N) allows for some nontrivial computations [as in the
example of exhaustive analysis of endgames in chess and similar
games, or -- less frivolously but using the same basic idea -- of
the distance distribution in a Cayley graph, as in
and in addition
Is it clear that the E8 computation cannot fit into "only" 128 or
64GB -- which would still be significantly larger than neron?
I explained about the polynomial storage:
We know that the coefficients can exceed 216 (by computation), and
we hope that they don't exceed 232. Each polynomial is stored as a
vector of 32 bit integers, of size exactly equal to its degree plus
one. Assuming an average degree of 19, that's 80 bytes per
polynomial.
On November 30, Noam replied
Well 232 is less than the product of the ten odd primes less than
25, so unless the computation requires divisions by numbers other
than 2 you could reduce this from 80 bytes to something like
(5/32)*80 = 12.5 bytes, at the cost of running the computation 9
times (counting once for mod 3*5)
I started to write a reply explaining why modular reduction of the KL
algorithm doesn't work, but I had to throw it away. Fokko's code was
beautifully compartmentalized, and Marc is amazing, so by December 4
we were getting character table entries mod n for any n up to 256.
On December 6 Marc's modifications of Fokko's code on sage computed
about 3/4 of the entries in the character table mod 251, finding
almost 700 million distinct polynomials and growing to 108G in size.
Since 90% of that was indexes, working on their structure became
worthwhile. Marc redesigned the indexes rather completely (replacing
8-byte pointers by four-byte counters several billion times, for
example). He also added code to output the answers to compact hard
disk files (easily re-readable by the atlas or other software, for
subsequent processing).
Meanwhile Birne ran various versions of the code on sage. Among other
things he established for certain that there were more than one billion
distinct polynomials.
Early on December 19, Marc's modified code began a computation for E8 mod
251, with the possibility of actually writing the result usefully at
the end. Essentially it worked, finishing the computation in about 16
hours:
Total elapsed time = 62575s. Finished at l = 64, y = 453059
d_store.size() = 1181642979, prim_size = 3393819659
VmData: 64435824 kB
(The number d_store is the number of distinct KL polynomials, and y is
the row number in the character table. The number prim_size is the
number of not-obviously-equal entries in the character table that
turned out to be non-zero.)
Unfortunately, writing the answer to disk took two days. Marc and I
went over Marc's output code to see why. Fortunately, we figured it
out, and Marc improved the speed. Unfortunately, we found at the same
time a bug (he wrote size() in one line where he meant capacity()).
The result was that, even though the polynomials were all correctly
written to disk, the index files explaining which polynomial was
stored where were missing something like half their contents.
Marc fixed things instantly, and on Friday December 22 we started a
calculation mod 256 on sage. This did 452,174 out of 453,060 rows of
the character table in 14 hours, then sage crashed. We tried again
starting late Friday afternoon, and actually finished with good
output:
Total elapsed time = 40229s. Finished at l = 64, y = 453059
d_store.size() = 1181642979, prim_size = 3393819896
VmData: 54995416 kB
Just eleven hours this time, (joys of multithreading!), and with Marc's
adjustments the disk output was fast.
On Saturday December 23 we started a run mod 255. This time sage
crashed a third of the way through the computation. There was no one
physically present to reboot it (apparently some kind of local holiday
in Seattle) so we retired for a bit (still having mod 256 as our only
good output files).
Meanwhile Marc was writing code to combine KL polynomials from several
moduli m1, m2,... into KL polynomials modulo lcm(m1, m2, ...). We
tested that on the first hundred million entries of the E8 character
table modulo 253, 255, and 256, and it worked fine. (The biggest
coefficient appearing to that point was 175836.) When sage came back
up on December 26, we got a character table mod 255 written to disk.
At 1 a.m. on the morning of December 27, we started a run mod 253.
About halfway through, sage crashed.
The experts I consulted assured me that the atlas software couldn't
possibly be crashing sage. My own opinions about the causes of the
crashes wavered between black helicopters from the NSA and Sasquatch.
We decided to keep our hands off sage until we were older and wiser:
say for a year.
On Wednesday January 3 we were all one year older, which made perhaps
thirty years of additional wisdom counting all the atlas people. This
factor of thirty seemed like a suitable margin of safety, so that
afternoon we started another computation mod 253. This took twelve
hours. By 4 a.m. Thursday morning January 4th we had output for three
moduli (253, 255, and 256) with lcm 16515840: bigger (we had some
hope) than all the coefficients. Marc took unfair advantage of the
time difference in Europe to start running his Chinese remainder
theorem utility on the results. Arranging the indexes takes a long
time, but that finished in nine hours. There was another speed
problem: the first version had a pretty little counter to display the
number of the polynomial being written to disk, to allow for
monitoring the job. Since more than a billion polynomials are
written, this meant writing several billion characters to the
display. Turns out that takes a long time. So we started over on
Friday morning January 5, with a counter that updates only every 4096
polynomials. This went nicely until sage crashed.
William Stein identified sage's problem as a flaky hard drive. (I
personally can never remember which of French pastry and hard drives
is supposed to be flaky and which is supposed to be sticky.) So he
replaced it with some French pastry, on which our 100G of mod m output
files were written in dark chocolate icing. (I'm simplifying the
technical details a little.) The fact that he _had_ such a pastry
puts him high on my long list of heroes for this saga. He did this
more or less instantly, and we restarted the Chinese Remainder
calculation late Friday afternoon.
Early Saturday morning, the file of polynomial coefficients mod
16515840 had grown to be about 7 billion bytes larger than it was
supposed to be. Since it was a day with a "y" in it, I figured Marc
ought to be working, so I sent him off to find out what was up (or
down, or whatever). (He was visiting his family in the Netherlands
for the weekend, further evidence that he had nothing to do.)
The bug was unbelievably subtle. Since the number of polynomials is
about a billion, Marc's Chinese Remainder code represented the index
of a polynomial by a four-byte integer (perfectly good up to
4,294,967,295). Unfortunately, this integer at some point needs to be
multiplied by 5 (the number of bytes in the index entry for one
polynomial); the result is put into an 8-byte integer, where it fits
nicely. But when the polynomial number exceeds 858,993,459, and the
multiplication is done in four bytes, it overflows. The result is
that everything works fine in any reasonable test (like the one we ran
with a hundred million polynomials). To complicate Marc's task
further, the bug was not present in his most recent version of the
code; what I was running on sage was a couple of days older.
So it took Marc almost twenty hours to find and fix this bug (assuming
that he neither slept nor ate nor spoke to his family.)
Marc's analysis showed that the bug came into play only around
polynomial number 858 million; so all the coefficients calculated
before that should be correct. The largest of these was 11,808,808,
at polynomial number 818553156. This convinced me that we would
almost certainly find larger coefficients among the 350 million
polynomials that were not correctly evaluated the first time; and that
very likely we would need a larger modulus than 16515840 to get them all.
So at 6 a.m. on Sunday the 7th I was able to start Marc's current (and
correct) Chinese remainder theorem utility, this time adding modulus
251. Of course nothing went wrong (because what could go wrong?), and
the last polynomial was written just before 9 a.m. Eastern time on
Monday.
I apologize for making this so long, but it has been like nothing else
I've ever done. I've left out much more than I put in. Of course the
central omission was Fokko's story. Almost all of you know that he
was diagnosed with ALS just after finishing the Kazhdan-Lusztig
computation software in November of 2005. By February 2006 he had
little use of his hands, and by May he was entirely paralyzed below
his neck. But he continued to share his fantastic skills and insights
into the mathematics and the programming, with Jeff and with Marc and
with me, until his death November 10.
This project has introduced me to great mathematicians I barely knew,
like Marc van Leeuwen and John Stembridge, and it has shown entirely
new depths in people I knew well, like Jeff Adams and Dan Barbasch.
So it's a very high bar...but what has been best of all,
mathematically and personally, has been spending time with Fokko.
It's _still_ the best: every minute I spend on this stuff is a chance
to think about him, and that's always good for a smile.
So congratulations and thank you to Fokko, who did this all by
himself.
Congratulations and thank you to everyone who helped him - Marc did
more of that than anybody, but there were a lot of indispensable
people.
I haven't had a _boss_ since I worked in the lumberyard in the summer
of 1972, until Jeff. Congratulations and thank you, boss!
So we've got to get going on this code for branching to K, Alfred...
David Vogan
January 8, 2007