CO 739: Analytic and Algorithmic Combinatorics (Winter 2023)

This webpage will not be updated during the term. All course materials will be posted on LEARN.

Course details can also be found in the course syllabus.

Organization: This course will meet Tuesday and Thursdays from 11:30 - 12:50 in MC 6029.

Assessment: Students will be evaluated on three homework assignments (each 20% of final grade) and a final project (40% of final grade). For the final project students will write a report and make a short presentation to the class about a research paper related to their interests and the course content (other options, like a historical survey or coding project, are also possible). Deadlines for the assignments and project will be set after consultation with the class at the beginning of the semester.

Prerequisites: This course may be of interest to students in a variety of areas (combinatorics, computer science, pure math, etc.) who will come in with different backgrounds. Students should be comfortable with the basics of real analysis (sequences and series) and ideally will have previous exposure to basic algebra (rings and fields) and complex analysis (analytic functions, Cauchy residue theorem). Short background lessons will be provided for students wanting to review (or needing to catch up on) some of the necessary background material. Interested students unsure of their background should email the instructor for more information.

References: Our main references are the textbooks

and the course notes

all of which are freely available online.

Examples of Course Topics

These examples can be explored interactively in the browser (after a short load time) by clicking on this link (a non-interactive version is here). Note that randomness will change the output each run.

1. Random Walks in Cones

Consider walks on the steps North, South, East, and West, which start at the origin and stay in the quadrant1 $\mathbb{N}^2$. How many such walks are there on $n$ steps? Equivalently (since there are $4^n$ unrestricted walks) what is the probability that a walk of length $n$ never leaves the first quadrant?

100 random walks of length 300, greyed out once they leave the first quadrant. Of the 100 starting walks, only one stays in the first quadrant after all 300 steps. Readers can generate their own animations using the interactive link above. This animation resets every 7 seconds. If it does not play in your browser click here.

Using tools like creative telescoping, numeric analytic continuation, and singularity analysis we will see, among other things, methods to efficiently compute the number of walks of large size and techniques to determine asymptotic behaviour (it turns out the probability that a walk of length $n$ stays in $\mathbb{N}^2$ goes to zero like $\frac{4}{\pi n}$).

2. Complexity of Integer Sequences

One of the most famous examples of an integer sequence is the Virahanka-Fibonacci sequence $f_n$ defined by the recurrence $$ f_{n+2} = f_{n+1} + f_n, \qquad f_0=f_1=1.$$

How fast can we determine the term $f_N$ for large $N$? Using the recurrence directly, one must perform $N$ additions. Taking the growing size of the sequence terms into account, computing $f_N$ in this manner uses approximately $O(N^2\log N)$ operations on fixed-size integers. Can we do better?

Yes. For instance, it can be seen from the recurrence that for all $N\in\mathbb{N}$,

$$\begin{pmatrix} f_{N+1} \\ f_N \end{pmatrix} = \begin{pmatrix}1&1\\1&0\end{pmatrix}^N \begin{pmatrix} f_1 \\ f_0 \end{pmatrix}.$$

Using binary squaring to compute the $N$th matrix power, we can obtain $f_N$ using around $O(N\log^2N)$ operations on fixed-size integers. Can we do better? (Note: Diagonalizing the matrix above gives $f_N$ as an explicit sum of two powers of algebraic numbers. This representation is great for determining asymptotic behaviour, but terrible for actually computing terms.)

Again, yes (to a lesser degree). If $R$ is the quotient ring $R = \mathbb{Q}[x]/(x^2-x-1)$ then $f_N$ equals the the coefficient of $x$ when $x^{N+1}$ is represented by a linear polynomial in $R$ (we will see why in class). Thus, we can compute $f_N$ by exponentiating polynomials in quotient rings, which is more efficient than taking matrix powers. (This approach also has complexity around $O(N\log^2N)$ but the hidden constants are lower, especially for linear recurrences with higher shifts in index)

Time (in microseconds) needed in Sage to compute $f_N$ using binary powering of a matrix (red) and binary powering in a polynomial quotient ring (blue). The essentially linear complexity of these approaches can be directly observed. Naively computing terms with the recurrence would not terminate for $N$ of this size in any reasonable time. Readers can create their own timings using the interactive link above.

It is also of note that the (essentially) linear complexity comes from the growth of the integers involved. Using the quotient ring approach while computing modulo a fixed integer results in a logarithmic complexity! (For instance, it takes only 5 milliseconds to compute $F_{10^{100}}$ mod $13 = 5$.)

3. Limit Theorems for Combinatorial Objects

Instead of simply counting combinatorial objects by size, it is often illuminating to also keep track of different parameters. This is especially useful in the analysis of algorithms and computer algebra, as the number of polynomials of fixed degree whose coefficients come from a finite field (like the integers modulo a prime) form a finite set.

For instance, let $e_{n,k}$ denote the number of pairs of polynomials $(f,g)$ with coefficients in $\mathbb{Z}_p$ such that $f$ is monic, $n = \deg f > \deg g$, and the Euclidean gcd algorithm applied to $(f,g)$ takes $k$ steps to terminate. The Symbolic Method for generating functions shows how to write the bivariate sequence $e_{n,k}$ as the power series coefficients of a rational function (depending on the prime $p$): $$ \sum_{n,k \geq 0} e_{k,n} u^kz^n = \frac{1}{1-pz - p(p-1)uz}. $$ From here, analytic methods imply that for $n$ fixed and large the sequence $e_{k,n}$ approaches a normal distribution. In other words, as the degrees of the polynomials approach infinity the number of divisions performed by the Euclidean algorithm satsifies a central limit theorem.

A running count of the number of divisions performed when running the Euclidean algorithm on pairs of polynomials in $\mathbb{Z}_3[x]$ with the higher degree polynomial monic of degree 500, compared to the expected distribution from the limit curve. Readers can experiment with different primes and polynomial degrees using the interactive link above. This animation resets every 7 seconds. If it does not play in your browser click here.

Computer Algebra Software

Students will be expected to do some basic coding on their assignments. You are free to use any computer algebra system or programming language you wish (SageMath, Maple, Mathematica, Julia, Python, etc.). Because it is free, open source, and even possible to run in your browser with no installation, I recommend using Sage.

A short tutorial for Sage can be run in the browser at this link (after a short load time of up to a minute) or viewed as a static HTML page here.

We will mainly use Sage for examples in class. Possibilities for running Sage yourself include

  1. Installing Sage on your computer from its website (can be surprisingly tricky)
  2. Using Waterloo’s JupyerHub website (Requires Waterloo login)
  3. Signing up for a free account on the online browser-based CoCalc service
  4. Using a public cloud environment (some notebooks will be distributed like this in class)

In addition to the packages that come built-in to Sage, we will also make use of the ore_algebra package, available at this link.

  1. Walks in orthants (such as a two-dimensional quadrant) are important in queuing theory (among many other areas) as they model jobs/people entering and being processed out of queues and the number of jobs/people in a line can never be negative. ↩︎