# Other articles

1. Published: Thu 11 February 2016

# My computer science course of Jan 2016

As I told before, in January I have been teaching a class preparing for an exam in theoretical computer science master's students.

There were little over 20 people in attendance and it was a nice experience I wouldn't mind repeating. During the evaluation process (I hope to get access to the official evaluation dataset) I got almost only high marks in all categories which I'm rather proud of.

I was a bit baffled when I asked for a list of well-known set operations and stared into blank faces but one never knows if it is shyness or missing knowledge keeping the answer from being given. I'd like it to be the former.

Since, after my first teaching job, I've yet to run out of hope, I offered to answer questions via email after the fact and a few messages laster, here is what most students seem to have problems with:

How do you find out whether a language is regular/context-free?

For both types of language we have something called the "pumping lemma" which is basically that in any infinite language with a finite set of production rules, there exists a list of words that form a repeating pattern that has a relation to a loop within the production rules.

For regular languages (say $$R$$) it is basically $$L \in R \to (\exists uvw \in L: uv^*w \subseteq L)$$ with the condition of $$v$$ being not empty. The idea is, that a regular language can be represented as a deterministic finite automata (DFA) that has a finite number of states while the language is infinite so that some state has to be reached more than once in the course of accepting a word. After reading $$u$$ you are in some state and then you read $$v$$ and are in the same state you were after reading $$u$$ and after then reading $$w$$ the word is accepted. The word $$v$$ can be repeated many times or not at all.

CC BY-SA 3.0 / Jochen Burghardt

For context free languages (say $$CF$$) this is nearly the same, only that there is some nonterminal that can produce itself from the production rules. Repeating words appear left and right of the looping nonterminal equally often ($$v$$ and $$x$$). The looping nonterminal also produces a word without looping (or every production including that nonterminal wouldn't halt) that we call $$w$$. So the formula for the context-free version of the pumping lemma is basically $$L \in CF \to (\exists uvwxy \in L: (\forall n \in N : uv^nwx^ny \in L))$$ with the sidecondition of not both $$vw$$ and $$wx$$ being empty.

In both formulas have the form $$L \in L' \to PL$$ and we get the reversal $$\neg PL \to L \notin L'$$ for free. So to show that a language is regular or context free it is not enough to show that the pumping lemma holds because there are languages that are not in that category and still have words that can be pumped.

So to show that a language is context free/regular one has to construct a context free grammar/(non)deterministic finite automata and if you can't find any way to pump words then the language is not in that category.

The direction of the implication-arrow (or that it isn't an equivalence) seems to have my students stumped.

And here you go with two languages that fulfill the pumping lemma for a language category and are not a language of that type:

• $$a^n | n \text{ is not prime}$$ with $$(u,v,w) = (aa,aa,\varepsilon)$$
• $$a^nb^n | n \text{ not a power of two}$$ with $$(u,v,w,x,y) = (\varepsilon,aaa,aabb,bbb,\varepsilon)$$

2. Published: Thu 05 November 2015

# Musings on Graph Isomorphism pre-results

I think by now everybody has read that Babai et al will hold a seminar next week where they want to "outline an algorithm that solves the Graph Isomorphism problem [...] in quasipolynomial time".

So they are shooting for a result with the complexity $$2^{O(log(n))^c}$$ while the best former result (by Luks, 1983) for Graph Isomorphism was $$2^{O(\sqrt{n*log(n)})}$$.

What is Graph Isomorphism? Given two graphs, the question is, whether there is a mapping function $$f$$ between the nodes of one graph $$G_1$$ to the nodes of the other graph $$G_2$$ so that every edge $$(a,b) \in G_1$$ is also an edge $$(f(a),f(b)) \in G_2$$.

So why is this quasipolynomial when $$2^x$$ is obviously not a polynome? That's where the quasi comes from. If the $$c$$ in $$2^{O(log(n))^c}$$ is $$1$$ then this simplifies to $$2^{O(log(n))}$$ which is a polynomial. Remember that the base of both the logarithm and the exponentiation isn't actually $$2$$ so this doesn't simplify to $$O(n)$$ but some other polynomial (but a polynomial nevertheless). And the $$c$$ is just a factor in the formula (like the base the logarithm and the exponentiation) that have no impact on the complexity properties (the general shape of the formula describing the complexity) but can be changed by optimization (opposite to groundbreaking research). So given proper optimization the algorithm we will be presented could be (theoretically) as fast as a polynomial algorithm.

When we talk about "fast" we talk about how the runtime changes (relatively) when the problem size grows larger. A "faster" algorithm has smaller relative growth while it might still be orders of magnitude slower in actual runtime for any practically sized input.

Is this result surprising? Well, yes and no. On the one hand the "witness" (or positive solution to the problem) is only of linear size (one mapping for every node in the graph) and the algorithm for checking would be basically quadratic in complexity (checking the definition given above for all edges and there are only quadratically many). This holds for problems in NP-complete but also for problems in P. I think it is obvious that this answer would not be very hard to find (we surely don't need to check every possible mapping - this would basically be the case for NP-complete) but it isn't obvious that the answer is easy to find either (like for a problem in P).

So what we're probably going to see is a clever application of some result of group theory that is applicable here and excludes a large number of all possible isomorphism as possible solutions. So the problem is still not easy but math can help a lot.

What does it mean? It is not known whether Graph Isomorphism is NP-Complete. This result would be a strong hint, that it isn't (albeit no proof). Even if we found Graph Isomorphism to be in P, this would have no larger impact on the field of complexity theory in general.

What's the next thing? That would be finding an algorithm properly in P. If such an algorithm exists, we would also find a way to compute a small "witness" (or in this case "proof trail") that is an easily checkable answer to the question "is there no such isomorphism?". How this proof trail would look is even less obvious. There could be some function that, given a graph as input calculates a value and all graphs that have an isomorphism between them have the same value (which would also need to not be exponential in size). In practice it is very difficult to even find hard instances of Graph Isomorphisms. So it might turn out that there are actually only easy cases, we just don't know it yet.