This thread marks the formal launch of “Finding primes” as the massively collaborative research project Polymath4, and now supersedes the proposal thread for this project as the official “research” thread for this project, which has now become rather lengthy. (Simultaneously with this research thread, we also have the discussion thread to oversee the research thread and to provide a forum for casual participants, and also the wiki page to store all the settled knowledge and accumulated insights gained from the project to date.) See also this list of general polymath rules.
The basic problem we are studying here can be stated in a number of equivalent forms:
Problem 1. (Finding primes) Find a deterministic algorithm which, when given an integer k, is guaranteed to locate a prime of at least k digits in length in as quick a time as possible (ideally, in time polynomial in k, i.e. after steps).
Problem 2. (Finding primes, alternate version) Find a deterministic algorithm which, after running for k steps, is guaranteed to locate as large a prime as possible (ideally, with a polynomial number of digits, i.e. at least digits for some .)
To make the problem easier, we will assume the existence of a primality oracle, which can test whether any given number is prime in O(1) time, as well as a factoring oracle, which will provide all the factors of a given number in O(1) time. (Note that the latter supersedes the former.) The primality oracle can be provided essentially for free, due to polynomial-time deterministic primality algorithms such as the AKS primality test; the factoring oracle is somewhat more expensive (there are deterministic factoring algorithms, such as the quadratic sieve, which are suspected to be subexponential in running time, but no polynomial-time algorithm is known), but seems to simplify the problem substantially.
The problem comes in at least three forms: a strong form, a weak form, and a very weak form.
- Strong form: Deterministically find a prime of at least k digits in poly(k) time.
- Weak form: Deterministically find a prime of at least k digits in time, or equivalently find a prime larger than in time O(k) for any fixed constant C.
- Very weak form: Deterministically find a prime of at least k digits in significantly less than time, or equivalently find a prime significantly larger than in time O(k).
The pr0blem in all of these forms remain open, even assuming a factoring oracle and strong number-theoretic hypotheses such as GRH. One of the main difficulties is that we are seeking a deterministic guarantee that the algorithm works in all cases, which is very different from a heuristic argument that the algorithm “should” work in “most” cases. (Note that there are already several efficient probabilistic or heuristic prime generation algorithms in the literature, e.g. this one, which already suffice for all practical purposes; the question here is purely theoretical.) In other words, rather than working in some sort of “average-case” environment where probabilistic heuristics are expected to be valid, one should instead imagine a “Murphy’s law” or “worst-case” scenario in which the primes are situated in a “maximally unfriendly” manner. The trick is to ensure that the algorithm remains efficient and successful even in the worst-case scenario.
Below the fold, we will give some partial results, and some promising avenues of attack to explore. Anyone is welcome to comment on these strategies, and to propose new ones. (If you want to participate in a more “casual” manner, you can ask questions on the discussion thread for this project.)
Also, if anything from the previous thread that you feel is relevant has been missed in the text below, please feel free to recall it in the comments to this thread.
— Partial results —
One way to proceed is to find a sparse explicitly enumerable set of large integers which is guaranteed to contain a prime. One can then query each element of that set in turn using the primality oracle to then find a large prime.
For instance, Bertrand’s postulate shows that there is at least one prime of k digits in length, which can then be located in time. A result of Baker and Harman shows that there is a prime between n and for all sufficiently large n, so by using the interval one can find a k-digit prime in time. This is currently the best result known unconditionally. Assuming the Riemann hypothesis, there is a prime between n and , giving the slight improvement to . (Assuming GRH, it seems that one may be able to improve this a little bit using the W-trick, but this has not been checked.) There is a conjecture of Cramer that the largest prime gap between primes of size n is which would give a substantial improvement, to , thus solving the strong form of the conjecture, but we have no idea how to establish this conjecture though it is heuristically plausible.
In a slightly different direction, there is a result of Friedlander and Iwaniec that there are infinitely many primes of the form , and in fact their argument shows that there is a k-digit prime of this form for all large k. This gives a run time of . A subsequent result of Heath-Brown achieves a similar result for primes of the form , which gives a slightly better run time of , but this is still inferior to the Baker-Harman bound.
Strategy 1: Find some even sparser sets than these for which we actually have a chance of proving that the set captures infinitely many primes.
(For instance, assuming a sufficiently quantitative version of Schinzel’s hypothesis H, one would be able to answer the weak form of the problem, though this hypothesis is far from being resolved in general. More generally, there are many sparse sets which heuristically have an extremely good chance of capturing a lot of primes, but the trick is to find a set for which we can provably establish the existence of primes.)
Another approach is based on variants of Euclid’s proof of the infinitude of primes and rely on the factoring oracle. For instance, one can generate an infinite sequence of primes recursively by setting each to be the largest prime factor of (here we use the factoring oracle). After k steps, this is guaranteed to generate a prime which is as large as the prime, which is about by the prime number theorem. (In general, it is likely that one would find much larger primes than this, but remember that we are trying to control the worst-case scenario, not the average-case one.)
Strategy 2. Adapt the Euclid-style algorithms to get larger primes than this in the worst-case scenario.
For comparison, note that the Riemann hypothesis argument given above would give a prime of size about or more in k steps.
A third strategy is to look for numbers that are not S-smooth for some threshold S, i.e. contain at least one prime factor larger than S. Factoring such a number will clearly generate a prime larger than S. (Furthermore, if the non-S-smooth number is of size less than , it must in fact be prime.) One strategy is to scan an interval [n, n+a] for non-smooth numbers, thus leading to
Problem 3. For which n, a, S can we rigorously show that [n,n+a] contains at least one non-S-smooth number?
If we can make S much larger than a, then we are in business (e.g. if we can make S larger than , then we will beat the RH bound). Unfortunately, if S is larger than a, we know that [0,a] does not contain any non-S-smooth number, which makes it difficult to see how to show that [n,n+a] will contain any non-S-smooth numbers, since sieve theory techniques are generally insensitive to the starting point of an interval.
One interesting proposal to do this, raised by Tim Gowers, is to try to use additive combinatorics. Let J be a set of primes larger than S (e.g. the primes between S and 2S), and let K be the set of logarithms of J. If we can show that one of the iterated sumsets intersects the interval , then we have shown that [n,n+a] contains at least one non-S-smooth number. The hope is that additive combinatorial methods can provide some dispersal and mixing properties of these iterated sumsets which will exclude a “Murphy’s law” type scenario in which the interval is always avoided.
Strategy 3. Find good values of n, a, S for which the above argument has a reasonable chance of working.
A fourth strategy would be to try to generate pseudoprimes rather than primes, e.g. to generate numbers obeying congruences such as . It is not clear though how to efficiently achieve this, especially in the worst-case scenario.
A fifth strategy would be to try to understand the following question:
Problem 4. Given a set A of numbers, what is an efficient way to deterministically find a k-digit number which is not divisible by any element of A?
Note that Problem 1 is basically the special case of Problem 4 when A is equal to all the natural numbers (or primes) less than .
A last minute addition to the above strategies, suggested by Ernie Croot:
Strategy 6. Assume the existence of a Siegel zero at some modulus M, which implies that the Jacobi symbol is equal to -1 for all small primes (say ). Can one use this sort of information to locate a large prime, perhaps by using the quadratic form ? Note that M might not be known in advance. This could lead to a non-trivial disjunction: either we can solve the primes problem, or we can assume that there are no Siegel zeroes in a certain range.
There may be additional viable strategies beyond the above ones. Please feel free to share your thoughts, but remember that we will eventually need a rigorous worst-case analysis for any proposed strategy; heuristics are not sufficient by themselves to resolve the problem.
— Other variants —
There are some variants of the problem that have already been solved, for instance finding irreducible polynomials of a certain degree over a finite field is easy, as is finding square-free numbers of a certain size. It may be worth looking for additional variant problems which are easier but are not yet solved.
We have an argument that shows that in the presence of some oracles, and replacing primes by a similarly dense set of “pseudoprimes”, the problem cannot be solved. It may be of interest to refine this argument to show that even if one assumes complexity-theoretic conjectures such as P=BPP, the general problem of finding pseudoprimes is not efficiently solvable. (We know that the problem is solvable though if P=NP.)
The following toy problem has been advanced:
Problem 5. (Finding consecutive square-free numbers) Find a deterministic algorithm which, when given an integer k, is guaranteed to locate a pair n,n+1 of consecutive square-free numbers of at least k digits in length in as quick a time as possible.
Note that finding one large square-free number is easy: just multiply lots of distinct small primes together. Also, as the density of square-free numbers is , a counting argument shows that pairs of consecutive square-free numbers exist in abundance, and are thus easy to find probabilistically. Testing a number for square-freeness is trivial with a factoring oracle. (Question: is there a deterministic polynomial-time algorithm to test a k-digit number for square-freeness without assuming any oracles?)
For Question 6, it has been suggested that Sylvester’s sequence may be a good candidate; interestingly, this sequence has also been proposed for Strategy 2.