Syncopated Systems
Seriously Sound Science

On Prime Numbers

Since first publishing this, I have made many corrections and improvements. I apologize for inaccuracies that appeared in earlier versions.

What Are Prime Numbers?

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. (The sets of these numbers may be represented with the symbols ℙ and ℕ, respectively.)

So, for any given prime number p, x divides p if and only if x is equal to either 1 or p. Using mathematical notation, this may be expressed as:

x | p, x = { 1, p }

(This may be read as “The value x evenly divides p, where x is a member of the set containing 1 and p”.)

Prime Numbers and Computers

In computer science, prime numbers have notable applications in areas including cryptography, which is the science of encoding and decoding messages to create relative security in communication. Modern cryptographic practices—such as RSA (Rivest–Shamir–Adleman) encryption—rely upon the selection of relatively unknown (and usually very large) prime numbers. Using these cryptographic methods, those able to identify larger prime numbers should, in theory, be better able to securely encode their own messages and to decode their adversaries’ messages. Cryptography also provides a foundation for blockchain-based systems such as Bitcoin that emerged around 2008-2009.

My Interest in Prime Numbers

Around the time I was born in 1970, my father shared an office at Control Data Corporation (CDC) with LaFarr Stuart, who later became my childhood mentor and lifelong friend.

During the Korean War, my father had taken a break from his studies at the University of Washington to enlist (twice) in the U.S. Army so that he could fight the cold war in Germany rather than the hot war in Korea; Stuart found that he could avoid fighting in Korea by staying in school. While postponing his service in the U.S. Air Force, he earned five four-year degrees. One of his degrees is in mathematics, in which he focused on number theory.

In 1979, my father and Stuart started teaching me how to take apart electronic circuits that were discarded by CDC, which I learned were logic modules designed by the company’s head engineer, Seymour Cray, now known as “the father of supercomputing”. They also taught me to put together circuits using kits from Radio Shack stores (then owned by Tandy Corporation), parts from local surplus stores, and (as I got a little older) a college-level digital logic trainer kit made by Digital Equipment Corporation (DEC), which Stuart loaned me, along with a first edition (1973 hardcover) Texas Instruments TTL Data Book. Stuart also loaned me a COSMAC Elf single-board computer built around the 1802 8-bit microprocessor made by RCA (where he had worked earlier), through which I learned to enter programs through a hexadecimal keypad using machine code.

After convincing my father to buy an Atari 800 home computer (probably in 1983, when retailers dropped its price to make way for the new 1200XL model), one of the first things I did with it was to write a program to test integers to find and print perfect numbers. It wasn’t a great program, but we’ve all got to start somewhere. I also learned quickly that I could increase the speed of calculations by about 25% through turning off the display subsystem, and turning it back on only after I pressed a key to monitor the program.

(The printer I had put together from component assemblies for an Atari 820 40-column dot-matrix type that I bought at a surplus store and a wooden box my father built, which unfortunately seemed to amplify the noise of the print head. I later got a disk drive that someone had assembled from surplus parts, which made me learn about computer hardware maintenance and repair, leading to my first business, its expansion and my start in online advertising in 1985, a job at Atari in 1987, and likely my following jobs at IBM research and for Atari founder Nolan Bushnell—effectively becoming a springboard for my career.)

Among many other things, Stuart introduced me to the work of another number theorist named Derrick Henry Lehmer (1905-1991), whose interesting work with early electronic computing and prime numbers includes the Lucas-Lehmer test (LLT) for primality and his 1926 electromechanical bicycle-chain sieve device (designed to select numbers to test for primality), which Stuart suggested we might reproduce using modern electronics.

Although such an implementation might circulate the sieve’s patterns through rings of shift registers, I believe that doing this might seem naïve compared to leaving the data in situ and accessing them by indexing through matrices.

For building such a sieve, many technologies exist today that are still better, faster, and/or cheaper.

Around 2002, as I completed a computer science degree, I had seen some interesting developments from Air Force Research Laboratories that suggested to me that a much better sieve could be constructed using newer technology, though at much greater cost. (Since then, there have also been great advancements in quantum computing, which potentially open entire new frontiers in computer science.)

On August 6, 2002, a trio of authors at the Indian Institute of Technology, Kanpur released for review an impressive paper titled Primes is in P, in which they present a (potential) relatively-fast algorithm for testing a given number for primality, now known as the Agrawal–Kayal–Saxena primality test, often abbreviated as the AKS primality test and also known as the cyclotomic AKS test. (One of my outstanding mathematics professors from Saint Edward’s University, Dr. Michael Engquist, brought this paper to my attention hardly more than a week after its publication.) The AKS primality test renewed my interest in prime numbers, and is particularly exciting because it is “the first primality-proving algorithm to be simultaneously general, polynomial, deterministic, and unconditional”.

I had created this page as a brief essay on prime numbers in 2006, the same year that Agrawal, Kayal, and Saxena received the Gödel Prize and the Fulkerson Prize.

In 2009, I taught in Texas as an assistant professor computer science before returning to Silicon Valley.

I occasionally revisit the topic of prime numbers, which I sometimes find useful as a foundation for practical applications in computer science. (I should note that I experiment largely for my own edification, and that my studies have been informal at best.)

How Many Prime Numbers Are There?

Just as there are infinitely many integers, there are infinitely many prime numbers. However, the number of prime numbers in a finite interval is also finite, and the proportion of prime numbers in an interval quickly becomes smaller as the interval becomes larger.

As an example, I created the following table of prime numbers in the interval [1,1000], of which there are 168. In the table:

  • Circles indicate prime numbers and double cicles indicate Mersenne prime numbers
  • Strikethroughs indiate composite (non-prime) numbers
  • Arcs over composite numbers indicate twin prime numbers
  • Colors indicate factors of 2 (red), factors of 5 (blue), and factors of 10 (violet)
  • The number at the end of each line is a running count of prime numbers

The Prime Numbers in the Interval [1,1000]

Numbers Prime & CompositeTotal
Prime
123456789104
111213141516171819208
2122232425262728293010
3132333435363738394012
4142434445464748495015
5152535455565758596017
6162636465666768697019
7172737475767778798022
8182838485868788899024
91929394959697989910025
10110210310410510610710810911029
11111211311411511611711811912030
12112212312412512612712812913031
13113213313413513613713813914034
14114214314414514614714814915035
15115215315415515615715815916037
16116216316416516616716816917039
17117217317417517617717817918041
18118218318418518618718818919042
19119219319419519619719819920046
20120220320420520620720820921046
21121221321421521621721821922047
22122222322422522622722822923050
23123223323423523623723823924052
24124224324424524624724824925053
25125225325425525625725825926055
26126226326426526626726826927057
27127227327427527627727827928059
28128228328428528628728828929061
29129229329429529629729829930062
30130230330430530630730830931063
31131231331431531631731831932066
32132232332432532632732832933066
33133233333433533633733833934068
34134234334434534634734834935070
35135235335435535635735835936072
36136236336436536636736836937073
37137237337437537637737837938075
38138238338438538638738838939077
39139239339439539639739839940078
40140240340440540640740840941080
41141241341441541641741841942081
42142242342442542642742842943084
43143243343443543643743843944085
44144244344444544644744844945087
45145245345445545645745845946088
46146246346446546646746846947091
47147247347447547647747847948092
48148248348448548648748848949093
49149249349449549649749849950095
50150250350450550650750850951097
51151251351451551651751851952097
52152252352452552652752852953099
53153253353453553653753853954099
541542543544545546547548549550101
551552553554555556557558559560102
561562563564565566567568569570104
571572573574575576577578579580106
581582583584585586587588589590107
591592593594595596597598599600109
601602603604605606607608609610111
611612613614615616617618619620114
621622623624625626627628629630114
631632633634635636637638639640115
641642643644645646647648649650118
651652653654655656657658659660120
661662663664665666667668669670121
671672673674675676677678679680123
681682683684685686687688689690124
691692693694695696697698699700125
701702703704705706707708709710127
711712713714715716717718719720128
721722723724725726727728729730129
731732733734735736737738739740131
741742743744745746747748749750132
751752753754755756757758759760134
761762763764765766767768769770137
771772773774775776777778779780137
781782783784785786787788789790138
791792793794795796797798799800140
801802803804805806807808809810140
811812813814815816817818819820141
821822823824825826827828829830145
831832833834835836837838839840146
841842843844845846847848849850146
851852853854855856857858859860149
861862863864865866867868869870150
871872873874875876877878879880151
881882883884885886887888889890154
891892893894895896897898899900154
901902903904905906907908909910156
911912913914915916917918919920157
921922923924925926927928929930158
931932933934935936937938939940159
941942943944945946947948949950161
951952953954955956957958959960162
961962963964965966967968969970163
971972973974975976977978979980165
981982983984985986987988989990166
9919929939949959969979989991000168

The Prime-Counting Function

The number of primes in a closed interval [1,x] is represented by the prime-counting function π(x), which may also be written without Greek letters as pi(x).

No definitive method exists to date to calculate the number of primes in an interval, and doing so can be tedious and/or computationally intense.

However, there do exist methods of estimation that vary from simple though inaccurate to more accurate though complex.

Methods For Approximating Numbers of Primes

Starting in the 18th century, the number of prime numbers in an interval [1,x] could be approximated as x ÷ ln x. this might be espressed in longer forms as:

|ℙ[1,x]| = π(x) x = x


loge x ln x

By the end of the 19th century, a more-accurate method of approximation was developed using the logarithmic integral function li(x), but the calculus required makes it more complex to use.

My Experiments

In the early 2000s, after generating a table of several thousand prime numbers with my Apple iMac, I found that the earlier method doesn’t provide a very good approximation.

My old iMac with only a 600 MHz PowerPC G3 processor running my quick-and-dirty C++ program tests for primality and prints only about 4000 numbers a minute, using Euclid’s algorithm to find the greatest common denominator of the number under test.

Running my little program for about 10 minutes, I found:

Interval Size Actual Number
of Primes
Rounded Approximate
Number of Primes
Deviation
[1,x] π(x) x ÷ ln x Δ
10 4 4 0 0%
100 25 22 -3 -13.2%
1,000 168 145 -23 -13.8%
10,000 1,229 1,086 -143 -11.7%

Refining Methods for Approximating the Number of Primes

In 2006, I had suggested that the approximation x ÷ ln x could be improved by multiplying by (10 ÷ 9). For relatively small intervals, this method’s results compare favorably to those obtained via the methods discussed previously, as indicated via the table below. Although multiplying x ÷ ln x by (10 ÷ 9) greatly improves estimates for relatively small values of x, multiplying by this constant causes the function to deviate when x reaches about 50,000 (5 × 104) and actually makes the estimate poorer when x exceeds approximately 500,000,000 (5 × 108).

Interestingly, multiplying x ÷ ln x by (100 ÷ 99) improves the estimate over a range of numbers at least through 1027. Such an approximation might look like this:

|ℙ[1,x]| = π(x) 100 x = 100 x


99 loge x 99 ln x

As shown in the table below, for intervals less than about 500,000,000 (5 × 108) this estimate is less accurate than multiplying by (10 ÷ 9). However, multiplying by (100 ÷ 99) also appears to converge on the correct value, though not nearly as quickly as li(x). Multiplying by (1000 ÷ 999) yields approximations that are less accurate, so the factor (100 ÷ 99) appears be what I believe is known in scientific circles as a lucky guess.

Interval Size Actual Number
of Primes
Deviation (Δ)
[1,x] π(x) (x ÷ ln x)
π(x)
(10 x ÷ 9 ln x)
π(x)
(100 x ÷ 99 ln x)
π(x)
li(x)
π(x)
103 168 -13.8% -4.17% -13.1% 6.0%
106 78,498 -7.79% 2.45% -6.86% 0.17%
109 50,847,534 -5.10% 5.45% -4.14% 0.0033%
1012 37,607,912,018 -3.77% 6.93% -2.79% 0.00010%
1015 29,844,570,422,669 -2.99% 7.79% -2.01% 0.0000035%
1018 24,739,954,287,740,860 -2.48% 8.36% -1.49% 0.000000089%
1021 21,127,269,486,018,731,928 -2.11% 8.76% -1.13% 0.0000000028%
1024 18,435,599,767,349,200,867,866 -1.84% 9.06% -0.85% 0.000000000093%
1027 16,352,460,426,841,680,446,427,399 -1.64% 9.29% -0.64% 0.0000000000031%

In my 2006 essay, I described using my Apple iMac (with a PowerPC processor running at 600 MHz) and to process about 4000 numbers per second. In 2019, my calculations on my Panasonic Toughbook CF-30 (with a 2-core Intel® Core™2 Duo L7500 processor running at 1.60 GHz) were more than 162 times faster. (Over the same small interval I processed in 2006, my code in 2019 runs so much faster that it’s practically immeasurable.)

Total processor resources available in 2019 are about 5.33 times what I had in 2006. But, as of this writing, I am experimenting only with a single thread that uses only a single processor core, so the improvement in hardware should speed up my experiments by only about 2.67 times. Accounting for differences between RISC and CISC processor architectures, the throughput of the newer processor core might actually be about 8 times faster.

So what would make my software run about 20 times faster? The differences in processing speeds between my experiments in 2006 and 2019 are likely best explained by differences in the algorithms I used.

Algorithms

In the field of computer science, we often describe computational time complexity in terms of big O notation, which is more formally known as Bachmann–Landau notation or asymptotic notation.

In 2006, I had noted that I used Euclid’s algorithm, which dates back to nearly 300 BCE.

In 2019, I initially used a very slow method of brute force computation called trial division (first described in 1202 by Fibonacci in his book Liber Abaci), which has a complexity of O(x2). Soon afterward, I improved my method to include a crude sieve. (Though much different in its implementation, it is functionally equivalent to the Sieve of Eratosthenes from circa 200 BCE or Euler’s Sieve from circa 1750 CE.) This reduced the time complexity to linear time with processing on the order of O(x), as observed in 1978 by Gries and Misra.

Afterward, my early experiments with wheel factorization using the relatively small basis { 2, 3, 5 } were computed at about the same speed. Using a larger basis improved speed, but yielded solution sets that were incomplete. (I believe that was likely caused by overflowing the integer type I had used, but I have not yet had occassion to delve into it again.)

Pollard’s Rho Algorithm

For comparison, in 2020, my laptop computer that’s already about 10 years old can evaluate the first 10,000 integers in a fraction of a second, and the first billion (109) integers in less than four hours using the Linux command factor. This uses Pollard’s rho algorithm, which was developed in 1975 and appears to be still the most practical algorithm for factoring integers. To count the prime numbers in a certain interval (such as from one to one-million, for example), and report the amount of time spent doing it, the following string of commands and parameters could be entered at the command line prompt ($):

$ time seq 1 1000000 | factor | grep "[0-9]*: [0-9]*$" | wc -l

78498

 

real 0m6.963s

user 0m5.524s

sys 0m0.040s

(With the specified parameters above, seq executes factor using the first one million integers and feeds its output to grep, which outputs only lines containing prime numbers to wc, which counts the number of lines, reporting the expected number of prime numbers in that interval. Finally, time reports the amount of time required to perform the task, which was nearly seven seconds.)

Summary

I expect to continue occasional experiments with prime numbers, especially as they relate to practical applications with computers and electronics.

I welcome input via my contact page.