Syncopated Systems™
Seriously Sound Science™
|
Home | Services | Case Studies | About | Library | Contact |
On Prime NumbersPublished December 26, 2006; Updated October 21, 2020 What Are Prime Numbers?A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. (The sets of these numbers may be represented with the symbols ℙ and ℕ, respectively.) So, for any given prime number p, x divides p if and only if x is equal to either 1 or p. Using mathematical notation, this may be expressed as: x | p, x = { 1, p } (This may be read as “The value x evenly divides p, where x is a member of the set containing 1 and p”.) Prime Numbers and ComputersIn computer science, prime numbers have notable applications in areas including cryptography, which is the science of encoding and decoding messages to create relative security in communication. Modern cryptographic practices—such as RSA (Rivest–Shamir–Adleman) encryption—rely upon the selection of relatively unknown (and usually very large) prime numbers. Using these cryptographic methods, those able to identify larger prime numbers should, in theory, be better able to securely encode their own messages and to decode their adversaries’ messages. Cryptography also provides a foundation for blockchain-based systems such as Bitcoin that emerged around 2008-2009. My Interest in Prime NumbersAround the time I was born in 1970, my father shared an office at Control Data Corporation (CDC) with LaFarr Stuart, who later became my childhood mentor and lifelong friend. During the Korean War, my father had taken a break from his studies at the University of Washington to enlist (twice) in the U.S. Army so that he could fight the cold war in Germany rather than the hot war in Korea; Stuart found that he could avoid fighting in Korea by staying in school. While postponing his service in the U.S. Air Force, he earned five four-year degrees. One of his degrees is in mathematics, in which he focused on number theory. In 1979, my father and Stuart started teaching me how to take apart electronic circuits that were discarded by CDC, which I learned were logic modules designed by the company’s head engineer, Seymour Cray, now known as “the father of supercomputing”. They also taught me to put together circuits using kits from Radio Shack stores (then owned by Tandy Corporation), parts from local surplus stores, and (as I got a little older) a college-level digital logic trainer kit made by Digital Equipment Corporation (DEC), which Stuart loaned me, along with a first edition (1973 hardcover) Texas Instruments TTL Data Book. Stuart also loaned me a COSMAC Elf single-board computer built around the 1802 8-bit microprocessor made by RCA (where he had worked earlier), through which I learned to enter programs through a hexadecimal keypad using machine code. After convincing my father to buy an Atari 800 home computer (probably in 1983, when retailers dropped its price to make way for the new 1200XL model), one of the first things I did with it was to write a program to test integers to find and print perfect numbers. It wasn’t a great program, but we’ve all got to start somewhere. I also learned quickly that I could increase the speed of calculations by about 25% through turning off the display subsystem, and turning it back on only after I pressed a key to monitor the program. (The printer I had put together from component assemblies for an Atari 820 40-column dot-matrix type that I bought at a surplus store and a wooden box my father built, which unfortunately seemed to amplify the noise of the print head. I later got a disk drive that someone had assembled from surplus parts, which made me learn about computer hardware maintenance and repair, leading to my first business, its expansion and my start in online advertising in 1985, a job at Atari in 1987, and likely my following jobs at IBM research and for Atari founder Nolan Bushnell—effectively becoming a springboard for my career.) Among many other things, Stuart introduced me to the work of another number theorist named Derrick Henry Lehmer (1905-1991), whose interesting work with early electronic computing and prime numbers includes the Lucas-Lehmer test (LLT) for primality and his 1926 electromechanical bicycle-chain sieve device (designed to select numbers to test for primality), which Stuart suggested we might reproduce using modern electronics. Although such an implementation might circulate the sieve’s patterns through rings of shift registers, I believe that doing this might seem naïve compared to leaving the data in situ and accessing them by indexing through matrices. For building such a sieve, many technologies exist today that are still better, faster, and/or cheaper. Around 2002, as I completed a computer science degree, I had seen some interesting developments from Air Force Research Laboratories that suggested to me that a much better sieve could be constructed using newer technology, though at much greater cost. (Since then, there have also been great advancements in quantum computing, which potentially open entire new frontiers in computer science.) On August 6, 2002, a trio of authors at the Indian Institute of Technology, Kanpur released for review an impressive paper titled Primes is in P, in which they present a (potential) relatively-fast algorithm for testing a given number for primality, now known as the Agrawal–Kayal–Saxena primality test, often abbreviated as the AKS primality test and also known as the cyclotomic AKS test. (One of my outstanding mathematics professors from Saint Edward’s University, Dr. Michael Engquist, brought this paper to my attention hardly more than a week after its publication.) The AKS primality test renewed my interest in prime numbers, and is particularly exciting because it is “the first primality-proving algorithm to be simultaneously general, polynomial, deterministic, and unconditional”. I had created this page as a brief essay on prime numbers in 2006, the same year that Agrawal, Kayal, and Saxena received the Gödel Prize and the Fulkerson Prize. In 2009, I taught in Texas as an assistant professor computer science before returning to Silicon Valley. I occasionally revisit the topic of prime numbers, which I sometimes find useful as a foundation for practical applications in computer science. (I should note that I experiment largely for my own edification, and that my studies have been informal at best.) How Many Prime Numbers Are There?Just as there are infinitely many integers, there are infinitely many prime numbers. However, the number of prime numbers in a finite interval is also finite, and the proportion of prime numbers in an interval quickly becomes smaller as the interval becomes larger. As an example, I created the following table of prime numbers in the interval [1,1000], of which there are 168. In the table:
The Prime-Counting FunctionThe number of primes in a closed interval [1,x] is represented by the prime-counting function π(x), which may also be written without Greek letters as pi(x). No definitive method exists to date to calculate the number of primes in an interval, and doing so can be tedious and/or computationally intense. However, there do exist methods of estimation that vary from simple though inaccurate to more accurate though complex. Methods For Approximating Numbers of PrimesStarting in the 18th century, the number of prime numbers in an interval [1,x] could be approximated as x ÷ ln x. this might be espressed in longer forms as:
By the end of the 19th century, a more-accurate method of approximation was developed using the logarithmic integral function li(x), but the calculus required makes it more complex to use. My ExperimentsIn the early 2000s, after generating a table of several thousand prime numbers with my Apple iMac, I found that the earlier method doesn’t provide a very good approximation. My old iMac with only a 600 MHz PowerPC G3 processor running my quick-and-dirty C++ program tests for primality and prints only about 4000 numbers a minute, using Euclid’s algorithm to find the greatest common denominator of the number under test. Running my little program for about 10 minutes, I found:
Refining Methods for Approximating the Number of PrimesIn 2006, I had suggested that the approximation x ÷ ln x could be improved by multiplying by (10 ÷ 9). For relatively small intervals, this method’s results compare favorably to those obtained via the methods discussed previously, as indicated via the table below. Although multiplying x ÷ ln x by (10 ÷ 9) greatly improves estimates for relatively small values of x, multiplying by this constant causes the function to deviate when x reaches about 50,000 (5 × 104) and actually makes the estimate poorer when x exceeds approximately 500,000,000 (5 × 108). Interestingly, multiplying x ÷ ln x by (100 ÷ 99) improves the estimate over a range of numbers at least through 1027. Such an approximation might look like this:
As shown in the table below, for intervals less than about 500,000,000 (5 × 108) this estimate is less accurate than multiplying by (10 ÷ 9). However, multiplying by (100 ÷ 99) also appears to converge on the correct value, though not nearly as quickly as li(x). Multiplying by (1000 ÷ 999) yields approximations that are less accurate, so the factor (100 ÷ 99) appears be what I believe is known in scientific circles as a lucky guess.
In my 2006 essay, I described using my Apple iMac (with a PowerPC processor running at 600 MHz) and to process about 4000 numbers per second. In 2019, my calculations on my Panasonic Toughbook CF-30 (with a 2-core Intel® Core™2 Duo L7500 processor running at 1.60 GHz) were more than 162 times faster. (Over the same small interval I processed in 2006, my code in 2019 runs so much faster that it’s practically immeasurable.) Total processor resources available in 2019 are about 5.33 times what I had in 2006. But, as of this writing, I am experimenting only with a single thread that uses only a single processor core, so the improvement in hardware should speed up my experiments by only about 2.67 times. Accounting for differences between RISC and CISC processor architectures, the throughput of the newer processor core might actually be about 8 times faster. So what would make my software run about 20 times faster? The differences in processing speeds between my experiments in 2006 and 2019 are likely best explained by differences in the algorithms I used. AlgorithmsIn the field of computer science, we often describe computational time complexity in terms of big O notation, which is more formally known as Bachmann–Landau notation or asymptotic notation. In 2006, I had noted that I used Euclid’s algorithm, which dates back to nearly 300 BCE. In 2019, I initially used a very slow method of brute force computation called trial division (first described in 1202 by Fibonacci in his book Liber Abaci), which has a complexity of O(x2). Soon afterward, I improved my method to include a crude sieve. (Though much different in its implementation, it is functionally equivalent to the Sieve of Eratosthenes from circa 200 BCE or Euler’s Sieve from circa 1750 CE.) This reduced the time complexity to linear time with processing on the order of O(x), as observed in 1978 by Gries and Misra. Afterward, my early experiments with wheel factorization using the relatively small basis { 2, 3, 5 } were computed at about the same speed. Using a larger basis improved speed, but yielded solution sets that were incomplete. (I believe that was likely caused by overflowing the integer type I had used, but I have not yet had occassion to delve into it again.) Pollard’s Rho AlgorithmFor comparison, in 2020, my laptop computer that’s already about 10 years old can evaluate the first 10,000 integers in a fraction of a second, and the first billion (109) integers in less than four hours using the Linux command factor. This uses Pollard’s rho algorithm, which was developed in 1975 and appears to be still the most practical algorithm for factoring integers. To count the prime numbers in a certain interval (such as from one to one-million, for example), and report the amount of time spent doing it, the following string of commands and parameters could be entered at the command line prompt ($): $ time seq 1 1000000 | factor | grep "[0-9]*: [0-9]*$" | wc -l 78498
real 0m6.963s user 0m5.524s sys 0m0.040s (With the specified parameters above, seq executes factor using the first one million integers and feeds its output to grep, which outputs only lines containing prime numbers to wc, which counts the number of lines, reporting the expected number of prime numbers in that interval. Finally, time reports the amount of time required to perform the task, which was nearly seven seconds.) SummaryI expect to continue occasional experiments with prime numbers, especially as they relate to practical applications with computers and electronics. I welcome input via my contact page. |
Syncopated attempts to present a model Web site that meets or exceeds all applicable technical and legal requirements, including those of the A.D.A., COPPA, GDPR, ICANN, and W3C. | Syntax validated |
Style sheet validated |
Highest accessibility |
“Syncopated Systems” and “Syncopated Software” are registered trademarks, the interlaced tuning forks device and the “seriously sound science” and “recreate reality” slogans are trademarks, and all contents (except as otherwise noted) are copyright ©2004-2024 Syncopated Systems. ALL RIGHTS RESERVED. Any reproduction without written permission or other infringement is prohibited by United States and international laws. |