Computational Work on the Sieve of Eratosthenese to Find Prime Numbers

I have just read about the Sieve of Eratosthenese from this Scientific American article from Facebook and was inspired to give it a try to see how it works, especially because the math and programming seemed simple enough. I worked with a dataset of 1-1000. I also tried up to 100,000. My browser to freaks out with a dataset larger than that.

I have also tried to setup my process so there would be no numbers that would be worked on more than once for each round of multiple removal. I am calling this the Square Prime Reduction Method (SPRM).

In order to do so, I would look for a pattern that would identify the remaining numbers. The farther along I worked the harder it became. Below I will list my notes, the pattern used for cumulative reduction, and the amount of dataset reduction, and the dataset size required for pattern recognition. This is the result of about a day of work.

See my notes here and the results of my scripting work below.

Square Prime Reduction Method (SPRM)

Using this cumulative mutiple removal method it seems that the beginning for each reduction round is the square of the next prime number (i.e. for 2 it is 4; for 3 it is 9; for 5 it is 25; etc). For each reduction round I:

  1. start at the square of the next prime
  2. run a clean process as if I were removing all of the mutliples which showed zeros for the already removed ones
  3. find the pattern for its remaining multiples
  4. code to remove the remaining multiples

With each successfully processed prime we are able to further reduce the pool. To reveal all primes under 1000 I would have to follow this process up to the square of the prime of 31 which is 961.

Prime Multiple Reduction (PMR)

I started with a dataset of 1-1000 to see if I could get it to work. Starting with 2 I processed the removal or the multiples of each prime number as I encountered them as I moved up the list. It also seemed to scale fine for 10,000 and 100,000 and my observations and tenous rules seem to work as the dataset scales.

Factors of 2

First, in any given number set we can get rid of half of the numbers because they are even and will be evenly divisible by 2, so that speeds up things considerably.

Pattern Complexity: Obvious

Pattern: Ummm...well... Every other number! =)

Dataset Reduction: This resulted in a reducing the possible number pool by 49.9% (499).

Factors of 3

Second, we can get rid of every third number because they will evenly divisible by 3.

Pattern Complexity: Obvious - 1 extra step

Pattern: In writing the code for this I noticed the following pattern after removing the even numbers: start with 3 and then add 6 and remove that, since the previous value was divisible by 2.

Dataset Reduction: This resulted in a reducing the possible number pool by about 16.6% (166).

Factors of 5

Then, removing factors of 5 becomes slightly more tricky since we have removed numbers divisible by prime numbers 2 and 3 creating quite a few holes in its multiple list.

Pattern Complexity: Easy - 2 extra steps

Pattern: The pattern for multiples of 5: start at 5, increment by 30, remove that value and then -10 value.

Dataset Reduction: This resulted in a reducing the possible number pool by 6.6% (66). Not a whole lot, but we are getting closer.

Factors of 7

Removing factors of 7 was much more difficult since we have removed a broad swath of numbers ~ 72.7% of the number pool. Stay on Target!

Pattern Complexity: Moderate - 8 extra steps

Pattern: To find the pattern for multiples of 7 I copied the result of a straight removal by 7 and put it into a text file and looked for how the number of zeros (already removed multiples) stacked up between the number of those numbers remaining which were divisible by 7. There was a visual pattern and from there I had to figure it out. I found a repeating pattern of previously removed multiples: 3/1/3/1/3/5/1/5. Turning that into a computational model was fun!

My second idea, which was used, was the opposite sort of implementation than what I used for the multiples of 5. With 5 I went up and then back. With 7 seven I programmatically worked my way forward to the next iteration.

Dataset Reduction: This resulted in a reducing the possible number pool by 3.2% (32).

Factors of 11

Because I needed to jump the max number out from 1,000 to 10,000 in order to try to see a pattern attempting to remove factors of 11 was too difficult to fully pursue,

Pattern Complexity: Difficult - with the numbers and observations below I expect the pattern would be near 32 and 512 extra steps.

Pattern: I was able to see a vague pattern in about 5,000 items, but was not going to pursue that due to lack of time and quality pattern recognition and visual analysis tools.

Dataset Reduction: ?? - given the numbers which I will display below I expect a reduction of about 1.98%

Observations

Larger Dataset Needed for Pattern Recognition

As you go further and further through the primes it seems like we need a larger dataset to compensate for the reduced info in the dataset in order to see the pattern. I needed the just over:

Possible rule: To find the pattern for each successive prime we will need a dataset 10 times larger than the previous one. With this I am not really taking into account the first two since the pattern is super simple to find because there is minimal processing and with a datapoint reduction which is very easy to compensate for.

Or perhaps the dataset has to do with the number of multiples removed and the remaining number of datapoints remaining from the prime's square value.

That is supremely annoying. I would need better tools to play with larger numbers. =O

Increasing Complexity in Pattern

As each square is processed it will become more and more difficult to process the pattern to prevent processing already removed numbers. As mentioned above the dataset requirements grow in order for us to see the pattern, but also so does the work required to process the pattern to remove subsequent multiples. This is also biased by by my limitations, biases, and inexperience with such work. I definitely need to get through processing the pattern for 7 before I really state anything, but, since I am stopping here, I will throw out the following tenuous rule:

Possible Rule: Based on my very limited work the complexity increase with each prime processed is the one of these 3 ideas: the square of the previous number of complexity steps * 2; the cube of the previous number of complexity steps; or number of complexity steps * 4.

This was more evident from processing 5 and 7:

Keep in mind, this is probably not close to wholly accurate given my limited data and processing (up to 7), but it might give you a place to start with in understanding the complexity of finding the multiples in each successive prime generation using this technique. I really need to find out how many extra steps are required for 11 to have a better idea about my observations.

Diminishing Returns with each Processing Generation

When we start with 2 we remove half of the datapoints and as we process each generation we reduce fewer and fewer data points which will seem to result in diminishing returns for each generation.

I expect there is a mathematical relationship to be found here, but I am not sure what yet. I also expect that for any given dataset that proportionaly numbers will be found. At some point, depending on the size of your dataset, may not be worth the extra work to try to find a pattern and just working on a brute force technique to process.

Possible Rule: Law of Diminishing Returns - the % of multiples removed as compared to the previous generation is roughly equal to the delta removed by the previous generation + 10%. This average increase may also be increase be a few % points too assuming that the delta from 2-3-5 has any real bearing on further results.

Square Prime

Possible Rule: It appears that once you get done removing a prime's multiples from a dataset (and all previous primes' multiples) then all numbers up until the square of the next prime will be prime numbers.

Closing

Working through the primes through 7 gives us to a total reduction of 76.5 of the dataset up to 1000. =O My primes are correct up the the next square which is for 11 @ 121. I would need to process that and then prime numbers list would be correct until the next square - 13 @ 169. I am expecting that this will be the patttern if we were to continue. Since I need a really large dataset to see the pattern for 11 I think I will stop now. Hopefully, someone will find this useful. =) I am sure there are much more complex mathematical relations here than I able to discern with my limited knowledge (not a mathemetcian), time, tools, and experience.

Possible Rules

Here is the collection of the potential observed rules from above so we can have them all in one place:

Data Analysis

The info for primes 11, 13, 17 are predicted based on my limited observational projections here:

prime # of non-repeating
datapoints reduced
% of total datapoints
reduced
% delta in reduction
from previous gen
dataset required to see pattern extra programmatic
steps
prime # of non-repeating
datapoints reduced
% of total datapoints
reduced
% delta in reduction
from previous gen
  extra programmatic
steps
1 -- -- -- -- --
2 499 49.90% 49.90% -- 0
3 166 16.60% 33.27% 30 1
5 66 6.60% 39.76% 75 2
7 33 3.30% 50.00% 500 8
11 19.8 1.98% 60.00% 5000 512 or 128 or 32
13 13.86 1.39% 70.00% 50000 134,217,728 or 524,288.00 or 128
17 11.088 1.11% 80.00% 500000 2.41785E+24 or 36,028,797,018,964,000 or  512

Prime Multiple Processing Work

What is listed below are the numbers which are cumulatively removed from the dataset for each round prime multiple removal.

Removing Numbers Divisible by 2

4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 192 194 196 198 200 202 204 206 208 210 212 214 216 218 220 222 224 226 228 230 232 234 236 238 240 242 244 246 248 250 252 254 256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286 288 290 292 294 296 298 300 302 304 306 308 310 312 314 316 318 320 322 324 326 328 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358 360 362 364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396 398 400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476 478 480 482 484 486 488 490 492 494 496 498 500 502 504 506 508 510 512 514 516 518 520 522 524 526 528 530 532 534 536 538 540 542 544 546 548 550 552 554 556 558 560 562 564 566 568 570 572 574 576 578 580 582 584 586 588 590 592 594 596 598 600 602 604 606 608 610 612 614 616 618 620 622 624 626 628 630 632 634 636 638 640 642 644 646 648 650 652 654 656 658 660 662 664 666 668 670 672 674 676 678 680 682 684 686 688 690 692 694 696 698 700 702 704 706 708 710 712 714 716 718 720 722 724 726 728 730 732 734 736 738 740 742 744 746 748 750 752 754 756 758 760 762 764 766 768 770 772 774 776 778 780 782 784 786 788 790 792 794 796 798 800 802 804 806 808 810 812 814 816 818 820 822 824 826 828 830 832 834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868 870 872 874 876 878 880 882 884 886 888 890 892 894 896 898 900 902 904 906 908 910 912 914 916 918 920 922 924 926 928 930 932 934 936 938 940 942 944 946 948 950 952 954 956 958 960 962 964 966 968 970 972 974 976 978 980 982 984 986 988 990 992 994 996 998 1000

Removing Remaining Numbers Divisible by 3

9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219 225 231 237 243 249 255 261 267 273 279 285 291 297 303 309 315 321 327 333 339 345 351 357 363 369 375 381 387 393 399 405 411 417 423 429 435 441 447 453 459 465 471 477 483 489 495 501 507 513 519 525 531 537 543 549 555 561 567 573 579 585 591 597 603 609 615 621 627 633 639 645 651 657 663 669 675 681 687 693 699 705 711 717 723 729 735 741 747 753 759 765 771 777 783 789 795 801 807 813 819 825 831 837 843 849 855 861 867 873 879 885 891 897 903 909 915 921 927 933 939 945 951 957 963 969 975 981 987 993 999

Removing Remaining Numbers Divisible by 5

25 35 55 65 85 95 115 125 145 155 175 185 205 215 235 245 265 275 295 305 325 335 355 365 385 395 415 425 445 455 475 485 505 515 535 545 565 575 595 605 625 635 655 665 685 695 715 725 745 755 775 785 805 815 835 845 865 875 895 905 925 935 955 965 985 995

Removing Remaining Numbers Divisible by 7

49 77 91 119 133 161 203 217 259 287 301 329 343 371 413 427 469 497 511 539 553 581 623 637 679 707 721 749 763 791 833 847 889 917 931 959 973 undefined

Removing Remaining Numbers Divisible by 11

not completed, but you can see the work in progress

121 0 143 0 0 0 187 0 209 0 0 0 253 0 0 0 0 0 319 0 341 0 0 0 0 0 407 0 0 0 451 0 473 0 0 0 517 0 0 0 0 0 583 0 0 0 0 0 649 0 671 0 0 0 0 0 737 0 0 0 781 0 803 0 0 0 0 0 869 0 0 0 913 0 0 0 0 0 979 0 0

Prime Verification

As mentioned above, this list is correct up until the square of the next prime - 11 @ 121. You can see a list of primes up to 1000 here to compare.

1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 169, 173, 179, 181, 191, 193, 197, 199, 211, 221, 223, 227, 229, 233, 239, 241, 247, 251, 257, 263, 269, 271, 277, 281, 283, 289, 293, 299, 307, 311, 313, 317, 323, 331, 337, 347, 349, 353, 359, 361, 367, 373, 377, 379, 383, 389, 391, 397, 401, 403, 409, 419, 421, 431, 433, 437, 439, 443, 449, 457, 461, 463, 467, 479, 481, 487, 491, 493, 499, 503, 509, 521, 523, 527, 529, 533, 541, 547, 551, 557, 559, 563, 569, 571, 577, 587, 589, 593, 599, 601, 607, 611, 613, 617, 619, 629, 631, 641, 643, 647, 653, 659, 661, 667, 673, 677, 683, 689, 691, 697, 701, 703, 709, 713, 719, 727, 731, 733, 739, 743, 751, 757, 761, 767, 769, 773, 779, 787, 793, 797, 799, 809, 811, 817, 821, 823, 827, 829, 839, 841, 851, 853, 857, 859, 863, 871, 877, 881, 883, 887, 893, 899, 901, 907, 911, 919, 923, 929, 937, 941, 943, 947, 949, 953, 961, 967, 971, 977, 983, 989, 991, 997,