ProjectEuler: 001

Project Euler

I've decided to take on Project Euler, as a way for me to improve my programming skills, force me to grow math skills, and train myself to think more about algorithms.

My purpose is to push myself, not take the "yeah I can do that"  specific approach here. I want to solve the general problem, not the specific. Solving a fizz buzz is easy, but solving the general problem effectively requires a bit more thinking.

If you're reading this and interested in writing code, I suggest you try the problems before reading my notes. Problem solving is not a spectator sport.

If I wanted to do this the easy way in which I learnt nothing I would simply paste the first dozen or so answers into wolfram alpha, because it calculates the answer for us.

Problem 001
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
This seems to be like simple "fizz buzz" problem, but I'm curious if I can solve this with an equation rather than iterate through a loop. The loop to me, would be taking the easy way out. That's because I'm better at writing code to solve problems than I am at writing math to solve problems.

My old way

To do so, I would simply create one loop that adds to a sum on modulo 3 and modulo 5 (but not twice for both). I like JavaScript, especially because you can show it online.

This took me 5 minutes to write out, and you can see the live code at

Solving every set of multiple numbers 
(between 1 and any number greater than the lowest multiple)

Like I said, that's my easy way out and I've learnt nothing. I could do some really neat code re-use patterns here to make this capable of any combination of multiples.

Pushing myself to think a little

So how else might I try to solve this? Thinking back to high-school math I remember sums of arithmetic and geometric series. Arithmetic series are useful for things like "Add/multiply all the numbers in 1..n together". This may be useful, because 3, 6, 9, 12, 15... and 5, 10, 15, 20... are both arithmetic sequences that progress by 3 and 5 respectively.

Now, we can calculate them separately, but I'm not sure how we might count them so we don't have things like the number 15 counted twice. Generally, the equation for a sum of the sequence is as follows:

Basically, you add the first and last term in a series, and multiply by half the number of elements. So, "one plus one hundred times fifty (half of one hundred)" ends up being the sum of all numbers between one and one hundred. That's quite a bit easier than writing out 100 numbers and adding them.

We could do this by adding the series for 3, and series for 5, but we're also adding the over-lapping numbers that are multiples of 3 and 5 twice. So how can we subtract that? What numbers are common to 3 and 5? Well, the common multiple to 3 and 5 is always a multiple of 15, so we can likely add 3s and 5s, then subtract the 15 to compensate.

Now here we are, using no loops or IF statements, simple functions and mathematics to figure out the sum.

How do I solve all of the combinations like I did previously?

Well, in the above example, 15 is the least common multiple of both 3 and 5 (multiplied, they are 15), so when we sum the small sets and negate the over-lapping numbers once, we find our value.

Thinking back to set theory and Venn diagrams, I imagine if I wanted to make that function Expand into any set of multiples, we would need to sum all the series, and subtract a sum of all the combinations of series. So, in other words, if we were adding the multiples of a, b, and c we would add the series values of ab, and c but have to subtract series sums of abac, bc, and abc.

I understand how to do that using two for loops, but again, that's the easy way out. We aren't going to make the computer work harder because I'm ignorant of combinations and set theory. So, lets bring search engines up to maximum thrust.

I found two answers on my favorite Q&A community:
Most of the answers on the programmer site seem to be using loops or recursive solutions. I decided to ask my own question and I got a really nice answer back. In essence, these all can be thought of as binary lists of values, and the only thing I'm doing is subtracting parts of the total set to find another part.

This really made me think of something interesting.

Instead of adding the first set and subtracting the second, I should sum all the sets of this for 2^k, where k is the number of multiples, and each combination of k would represent it's own subsection of these sets as if they were on a Venn Diagram.

Lets say I have a set of multiples {3, 5, 7} for values 1..999 inclusive:
(generated by Google charts API)

Lets abstract the numbers out, so instead of {3, 5, 7} the list is {A, B, C} I would have my binary list such as:

001 = C
010 = B
011 = BC
100 = A
101 = AC
110 = AB
111 = ABC

The individual circles are the single values. Intersections of circles are intersections of the values. The center of all three circles is ABC.

Using this concept, I could simply multiply each value out, thus getting the multiple, and then I could do any calculation on any Venn diagram I want, and even pick and chose specific to each intersection.

I just figured out something really cool there. Think about it, rather than checking for every single number in a range, against every single divisor, all I'm doing is seeing how big a series is without counting them, and summing them together. So lets put it into code.

What's really exciting here is I have a good reason to think about bitwise operators for the first time since I wrote x86 assembly code. This is exciting.

I've written a bit of code here that shows how we can generate these combinations, I'd like to see if I can improve on it somehow as it seems the second for-loop could be removed. This will take ["a", "b",  "c" , "d"] and return ["a", "b", "ab", "c", "ac", "bc", "abc", "d", "ad", "bd", "abd", "cd", "acd", "bcd", "abcd"] (all combinations of ABCD). The problem is now how to combine these values correctly.

The inclusion-exclusion principle, as it turns out, perfectly describes how we can find the value for the union of all the sets, but discount the duplicate data correctly. The equation on the wikipedia page is somewhat complicated for someone who is not literate with all the symbols, so I will attempt to re-write and explain what's going on here.

On Wikipedia, there is a "general" formula that describes how to find the value for any number of sets. There's a symbol that is shaped like a "cup" which represents a "union." The equation states that for a union of n sets starting with the first, we can find this by using this compact equation:

That equation states that the union of n sets is equal to the sum of all combinations, where we add or negate based on the cardinality of a set. That sounds complicated so lets expand what that equation would look like if we were going to write it out the long way.

Okay, it's still a bit confusing. So lets look. That first line with the cup is the same as above, and that next line says we're going to add up the sum of the intersection (the upside down cup is a cap). The sum of all the intersections would mean (if we had 4 items, A, B, C, D) adding together A, B and C and D. The second line would mean we subtract the sum of AB, AC, AD, BC, BD, and CD. The third line means we add the sum of ABC, ABD, and BCD, the fourth line means we subtract the value of ABCD. The pattern here is we alternate the operation (adding or subtracting a sum) based on the cardinality (how many items are in a set) of something. Even cardinalities are subtracted, odd cardinalities are added. We sum together all combinations of that cardinality.

So we need to express this process with some code. I could iterate over each set, but I know another trick I could use. Since we already know the cardinality of any subset I'm not sure how to express that this works mathematically, so I posed a question to the Math Stack Exchange site. I can show it works, and if you look at it, this makes sense. While writing the equation's general form is presently beyond me, writing the code is dead-simple.

So all we do is add or subtract a value based on the count of "on bits" being even or odd.

Lets put that together and generate all combinations of all multiples and negate the most common divisors and the least common divisors, and it should look something like this.

Better, smarter, faster, stronger?

Who's faster?

For the small data-set with the range of 1..999, and two sets of multiples, using a brute-force method is roughly 60% slower. If you notice, using the brute-force method on a huge data set doesn't even register (actual ops per second came in at 0.01, meaning it took over a minute and a half just to compute), yet using the same data set on the math-oriented approach, we see nearly 3248 operations per second with the same performance regardless of the series size, entirely dependent on the number of multiples. FireFox must be doing something interesting with their JavaScript engine optimization.

What did I learn?
  • Speed is way better when you don't just brute-force it, unless a compiler does something funky.
  • Brute force ended up crashing on larger ranges.
  • Math-based approach.
  • A bit more about set theory, a cool way to use bitwise to generate combinations
  • Some cool bitwise techniques.
  • Execution time on math only increases when I add more multiples, it can solve 3's and 5's faster, for any data set imaginable -- yet, the brute force cannot do this.
  • Inclusion-exclusion principle
  • Basic LaTeX for math notation
  • How to read some more math
What did I learn that I need to learn?
  • Set theory
  • Euclidean theorem
  • More bitwise stuff
  • More LaTeX
  • Better understanding of math.
  • Why my getCombinations function has some quirks in it, perhaps a recursive approach would be better.
  • Maybe I should learn how to properly translate my algos into O-notation so I can sound fancy, might also help me find bottlenecks in my code.
I'm excited to look at this post in a year and find all the improvements I could have made, or other approaches I could have taken. For anyone who looks at their year-old code, you will understand.

I've also created a backup of all the code on github if for some reason jsFiddle dumps their database in the future.

Labels: ,

Full list of archived posts