A friend of mine showed me this site. I tried several puzzles manually and looked at modulo first from programmer's point of view but didn't solve much.

Then crossflip catched my attention. It looked quite obvious and simple XORing.

My language of choice is pure C. Not using it for work, but i do enjoy implementing complex data structures in it.

Just to validate i can parse tasks and submit solutions i implemented a simple bruteforce first. E.g. pack the whole solution into 'unsigned long long int' and just increment it starting from 0 until a valid state found. Surprisingly this worked fine until level 74! Quick calculation showed it will take at most 200 days to check all the variants at 74.

At which point i started looking for optimizations. And failed to find them. Lots of microoptimisations of course, but they won't help reducing time significantly.

In a search for new ideas i tried google and found a hardware game called "Lights Out". This was it! Lots of papers on solving it using linear algebra. Quite popular among mathematicians. And a beautiful solution too.

I implemented Gauss. Any solution would do, hence unknown variables i chose just to replace them with 0s.

I used 'int' to hold cell value and this worked fine until field sizes grew larger than 32 bit memory. Okay, switched to byte here. As expected performance dropped by 30%.

Somewhere in the middle i decided to try parallel runs. This makes sense as adding rows could be easily parallelised.

Well, this didn't work of course. Parallel version was 6 times slower. It makes sense to run only computationally intensive operations in parallel. XORing lines of bytes relies on memory throughput, not CPU performance. So i abandoned this version.

Then at some point i had to switch to 64 bit because again memory requirements grew greater than 32 bit process can handle, even for a byte.

As levels grew so performance dropped. I started looking at cleverer algorithms than Gaussian Elimination, like LU decomposition based ones and all that clever stuff. Matrix is symmetric and extremely sparse. There HAVE to be some better algorithms.

While i was trying to understand some of the papers the user Ignorance outrun me in solving levels. Argh! I was between 400 and 500 at the time. I then decided to pack 8 variables into a single byte still using Gauss. After all, all is required is XOR on forward step and XOR and AND on backward step of Gauss.

Packing variables sped up my code by the factor of 30! Wow! this was quite unexpected to me.

I decided to stay with Gauss for now and just optimize my solution. Rows were aligned on 16 bytes, padding added, SSE instructions engaged via _mm_xor_si128() and voila! Speed up by 10.

This was the final code to reach level 643.

## My journey to 643

congrats on beating the game! I always like to hear about what problems and solutions others have.

But since this forum is open for everyone, I fear your post is taking away too much of the challenge. Note that only about 70 people completed crossflip so far.

I think it would be great if we could have something similar to the "challenges solved"-section for the games.

@adum: what do you think about that?

The speed factor you achieved with the data type you used is very promising, but i think better algorithm can speed things up much more

Would be interesting to compare our algorithms on level 1000.