hacker.org Forum Index
RegisterSearchFAQMemberlistUsergroupsLog in
Code optimization
Goto page Previous  1, 2, 3, 4  Next
 
Reply to topic    hacker.org Forum Index » Crossflip View previous topic
View next topic
Code optimization
Author Message
Zeta



Joined: 16 Apr 2009
Posts: 62

Post Reply with quote
gcc is conservative about the used instruction set. When compiling for 32 bit you get x386 compatible code. You have to provide the right switches:

-O3 -march=native -ftree-vectorizer-verbose=2

-O3 enables among other things the vectorizer (-ftree-vectorize)
-march=native enables every instruction set extension available on your computer
-ftree-vectorizer-verbose=2 gives some information how successful the vectorizer was
Wed Aug 11, 2010 2:02 pm View user's profile Send private message
MyNameIsAlreadyTaken



Joined: 17 Oct 2010
Posts: 31
Location: Germany

Post Reply with quote
How fast are your algorithms?


I got to lvl 489 by now and it took me approx 10 minutes to calculate its solution. Assuming that my algorithm runs in O(n^3) and that 643 has two times the size of 489, it is going to take at least 1 1/2 hours and 1.2GB of RAM to complete just the last one :shock:


Is it possible to optimize my code a lot more or do I just have to run my script everyday for two weeks?
Mon Nov 15, 2010 10:14 pm View user's profile Send private message
tails



Joined: 10 Jun 2008
Posts: 191
Location: Tokyo

Post Reply with quote
Hi, it took 18 seconds for me to solve the last level.
I think we can do much quicker if we use a better algorithm and a better computer.
Tue Nov 16, 2010 9:10 am View user's profile Send private message
MyNameIsAlreadyTaken



Joined: 17 Oct 2010
Posts: 31
Location: Germany

Post Reply with quote
18 seconds? Holy shit oO


I used Gauss on an average Computer [dualcore 2,2GHz, 64 Bit Ubuntu 10.10] and may be able to double the speed if I use threads [gonna try out that one next]. I guess you used a way better algorithm?


Edit: Ok, threading seems to slow down my program :[ Looks like I can't optimize anything without having to use a completely other approach.


Last edited by MyNameIsAlreadyTaken on Tue Nov 16, 2010 3:48 pm; edited 1 time in total
Tue Nov 16, 2010 3:00 pm View user's profile Send private message
tails



Joined: 10 Jun 2008
Posts: 191
Location: Tokyo

Post Reply with quote
Threading sounds good. I couldn't do that because I used a laptop with Celeron M (single core, single thread).
Also, I didn't even use SSE (just because I was lazy :-)).
So, I'm pretty sure we can do much quicker.
Tue Nov 16, 2010 3:48 pm View user's profile Send private message
Zeta



Joined: 16 Apr 2009
Posts: 62

Post Reply with quote
Threading is not easy if you want to gain speed. Multi-core processors have a 1st level cache for every core and and a shared 2nd level cache. Several copies of a cache line may exist in different 1st level caches. In this case a coherence protocol ensures some form of consistency between these copies. The necessary memory operations may slow down your program considerably.

In your case the data should be aligned to the cache boundary and different threads should not access data in the same cache line: Each thread should work on a vertical slice of the matrix.

I believe it's possible to get the time for the last level down to 0.1s with a optimized (multithreaded, SIMD) c implementation. But always optimize the algorithm first and then the implementation Wink.
Wed Nov 17, 2010 3:06 am View user's profile Send private message
helly0d



Joined: 13 Feb 2009
Posts: 27
Location: Iasi Romania

Post Reply with quote
How's your algorithm run in O(n^3) mine is in o(n^6) or by n you mean width*length ? I too use Gaussian Elimination with some freak combination of C with ASM and at 224 is taking about 1 min to solve but at 460 it takes 1h and 10 min. I still tried to play with flags. How could you get 128 bits in one instruction ? I am able just 32 bits. Some hints of optimization from this point further ?
Sun May 08, 2011 2:39 am View user's profile Send private message
MyNameIsAlreadyTaken



Joined: 17 Oct 2010
Posts: 31
Location: Germany

Post Reply with quote
Yeah, by n i meant length * height.

Solving #224 takes 4.78s on my computer [not counting the time to get the level and sending the solution]. In every instruction I manipulate 64 bits at the same time by using unsigned long long as type for saving the data [which improves my code a lot since I use a 64 bit OS].

My Code is basically an implementation of Gauss specialized to work mod 2. I can't tell why it runs faster than yours, the only compiler-optimization I used is -O3 [and I didn't use any ASM].
Sat May 21, 2011 4:03 pm View user's profile Send private message
bsguedes



Joined: 24 Feb 2009
Posts: 103
Location: Porto Alegre

Post Reply with quote
In my computer it takes 15 minutes to solve level 220.

I'm using Java, and I'm solving the linear system using Gauss' method. My solution is very raw, I've made no simplifications in my solver. I know that we are dealing with a sparse matrix, but I don't know how can I use this to improve my solver time. (yeah I'm not a mathematician lol)
Sat Jul 02, 2011 10:29 am View user's profile Send private message
Adzeye



Joined: 04 Apr 2009
Posts: 9
Location: Canada

Post Reply with quote
Gauss is a good start, of course, but what you have to remember is to apply it to the context of this puzzle. We have huge matrices of 0's and 1's, but only a tiny fraction of it is actually a '1', which is what you are interested in. Most of the time you spend iterating through the matrix does not advance you towards a solution. So ask yourself, how can you make it such that you only read/write your matrix when it is actually serves a purpose?

This puzzle has a ton of small optimization you can make, I started with about 15 minutes for a level around 220 and it took me 15 minutes for lvl 643. Benchmarking the different steps your solver takes is quite important to know what takes time. I still have a few optimizations that could speed the thing up by about 4x but my solver already finished the game before I could implement them Razz
Sat Jul 02, 2011 3:34 pm View user's profile Send private message
bsguedes



Joined: 24 Feb 2009
Posts: 103
Location: Porto Alegre

Post Reply with quote
Thanks Adzeye !

I've thought about what you've said and I tried to change the data structure of my matrix. I was using a bi-dimensional array, full of zeros and I'm trying now an implementation using linked lists. I thought the second one would be better, but it fails horribly.

I think I'm going to follow your advice about benchmarking the different steps of my solver... that should give me a hint where I need to improve my algorithm. Thanks again!

PS.: level 236 : 56 minutes. And level 237 destroys may alocated memory. Sweet.
Sat Jul 02, 2011 5:21 pm View user's profile Send private message
Adzeye



Joined: 04 Apr 2009
Posts: 9
Location: Canada

Post Reply with quote
I think that bitfield representation and operations is still the fastest option. The idea is more that, among your thousands of "equations", both during the elimination and substitution steps of gaussian, only a few will contain the parameter/variable you are trying to do something with.
Sat Jul 02, 2011 5:52 pm View user's profile Send private message
bsguedes



Joined: 24 Feb 2009
Posts: 103
Location: Porto Alegre

Post Reply with quote
I have innocently thought that implementing the Gauss method with all the adaptations needed to this problem would be enough to achieve level 643 (similarly to Oneofus, for example). However, it seems to be not so simple.

I've found the slowest part of my code, which is the simplest portion of it ... I cannot see how I can reduce the complexity of this little set of lines still solving the puzzle... I guess I should think of a completely different approach Sad
Mon Jul 04, 2011 1:53 pm View user's profile Send private message
compudemon



Joined: 13 Aug 2011
Posts: 33

Post Reply with quote
i switched over my code to use BigInteger seems to have cut my memory use by a factor of 16 for now but it should improve as the memory needed for the matrix itself grows compared to other variables. basically i am just converting the matrix to an augmented identity matrix. is to nice that binary ops are mathematically valid. so long as you obey the commutative rules.
Sun Aug 28, 2011 1:26 am View user's profile Send private message
compudemon



Joined: 13 Aug 2011
Posts: 33

Post Reply with quote
Java is slow
C++ is full of memory pointer hassles

so i converted all my code to c++
created my own classes for bigints, stringbuffer, stringbufferarray
as any true hacker might do
since i have no idea how to use sockets i got libcurl

bigints is inlined and optimized with no bounds checking
basically an array of _int64 arrays

setbit is data[index >> 6] |= 1LL<<(index & 63);
clearbit is data[index >> 6] &= ~(1LL<<(index & 63));

so far its running 10~5 x faster then java and it seems to have a memory leak somewhere
not very impressive speed increase considering all the inline-ing and optimizing
i may just stay with java and its very impressive jit compiler for other challenges
c++ sucks with all the pointers
Sat Sep 03, 2011 1:58 am View user's profile Send private message
Display posts from previous:    
Reply to topic    hacker.org Forum Index » Crossflip All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.