System.out.format("M %08X %08X %08X %08X\n", buf[0], buf[1], buf[2], buf[3]);
for(int x = 3; x > -1; x--){
int y = buf[x] * 5;
System.out.println(buf[x] * 5);
buf[x] = decodedByte + y;
decodedByte = y;
}
System.out.format("M %08X %08X %08X %08X\n", buf[0], buf[1], buf[2], buf[3]);
decodedByte starts at 2;
M 00000000 00000000 00000000 00000001
5
0
0
0
M 00000000 00000000 00000005 00000007
WHY THE HELL is the 3rd int being set to 5?
"y" is 5 on the first iteration, so decodedByte is 5 on the second iteration. You add 5 to 0, you should get 5, no?
Ya, I noticed that, i'm trying to figure out what makes it diffrent form this:
while(rounds-- > 0){
long param1 = bufA[posA--] & 0x00000000FFFFFFFFl;
long param2 = mulx & 0x00000000FFFFFFFFl;
long edxeax = param1 * param2;
bufB[posB--] = decodedByte + (int)edxeax;
System.out.println(decodedByte + " " + edxeax);
decodedByte = (int)(edxeax >> 32);
}
m 00000000 00000000 00000000 00000001
2 5
0 0
0 0
0 0
m 00000000 00000000 00000000 00000007
God damnet
32-bit ints != 64-bit ints
lol, yuck. I might try to stare at it later. I'm going to crank out some more crap for an assignment I have due in a few days and hit the sack for now, though.
Quote from: Sidoh on November 02, 2008, 01:55:46 AM
lol, yuck. I might try to stare at it later. I'm going to crank out some more crap [...] and hit the sack for now, though.
vivid imagery
btw, "int" and "long" and stuff aren't portable.
Use stdint.h and then use int32_t, uint32_t, int64_t, uint64_t, etc.
Nevermind, thought you were using C. :P
Quote from: HdxBmx27 on November 02, 2008, 01:52:54 AM
God damnet
32-bit ints != 64-bit ints
I was using a 32-bit in when I should of used a 64-bit in.
I fixed that. Now I'm on the final loop of WC3 decoding.
uint32_t eax, ecx, edx;
uint32_t location;
e = 0;
for(c = 0; c < 120; c++){
eax = c & 0x1F;
ecx = e & 0x1F;
edx = 3 - (edi >> 5);
location = 12 - ((e >> 5) << 2);
ebp = values[location / 4];
ebp = (ebp & (1 << ecx)) >> ecx;
values[edx] = ((ebp & 1) << eax) | (~(1 << eax) & values[edx]);
printf("%08X %08X %08X %08X\n", values[0], values[1], values[2], values[3]);
e = (e + 11) % 120;
}
Not working correctly. I'll look at it more later. But now, i'm at the mall hoping the owner of the WiFi i'm on [SSID: Uv been hAcked] will try and hack/talk to me
#ifndef CDKEYS_C
#define CDKEYS_C
#define hexvalue(n) (((n) & 0x0F) + (((n) & 0x0F) < 10 ? 0x30 : 0x37))
#define numvalue(n) (toupper(n) - (isdigit(n) ? 0x30 : 0x37))
#include "cdkeys.h"
uint8_t *orig_key = "246789BCDEFGHJKMNPRTVWXZ";
uint8_t *war3_key = "246789BCDEFGHJKMNPRTVWXYZ";
uint8_t star_seq[12] = {6, 0, 2, 9, 3, 11, 1, 7, 5, 4, 10, 8};
uint8_t wcd2_seq[16] = {5, 6, 0, 1, 2, 3, 4, 9, 1, 11, 12, 13, 14, 15, 7, 8};
uint8_t TranslateTable[] = {
0x09, 0x04, 0x07, 0x0F, 0x0D, 0x0A, 0x03, 0x0B, 0x01, 0x02, 0x0C, 0x08,
0x06, 0x0E, 0x05, 0x00, 0x09, 0x0B, 0x05, 0x04, 0x08, 0x0F, 0x01, 0x0E,
0x07, 0x00, 0x03, 0x02, 0x0A, 0x06, 0x0D, 0x0C, 0x0C, 0x0E, 0x01, 0x04,
0x09, 0x0F, 0x0A, 0x0B, 0x0D, 0x06, 0x00, 0x08, 0x07, 0x02, 0x05, 0x03,
0x0B, 0x02, 0x05, 0x0E, 0x0D, 0x03, 0x09, 0x00, 0x01, 0x0F, 0x07, 0x0C,
0x0A, 0x06, 0x04, 0x08, 0x06, 0x02, 0x04, 0x05, 0x0B, 0x08, 0x0C, 0x0E,
0x0D, 0x0F, 0x07, 0x01, 0x0A, 0x00, 0x03, 0x09, 0x05, 0x04, 0x0E, 0x0C,
0x07, 0x06, 0x0D, 0x0A, 0x0F, 0x02, 0x09, 0x01, 0x00, 0x0B, 0x08, 0x03,
0x0C, 0x07, 0x08, 0x0F, 0x0B, 0x00, 0x05, 0x09, 0x0D, 0x0A, 0x06, 0x0E,
0x02, 0x04, 0x03, 0x01, 0x03, 0x0A, 0x0E, 0x08, 0x01, 0x0B, 0x05, 0x04,
0x02, 0x0F, 0x0D, 0x0C, 0x06, 0x07, 0x09, 0x00, 0x0C, 0x0D, 0x01, 0x0F,
0x08, 0x0E, 0x05, 0x0B, 0x03, 0x0A, 0x09, 0x00, 0x07, 0x02, 0x04, 0x06,
0x0D, 0x0A, 0x07, 0x0E, 0x01, 0x06, 0x0B, 0x08, 0x0F, 0x0C, 0x05, 0x02,
0x03, 0x00, 0x04, 0x09, 0x03, 0x0E, 0x07, 0x05, 0x0B, 0x0F, 0x08, 0x0C,
0x01, 0x0A, 0x04, 0x0D, 0x00, 0x06, 0x09, 0x02, 0x0B, 0x06, 0x09, 0x04,
0x01, 0x08, 0x0A, 0x0D, 0x07, 0x0E, 0x00, 0x0C, 0x0F, 0x02, 0x03, 0x05,
0x0C, 0x07, 0x08, 0x0D, 0x03, 0x0B, 0x00, 0x0E, 0x06, 0x0F, 0x09, 0x04,
0x0A, 0x01, 0x05, 0x02, 0x0C, 0x06, 0x0D, 0x09, 0x0B, 0x00, 0x01, 0x02,
0x0F, 0x07, 0x03, 0x04, 0x0A, 0x0E, 0x08, 0x05, 0x03, 0x06, 0x01, 0x05,
0x0B, 0x0C, 0x08, 0x00, 0x0F, 0x0E, 0x09, 0x04, 0x07, 0x0A, 0x0D, 0x02,
0x0A, 0x07, 0x0B, 0x0F, 0x02, 0x08, 0x00, 0x0D, 0x0E, 0x0C, 0x01, 0x06,
0x09, 0x03, 0x05, 0x04, 0x0A, 0x0B, 0x0D, 0x04, 0x03, 0x08, 0x05, 0x09,
0x01, 0x00, 0x0F, 0x0C, 0x07, 0x0E, 0x02, 0x06, 0x0B, 0x04, 0x0D, 0x0F,
0x01, 0x06, 0x03, 0x0E, 0x07, 0x0A, 0x0C, 0x08, 0x09, 0x02, 0x05, 0x00,
0x09, 0x06, 0x07, 0x00, 0x01, 0x0A, 0x0D, 0x02, 0x03, 0x0E, 0x0F, 0x0C,
0x05, 0x0B, 0x04, 0x08, 0x0D, 0x0E, 0x05, 0x06, 0x01, 0x09, 0x08, 0x0C,
0x02, 0x0F, 0x03, 0x07, 0x0B, 0x04, 0x00, 0x0A, 0x09, 0x0F, 0x04, 0x00,
0x01, 0x06, 0x0A, 0x0E, 0x02, 0x03, 0x07, 0x0D, 0x05, 0x0B, 0x08, 0x0C,
0x03, 0x0E, 0x01, 0x0A, 0x02, 0x0C, 0x08, 0x04, 0x0B, 0x07, 0x0D, 0x00,
0x0F, 0x06, 0x09, 0x05, 0x07, 0x02, 0x0C, 0x06, 0x0A, 0x08, 0x0B, 0x00,
0x0F, 0x04, 0x03, 0x0E, 0x09, 0x01, 0x0D, 0x05, 0x0C, 0x04, 0x05, 0x09,
0x0A, 0x02, 0x08, 0x0D, 0x03, 0x0F, 0x01, 0x0E, 0x06, 0x07, 0x0B, 0x00,
0x0A, 0x08, 0x0E, 0x0D, 0x09, 0x0F, 0x03, 0x00, 0x04, 0x06, 0x01, 0x0C,
0x07, 0x0B, 0x02, 0x05, 0x03, 0x0C, 0x04, 0x0A, 0x02, 0x0F, 0x0D, 0x0E,
0x07, 0x00, 0x05, 0x08, 0x01, 0x06, 0x0B, 0x09, 0x0A, 0x0C, 0x01, 0x00,
0x09, 0x0E, 0x0D, 0x0B, 0x03, 0x07, 0x0F, 0x08, 0x05, 0x02, 0x04, 0x06,
0x0E, 0x0A, 0x01, 0x08, 0x07, 0x06, 0x05, 0x0C, 0x02, 0x0F, 0x00, 0x0D,
0x03, 0x0B, 0x04, 0x09, 0x03, 0x08, 0x0E, 0x00, 0x07, 0x09, 0x0F, 0x0C,
0x01, 0x06, 0x0D, 0x02, 0x05, 0x0A, 0x0B, 0x04, 0x03, 0x0A, 0x0C, 0x04,
0x0D, 0x0B, 0x09, 0x0E, 0x0F, 0x06, 0x01, 0x07, 0x02, 0x00, 0x05, 0x08
};
uint32_t cdkeys_instr(uint8_t *string, uint8_t find){
uint8_t *location = strchr(string, toupper(find));
return (location == NULL ? -1 : location - string);
}
/*******************************************************
*Converts and ASCII strign of number into a long.
*Used in Starcraft Key decoding.
*******************************************************/
uint32_t cdkeys_strtol(uint8_t *number, uint32_t length, uint32_t base){
uint8_t *temp = safe_malloc(length);
memcpy(temp, number, length);
temp[length] = 0;
uint32_t ret = strtol(temp, NULL, base);
free(temp);
return ret;
}
BOOLEAN cdkeys_verifystarcraft(uint8_t *key){
uint32_t sum = 3;
uint32_t i = 0;
if(strlen(key) != 13) return FALSE;
for(i = 0; i < 12; i++)
sum += (key[i] - '0') ^ (sum << 1);
return (sum % 10 == key[12] - '0');
}
/*****************************************************************************
*Reterives the Public, Private, and Product values from an 'old style'
*Starcraft cdkey, AE: the 13-digit numerical key.
*****************************************************************************/
BOOLEAN cdkeys_getstarcraftvalues(uint8_t *key, uint32_t *public_key, uint32_t *private_key, uint32_t *product){
if(cdkeys_verifystarcraft(key) == FALSE) return FALSE;
uint32_t hashkey = 0x13AC9741;
uint8_t sequance[12] = {6, 0, 2, 9, 3, 11, 1, 7, 5, 4, 10, 8};
uint8_t *temp_key = safe_malloc(13);
int32_t c;
for(c = 11; c >= 0; c--){
if(key[sequance[c]] <= '7'){
temp_key[c] = key[sequance[c]] ^ (uint8_t)(hashkey & 7);
hashkey >>= 3;
}else
temp_key[c] = key[sequance[c]] ^ (uint8_t)(c & 1);
}
*product = cdkeys_strtol(temp_key, 2, 10);
*public_key = cdkeys_strtol(temp_key+2, 7, 10);
*private_key = cdkeys_strtol(temp_key+9, 3, 10);
free(temp_key);
return (product != 0 && public_key != 0 && private_key != 0 ? TRUE : FALSE);
}
BOOLEAN cdkeys_verifydiablo(uint8_t *key, uint8_t **decoded){
if(strlen(key) != 16) return FALSE;
uint8_t *temp_key = safe_malloc(16);
uint8_t i;
uint32_t n = 0;
uint32_t checksum = 0;
uint32_t r = 1;
for(i = 0; i < 16; i +=2){
n = (cdkeys_instr(orig_key, key[i + 1]) + (cdkeys_instr(orig_key, key[i]) * 24));
if(n >= 0x100){
n -= 0x100;
checksum |= r;
}
r <<= 1;
temp_key[i] = hexvalue(n >> 4);
temp_key[i + 1] = hexvalue(n);
}
n = 3;
for(i = 0; i < 16; i++)
n += numvalue(temp_key[i]) ^ n * 2;
if((n & 0xFF) != checksum){
free(temp_key);
return FALSE;
}
if(decoded != NULL) *decoded = temp_key;
else free(temp_key);
return TRUE;
}
BOOLEAN cdkeys_getdiablovalues(uint8_t *key, uint32_t *public_key, uint32_t *private_key, uint32_t *product){
uint8_t *temp_key = safe_malloc(16);
if(cdkeys_verifydiablo(key, &temp_key) == FALSE){
free(temp_key);
return FALSE;
}
uint32_t hashkey = 0x13AC9741;
uint16_t i;
uint8_t n;
uint8_t c;
for(i = 15; i < 16; i--){
n = (i > 8 ? i - 9 : 7 + i);
c = temp_key[n];
temp_key[n] = temp_key[i];
temp_key[i] = c;
}
for(i = 15; i < 16; i--){
if(temp_key[i] <= '7'){
temp_key[i] ^= (hashkey & 7);
hashkey >>= 3;
}else if(temp_key[i] < 'A')
temp_key[i] ^= (i & 1);
}
*product = cdkeys_strtol(temp_key, 2, 16);
*public_key = cdkeys_strtol(temp_key+2, 6, 16);
*private_key = cdkeys_strtol(temp_key+8, 8, 16);
free(temp_key);
return ((product != 0) && (public_key != 0) && (private_key != 0));
}
BOOLEAN cdkeys_getwarcraftvalues(uint8_t *key, uint32_t *public_key, uint32_t *private_key, uint32_t *product){
uint8_t *table = safe_malloc(52);
uint32_t *values = safe_malloc(16);
uint32_t c = 0;
uint8_t d = 0;
uint64_t e;
uint32_t f = 30;
uint32_t g;
uint32_t h;
for(c = 0; c < 26; c++){
d = cdkeys_instr(war3_key, toupper(key[c]));
table[f] = (d / 5);
f = (f + 49) % 52;
table[f] = (d % 5);
f = (f + 49) % 52;
}
for(c = 51; c < 52; c--){
for(d = 3; d < 4; d--){
e = (uint64_t)values[d] * 5;
values[d] = table[c] + (uint32_t)e;
table[c] = (uint8_t)(e >> 32);
}
}
f = 24;
d = 2;
for(c = 0x1D0; c < 0x1D1; c -= 0x10){
f = (f + 28) % 32;
g = (values[d / 8] & (0x0F << f)) >> f;
h = 24;
for(e = 29; e < 30; e--){
h = (h + 28) % 32;
if(e != (c / 0x10))
g = TranslateTable[((values[3 - (e >> 3)] & (0x0F << h)) >> h) ^ TranslateTable[g + c] + c];
}
values[d / 8] = ((TranslateTable[g + c] & 0x0F) << f) | ~(0x0F << f) & values[d / 8];
d++;
}
uint32_t eax, ecx, edx;
uint32_t location;
e = 0;
for(c = 0; c < 120; c++){
eax = c & 0x1F;
ecx = e & 0x1F;
edx = 3 - (edi >> 5);
location = 12 - ((e >> 5) << 2);
ebp = values[location / 4];
ebp = (ebp & (1 << ecx)) >> ecx;
values[edx] = ((ebp & 1) << eax) | (~(1 << eax) & values[edx]);
printf("%08X %08X %08X %08X\n", values[0], values[1], values[2], values[3]);
e = (e + 11) % 120;
}
/*for(edi = 0; edi < 120; edi++)
{
eax = edi & 0x1F;
ecx = esi & 0x1F;
edx = 3 - (edi >>> 5);
int location = 12 - ((esi >>> 5) << 2);
ebp = IntFromByteArray.LITTLEENDIAN.getInteger(Copy, location);
ebp = (ebp & (1 << ecx)) >>> ecx;
keyTable[edx] = ((ebp & 1) << eax) | (~(1 << eax) & keyTable[edx]);
esi = (esi + 11) % 120;
System.out.format("%03d %d %08X %08X %08X %08X %08X\n", location, edx, ebp, keyTable[0], keyTable[1], keyTable[2], keyTable[3]);
}
*/
for(c = 0; c < 4; c++){
printf("%08X ", values[c]);
}
}
#endif
once I get it finished I will clean it up, I'm going to clean code, and willing to sacrifice a bit of effiancy to do it. So please don't complain to me about it.
why would we complain to you about it? not like your code is the first and only option. youre months behind, imo. have fun :P
Quote from: HdxBmx27 on November 02, 2008, 08:24:35 PM
But now, i'm at the mall hoping the owner of the WiFi i'm on [SSID: Uv been hAcked] will try and hack/talk to me
lolwut
Quote
System.out.format("M %08X %08X %08X %08X\n", buf[0], buf[1], buf[2], buf[3]);
for(int x = 3; x > -1; x--){
int y = buf[x] * 5;
System.out.println(buf[x] * 5);
buf[x] = decodedByte + y;
decodedByte = y;
}
System.out.format("M %08X %08X %08X %08X\n", buf[0], buf[1], buf[2], buf[3]);
Ya, I noticed that, i'm trying to figure out what makes it diffrent form this:
while(rounds-- > 0){
long param1 = bufA[posA--] & 0x00000000FFFFFFFFl;
long param2 = mulx & 0x00000000FFFFFFFFl;
long edxeax = param1 * param2;
bufB[posB--] = decodedByte + (int)edxeax;
System.out.println(decodedByte + " " + edxeax);
decodedByte = (int)(edxeax >> 32);
}
Just a guess, but I'm pretty sure that
decodedByte = y;
and
decodedByte = (int)(edxeax >> 32);
do radically different things :)
Well dua i'm years behind. I haven't gotten it working 100% yet. [Who knew a 7 month old could erase 1/2 a file and save it?]
But I just did a comparison, and it runs ~27 times faster in C then it does in Java. [No surprise there] Its helping me learn a bit more about the cdkey system/C. So its worth it.
Quote from: HdxBmx27 on November 04, 2008, 09:52:39 PM
Well dua i'm years behind. I haven't gotten it working 100% yet. [Who knew a 7 month old could erase 1/2 a file and save it?]
But I just did a comparison, and it runs ~27 times faster in C then it does in Java. [No surprise there] Its helping me learn a bit more about the cdkey system/C. So its worth it.
In a year or two this knowledge is going to be all outdated, even more so than it is now.
Quote from: Hdx on November 04, 2008, 09:52:39 PM
But I just did a comparison, and it runs ~27 times faster in C then it does in Java.
I'd be willing to bet money that one or all of the following is true:
* You have hotspot disabled (or have a debugger attached) during the Java sample
* You are compiling the C code to 64-bit machine code, and comparing it to a 32-bit JVM
* You are not logging in your C code, or if you are, you are using stdout instead of cout
If none of these are the case, then you have something seriously wrong with your Java configuration. Hotspot transforms the Java bytecode to machine code that should run similarly to the machine code generated by the c compiler - since you're not allocating any objects in either case (all your variables are primitives on the stack), the gc vs malloc argument doesn't apply.
That said, hotspot is not as good at optimization as GCC is (which I'm assuming you're using; correct me if I'm wrong), so you'll still find that the C code will run marginally faster, by not by an order of magnitude. In my preliminary tests, it's somewhere between 5% and 30% slower than GCC-generated machine code.
This will have no bearing if you're not logging, but System.out is a stream, so it's not fair to compare it to stdout; use cout in your C code for a fair comparison.
Ya, I had a few problems with the code when i first tested it. As I didn't feel like verifying the results, I jsut let them both run 0x1000000 times with no logging or anything.
Turns out I had a bug that made the C code not compute 1/2 the crap.
Now i've finished it and Java runs about 70% faster then the C.
Both are still < 1/2 ms per decode, but still.
Quote from: Hdx on November 07, 2008, 04:31:06 PM
Now i've finished it and Java runs about 70% faster then the C.
Really? Using -O2?
Java does have JIT which supposedly generates some native code at run time. However, I find it hard to believe a highly suggestive and properly structured C code would underperform a Java code. Please compare your C code with a real compiler (e.g. icc). GCC is utter garbage when it comes to performance. Also, take advantage of C99 features (e.g. inline and restrict). Be as suggestive to the compiler as you can (e.g. use register qualifier where appropriate, order code to make vectorization more obvious..."neatness" and "style" have no meaning!). Also order your C code to mix as many operations as possible to reduce register pressure (compiler can generally do this). Finally, JIT probably takes advantage of processor-specific features (e.g. SSE for vectorization), please turn these features on when compiling the C code.
Why do you have to do this in C? Because C is complicated! The compiler cannot make too many assumptions. If you can, code the same program in Fortran and use ifort (again, GCC blows!). Fortran will probably blow both Java and C away because the language is so simple, the compiler can make more aggressive optimizations.
why stop at fortran? assembly is what performance pros use
Quote from: Chavo on November 20, 2008, 01:16:34 AM
why stop at fortran? assembly is what performance pros use
Sometimes, but rarely. You have to know all sorts of details about the processor and machine in order to produce high performance applications in assembler. So, performance pros don't often use assembler. There are a few exceptions, as an example: my friend does numerical linear algebra and works on a cluster of PS3s here, the current C/Fortran compilers for Cell cannot take full advantage of the Cell's vector capabilities (i.e. BLAS and LAPACK would suck!). As a consequence, he wrote some common vector routines in assembler. Almost nobody writes extensive high performance applications in assembler...its not portable and a human often does a worse job than a compiler.
You know whats sad? Fortran is crazy simple and it is the language of choice for many high performance numerical codes...written by people older than us with less knowledge about the computer. You have to know a good deal about what you're doing to do anything high performance in Assembler or C. Actually, despite my dislike of C++...C++'s templates, inline functions, and reference types (similar to restrict pointers) makes it easier for a C++ application to even outperform a C equivalent!
EDIT: P.S. And with regards to your comment, C is actually lower level than Fortran. It would have been more appropriate to say "Why stop at C?".
My comment was not designed to be a serious suggestion as much as a joking comment regarding the silliness of even having the discussion.
The main reason I chose C was because I wanted to become more fluent in it. My code right now is by no means optimized/efficient. JBLS uses ~56mbs of memory just idling, and 3/4ths the damn thing isnt even written yet! But, what I figure is i'll optimize things after i get the bulk of it working. There is no release date this is purly for my own enjoyment.
I plan to in the future teach myself other languages Fortran included. But I figured C was a good place to start.
Quote from: Hdx on November 20, 2008, 12:00:30 PM
The main reason I chose C was because I wanted to become more fluent in it. My code right now is by no means optimized/efficient. JBLS uses ~56mbs of memory just idling, and 3/4ths the damn thing isnt even written yet! But, what I figure is i'll optimize things after i get the bulk of it working. There is no release date this is purly for my own enjoyment.
I plan to in the future teach myself other languages Fortran included. But I figured C was a good place to start.
Are we talking about Java Battle.net Login Server or Just Another Battle.net Login Server?
Java uses like 90 with everything loaded.
C uses 56
Quote from: nslay on November 20, 2008, 12:41:03 AM
Java does have JIT which supposedly generates some native code at run time. However, I find it hard to believe a highly suggestive and properly structured C code would underperform a Java code. Please compare your C code with a real compiler (e.g. icc). GCC is utter garbage when it comes to performance. Also, take advantage of C99 features (e.g. inline and restrict). Be as suggestive to the compiler as you can (e.g. use register qualifier where appropriate, order code to make vectorization more obvious..."neatness" and "style" have no meaning!). Also order your C code to mix as many operations as possible to reduce register pressure (compiler can generally do this). Finally, JIT probably takes advantage of processor-specific features (e.g. SSE for vectorization), please turn these features on when compiling the C code.
Why do you have to do this in C? Because C is complicated! The compiler cannot make too many assumptions. If you can, code the same program in Fortran and use ifort (again, GCC blows!). Fortran will probably blow both Java and C away because the language is so simple, the compiler can make more aggressive optimizations.
Hotspot does not do much optimization; what's the point of having code that executes fast if it takes five minutes to JIT compile? Hotspot does have the advantage of already having the bytecode instructions, which are much
easierquicker to translate in to machine code than C source, but it is none the less a non-trivial operation, thereby limiting the jitter's ability to consider optimizations. If a block of code is observed to be used very often, Hotspot may bring it up as a candidate for further optimization. However, for a test like this, GCC with -O2 should beat Hotspot-enhanced Java every time.