Python + Psyco vs Java vs C++ vs C
Posted on July 11th, 2008 in C/C++, Java, Python |
In my last post, I demonstrated how Psyco can be used to speed up Python code using the example of a brute-force password cracker. To see how the speed of the Python-Psyco combination compared to the speed of other languages I ported to code to Java, C++ and C. In each language I wrote the code as if it were a regular program written in that language (i.e. I didn’t try to optimize the code) - in C++ I used C++ STL Strings, in Java I used the StringBuilder class and in C I used char arrays.
The timings for the Python+Psyco code are as follows (see last post for full details):
Password | Time taken zzzz | 0m0.134s zzzzz | 0m3.486s zzzzzz | 2m8.056s
Here is the Java code (I’m not a Java programmer at all - I just know the basics so I apologise if the code is poorly written):
class BrutePass { static int count = 0; public static void main( String args[] ) { if( args.length < 1 ) { System.out.println( "Usage: java BrutePass <password>" ); return; } String pwd = args[0]; int maxlen = 10; StringBuilder p = new StringBuilder(); for( int w=0; w<maxlen; w++ ) if( crack( w+1, 0, p, pwd ) > 0 ) break; System.out.println( "Found password " + p.toString() + " with " + count + " attempts." ); } public static int crack( int w, int pos, StringBuilder p, String pwd ) { String chars = "abcdefghijklmnopqrstuvwxyz"; for( int i=0; i<chars.length(); i++ ) { p.append( chars.charAt( i ) ); if( pos == w-1 ) { count++; if( pwd.compareTo( p.toString() ) == 0 ) return 1; } int ret = 0; if( pos < w-1 ) ret = crack( w, pos+1, p, pwd ); if( ret > 0 ) return 1; p.delete( p.length() - 1, p.length() ); } return 0; } }
Here are the timings for the java program:
Password | Time taken zzzz | 0m0.176s zzzzz | 0m1.334s zzzzzz | 0m31.369s
Much faster than the Psyco+Python version. The reason that the four-character password appears to take longer to crack in Java than in Python is that the JVM takes a while to get going. Now for the C++ code:
#include <iostream> #include <string> using namespace std; int crack( int w, int pos, string& p, string& pass ); int main( int argc, char **argv ) { if( argc < 2 ) { cout << "Usage: " << argv[0] << " <password>" << endl; return( 0 ); } int maxlen = 10; if( argc > 2 ) maxlen = atoi( argv[2] ); string pwd( argv[1] ); string p; int count; for( int i=0; i<maxlen; i++) if( count = crack( i+1, 0, p, pwd ) ) break; cout << "Found password " << p << " with " << count << " attempts." << endl; return 0; } int crack( int w, int pos, string& p, string& pass ) { static int count = 0; string chars( "abcdefghijklmnopqrstuvwxyz" ); for( int i=0; i<chars.length(); i++) { p.push_back( chars[i] ); if( pos == w-1 ) { if( p == pass ) return( count ); count++; } int ret = 0; if( pos < w-1 ) ret = crack( w, pos+1, p, pass ); if( ret ) return( count ); p.erase( p.end()-1 ); } return( 0 ); }
And the timings:
Password | Time taken zzzz | 0m0.036s zzzzz | 0m0.730s zzzzzz | 0m18.999s
I’d heard that Java is in many cases faster than C++; clearly this is not one of those cases. Now the C version:
#include <stdio.h> #include <stdlib.h> #include <string.h> unsigned long crack( int w, int pos, char *p, char *pass ); int main( int argc, char **argv ) { int maxlen = 10, i; unsigned long count; char *pwd, *p; if( argc < 2 ) { printf( "Usage: %s <password>\n", argv[0] ); return( 1 ); } if( argc > 2 ) maxlen = atoi( argv[2] ); pwd = argv[1]; p = (char *)calloc( maxlen + 1, sizeof(char) ); for( i=0; i<maxlen; i++) if( count = crack( i+1, 0, p, pwd ) ) break; printf( "Found password %s with %lu attempts.\n", p, count ); return 0; } unsigned long crack( int w, int pos, char *p, char *pass ) { static unsigned long count = 0; int i, ret; char *chars = "abcdefghijklmnopqrstuvwxyz"; for( i=0; i<strlen(chars); i++) { p[strlen(p)] = chars[i]; if( pos == w-1 ) { count++; if( !strcmp( p, pass ) ) return( count ); } ret = 0; if( pos < w-1 ) ret = crack( w, pos+1, p, pass ); if( ret ) return( count ); p[strlen(p)-1] = 0; } return( 0 ); }
And the timings:
Password | Time taken zzzz | 0m0.022s zzzzz | 0m0.381s zzzzzz | 0m9.663s
That’s pretty much twice the speed of the C++ version! This shows how much overhead there is for using STL Strings rather than C strings.
Although Psyco provided a fantastic speed boost, it has not eliminated the need to know C as some people have claimed - the native C version of the program performed 13.25x faster than the Psyco version.
9 Responses
I searched for \’Hi Speed Usb Host Controller\’ in google and found this your post (\’o.us poetry\’) in search results. Not very relevant result, but still interesting to read.
On the Java version:
First off, if you are not using the server VM for Java then do so using java -server. Run java -version to see if you are (it will default on for “server class” machines running Unix). If you are running Windows then you need to use the JDK not a JRE to get the server VM. The server vm is around 40% faster for me.
Second, you are creating a new string to compare against in this line: if( pwd.compareTo( p.toString() ) == 0 )
This is expensive because it has to copy the character array (because Strings are immutable, but the StringBuilder could change it).
Just use the following method to do the comparison:
static boolean eq(CharSequence a, CharSequence b) {
if (a.length() != b.length()) return false;
for (int i = 0; i < a.length(); i++) {
if (a.charAt(i) != b.charAt(i)) return false;
}
return true;
}
and call it like this: if( eq(pwd, p)). This gives me a huge improvement (with the server VM)
By the way, there is no reason to use compareTo here. You don’t need ordering so you could just have used .equals()
@Asd: Thanks for that - I’m not at all a Java programmer (I spent about 10 mins on the Java trail website to learn enough to be able to write the code that is in the post). I’m not using the server version of java; I’m primarily on a Mac (+ Linux) so I’ll have a go at installing that and making the changes to the code that you suggested.
Don’t forget that Java is doing maybe twice as much work as C/C++ versions, because sizeof(char) == 2 in Java, but 1 in C & C++.
It would be more cumbersome to rewrite using byte arrays, but that would be necessary for an apples-to-apples comparison - or alternatively, using wchar_t in C & C++ (wstring etc.).
And using a char array instead of a StringBuilder also boosts performance for Java. It gives me a 3x speed increase.
Of course, it’s a bit more difficult to write, but since you compared C++ strings with C char *, I think the comparison is fair.
Here is my current version:
public static void main(String args[]) {
if (args.length < 1) {
System.out.println(”Usage: java BrutePass “);
return;
}
String pwd = args[0];
int maxlen = 10;
long start = System.currentTimeMillis();
char[] buff = new char[maxlen];
String found = “”;
for (int w = 0; w < maxlen; w++) {
found = crack(w + 1, 0, buff, pwd);
if (found != null) {
break;
}
}
long end = System.currentTimeMillis();
System.out.println(”Found password ” + found + ” with ” + count + ” attempts in ”
+ (end - start) + ” ms.”);
}
private static final String CHARS = “abcdefghijklmnopqrstuvwxyz”;
private static final char[] CHARS_A = CHARS.toCharArray();
static boolean eq(char[] a, int al, char[] b, int bl) {
if (al != bl) {
return false;
}
for (int i = 0; i < al; ++i) {
if (a[i] != b[i]) {
return false;
}
}
return true;
}
public static String crack(int w, int pos, char[] p, String pwd) {
final char[] pb = pwd.toCharArray();
for (int i = 0; i < CHARS_A.length; i++) {
p[pos] = CHARS_A[i];
if (pos == w - 1) {
count++;
if (eq(pb, pb.length, p, pos + 1))
return new String(p, 0, pos + 1);
}
String ret = null;
if (pos < w - 1)
ret = crack(w, pos + 1, p, pwd);
if (ret != null)
return ret;
}
return null;
}
Probably all you have to do is add -server to the command line. I’d guess it is available on Macs.
You also used a-z&0-9 in the python version and “only” a-z in the C/C++/Java versions.
@laurentvaucher: Yea, I was trying to compare the standard language constructs rather than providing the most optimized solution - the fastest C++ solution would be using char arrays, and the fastest python solutions would be implementing it in c! Nice implementation though - comments on this page have taught me quite a lot about java!
@mphipps: good point - i actually used a slightly different version of the python one when i was originally trying out the benchmarks where only a-z was used. it made a bit of difference but as the passwords were repeated z’s rather than 9’s it didn’t make a vast amount of difference.