NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
In a past life, I was trained to be a hardware engineer. The regular mantra was that the closer the code was to the hardware, the faster it was going to be. That's why Java was so slow (yea, sure.). This morning,
I came across an interesting article
about how the Java JIT compiler may just be faster than running Java byte code directly on the silicon.
Mark Lam has really gone into some serious detail about the differences between the execution cycles and memory accesses between the JIT compiler and a Java Processor and I have to say that I'm pretty impressed. I had no idea that the JIT compiler could push things that fast (even in a non-ideal situation) and he makes it a point to start with the simple situations and move to ever increasingly more difficult problems.
See, the hardware that can execute Java byte code is still limited when it comes to executing complicated instructions. As the instructions get more complicated, they have to start interacting with the JVM at a more meaningful level. Take any instruction that may need to deal with the garbage collector. The processor would need to call out to something in the JVM to deal with that unless the garbage collector was implemented in the silicon as well. In this area, the JIT can really step ahead because it is able to have special knowledge about the internals of the JVM. The processor is attempting to remain fairly generic so that it can respond to many different garbage collection and thread synchronization algorithms. Of course, there definitely are reasons why you'd want a Java CPU instead of a JIT compiler.
Back in my senior design project in college, we used a Java-based microprocessor in a very memory constrained environment. It's in environments like these that Java in the processor makes sense. In tight memory situations, you may not be able to spare the overhead of a JIT compiler. In addition, you may not be able to sacrifice the extra time on startup for the JIT. Here, speed over the long term may not be your primary concern and being able to write embedded systems in Java instead of C is certainly a nice change of pace.
I think it’s a wonderful testament to the power and progress that Java has made over the years. The fact that in many cases, Java is just the fastest running option available is something that years ago many of us would have scoffed at. Now, Java runs on nearly everything from PCs to phones to toasters (ok, so maybe not reliably on toasters!) and I think Java as a platform has a long life yet to live.
> See, the hardware that can execute Java byte code is still
> limited when it comes to executing complicated instructions
Processors are generally limited.
If you have complicated instructions it increases the overall time taken to execute all other instructions.
That is why complex instructions are split into multiple simple instructions or are passed to a different processing unit.
Software faster than hardware ?
Am I crazy or are you taking a Java problem (complicated byte-code that was not designed for processors) and blame it on the hardware ?
The simple add example in the original article is done wrong for a java processor. What the author does, count the JVM interpreter instruction in machine cycles, is not the usual way to do it in a Java processor. All those actions are performed in parallel - that's what pipelining is in a standard processor and also in a Java processor.
So the simple instruction sequence iload_0, iload_1, add, istore_1 takes 4 cycles on a simple Java processor (and not 10 machine cycles). Furthermore, this instruction sequence does not access memory. Java processors usually have a few stack elements and local variables in registers. This is the same as a register file in a standard RISC processor.
There is also a comparison between a Java processor and JIT on the ARM available:
> All those actions are performed in
> parallel - that's what pipelining is in a standard
> processor and also in a Java processor.
Since pipelining takes effect both in the Java processor and in the native processor won't the savings in terms of cycles will be similar?
Its really hard to measure pipeline performance impact especially in super scalar CPU's so I think Mark simplified as he said for the theoretical case.
I intend to read deeper into the article you linked, it seems rather interesting and from what I glanced from it briefly it seems to fit perfectly with the article. It seems to me that it illustrates the best test cases for JPU from the other direction.
> So the simple instruction sequence iload_0, iload_1,
> add, istore_1 takes 4 cycles on a simple Java
> processor (and not 10 machine cycles). Furthermore,
> this instruction sequence does not access memory.
> Java processors usually have a few stack elements and
> local variables in registers. This is the same as a
> register file in a standard RISC processor.
Isn't it the same with a JIT in a RISC processor that has enough registers?
A JIT can even make better use of the registers based on profiling data.
Shai Almog
vPrise Software makers of vPrise Workgroup
http://wg.vprise.com/ founder of bean-properties the leading OSS properties implementation in Java https://bean-properties.dev.java.net/
> > All those actions are performed in
> > parallel - that's what pipelining is in a standard
> > processor and also in a Java processor.
>
> Since pipelining takes effect both in the Java
> processor and in the native processor won't the
> savings in terms of cycles will be similar?
The issue in the original article was that the author contributed work such as local variable load and pushing it on TOS as sequential operations on the interpreting JVM and on a JPU. However, in a JPU this is a pipelined action taking just a single cycle for all the work.
> Its really hard to measure pipeline performance
> impact especially in super scalar CPU's so I think
> Mark simplified as he said for the theoretical case.
I agree. The ARM and the available Java processors (aJile, picoJava, JOP, Komodo...) are all simple scalar designs. No out-of-order execution and the resulting complexity.
> > So the simple instruction sequence iload_0, iload_1,
> > add, istore_1 takes 4 cycles on a simple Java
> > processor (and not 10 machine cycles). Furthermore,
> > this instruction sequence does not access memory.
> > Java processors usually have a few stack elements and
> > local variables in registers. This is the same as a
> > register file in a standard RISC processor.
>
> Isn't it the same with a JIT in a RISC processor that
> has enough registers?
Yes, it should result in the same cycles. A little advantage in the Java bytecode is that you can actually address more 'registers' as usually available on a RISC: the locals (up to 255) and the elements on the stack. A stack is not so ideal for execution, but a very regular structure which is 'cache friendly'.
> A JIT can even make better use of the registers based
> on profiling data.
I don't think that profiling at this granularity is useful. Would you really profile register usage at runtime?
> The issue in the original article was that the author
> contributed work such as local variable load and
> pushing it on TOS as sequential operations on the
> interpreting JVM and on a JPU. However, in a JPU this
> is a pipelined action taking just a single cycle for
> all the work.
Correct me if I'm wrong here but a native instruction set is also multi-pipelined and so would be able to reduce the number of instructions at least at the same factor as a typical JPU, right?
I'm sorry if I'm being dense but I don't understand why the JPU's multiple pipelines have an advantage over a general purpose CPU's pipeline.
> > A JIT can even make better use of the registers
> based
> > on profiling data.
>
> I don't think that profiling at this granularity is
> useful. Would you really profile register usage at
> runtime?
No but you would profile to see which are the most frequently accessed methods. These methods which are a bottleneck would need extensive optimization which can make use of available registers from the pool.
Shai Almog
vPrise Software makers of vPrise Workgroup
http://wg.vprise.com/ founder of bean-properties the leading OSS properties implementation in Java https://bean-properties.dev.java.net/
> Correct me if I'm wrong here but a native instruction
> set is also multi-pipelined and so would be able to
> reduce the number of instructions at least at the
> same factor as a typical JPU, right?
You're right. However, the author of the original article showed instructions on a hypothetical JPU sequentially that are pipelined in a real JPU. No special benefit in a JPU - it's just the same (and not worse for a JPU compared to JIT compilation to a RISC).
> I'm sorry if I'm being dense but I don't understand
> why the JPU's multiple pipelines have an advantage
> over a general purpose CPU's pipeline.
no advantage, just the same - see above
> No but you would profile to see which are the most
> frequently accessed methods. These methods which are
> a bottleneck would need extensive optimization which
> can make use of available registers from the pool.
mmh. In my understanding of optimization register allocation is primarily considered at a method level. If one does it inter-procedural it's still not that easy when you have an invokation frequency of a single method.
> mmh. In my understanding of optimization register
> allocation is primarily considered at a method level.
> If one does it inter-procedural it's still not that
> easy when you have an invokation frequency of a
> single method.
I didn't mean to imply assigning a field to a register but rather a stack frame entry, if I did that was a mistake There is however a possibility through inlining to extend the scope of use of said register as explained by Mark in his followup here:
http://weblogs.java.net/blog/mlam/archive/2007/02/software_territ_1.html
Shai Almog
vPrise Software makers of vPrise Workgroup
http://wg.vprise.com/ founder of bean-properties the leading OSS properties implementation in Java https://bean-properties.dev.java.net/
There certainly have been performance issues with Java.
We've been working really hard on them.
The primary way we've attacked the problem is with advanced virtual machines. The performance
has been getting very nice. --James Gosling, 1999.
When is software faster than hardware?
URL: When is software faster than hardware?
At 7:54 AM on Feb 14, 2007, Matthew Schmidt wrote:
Fresh Jobs for Developers Post a job opportunity
Mark Lam has really gone into some serious detail about the differences between the execution cycles and memory accesses between the JIT compiler and a Java Processor and I have to say that I'm pretty impressed. I had no idea that the JIT compiler could push things that fast (even in a non-ideal situation) and he makes it a point to start with the simple situations and move to ever increasingly more difficult problems.
See, the hardware that can execute Java byte code is still limited when it comes to executing complicated instructions. As the instructions get more complicated, they have to start interacting with the JVM at a more meaningful level. Take any instruction that may need to deal with the garbage collector. The processor would need to call out to something in the JVM to deal with that unless the garbage collector was implemented in the silicon as well. In this area, the JIT can really step ahead because it is able to have special knowledge about the internals of the JVM. The processor is attempting to remain fairly generic so that it can respond to many different garbage collection and thread synchronization algorithms. Of course, there definitely are reasons why you'd want a Java CPU instead of a JIT compiler.
Back in my senior design project in college, we used a Java-based microprocessor in a very memory constrained environment. It's in environments like these that Java in the processor makes sense. In tight memory situations, you may not be able to spare the overhead of a JIT compiler. In addition, you may not be able to sacrifice the extra time on startup for the JIT. Here, speed over the long term may not be your primary concern and being able to write embedded systems in Java instead of C is certainly a nice change of pace.
I think it’s a wonderful testament to the power and progress that Java has made over the years. The fact that in many cases, Java is just the fastest running option available is something that years ago many of us would have scoffed at. Now, Java runs on nearly everything from PCs to phones to toasters (ok, so maybe not reliably on toasters!) and I think Java as a platform has a long life yet to live.
9 replies so far (
Post your own)
Re: When is software faster than hardware?
> See, the hardware that can execute Java byte code is still> limited when it comes to executing complicated instructions
Processors are generally limited.
If you have complicated instructions it increases the overall time taken to execute all other instructions.
That is why complex instructions are split into multiple simple instructions or are passed to a different processing unit.
Software faster than hardware ?
Am I crazy or are you taking a Java problem (complicated byte-code that was not designed for processors) and blame it on the hardware ?
Re: When is software faster than hardware?
The simple add example in the original article is done wrong for a java processor. What the author does, count the JVM interpreter instruction in machine cycles, is not the usual way to do it in a Java processor. All those actions are performed in parallel - that's what pipelining is in a standard processor and also in a Java processor.So the simple instruction sequence iload_0, iload_1, add, istore_1 takes 4 cycles on a simple Java processor (and not 10 machine cycles). Furthermore, this instruction sequence does not access memory. Java processors usually have a few stack elements and local variables in registers. This is the same as a register file in a standard RISC processor.
There is also a comparison between a Java processor and JIT on the ARM available:
http://www.jopdesign.com/perf.jsp
In this comparison a simple Java processors outperforms the JIT/ARM soultion.
Re: When is software faster than hardware?
> All those actions are performed in> parallel - that's what pipelining is in a standard
> processor and also in a Java processor.
Since pipelining takes effect both in the Java processor and in the native processor won't the savings in terms of cycles will be similar?
Its really hard to measure pipeline performance impact especially in super scalar CPU's so I think Mark simplified as he said for the theoretical case.
I intend to read deeper into the article you linked, it seems rather interesting and from what I glanced from it briefly it seems to fit perfectly with the article. It seems to me that it illustrates the best test cases for JPU from the other direction.
> So the simple instruction sequence iload_0, iload_1,
> add, istore_1 takes 4 cycles on a simple Java
> processor (and not 10 machine cycles). Furthermore,
> this instruction sequence does not access memory.
> Java processors usually have a few stack elements and
> local variables in registers. This is the same as a
> register file in a standard RISC processor.
Isn't it the same with a JIT in a RISC processor that has enough registers?
A JIT can even make better use of the registers based on profiling data.
Re: When is software faster than hardware?
> > All those actions are performed in> > parallel - that's what pipelining is in a standard
> > processor and also in a Java processor.
>
> Since pipelining takes effect both in the Java
> processor and in the native processor won't the
> savings in terms of cycles will be similar?
The issue in the original article was that the author contributed work such as local variable load and pushing it on TOS as sequential operations on the interpreting JVM and on a JPU. However, in a JPU this is a pipelined action taking just a single cycle for all the work.
> Its really hard to measure pipeline performance
> impact especially in super scalar CPU's so I think
> Mark simplified as he said for the theoretical case.
I agree. The ARM and the available Java processors (aJile, picoJava, JOP, Komodo...) are all simple scalar designs. No out-of-order execution and the resulting complexity.
> > So the simple instruction sequence iload_0, iload_1,
> > add, istore_1 takes 4 cycles on a simple Java
> > processor (and not 10 machine cycles). Furthermore,
> > this instruction sequence does not access memory.
> > Java processors usually have a few stack elements and
> > local variables in registers. This is the same as a
> > register file in a standard RISC processor.
>
> Isn't it the same with a JIT in a RISC processor that
> has enough registers?
Yes, it should result in the same cycles. A little advantage in the Java bytecode is that you can actually address more 'registers' as usually available on a RISC: the locals (up to 255) and the elements on the stack. A stack is not so ideal for execution, but a very regular structure which is 'cache friendly'.
> A JIT can even make better use of the registers based
> on profiling data.
I don't think that profiling at this granularity is useful. Would you really profile register usage at runtime?
Re: When is software faster than hardware?
> The issue in the original article was that the author> contributed work such as local variable load and
> pushing it on TOS as sequential operations on the
> interpreting JVM and on a JPU. However, in a JPU this
> is a pipelined action taking just a single cycle for
> all the work.
Correct me if I'm wrong here but a native instruction set is also multi-pipelined and so would be able to reduce the number of instructions at least at the same factor as a typical JPU, right?
I'm sorry if I'm being dense but I don't understand why the JPU's multiple pipelines have an advantage over a general purpose CPU's pipeline.
> > A JIT can even make better use of the registers
> based
> > on profiling data.
>
> I don't think that profiling at this granularity is
> useful. Would you really profile register usage at
> runtime?
No but you would profile to see which are the most frequently accessed methods. These methods which are a bottleneck would need extensive optimization which can make use of available registers from the pool.
Re: When is software faster than hardware?
> Correct me if I'm wrong here but a native instruction> set is also multi-pipelined and so would be able to
> reduce the number of instructions at least at the
> same factor as a typical JPU, right?
You're right. However, the author of the original article showed instructions on a hypothetical JPU sequentially that are pipelined in a real JPU. No special benefit in a JPU - it's just the same (and not worse for a JPU compared to JIT compilation to a RISC).
> I'm sorry if I'm being dense but I don't understand
> why the JPU's multiple pipelines have an advantage
> over a general purpose CPU's pipeline.
no advantage, just the same - see above
> No but you would profile to see which are the most
> frequently accessed methods. These methods which are
> a bottleneck would need extensive optimization which
> can make use of available registers from the pool.
mmh. In my understanding of optimization register allocation is primarily considered at a method level. If one does it inter-procedural it's still not that easy when you have an invokation frequency of a single method.
Re: When is software faster than hardware?
> mmh. In my understanding of optimization register> allocation is primarily considered at a method level.
> If one does it inter-procedural it's still not that
> easy when you have an invokation frequency of a
> single method.
I didn't mean to imply assigning a field to a register but rather a stack frame entry, if I did that was a mistake
There is however a possibility through inlining to extend the scope of use of said register as explained by Mark in his followup here:
http://weblogs.java.net/blog/mlam/archive/2007/02/software_territ_1.html
Re: When is software faster than hardware?
Very interesting and nice article indeed.Re: When is software faster than hardware?
Something to ponder on..Long live Java