Discussion:
Two poorly optimized functions in SQLParser.java
(too old to reply)
RPost
2004-12-22 22:45:04 UTC
Permalink
Does anyone have any experience or knowledge about the JavaCC process used to produce SQLParser.java?

There are two functions, jj_3R_315() and jj_3R_338(), in SQLParser.java that are very poorly optimized.

jj_3R_315 contains around 115 nested IF statements and jj_3R_338 contains over 200 nested IF statements.

The statement: PreparedStatement pstmt = con.prepareStatement("insert into table1 values (?)");

winds up calling jj_3R_315 three times. For each call the current token position is at the "(" and the function executes the entire chain of 115 calls:

if (jj_scan_token(208)) {
jj_scanpos = xsp;
if (jj_scan_token(209)) {
jj_scanpos = xsp;
... over 100 more nested if statements before finishing.

A stack trace shows that the actual call to jj_3R_315 is about 15 layers deep:

jj_3_9
jj_3R_51
jj_3R_91
jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343, jj_3R_315

Most of the calling functions do no more than call one function and returns its boolean result.

I have not been able to create code that calls the jj_3R_338 function but it is similar in structure with a depth of over 200 nested 'if' statements. Each 'if' calls the jj_scan_token() function.

The sysinfo for this is:


D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar org.apache.derby.tools.sysinfo
------------------ Java Information ------------------
Java Version: 1.3.1
Java Vendor: Sun Microsystems Inc.
Java home: d:\jdk1.3.1\jre
Java classpath: .;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
OS name: Windows 2000
OS architecture: x86
OS version: 5.0
Java user name: Administrator
Java user home: C:\Documents and Settings\Administrator
Java user dir: D:\cloudscape_derby\test
--------- Derby Information --------
[D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
------------------------------------------------------
----------------- Locale Information -----------------
------------------------------------------------------

Each of the three executions appear to take less than 1ms so I don't know how long it is really taking to execute the functions but these functions should probably be produced with 'switch' statements like many of the other functions in this file are.

Preparation time for the 'insert' statement averages about 250ms. Inserting 100 records with:

for (int i = 0; i < 100; i++)
{
pstmt.setInt(1, i);
lStart = System.currentTimeMillis();
pstmt.execute();
lEnd = System.currentTimeMillis();
System.out.println(i + " " + (lEnd - lStart));
}

averages about 16ms for each of the 1st 30 records and then about 266ms for each of the remaining 70 records.
Dain Sundstrom
2004-12-27 18:55:57 UTC
Permalink
You really never know what is going to happen to code once it is passes
through the JIT compiler. The "if" statement may be converted to a
switch statement. The "if" statement may be inlined. The only way to
tell is to time the code in isolation. Can you write a test case that
tests just the parser code in isolation? If you can write a isolated
test, don't forget to run the test twice to force the JIT compiler to
kick in and then throw out the first result. One final thing, a VM
can't really accurately measure a single period shorter then 500 ms. I
normally do some work in a for loop with a few thousand or million
iterations depending on the thing being measured. I time the entire
loop, and divid by the number of iterations to get to the time for a
single loop (normally in nanoseconds).

-dain
Post by RPost
Does anyone have any experience or knowledge about the JavaCC process
used to produce SQLParser.java?
 
There are two functions, jj_3R_315() and jj_3R_338(), in
SQLParser.java that are very poorly optimized.
 
jj_3R_315 contains around 115 nested IF statements and jj_3R_338
contains over 200 nested IF statements.
 
The statement:    PreparedStatement pstmt =
con.prepareStatement("insert into table1 values (?)");
winds up calling jj_3R_315 three times. For each call the current
token position is at the "(" and the function executes the entire
 
    if (jj_scan_token(208)) {
    jj_scanpos = xsp;
    if (jj_scan_token(209)) {
    jj_scanpos = xsp;
    ... over 100 more nested if statements before finishing.
 
A stack trace shows that the actual call to jj_3R_315 is about 15
 
jj_3_9
  jj_3R_51
    jj_3R_91
      jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343,
jj_3R_315
 
Most of the calling functions do no more than call one function and
returns its boolean result.
 
I have not been able to create code that calls the jj_3R_338 function
but it is similar in structure with a depth of over 200 nested 'if'
statements. Each 'if' calls the jj_scan_token() function.
 
 
D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath
.;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
org.apache.derby.tools.sysinfo
------------------ Java Information ------------------
Java Version:    1.3.1
Java Vendor:     Sun Microsystems Inc.
Java home:       d:\jdk1.3.1\jre
Java classpath: 
.;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
OS name:         Windows 2000
OS architecture: x86
OS version:      5.0
Java user name:  Administrator
Java user home:  C:\Documents and Settings\Administrator
Java user dir:   D:\cloudscape_derby\test
--------- Derby Information --------
[D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
------------------------------------------------------
----------------- Locale Information -----------------
------------------------------------------------------
Each of the three executions appear to take less than 1ms so I don't
know how long it is really taking to execute the functions but these
functions should probably be produced with 'switch' statements like
many of the other functions in this file are.
 
Preparation time for the 'insert' statement averages about 250ms.
 
    for (int i = 0; i < 100; i++)
    {
      pstmt.setInt(1, i);
      lStart = System.currentTimeMillis();
      pstmt.execute();
      lEnd   = System.currentTimeMillis();
      System.out.println(i + " " + (lEnd - lStart));
    }
averages about 16ms for each of the 1st 30 records and then about
266ms for each of the remaining 70 records.
 
 
RPost
2004-12-27 19:13:49 UTC
Permalink
I don't see how the presence or absence of a JIT compiler is relevant.

These extremely nested 'if' statements are part of source code
(SQLParser.java) that is being produced by JavaCC. The source file is then
compiled during build to a class file.

Depending on the JVM being used at run-time there may not even be a JIT
compiler.

The decompiled class file, produced by either Sun's provided 'javac' or
Jikes, also shows a nested if with a separate function call as shown in the
source code.

I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.

So far, I haven't been able to determine how to modify the sqlgrammar.jj
file to achieve this.

----- Original Message -----
From: "Dain Sundstrom" <dsundstrom-FwHK5tIWHUxWk0Htik3J/***@public.gmane.org>
To: "Derby Development" <derby-dev-***@public.gmane.org>
Sent: Monday, December 27, 2004 10:55 AM
Subject: Re: Two poorly optimized functions in SQLParser.java


You really never know what is going to happen to code once it is passes
through the JIT compiler. The "if" statement may be converted to a
switch statement. The "if" statement may be inlined. The only way to
tell is to time the code in isolation. Can you write a test case that
tests just the parser code in isolation? If you can write a isolated
test, don't forget to run the test twice to force the JIT compiler to
kick in and then throw out the first result. One final thing, a VM
can't really accurately measure a single period shorter then 500 ms. I
normally do some work in a for loop with a few thousand or million
iterations depending on the thing being measured. I time the entire
loop, and divid by the number of iterations to get to the time for a
single loop (normally in nanoseconds).

-dain
Post by RPost
Does anyone have any experience or knowledge about the JavaCC process
used to produce SQLParser.java?
There are two functions, jj_3R_315() and jj_3R_338(), in
SQLParser.java that are very poorly optimized.
jj_3R_315 contains around 115 nested IF statements and jj_3R_338
contains over 200 nested IF statements.
The statement: PreparedStatement pstmt =
con.prepareStatement("insert into table1 values (?)");
winds up calling jj_3R_315 three times. For each call the current
token position is at the "(" and the function executes the entire
if (jj_scan_token(208)) {
jj_scanpos = xsp;
if (jj_scan_token(209)) {
jj_scanpos = xsp;
... over 100 more nested if statements before finishing.
jj_3_9
jj_3R_51
jj_3R_91
jj_3R_167, which calls 204, 261, 289, 310, 322, 333, 337, 343,
jj_3R_315
Most of the calling functions do no more than call one function and
returns its boolean result.
I have not been able to create code that calls the jj_3R_338 function
but it is similar in structure with a depth of over 200 nested 'if'
statements. Each 'if' calls the jj_scan_token() function.
D:\cloudscape_derby\test>d:\jdk1.3.1\bin\java -classpath
.;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
org.apache.derby.tools.sysinfo
------------------ Java Information ------------------
Java Version: 1.3.1
Java Vendor: Sun Microsystems Inc.
Java home: d:\jdk1.3.1\jre
.;..\lib\derby.jar;..\lib\derbynet.jar;..\lib\derbytools.jar
OS name: Windows 2000
OS architecture: x86
OS version: 5.0
Java user name: Administrator
Java user home: C:\Documents and Settings\Administrator
Java user dir: D:\cloudscape_derby\test
--------- Derby Information --------
[D:\cloudscape_derby\lib\derby.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbynet.jar] 10.0.2.1 - (56458)
[D:\cloudscape_derby\lib\derbytools.jar] 10.0.2.1 - (56458)
------------------------------------------------------
----------------- Locale Information -----------------
------------------------------------------------------
Each of the three executions appear to take less than 1ms so I don't
know how long it is really taking to execute the functions but these
functions should probably be produced with 'switch' statements like
many of the other functions in this file are.
Preparation time for the 'insert' statement averages about 250ms.
for (int i = 0; i < 100; i++)
{
pstmt.setInt(1, i);
lStart = System.currentTimeMillis();
pstmt.execute();
lEnd = System.currentTimeMillis();
System.out.println(i + " " + (lEnd - lStart));
}
averages about 16ms for each of the 1st 30 records and then about
266ms for each of the remaining 70 records.
Dain Sundstrom
2004-12-27 19:41:16 UTC
Permalink
Post by RPost
I don't see how the presence or absence of a JIT compiler is relevant.
Your kidding, right? If not, please take some time to look at VM
benchmarks over the last few years. It shows that JITs have added
several orders of magnitude to the performance of Java.
Post by RPost
These extremely nested 'if' statements are part of source code
(SQLParser.java) that is being produced by JavaCC. The source file is then
compiled during build to a class file.
If you can convert it into a switch statement, so can the JIT compiler.
Post by RPost
Depending on the JVM being used at run-time there may not even be a JIT
compiler.
Every major VM has a JIT compiler, so there we should play to them. If
the vm doesn't have a jit, you have much bigger issues.
Post by RPost
The decompiled class file, produced by either Sun's provided 'javac' or
Jikes, also shows a nested if with a separate function call as shown in the
source code.
I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.
So far, I haven't been able to determine how to modify the
sqlgrammar.jj
file to achieve this.
Seriously, if you believe that this is a real problem with performance,
prove it. This is a policy we have had in the geronimo project since
the beginning. There have been several times where people believed
that certain code was slow, tried to prove it, and it turned out it was
another chunk of code. Also, without data you don't know if your
solution is optional (or possibly worse). Anyways, I suggest you time
the code in isolation as it stands, and then hand modify the generated
code as you describe above and time it again.

IMNOHO, only if it is a true provable performance problem should the
code be changed to address the problem. This is because normally
addressing such issues introduces complexity, and this complexity adds
a tax to all future development. Also, complex code is more difficult
for a JIT to understand so it can cut you off from future innovation in
JIT technology (ask that guys that wrote the first XML parsers in
java).

-dain
Daniel John Debrunner
2004-12-27 20:21:44 UTC
Permalink
Post by Dain Sundstrom
Seriously, if you believe that this is a real problem with performance,
prove it. This is a policy we have had in the geronimo project since
the beginning. There have been several times where people believed that
certain code was slow, tried to prove it, and it turned out it was
another chunk of code.
Another way to look at it is to say what difference would it make to
real world applications if you made this piece of code, say, ten times
faster?

Hopefully any well written JDBC application spends less than 1% of its
time compiling SQL statements. The compile phase is made up of four
stages, parse, bind, optimise and generate. Parse time is probably the
quickest stage, let's say 15% of the compile time. So if your ten times
improvement made parsing twice as fast, you will only improve overall
performance by less than 0.1%.

Traditionally performance work on Cloudscape mainly focussed on making
execution time improvements, not compile time improvements.

Dan.
RPost
2004-12-27 21:14:38 UTC
Permalink
Now your the one that's kidding right?

You gave a great answer. Too bad it wasn't related to the question.
Post by RPost
Does anyone have any experience or knowledge about the JavaCC process
used to produce SQLParser.java?
I will assume your answer to that one is 'no'.

I'm trying to determine why JavaCC is spitting out code with 115 and 200
nested IFs rather than the (much) more efficient SWITCH statement it puts
out for most of the other code performing similar functionality.

I have generally found in the past that 115 executions of a function takes
longer than one execution-:)

----- Original Message -----
From: "Dain Sundstrom" <dsundstrom-FwHK5tIWHUxWk0Htik3J/***@public.gmane.org>
To: "Derby Development" <derby-dev-***@public.gmane.org>
Sent: Monday, December 27, 2004 11:41 AM
Subject: Re: Two poorly optimized functions in SQLParser.java
Post by RPost
Post by RPost
I don't see how the presence or absence of a JIT compiler is relevant.
Your kidding, right? If not, please take some time to look at VM
benchmarks over the last few years. It shows that JITs have added
several orders of magnitude to the performance of Java.
Post by RPost
These extremely nested 'if' statements are part of source code
(SQLParser.java) that is being produced by JavaCC. The source file is then
compiled during build to a class file.
If you can convert it into a switch statement, so can the JIT compiler.
Post by RPost
Depending on the JVM being used at run-time there may not even be a JIT
compiler.
Every major VM has a JIT compiler, so there we should play to them. If
the vm doesn't have a jit, you have much bigger issues.
Post by RPost
The decompiled class file, produced by either Sun's provided 'javac' or
Jikes, also shows a nested if with a separate function call as shown in the
source code.
I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.
So far, I haven't been able to determine how to modify the
sqlgrammar.jj
file to achieve this.
Seriously, if you believe that this is a real problem with performance,
prove it. This is a policy we have had in the geronimo project since
the beginning. There have been several times where people believed
that certain code was slow, tried to prove it, and it turned out it was
another chunk of code. Also, without data you don't know if your
solution is optional (or possibly worse). Anyways, I suggest you time
the code in isolation as it stands, and then hand modify the generated
code as you describe above and time it again.
IMNOHO, only if it is a true provable performance problem should the
code be changed to address the problem. This is because normally
addressing such issues introduces complexity, and this complexity adds
a tax to all future development. Also, complex code is more difficult
for a JIT to understand so it can cut you off from future innovation in
JIT technology (ask that guys that wrote the first XML parsers in
java).
-dain
Dain Sundstrom
2004-12-27 21:26:09 UTC
Permalink
Post by RPost
Now your the one that's kidding right?
You gave a great answer. Too bad it wasn't related to the question.
Post by RPost
Does anyone have any experience or knowledge about the JavaCC process
used to produce SQLParser.java?
I will assume your answer to that one is 'no'.
Actually, I have extensive experience with javacc, but as Daniel
pointed out it is a waste of time and effort to optimize non-critical
code paths. Basically, you have a trade off between fast code and
complex code (i.e. more expensive to maintain and improve), so unless
it affects real world performance, I personally will always choose the
simple code.

-dain

Jeffrey Lichtman
2004-12-27 19:45:27 UTC
Permalink
Post by RPost
I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.
So far, I haven't been able to determine how to modify the sqlgrammar.jj
file to achieve this.
It's time for me to out myself. I wrote most of sqlgrammar.jj, along with a
lot of the other Cloudscape/Derby language code. I no longer work on
Cloudscape or Derby, but I'm still interested in it.

The code in SQLParser.java consists mostly of rules from sqlgrammar.jj and
lookahead code. I suspect the massive nested if statements are for lookahead.

sqlgrammar.jj uses a single token of lookahead by default. The first
implementation of sqlgrammar.jj used numeric lookahead to disambiguate the
grammar. That is, where javacc would report that more tokens of lookahead
were needed, I would put in LOOKAHEAD(2) (or 3 or 4, or whatever javacc
suggested). This led to a very large parser.

The javacc development team told me that the size of the parser would be
reduced significantly if, in those places where more than one token of
lookahead is needed, we used semantic lookahead rather than numeric
lookahead. What that means is that, where the grammar is ambiguous with a
single token of lookahead, we use a hand-written method for lookahead
rather than tell javacc to look ahead two or more tokens. When I made this
change, the number of generated methods starting with "jj_" shrank
significantly.

I suspect the ugly massive nested if statements in SQLParser.java have to
do with lookahead. I thought that maybe someone had put in some numeric
lookahead, but I didn't find any such thing when I looked through
sqlgrammar.jj. It's possible that I missed something, though.

In any case, it would help to have the complete call stack for the methods
in question. I am particularly interested in which grammar rules are
calling these methods. Given that, I might be able to figure out what's
going on by looking at sqlgrammar.jj


- Jeff Lichtman
swazoo-***@public.gmane.org
Check out Swazoo Koolak's Web Jukebox at
http://swazoo.com/
RPost
2004-12-27 21:26:49 UTC
Permalink
Thanks Jeffrey. That's the info I was looking for.

I will supply a call stack from Derby. I will also try to find the spot in
JavaCC that is spitting out the JJ_3R_315 method to see what part of
sqlgrammer.jj it is interpreting.

----- Original Message -----
From: "Jeffrey Lichtman" <swazoo-***@public.gmane.org>
To: "Derby Development" <derby-dev-***@public.gmane.org>
Sent: Monday, December 27, 2004 11:45 AM
Subject: Re: Two poorly optimized functions in SQLParser.java
Post by Jeffrey Lichtman
Post by RPost
I am merely suggesting that it would be more efficient if JavaCC produced
'switch' statements for these rather than the nested 'if' statements.
So far, I haven't been able to determine how to modify the sqlgrammar.jj
file to achieve this.
It's time for me to out myself. I wrote most of sqlgrammar.jj, along with a
lot of the other Cloudscape/Derby language code. I no longer work on
Cloudscape or Derby, but I'm still interested in it.
The code in SQLParser.java consists mostly of rules from sqlgrammar.jj and
lookahead code. I suspect the massive nested if statements are for lookahead.
sqlgrammar.jj uses a single token of lookahead by default. The first
implementation of sqlgrammar.jj used numeric lookahead to disambiguate the
grammar. That is, where javacc would report that more tokens of lookahead
were needed, I would put in LOOKAHEAD(2) (or 3 or 4, or whatever javacc
suggested). This led to a very large parser.
The javacc development team told me that the size of the parser would be
reduced significantly if, in those places where more than one token of
lookahead is needed, we used semantic lookahead rather than numeric
lookahead. What that means is that, where the grammar is ambiguous with a
single token of lookahead, we use a hand-written method for lookahead
rather than tell javacc to look ahead two or more tokens. When I made this
change, the number of generated methods starting with "jj_" shrank
significantly.
I suspect the ugly massive nested if statements in SQLParser.java have to
do with lookahead. I thought that maybe someone had put in some numeric
lookahead, but I didn't find any such thing when I looked through
sqlgrammar.jj. It's possible that I missed something, though.
In any case, it would help to have the complete call stack for the methods
in question. I am particularly interested in which grammar rules are
calling these methods. Given that, I might be able to figure out what's
going on by looking at sqlgrammar.jj
- Jeff Lichtman
Check out Swazoo Koolak's Web Jukebox at
http://swazoo.com/
Loading...