Today, I had to query a table in PostgreSQL DB that had 1,84,692 records in it, using JDBC.
I got a Connection object, prepared a statement as “select * from foo” and got the result set. I tried to print the field I was intersted in. Bam, I got this error “Exception in thread “main” java.lang.OutOfMemoryError: Java heap space”. I was a bit baffled as I had taken care to set the fetch size of the prepared statement, so that all the results are not loaded into memory. I changed my eclipse heap size to 512 MB and gave it a shot again. Still the same error. Now I went to my very good friend google and started poking around. Rest of this post is a summary of what I learnt.
I was under the impression that setting the fetch size on a prepared statement would not load the whole result set into memory. A closer look at the API spec says something else. This is what the API spec says about setting the fetch size “Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed”. The operative word here is “HINT”. So it means this behavior is not gauranteed.
For PostgreSQL not to load the whole resultset into memory, it should use cursors. Setting the fetch size of the prepared statement should have told PostgreSQL to use cursors. When a cursor is used, multiple queries are fired when more results are needed instead of loading all the results into memory in a single shot. But the catch is that, PostgreSQL can use cursors only in a transaction block. And I guess the JDBC driver of PostgreSQL executes statements inside a transaction block only if the connection object’s auto commit is set to false. As I had not set my connection object’s auto commit to false, my query was not being executed in a transaction block. As soon as I set my connection object’s auto commit to false, my memory error disappeared and things were back to normal.