Wednesday, July 21, 2010

New optimisation in Twig for merged parent queries

I have just added a really nifty corner-case optimisation to Twig for merged parent queries (relation index entities) which is needed for Target Rooms location based search.  Target Rooms searches a number of map blocks at the same time using async parallel queries and then merges the results together to get rid of the duplicates. You can see the blocks it chooses by adding the debug parameter:

http://www.targetrooms.com/place/montpellier?debug

See how tightly the blocks fit to the map?  It chooses the blocks using a lowest-cost formula to minimize the amount of overlap and the number of blocks queried. If there is a major city just off the screen Twig may need to sort through thousands of hotels to find the best available prices and then throw them away because they are not on the map. That makes a really slow worst case that was actually quite common before this best-fit approach.

It now asks each child query which entities it wants and gets rid of duplicates before sending off a single request to the datastore for all of them.  When any of the children needs more parent entities it asks this EntitySupplier for more which in turn asks all of the children again if they need any more.

It has speed up the geospatial search HUGELY!  I only recently realised that the complex part of the query (searching 6 blocks of geospatial data in parallel and sorting it by price) was completing in about 50ms!!!  The simple part (bulk get of the parent results) was taking between 200 and 1200ms to do 6 bulk gets.  What a surprise. You can actually see in the Google App Engine Status page that some days each simple get (not even bulk get) can take like 500ms.

Now Twig combines all bulk gets for each child query into a single bulk get which has made a huge improvement.

Eclipse debug running very slow

I hit a problem developing with App Engine which took me a long time to figure out.  I have been flipping between projects so didn't really notice when GAE and GWT started so run so slowly.  I have also been experimenting with JRebel and messing with JVM memory parameters.  Well "suddenly" every started running really really slowly and App Engine would take around 10 minutes to start up and run my 6 async parallel queries.  Was it the new version of App Engine Java introducing a concurrency problem?  Was it that my local datastore local_db.bin was too big - holy shit its 10MB!

After a lot of kafuffle I finally removed all break points and viola - Every thing was lightening fast again!  Even though they were not being hit they somehow slowed the dev server down to a crawl.  What a massive waste of time.