Osmosis/Benchmarking
Benchmark tests
A few scenarios were tested to see how modern multicore CPU's could best be utilized. All these scenarios produce the same result, namely extract a high number (196) of bounding boxes from the planet file. All tests were performed on a AMD64 X2 4200+ (S939, 2200MHz) with 3GB RAM and a single SATA Seagate 7200rpm 250GB disk. Below are the results (Osmosis 0.24): Only the relevant parts of the scenarios is included here to save space. Look at the Osmosis manual for completion.
- No optimizations
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml file=planet/planet-latest.osm outPipe.0=planet --tee inPipe.0=planet outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.>
Duration 4:45h utilizing only a single CPU. An AMD64 3000+ (S939, 1800MHz) uses about a quarter of an hour more. The 20% increase in CPU speed has no great affect apparently. Also in this situation the CPU is running constantly at 100% load, so the single hardisk is not a limiting factor.
- Disable date parsing
Extracting bounding boxes only requires scanning for positions. Dates are not important here, but parsing these is enabled by default. Lets see how things go when parsing dates is disabled.
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml enableDateParsing=no file=planet/planet-latest.osm outPipe.0=planet --tee inPipe.0=planet outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.>
Duration 1:30h utilizing only a single CPU. Not parsing dates gives a major speed bump, but still leaves room for improvement.
- Disable date parsing, using a buffer
Osmosis offers an option specifically for enhancing speed with multicore cpu's: buffering. It allows data to be exchanged between threads. The read-xml function that parses the XML is now separated from the bounding-box filters.
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml enableDateParsing=no file=planet/planet-latest.osm outPipe.0=planet --buffer inPipe.0=planet outPipe.0=b1 --tee inPipe.0=b1 outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.>
Duration 1:27h utilizing both CPU's. This construction does not provide any advantages speedwise. The 3 minutes gained can be ignored. Maybe reading the XML is faster then extracting the bounding boxes and the communication overhead is too large to gain any improvements.
- Disable date parsing, using a large buffer
The buffer has space for 100 objects by default. Let's see if enlarging this space 10x has any use:
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml enableDateParsing=no file=planet/planet-latest.osm outPipe.0=planet --buffer bufferCapacity=1000 inPipe.0=planet outPipe.0=b1 --tee inPipe.0=b1 outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.>
Duration 1:28h utilizing both CPU's. No improvement what so ever.
- Disable date parsing, bz2 output
It is also interesting to see what compression does to speed. In this case the output files are compressed using bz2.
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml enableDateParsing=no file=planet/planet-latest.osm outPipe.0=planet --tee inPipe.0=planet outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.> write-xml=file<0....195>.bz2
Duration 1:42h utilizing only a single CPU. The loss in speed shows that, in this setup, Osmosis is CPU bound. Compressing the stream to lessen the load on the hardisk is not necessary.
- Disable date parsing, using a large buffer, bz2 output
Throwing every method so far into the command:
java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml enableDateParsing=no file=planet/planet-latest.osm outPipe.0=planet --buffer bufferCapacity=1000 inPipe.0=planet outPipe.0=b1 --tee inPipe.0=b1 outputCount=196 <outPipe.0....195> --bounding-box inPipe.<tee> <etc.>
This resulted in a crash. Under normal circumstances Osmosis does not use the 1024Mb that is provided, it is rather conservative with regard to memory usage. So I'm not sure if this is the command or just the CPU overheating after running so many benchmarks ;-)
Exception in thread "Thread-1-read-xml" java.lang.OutOfMemoryError: GC overhead limit exceeded at com.bretth.osmosis.core.xml.v0_5.impl.NodeElementProcessor.begin(NodeElementProcessor.java:69) at com.bretth.osmosis.core.xml.v0_5.impl.OsmHandler.startElement(OsmHandler.java:91) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1357) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at com.bretth.osmosis.core.xml.v0_5.XmlReader.run(XmlReader.java:101) at java.lang.Thread.run(Thread.java:619) Dec 1, 2007 6:55:09 AM com.bretth.osmosis.core.xml.common.BaseXmlWriter release SEVERE: Unable to close writer. java.lang.ArithmeticException: / by zero at org.apache.tools.bzip2.CBZip2OutputStream.mainSort(CBZip2OutputStream.java:1135) at org.apache.tools.bzip2.CBZip2OutputStream.doReversibleTransformation(CBZip2OutputStream.java:1347) at org.apache.tools.bzip2.CBZip2OutputStream.endBlock(CBZip2OutputStream.java:438) at org.apache.tools.bzip2.CBZip2OutputStream.close(CBZip2OutputStream.java:389) at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301) at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130) at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216) at java.io.BufferedWriter.close(BufferedWriter.java:248) at com.bretth.osmosis.core.xml.common.BaseXmlWriter.release(BaseXmlWriter.java:181) at com.bretth.osmosis.core.filter.v0_5.AreaFilter.release(AreaFilter.java:333) at com.bretth.osmosis.core.tee.v0_5.EntityTee$ProxySinkSource.release(EntityTee.java:135) at com.bretth.osmosis.core.tee.v0_5.EntityTee.release(EntityTee.java:85) at com.bretth.osmosis.core.buffer.v0_5.EntityBuffer.run(EntityBuffer.java:84) at java.lang.Thread.run(Thread.java:619) Exception in thread "Thread-2-buffer" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.tools.bzip2.CBZip2OutputStream.qSort3(CBZip2OutputStream.java:999) Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
- Disable date parsing, buffer for each bounding-box
It might seem a good idea to give every bounding-box its own buffer, thus creating a thread for every bbox and output. Lots of threads spread over only two CPUs however slows things down considerably. The kernel accounted for 70% of the 200% total CPU power (instead of nearly 0% when not using buffers) during the test. This combined with a total load of about 110% (of the 200% total) produces a very slow processing (only 40% effective). I terminated the benchmark after 3 hours of processing. Perhaps a quadcore CPU can handle this better.
- Using bzcat for decompressing the planet file
I have not done benchmarks to prove this but: Using the Unix utility bzcat to decompress the planet file before passing it to Osmosis results in a good performance boost, certainly if you've got at least two processor cores.
The built-in decompression library of Osmosis isn't very speedy compared to bzcat. This combined with the beneficial split into two separate processes, allowing both processors to work side-by-side, gives a good performance boost on a multicore CPU. Below is an example to use bzcat in combination with Osmosis:
bzcat planet/planet-latest.osm.bz2 | java -Xmx1048m -jar utils/osmosis/osmosis.jar --read-xml file=/dev/stdin outPipe.0=planet .....
Conclusion
For planet splitting applications (bbox or polygon) the best optimizations by far are disabling date parsing and using bzcat for decompressing the planet file. Using buffers to spread the load over multiple CPU's sounds good but has not delivered it's promise (with this setup) because of a huge overhead. That is not to say it won't help in different applications or with different setups...
More tests to compare the speed of osmosis
That all is done on an Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz machine under 64bit Linux with 4GB RAM. I am extracting a part from the europe.osm.bz2 coming from the geofabrik extracts. I'm cutting out a bounding-box with top=49 left=13 bottom=47.2 right=16.
1st plane osmosis:
osmosis --read-xml enableDateParsing=no file=europe.osm.bz2 --bounding-box top=49 left=13 bottom=47.2 right=16 --write-xml file=- | bzip2 > map.osm.bz2
result: 107min 2sec
2nd using bzcat:
bzcat europe.osm.bz2 | osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=49 left=13 bottom=47.2 right=16 --write-xml file=- | bzip2 > map.osm.bz2
result: 50min 57sec
3rd using the pipe-viewer pv (see http://www.ivarch.com/programs/pv.shtml)
bzcat europe.osm.bz2 |pv | osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=49 left=13 bottom=47.2 right=16 --write-xml file=- | bzip2 > map.osm.bz2
result: 33min 50sec
4th using pv on the 2nd pipe too:
time bzcat europe.osm.bz2 |pv | osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=49 left=13 bottom=47.2 right=16 --write-xml file=- |pv -q | bzip2 > map.osm.bz2
result: 33min 19sec
5th using pv with 100MB buffer and not only a few 100KB with its default:
time bzcat europe.osm.bz2 |pv -B 100m| osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=49 left=13 bottom=47.2 right=16 --write-xml file=- |pv -q -B 100m | bzip2 > map.osm.bz2
result: 32min 51sec
Final result, the big boost in the osmosis speed is coming with using the "bufferd pipe" with pv. With pv you can see with top that the osmosis process is using one of the cores by 100%.
Applying diffs on pbf planet
Windows
Tests done on i7 920 using Windows Java 1.6.0_22, full planet size: 10.846.296.936 planet-20110103.pbf
osmosis.bat --read-xml-change file=20110102-20110103.osc.gz --buffer-change bufferCapacity=1000 --simplify-change --read-pbf file=planet-20110102.pbf --buffer bufferCapacity=1000 --apply-change --buffer bufferCapacity=1000 --write-pbf file=planet-20110103.pbf
result: 80 min
going up with the buffer size to even bigger numbers
osmosis.bat --read-xml-change file=20110102-20110103.osc.gz --buffer-change bufferCapacity=50000 --simplify-change --read-pbf file=planet-20110102.pbf --buffer bufferCapacity=50000 --apply-change --buffer bufferCapacity=50000 --write-pbf file=planet-20110103.pbf
result: 50 min
More tests:
smaller buffers (12000) results in 51 min: seams to be enough as no big difference to 50000 buffers
disabling compression (compress=none) but 50000 buffers results in 58 min: no real benefit. Higher IO seams to overcompensate compression saving
Linux Quad Core
On a quad-core server running Ubuntu and Osmosis 0.39 with a 750MB changes.osc and a 13GB planet-current.osm.pbf:
osmosis --read-xml-change changes.osc --read-bin planet-latest.osm.pbf --apply-change --write-bin planet-new.osm.pbf
result: 106 min
With the same server, changes.osc and planet file:
osmosis --read-xml-change changes.osc --read-bin planet-latest.osm.pbf --buffer bufferCapacity=12000 --apply-change --buffer bufferCapacity=12000 --write-bin planet-new.osm.pbf
result: 58 min
In this scenario using buffers helps significantly.