Berkeley DB XML: Native XML Database With XQuery Support
Berkeley DB XML has been on my radar screen for a long time now. It was mentioned recently in one of the meetings I attended. So I think I would take a deeper look. Here's my first impression report:
Wow! Cool! Yeay! Way to go! Well done!
The Name Game
The official name of the software is Berkeley DB XML, sometimes shortened to BDB XML. The file I downloaded from the website is called dbxml-2.1.8.tar.gz (23M in size), which extracts to a directory dbxml-2.1.8 (190M). However, if you search the web with dbxml, something other than Berkeley DB XML will show up. It is a bit of confusing situation.
Sleepycat Software, License
Sleepycat Software is the maker of Berkeley DB XML. They are better known as the maker of Berkeley DB, the widely deployed open source embedded database engine.
Berkeley DB XML is released under an open source license that permits its use in open source applications. Proprietary software vendors may purchase a proprietary license.
What's In The Bundle
The dbxml-2.1.8 directory contains the following:
[weiqi@gao dbxml-2.1.8] $ du -sh * 20K buildall.sh 48M db-4.3.28 49M dbxml 16M pathan 8.0K README 74M xerces-c-src_2_6_0 4.8M xquery-1.1.0
Berkeley DB XML uses Berkeley DB for data management, Xerces-C for XML processing, and Pathan for XPath parsing and evaluation. It also includes an XQuery processing engine that is written just for Berkeley DB XML.
Building It
Berkeley DB XML can be built using the configure; make; make install method for each of the piece parts of the system. There is also a buildall.sh script that will do the whole thing for you.
I used this command line to build Berkeley DB XML:
./buildall.sh --prefix=/opt/dbxml-2.1.8 --enable-java --enable-perl
It took 26 minutes to build on my Athlon XP 2700+ with 1GB RAM. The result is 154MB of stuff in /opt/dbxml-2.1.8, including goodies such as these:
[weiqi@gao dbxml-2.1.8] $ ls bin db_archive db_dump db_recover db_verify dbxml_load db_checkpoint db_load db_stat dbxml dbxml_load_container db_deadlock db_printlog db_upgrade dbxml_dump query_runner [weiqi@gao dbxml-2.1.8] $ ls lib db.jar libdb_java-4.3.so libpathan.a dbxml.jar libdb_java-4.so libpathan.la libdb-4.3.a libdb_java.so libpathan.so libdb-4.3.la libdb.so libpathan.so.3 libdb-4.3.so libdbxml-2.1.a libpathan.so.3.0.1 libdb-4.so libdbxml-2.1.la libxerces-c.so libdb.a libdbxml-2.1.so libxerces-c.so.26 libdb_cxx-4.3.a libdbxml-2.so libxerces-c.so.26.0 libdb_cxx-4.3.la libdbxml.a libxquery-1.1.a libdb_cxx-4.3.so libdbxml_java-2.1.a libxquery-1.1.la libdb_cxx-4.so libdbxml_java-2.1_g.so libxquery-1.1.so libdb_cxx.a libdbxml_java-2.1.la libxquery-1.so libdb_cxx.so libdbxml_java-2.1.so libxquery.a libdb_java-4.3.a libdbxml_java-2.so libxquery.so libdb_java-4.3_g.so libdbxml_java.so libdb_java-4.3.la libdbxml.so
The Interactive Command Line Tool
After adding the appropriate directories/jar files into the PATH, LD_LIBRARY_PATH, CLASSPATH and ld.so cache, I can run the interactive command line tool called dbxml:
[weiqi@gao] $ dbxml dbxml> createContainer foo.dbxml Creating document storage container dbxml> putDocument foo1 '<foo>bar</foo>' s Document added, name = foo1 dbxml> query ' collection("foo.dbxml")/foo' 1 objects returned for eager expression ' collection("foo.dbxml")/foo' dbxml> print <foo>bar</foo>
I have just created a file called foo.dbxml, inserted an XML document into it, run a XQuery query against the database, and printed the results.
The download bundle include very professional looking documentation, from introduction to programmer's guide to API references. The documentations are also available on the web.
The Guided Tour is especially illuminating.
The C++ And Java APIs
The interactive tool is only meant to be the programmer's helper. The typical Berkeley DB XML application's users won't ever see it. The application can use the C/C++/Java APIs to manipulate XML documents in one or more .dbxml database files in a transactional and multi-threaded fashion.
The following snippet of C++ code does roughly the same thing as the interactive session above:
[weiqi@gao] $ cat foo.cc #include#include "dbxml/DbXml.hpp" int main(int argc, char *argv[]) { try { DbXml::XmlManager mgr; DbXml::XmlContainer cont = mgr.createContainer("foo.dbxml"); DbXml::XmlUpdateContext uc = mgr.createUpdateContext(); cont.putDocument("foo1", "<foo>bar</foo>", uc); DbXml::XmlQueryContext qc = mgr.createQueryContext(); DbXml::XmlResults res = mgr.query("collection('foo.dbxml')/foo", qc); DbXml::XmlValue value; while (res.next(value)) std::cout << "Value: " << value.asString() << std::endl; } catch (DbXml::XmlException& e) { std::cout << "Exception: " << e.what() << std::endl; } return 0; } [weiqi@gao] $ g++ -I /opt/dbxml-2.1.8/include -L /opt/dbxml-2.1.8/lib -o foo foo .cc -lpathan -lxquery -lxerces-c -ldbxml-2.1 -ldb_cxx-4.3 -lpthread [weiqi@gao] $ ./foo Value: <foo>bar</foo>
Of course the second time this program is run, it complaints that foo.dbxml already exists.
XQuery Support
By far the most exciting feature of Berkeley DB XML, for me at least, is its support of the full XQuery 1.0 and XPath 2.0 languages. Although the W3C standardization process for XQuery has been very slow (I wrote about it 585 days ago), the power of the XQuery language has been recognized by the big three relational database vendors and a whole lot of native XML database vendors.
In addition, Berkeley DB XML also supports several XML storage strategies, indexing, metadata, W3C XML Schema validation, and an update syntax extension for the query language.
Got XML?
If you have a lot of XML documents laying around in the file system, why not pour them into a Berkeley DB XML container, add a few indices and some metadata, and query away?
Contributions Start To Flow Into Project Harmony
Geir Magnusson Jr. wrote: > > On Aug 6, 2005, at 12:38 PM, Ian Darwin wrote: > > > Well, this is tiny and trivial but I guess its a start. > > A river starts with a single raindrop
Harmony is the Apache project for an open source Java implementation announced 93 days ago.