[nexml]
phylogenetic data in xmlThe future data exchange standard is here!
nexml is an exchange standard for representing phylogenetic data — inspired by the commonly used NEXUS format, but more robust and easier to process.
Overview
The NEXUS file format is a commonly used format for phylogenetic data. Unfortunately, over time, the format has become overloaded - which has caused various problems. Meanwhile, new technologies around the XML standard have emerged. These technologies have the potential to greatly simplify, and improve robustness, in the processing of phylogenetic data:
- Validation — some of the issues hampering interoperability are caused by the fact that no formal specification exists for NEXUS and other flat files, and no objective way to validate them. Thanks to XML Schema we can now define a grammar against which data files can be validated syntactically, and with the wider EvoInfo working group we are developing an ontology against which data can be validated semantically.
- Web services — a number of different technologies (such as XML-RPC, REST and SOAP) have emerged allowing disparate, xml-based services to be glued together over the internet. Using such services, researchers can "farm out" their calculations to dedicated servers, such as those of the CIPRES project. The wider plan is to integrate such services in an ontology-mediated architecture.
- Native XML databases — relational databases are based on a fixed schema. For phylogenetic data this poses problems, because the field of phyloinformatics moves very rapidly: new metrics and analysis types are published constantly. XML databases are freed from this constraint, simplifying storage of unusual data types.
Therefore, a group of developers of phylogenetic software have come together as part of the NESCent working group for evolutionary informatics to develop a new data exchange standard based on these technologies.
[ Back to top ]
What are we doing about it?
Nexml development is being undertaken in a number of subprojects:
- In the first place, we're designing an XML schema. This schema (designated as namespace http://www.nexml.org/1.0) is documented on our wiki; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
- Secondly, we're developing java class libraries to aid software developers wanting to use nexml in their applications These class libraries will support reading of nexml data through a SAX API (to facilitate large data sets or data streams) and writing through a set of simple interfaces objects need to implement in order to become xml writable. The code for this subproject is in the java subfolder on the svn repository.
- Third, we're developing perl modules that plug into the IO backend of the Bio::Phylo package on CPAN. Much of the validator code on this website runs on these modules. As Bio::Phylo is (developing to become) compatible with BioPerl and Bio::NEXUS, this will make nexml IO available to a large number of perl programmers.
In addition, the developers of the phycas project are working on parsing and serializing their python objects in nexml, and Sergei Kosakovsky Pond of hyphy is interested in nexml IO for that project. We've also had (very tentative) conversations with Paul Lewis of NCL (the nexus class libraries in c++) about nexml integration. In short, a lot of active but alpha stage development is going on.
[ Back to top ]
Get involved!
If you are interested in being involved in the nexml project in any way, please do! Here are some ways to get involved:
- Get informed — information about the nexml project is distributed over the wiki (for an overview of vision, plans, implementation), documentation (for formal description of the schema) and the mailing list (for immediate plans and discussion).
- Try it out — the download section of the website has nightly builds of bindings for various languages. Using these, you can manipulate and create nexml data in mesquite, Bio::Phylo and pynexml.
- Contribute — if you are a programmer interested in extending nexml support, please contact us through the mailing list to get commit support for the subversion repository.
[ Back to top ]