[nexml]
phylogenetic data in xmlThe future data exchange standard is here!
nexml is an exchange standard for representing phylogenetic data — inspired by the commonly used NEXUS format, but more robust and easier to process.
Overview
The NEXUS file format is a commonly used format for phylogenetic data. Unfortunately, over time, the format has become overloaded - which has caused various problems. Meanwhile, new technologies around the XML standard have emerged. These technologies have the potential to greatly simplify, and improve robustness, in the processing of phylogenetic data (for a quick visual demonstration of the integration of NeXML with web services and client-side AJAX scripting, click here):
- Syntax validation — some of the issues hampering interoperability are caused by the fact that no formal specification exists for NEXUS and other flat files, and no unambiguous way to validate them. Thanks to XML Schema we can now define a grammar against which data files can be validated syntactically.
- Semantic annotation — another issue in current file formats is that their semantics are not well-defined. For example, what does it mean to use an ambiguity code in a matrix? Is it uncertainty or polymorphism? With the wider EvoInfo working group we are developing an ontology on which we are mapping nexml schema types so that the semantics of data files become well-defined.
- Web services — a number of different technologies (such as XML-RPC, REST and SOAP) have emerged allowing disparate, xml-based services to be glued together over the internet. Using such services, researchers can "farm out" their calculations to dedicated servers, such as those of the CIPRES project. The wider plan is to integrate such services in an ontology-mediated architecture.
- Native XML databases — relational databases are based on a fixed schema. For phylogenetic data this poses problems, because the field of phyloinformatics moves very rapidly: new metrics and analysis types are published constantly. XML databases are freed from this constraint, simplifying storage of unusual data types.
Therefore, a group of developers of phylogenetic software have come together as part of the NESCent working group for evolutionary informatics to develop a new data exchange standard based on these technologies.
[ Back to top ]
What are we doing about it?
Nexml development is being undertaken in a number of subprojects:
- In the first place, we're designing an XML schema. This schema (designated as namespace http://www.nexml.org/1.0) is documented on our wiki; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
-
Secondly, we're implementing nexml read/write abilities in a number of
software packages:
Mesquite now supports reading
and writing of nexml. This implementation has been developed by Peter Midford and Rutger
Vos.
At the most recent EvoInfo meeting, Xuhua Xia demonstrated
DAMBE's abilities
to read and write nexml data transparently.
The phylobase
package for R reads and writes
tree descriptions, with character matrices under way. This implementation
is being developed by Aaron Mackey.
David Swofford has started implementing nexml I/O for
paup. This implementation uses
C++ bindings analogous to xmlbeans.
Jeet Sukumaran has implemented nexml I/O for python, in a package
tentatively called pyNexml.
Weigang Qiu has started implementing nexml I/O for
BioPerl's TreeIO and
AlignIO interfaces, which under the hood reuse Rutger Vos's
Bio::Phylo
parser libraries.
In addition, Sergei Kosakovsky Pond is interested in nexml IO for hyphy. We've also had (very tentative) conversations with Paul Lewis of NCL (the nexus class libraries in c++) about nexml integration. In short, a lot of active (but sometimes alpha stage) development is going on. - Third, we're crossreferencing the nexml schema with the Character Data Analysis Ontology which is being developed by other members of the EvoInfo working group.
[ Back to top ]
Get involved!
If you are interested in being involved in the nexml project in any way, please do! Here are some ways to get involved:
- Get informed — information about the nexml project is distributed over the wiki (for an overview of vision, plans, implementation), documentation (for formal description of the schema) and the mailing list (for immediate plans and discussion).
- Try it out — the download section of the website has nightly builds of bindings for various languages. Take these for a spin!
- Contribute — if you are a programmer interested in extending nexml support, please contact us through the mailing list to get commit support for the subversion repository.
[ Back to top ]