|
|
OpenOffice.org filters using the XML based file format
Abstract: This document explains the implementation of OpenOffice.org import and export filter components, focusing on filter components based on the OpenOffice.org XML file format. It is intended as a brief introduction to developers that want to implement OpenOffice.org filters for foreign file formats.
Table Of Contents 2 The Innards of an OpenOffice.org Filter Component 2.3 Waiter, the Export Please! 2.4 A Second Look at the Filter Wrapper 2.8 Registering a New Filter With the Application 3.1 The Filter Wrapper: Instantiating the XML Filters 3.2 Exporting through the XML filter
1PreliminariesThey are several ways to get information into or out of OpenOffice.org: You can
Each of these ways has unique advantages and disadvantages, that I will briefly summarize: Using the core data structure and linking against the application core is the traditional way to implement filters in OpenOffice.org. The advantages this method offers are efficiency and direct access to the document. However, the core implementation provides a very implementation centric view of the applications. Additionally, there are a number of technical disadvantages: Every change in the core data structures or objects will have to be followed-up by corresponding changes in code that use them. Hence filters need to be recompiled to match the binary layout of the application core objects. While these things are manageable (albeit cumbersome) for closed source applications, this method is expected to create a maintenance nightmare if application and filter are developed separately, as is customary in open sources applications. Simultaneous delivery of a new application build and the corresponding filters developed by outside parties looks challenging. Using the OpenOffice.org API (based on UNO) is a much better way, since it solves the technical problems indicated in the last paragraph. The UNO component technology insulates the filter from binary layout (and other compiler and version dependent issues). Additionally, the API is expected to be more stable than the core interfaces, and it even provides a shallow level of abstraction from the core applications. In fact, the native XML filter implementations largely make use of this strategy and are based on the OpenOffice.org API. The third (and possibly surprising choice) is to import and export documents using the XML based file format. UNO-based XML import and export components feature all of the advantages of the previous method, but additionally provides the filter implementer with a clean, structured, and fully documented view of the document. As a significant difficulty in conversion between formats is the conceptual mapping from the one format to the other, a clean, well-structured view of the document may turn out to be beneficial. 2The Innards of an OpenOffice.org Filter ComponentFirst, we will try to get an overview of the import and export process using UNO components. Let's first attempt to gain a view of... 2.1The Big PictureAn in-memory OpenOffice.org document is represented by it's document model. On disk, the same document is represented as a file. An import component must turn the latter into the former as shown by the diagram (Illustration 1).
If you make use of UNO, this diagram can be turned into programming reality quite easily. The three entities in the diagram, (the file, the model, and the filter) all have direct counterparts in UNO services. The services themselves may consist of several interfaces that finally map into C++ or Java classes. The following diagram annotates the entities with their corresponding services and interfaces:
In Illustration 2 (and all following illustrations) the gray part marks the part a filter implementer will have to program, while the white parts are already built into OpenOffice.org. If the implementer decides to make use of the OpenOffice.org API
directly, this diagram is the proper starting point: The filter
writer must create a class that implements the 2.2Where XML Comes In...If the advantages of an XML based import or export are desired, the filter implementer may make use of the existing XML import and export components. This way, the import logic does not need to deal with the document model itself, but rather generates the document in its OpenOffice.org XML file format representation. Done in a naive way, such a filter component would generate the XML, write it to file, and then call the built-in XML import to read it again. Since the XML import is based on the SAX API however, a better way exists: The import logic calls the SAX API. Since the XML reader component implements the SAX API, the document thus gets translated from the foreign format into its XML representation and then into the document model without the need to use temporary files, or even to render and subsequently parse an XML character stream.
The link between the XML based import filter
and the XML reader is the SAX 2.3Waiter, the Export Please!The export into a foreign format may of course be implemented in
the same fashion. Instead of the
2.4A Second Look at the Filter WrapperHow do the built-in XML export or import components cooperate with
the self-programmed filter? As was briefly mentioned above, the
export filter services consist of two major interfaces: In the case of an XML-based filter, this functionality gets
distributed to two components. For the import, the built-in XML
import component implements the The export case is slightly more complicated. The additional
problem is that the filter(
) call of the 2.5The ServicesWe should now have a closer look at the involved services: The service The twin of the The The document model cannot be described by a single service, as it
obviously has to vary greatly, depending on the type of document (e.g.,
text or spreadsheet.) An example for a document model service is the
2.6InterfacesThe The interface The The The interface Initialization of components can be supported through the
Properties of the filters can be queried using the
2.7Built-in ComponentsAll of OpenOffice.org's applications have built-in XML import and export components. The component names are summarized in the following table:
Additionally, the XML reader and writer components should be
mentioned, even though they have not been discussed in the previous
chapters. These two components implement the XML reader (or parser)
and writer (or unparser) components used by OpenOffice.org for
writing all it's XML files. They implement (XML writer) or use (XML
parser) the 2.8Registering a New Filter With the ApplicationThere is a final, crucial step that will not be covered here: Registering a filter with the application. The registration process will make sure that the application knows the filter, and also knows which files the filter can be applied to. The filter registration is described here. 3Code examplesThis chapter is intended to give brief code examples for the crucial steps in creating XML-based import or export filters. We'll start with the filter wrapper, followed by short examples for importing into and export from the XML filters. 3.1The Filter Wrapper: Instantiating the XML FiltersThe filter wrapper needs to instantiate the built-in XML import or export components. The following code snippet will demonstrate this for an XML-based export filter.
3.2Exporting through the XML filterThe following code snippet could be located in a filter wrapper for an XML-based export filter. The following two methods implement the gist of a filter wrapper for an XML-based export. They are really simple because the filter wrapper doesn't really do much of its own. It only delegates to it's two components.
3.3Import: Writing into the XML FilterThe next example should detail how an import filter would
communicate with the XML import component. Basically, it only needs
to call the
using namespace ::com::sun::star;
// instantiate the XML import component
::rtl::OUString sService =
::rtl::OUString::createFromAscii("com.sun.star.comp.Writer.XMLImporter")
uno::Reference<xml::sax::XDocumentHandler> xImport(
xServiceFactory->createInstance(sService), uno::UNO_QUERY );
ASSERT( xImport.is(), "can't instantiate XML import" );
// OK. Now we have the import. Let's make a real simple document.
// a few comments:
// 1. We will use string constants from xmloff/xmlkywd.hxx
// 2. For convenience, we'll use a globally shared attribute list from the
// xmloff project (xmloff/attrlist.hxx)
// 3. In a real project, we would pre-construct our OUString, rather than use
// the slow createFromAscii(
) method every time.
// We will write the following document: (the unavoidable 'Hello World!')
// <office:document
// office:class="text"
// xmlns:office="http://openoffice.org/2000/office"
// xmlns:text="http://openoffice.org/2000/text" >
// <office:body>
// <text:p>Hello World!</text:p>
// </office:body>
// </office:document>
SvXMLAttributeList aAttrList;
xHandler->startDocument();
// our first element: first build up the attribute list, then start the element
// DON'T FORGET TO ADD THE NAMESPACES!
aAttrList.AddAttribute(
::rtl::OUString::createFromAscii("xmlns:office"),
::rtl::OUString::createFromAscii("CDATA"),
::rtl::OUString::createFromAscii("http://openoffice.org/2000/office") );
aAttrList.AddAttribute(
::rtl::OUString::createFromAscii("xmlns:text"),
::rtl::OUString::createFromAscii("CDATA"),
::rtl::OUString::createFromAscii("http://openoffice.org/2000/text") );
aAttrList.AddAttribute(
::rtl::OUString::createFromAscii("office:class"),
::rtl::OUString::createFromAscii("CDATA"),
::rtl::OUString::createFromAscii("text") );
xHandler->startElement(
::rtl::OUString::createFromAscii("office:document"),
aAttrList );
// body element (no attributes)
aAttrList.clear();
xHandler->startElement(
::rtl::OUString::createFromAscii("office:body"),
aAtrList );
// paragraph element (no attributes)
aAttrList.clear();
xHandler->startElement(
::rtl::OUString::createFromAscii("text:p"),
aAtrList );
// write text
xHandler->characters(
::rtl::OUString::createFromAscii("Hello World!") );
// close paragraph
xHandler->startElement(
::rtl::OUString::createFromAscii("text:p"),
// close body
xHandler->endElement(
::rtl::OUString::createFromAscii("office:body") );
// close document element
xHandler->endElement(
::rtl::OUString::createFromAscii("office:document") );
// close document
xHandler->endDocument();
4Appendix4.1 Other UsesThis chapter briefly mentions a few other uses of XML-based filter components that provide additional value and versatility. In some circumstances, it may be desirable to have standalone
format conversion tools. This would, for example, enable batch
conversion of legacy documents. The XML-based filter components allow
us to do that with little extra effort. Let us recall that an
XML-based import filter uses OpenOffice.org's built-in XML import to
generate the document. It calls the (generic)
A different possible use is the chaining of
XML-based filters. Suppose the foreign file format in question is
also based on XML. Now it doesn't make sense to re-implement the XML
parser inside that component, so it seems natural to use the existing
parser (or unparser) component. This way, our import (or export)
filter would have to implement the
Note that, if the other application is also an OpenSource application, it could use UNO component technology as well, and thus use the very same filter components for its own import and export. A filter converting from the foreign XML into OpenOffice.org XML would be an import filter for OpenOffice.org, and simultaneously an export filter for the other application. As OpenOffice.org is being developed further, it becomes likely that eventually changes to the file format will have to be made. It is mandatory to supply users with the ability to read and write the old formats of course. This could indeed be handled by an XML to XML transformation, with one format being the old OpenOffice.org XML format, and the other being the new format. Note that such a filter could also be used by users of the older versions to read and write documents in the new format! Additionally, it could be chained between other XML-based import or export filters, allowing users to utilize import and export filters for versions other than their own. Essentially, this would achieve a decoupling of application, filter, and file format version. The opportunities this opens up are quite amazing: If a new file format is implemented, users would not be forced to upgrade their application to make use of the new filter. Also, users of newer application versions could still use filters developed for the older format. 4.2ResourcesThe following resources may provide additional information:
|








