RefDB-lite - a First Stab

Permission is granted to copy, adapt, modify and distribute this document under the terms of the GNU Free Documentation License.

Mar 19 2006

Abstract

Exploring the boundaries, fringes and possibilities of RefDB - the TEI and DocBook compatible reference database.

mailto:refdb-users@lists.sourceforge.net

http://refdb.sourceforge.net/


Table of Contents

1. Introduction
References
2. Useage
References
3. Making Citations
References
4. XSLT Document Processing
References
5. Constructing the "raw" Bibliography
References
6. Possible Customisations
References
7. Configuring the RefDB-lite gateway server
References
8. Example Document Structures
9. The Pitfalls
References
1. Citation Formatting Tests
References

List of Examples

8.1. A perfectly valid book
8.2. Another perfectly valid book
8.3. A third perfectly valid book
8.4. A perfectly valid part
8.5. A perfectly valid article
8.6. A wierd concoction that should still be valid

Chapter 1. Introduction

Table of Contents

References

Two of the more popular dialects for XML authoring are those of the Text Encoding Initiative (1), and DocBook (2). A third, and relative newcomer on the block is the Darwin Information Typing Architecture (DITA) (3), though it remains outside the scope of this package. Both of the former dialects have extensive facilities for encoding or encapsulating bibliographic information, but their associated tools are devoid of comprehensive bibliographic authoring and collation facilities (to the best of the present authors' knowledge).

The program RefDB (4) fills that void. RefDB implements a relational database interface to various database management systems such as SQLite, MySQL and PostgreSQL. RefDB has a two-tier client-server architecture, providing methods for adding, retrieving and searching out reference data in externally managed databases. RefDB also provides convenience commands and interfaces for editing and annotating the reference data contained therein and for formatting the citations and bibliographical data that are emitted. Typically "cooked", i.e. preformatted, citations and bibliography entries are then simply included in, or with, the source TEI or DocBook douments, written in iether XML or SGML formats.

But as powerful as RefDB is, there are some instances where the overhead of a full database management system may be inexpedient, or might unnecessarily restrict the portability of the source document. Maybe you work in a small laboratory without the man-power to devote to comprehensive reference management, or perhaps a document needs to exist standalone in its entirety, or will be used in non-standard ways, such as submission to different journals (not that You would ever need that!), or inclusion in laboratory webpages and productivity manuals. In such situations it could well be useful to apply standard bibliographic and citation style formatting to the complete document, formatting it in different ways consistent with those different purposes.

For those somewhat esoteric reasons, some of the useful sorting and formatting features of the RefDB package have been externalised here as a CITESTYLE (5) driven eXtensible Stylesheet Language Transformation (6). RefDB-lite is XSLT1.0 compatible but makes considerable use of the standard extension function, exsl:node-set(). In fact that is a fundamental prerequisite for any XSLT1.0 engine intended to apply these stylesheets.

For the moment, RefDB-lite has been written explicitly with DocBook output in mind and, in fact, that is further restricted to XML documents rather than the combined XML and SGML support of RefDB. However, there are no compelling reasons why analogous XML transformation stylesheets could not be written to extend formatting to other documentation systems, such as TEI. For that matter, its possible that the programmatic logic could be mapped into DSSSL to cater for SGML documents, though this is left as an exercise for the reader.

Good luck.

References

1. TEI Consortium. Text Encoding Initiative. . (2000). Charlottesville TEI ConsortiumTEI-C.

2. DocBook Technical Committee. DocBook 4.5b1. . (2005). Billerica Organization for the Advancement of Structured Information StandardsOASIS.

3. IBM. Darwin Information Typing Architecture. . (2000). Somecity International Business MachinesIBM.

4. Hoenicka Markus. RefDB 0.9.6. . (2005). .

5. Hoenicka M. (2005)

6. W3C. XSL Transformations (XSLT) Version 1.0. . (1999). Cambridge World Wide Web ConsortiumW3C.

Chapter 2. Useage

Table of Contents

References

Using RefDB-lite involves little more than adding an import instruction to your existing XSLT customisation file.

<xsl:import href="./docbook-xsl-1.69.0/html/chunk.xsl"/>
<xsl:import href="./refdb-lite/xsl/docbook/html/biblio.xsl"/>

The import order above is quite crucial, because RefDB-lite overrides some of the processing templates of the standard DocBook XSL stylesheets. You would replace instances of html in the above with xhtml or fo for alternative output styles (if and when implemented).

Locating the RefDB style file

It is also necessary to select a bibliography style file (CITESTYLE) such as Eur.J.Pharmacol.xml. This could be selected from command line arguments using xsltproc (1) for example:

    --stringparam  refdb.citation.style.file.name  "Eur.J.Pharmacol.xml"

or alternatively by configuring it within your customisation file.

<xsl:param name="refdb.citation.style.file.name"
      select="'Eur.J.Pharmacol.xml'"/>

Of course, both of these selection mechanisms assume that your desired style exists within the RefDB-lite styles directory. If that isn't the case, you need to override the xsl:variable

<xsl:variable name="refdb.citation.style" select="document(
   concat('../../../styles/',$refdb.citation.style.file.name),document(''))/CITESTYLE" />

The document() function used above locates the style file with respect to the refdb-lite/xsl/docbook/common/collation.xsl file. Replace document('') in the above with "/" in your customisation, to locate your style file with respect to your source document.

Locating the Auxilliary Reference Database

For use with DocBook you can specify an auxilliary reference database file. This is simply a single DocBook XML file containing a bibliography rootnode and filled with biblioentrys with id attributes that may or may not match those of your biblioref citation targets. To specify this file you should iether add

<xsl:param name="bibliography.collection" select="'docbook.bib.data.xml'"/>

to your customisation file, or set it on the command line as an argument e.g. for xsltproc:

     --stringparam  bibliography.collection  "./docbook.bib.data.xml"

Obviously this assumes that docbook.bib.data.xml is the relative (w.r.t. your source doc) pathname to your raw bibliography database.

Interfacing with RefDB

For citation references that do not resolve iether internally, or to the external database file (bibliography.collection), its possible to initiate an HTTP GET connection to a remotely running RefDB installation. XSLT_1.0 blesses the document() with URL read ability, potentially through a Common Gateway Interface (CGI) program that returns the missing items from your bibliography. Because this interface only ever needs to read references from RefDB and its underlying databases, it is best achieved by creating a "public" user-account configured with read-only access permissions (http://refdb.sourceforge.net/manual/sect1-add-user.html), that would be used by everyone intending to work with RefDB-lite. This would provide the greatest level of reference database security.

Notwithstanding the potential severe security breach, the potential user intending to imbue RefDB-lite with CGI access will need to set some or all of the following parameters in their XSLT customisation file:


<xsl:param name="refdb.server.address"
   select="'http://localhost/refdb/refdb-lite/refdb-lite-server.cgi'" />
<xsl:param  name="refdb.server.username" select="'anon'" />
<xsl:param  name="refdb.server.password" select="'password'" />
<xsl:param  name="refdb.server.default.database" select="'refs'" />
<xsl:param  name="refdb.server.data.format" select="'risx'" />
<xsl:param  name="refdb.server.timeout" select="1"/> <!-- autodestroy session id in N minutes?? -->

In the interests of local security, you could add some or all of those parameters to an auxilliary customisation file with read/write permissions restricted only to yourself (chmod 600 auxfile.xsl), and then include that within your primary customisation file:

<xsl:include href="auxfile.xsl"/>

While that will protect your RefDB access account details locally, your database password is still passed in cleartext form to the HTTP access URL and that is easily intercepted mid flight, as well as being logged by the gateway server access and error logs (Have to look into this a bit more, but presumably we could negotiate a variable password encoding key of some description and implement that in XSLT?). Do tread very, very carefully!

References

1. Veillard Daniel. xsltproc. (1.1.15-2). (2005). .

Chapter 3. Making Citations

Table of Contents

References

The RefDB-lite stylesheets are, in essence, activated on matching <citation role="REFDB"><biblioref endterm="SomeRef2003-X"/> </citation> elements. For RefDB-lite there are no separate intermediate, short-citation expansion and collation processing phases, analogous to those required with the runbib or refdbib commands, documented in the RefDB manual (1). Note also that in contrast to RefDB usage, we use the biblioref DocBook element rather than xref and we use the corresponding endterm attribute rather than linkend of the latter. The utility of the role="REFDB" attribute is also debateable. The underlying philosophy was to permit coexistence with standard DocBook citation and cross-referencing mechanisms, but as it stands, that may not work and it seems unlikely to be reliable because RefDB-lite doctors the target bibliography substantially (and yet ...?).

The biblioref element was introduced c.a. DocBook 4.3 and is a more appropriate container for bibliographic reference info. Moreover, the linkend attribute of xref is intended to be an IDREF (reference) to a specific unique reference ID, but for RefDB-lite those ID's may only exist in the transformed document, so that a stand-alone source XML document including unresolvable xrefs may technically be invalid.

[Note] Note

According to DocBook 5.0, biblioref has the following attributes:

begin     (token)  e.g. start page number
end       (token)  e.g. end page number
endterm   (IDREF)
units     (token)  e.g. page
xrefstyle          e.g. extra formatting style info 

of which you will note that endterm should also point to a unique document ID, identical to the xref's linkend attribute.

You should note also that the -X endterm extensions are stripped and, in the simple case, SomeRef2003 would refer to a <biblioentry id="Smith00"/> entry in the transformed document's <bibliography/>.

Taking the above into consideration, citations are therefore made in RefDB "full" notation, where the ultimate citation format is governed by the trailing "-N" extension on the endterm. In keeping with the "full" notation, for a document that knowingly makes use of the RefDB server interface to resolve missing references, it is possible to prepend the database identification string to the front of the RefDB citekey identifier, for example: endterm="MyOtherRefDB-Smith00-X". That would only be necessary if you are resolving references from multiple RefDB databases and need to distinguish between them. The default database, if needed, should be identified in the XSLT customisation parameter refdb.server.default.database

The following citation formatting rules are intended to apply:

 endterm="Smith00-X"  ->  (Smith, Jones & Murphy, 2000) INTEXTDEF first
 endterm="Smith00-S"  ->  (Smith et al., 2000)          INTEXTDEF subsequent
 endterm="Smith00-W"  ->  Smith, Jones & Murphy (2000)  combined AUTHORONLY/YEARONLY first 
 endterm="Smith00-U"  ->  Smith et al., (2000)          combined AUTHORONLY/YEARONLY subsequent 
 endterm="Smith00-A"  ->  Smith, Jones & Murphy         AUTHORONLY first
 endterm="Smith00-Q"  ->  Smith et al.                  AUTHORONLY subsequent
 endterm="Smith00-Y"  ->  (2000)                        YEARONLY
i.e.  -X -W -A  for initial citations
and   -S -U -Q  for subsequent citations

In fact -S, -U and -Q aren't needed at all because with the source document at hand we can just count the number of previous citations using the same key and can adjust the style accordingly (when xsl:number works properly!). On the other hand, when RefDB itself provides active support for biblioref elements, then compatibility could require them, for rigorous citation formatting.

But in fact that is not the final word on citation formatting, because the specific citation renderings are specified in every RefDB CITESTYLE (2) file, e.g.:


  <CITSTYLE> 
    <INTEXTDEF><REFNUMBER/></INTEXTDEF>
    <YEARONLY><REFNUMBER/></YEARONLY>
    <AUTHORONLY>
      ...
    </AUTHORONLY>
  </CITSTYLE> 

With those specifications, the citations above could present as:

 endterm="Smith00-X"  ->  [1]
 endterm="Smith00-S"  ->  [1]
 endterm="Smith00-W"  ->  Smith, Jones & Murphy (2000) (should that be [1]?) 
 endterm="Smith00-U"  ->  Smith et al., (2000)          (should that be [1]?) 
 endterm="Smith00-A"  ->  Smith, Jones & Murphy     
 endterm="Smith00-Q"  ->  Smith et al.                  
 endterm="Smith00-Y"  ->  [1]

and the corresponding bibliography would be ordered by citation sequence, and accordingly indexed.

I would also like to believe that a style configuration file resembling the following:


  <CITSTYLE> 
    <INTEXTDEF><CITEKEY/></INTEXTDEF>
    <YEARONLY><CITEKEY/></YEARONLY>
    ...
  </CITSTYLE> 

might permit citations of the form:

 endterm="Smith00-X"  ->  [Smith00]
 endterm="Smith00-S"  ->  [Smith00]
 endterm="Smith00-W"  ->  Smith, Jones & Murphy (2000) 
 endterm="Smith00-U"  ->  Smith et al., (2000)          
 endterm="Smith00-A"  ->  Smith, Jones & Murphy     
 endterm="Smith00-Q"  ->  Smith et al.                  
 endterm="Smith00-Y"  ->  [Smith00]

with the corresponding bibliography ordered alphabetically and indexed in terms of citation keywords. This would provide consistency with the standard DocBook bibliographic style as documented in DocBook: The Definitive Guide (3). As yet it is not supported.

[Note] Note

There is no facility for page-note i.e. bottom of page style bibliographies in DocBook and consequently none in RefDB-lite.

Personally I would like to see new, possibly configurable citation formats such as:

 endterm="Walsh99-T"  ->  DocBook: The Definitive Guide (hyperlinked to bib)
 endterm="Smith00-F2" ->  Some extended title/volume CITESTYLE specified free format

This would simplify writing extended citations for things like "DocBook: The Definitive Guide (3)" by abbreviating


<citetitle>DocBook: The Definitive Guide</citetitle> 
     <citation role="REFDB"><biblioref endterm="Walsh99-X"/></citation>. 

to just:


     <citation role="REFDB"><biblioref endterm="Walsh99-TW"/></citation>. 

That would be very useful for citing software programs too. But is it practical and likely to be compatible with some future RefDB? Dunno.

References

1. Hoenicka M. (2005)

2. Hoenicka M. (2005)

3. Walsh N., Muellner L., and Stayton B. (1999) O'Reilly & Associates Inc., Sebastopol

Chapter 4. XSLT Document Processing

aka: How it All Works

Table of Contents

References

For an in depth review of XSLT (1), you are urged to consult a good reference book, such as the XSLT Programmer's Reference (2).

With that under your belt, you would clearly grasp that RefDB-lite works by initially creating two global xsl:variables named refdb.citation.style and refdb-lite.bib.doc. The former loads the RefDB citation style, with the latter being a temporary result-tree node-set that contains an assembly of source document citation references and their accompanying, resolved and sorted, bibliographic data. Because these are global variables, they are evaluated in a pre-parse stage, typically before the target transformation template matching rules are applied to the document root node - and certainly before any template matching rules encounter their first citation node in the source document.

The upshot of the foregoing is that by the time that first citation tag is encountered, both the citation formatting style and the relevant reference data are both on hand to transform that citation to its desired output format and link it with an appropriate bibliographic entry.

The citations are collated in document order (for numerical schemes) or alphabetically by citekey (for citekey based schemes) or by author (for author-year schemes) and saved to a temporary result tree as a list of, somewhat abused, DocBook (3) xref tags with associated linkend attributes containing unique citation basenames (stripped of -X extensions), all encapsulated within a citations fragment.

The citations node also holds a complete DocBook bibliography, containing fully resolved biblioentry elements that hold the "raw" reference data obtained iether from within the document, or, if missing, resolved from the auxilliary bibliography database indicated by the DocBook xsl:parambibliography.collection. The remaining unresolved biblioentry are optionally, further resolved from a remote RefDB HTTP gateway server. Such URL resolution occurs on an entry by entry basis, with each access resolving to a single biblioentry. In actuality, the RefDB server serves RISX (4) documents and these are XSL Transformed to a compliant DocBook "raw" format. Failure to resolve at this stage is considered fatal and terminates further stylesheet processing.

For a source document containing multiple bibliography elements, the temporary refdb-lite.bib.doc node-set resembles the following structure:


 <refdb-lite>
   <citations id="BIB1-parent-id">
     <xref linkend="ref1" database="" type="A" sortkey="ref1"/>  
     <xref linkend="ref2" database="" type="A" sortkey="ref2"/>  
     <xref linkend="ref3" database="db2" type="A" sortkey="ref3db2"/>  
     <bibliography id="bib1-id">
       <biblioentry id="ref1">
         <abbrev>ref1</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
       <biblioentry id="ref2">
         <abbrev>ref2</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
       <biblioentry id="db2-ref3">
         <abbrev>ref3</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
     </bibliography>
   </citations> 
   <citations id="BIB2-parent-id">
     <xref linkend="ref4" database="db2" type="A" sortkey="ref4db2"/>  
     <xref linkend="ref2" database="" type="A"    sortkey="ref2"/>  
     <xref linkend="ref3" database="db2" type="A" sortkey="ref3db2"/>  
     <bibliography id="bib1-id">
       <biblioentry id="db2-ref4">
         <abbrev>ref4</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
       <biblioentry id="ref2">
         <abbrev>ref2</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
       <biblioentry id="db2-ref3">
         <abbrev>ref3</abbrev>
         <bibliomisc role="sortkey">AUTH1AUTH2AUTHnDATE</bibliomisc>
         <biblioset>
           ...
         </biblioset>
       </biblioentry>
     </bibliography>
   </citations> 
 </refdb-lite>

By the time that citation elements are encountered and actively being transformed, the xref elements in the above are fully redundant, having served their purpose of ordering the biblioentrys.

At transformation time then, each source document citation/biblioref node is linked to a target biblioentry with an attribute id created from the following components: {bibL-}{dbM-}{refN}. If the source document contains a single bibliography element, then the {bibL-} component is omitted (L is an integer and "bib" is specified by the global parameter refdb-multi-bib-prefix. If the source citation <biblioref endterm="?-?-X"/> did not specify a database component, then the {dbM-} component is also omitted.

The content of each biblioentry in the above listing is comprised of DocBook "raw" bibliographic information. Thus, it can be equally well included in the source DocBook document, or stored in an external bibliography resource file.

In addition to the "raw" DocBook data, the refdb.raw.biblist biblioentry data is augmented with a bibliomisc role='sortkey' attribute. The sortkey is comprised of the uppercased concatenation of all PRIMARY author surnames OR SECONDARY editor surnames OR TERTIARY editor surnames OR the PRIMARY title AND then postfixed with the publication year. Very probably this is publication style specific as well as reference type specific. You could well wish to customize the collation.xsl refdb.add.raw.bibentry template to accomodate your sorting requirements.

During ordinary DocBook XSLT (5) stylesheet processing, when the templates finally encounter a bibliography node, the RefDB-lite stylesheets override the default templates and call the refdb.process.bibliography template. This applies the previously requested publication style to all biblioentrys contained within the corresponding refdb-lite.bib.doc bibliography node-set, matched by its associated id attribute.

References

1. W3C. XSL Transformations (XSLT) Version 1.0. . (1999). Cambridge World Wide Web ConsortiumW3C.

2. Kay M. (2003)

3. Walsh N., Muellner L., and Stayton B. (1999) O'Reilly & Associates Inc., Sebastopol

4. Hoenicka M. (2005)

5. Stayton B. (2005) Sagehill Enterprises, Santa Cruz

Chapter 5. Constructing the "raw" Bibliography

Table of Contents

References

RefDB-lite requires an input source of biblioentry data in "raw" DocBook format. The organisation of DocBook bibliographies is incredibly flexible so in fact it is necessary to restrict the potential formats to a consistent though legal DocBook subset. Otherwise the source document would not validate. The raw bibliography format itself is a tentative proposal and open for negotiation. It should capture most of the potential of RISX (1) and ultimately, perhaps MODS (2).

Each biblioentry must have an id attribute corresponding to the basename of a citation biblioref element.

  <biblioentry role='JOUR' id='FoxCoef'>                   1
    <abbrev>FoxCoef</abbrev>

    <biblioset relation='JOUR'>                            2
      <titleabbrev role='SECONDARY'>Acta Cryst.</titleabbrev> 3
    </biblioset>

    <biblioset relation='SERIES' role='TERTIARY'>          4
      <titleabbrev role='TERTIARY'>A</titleabbrev>
    </biblioset>
    <bibliomisc role='USERDEF1'>A</bibliomisc>               5

    <biblioset relation='ARTICLE'>                         6
      <authorgroup role='PRIMARY'>                         7
        <author>
          <firstname>A</firstname>
          <othername role='mi'>G</othername>
          <surname>Fox</surname>
        </author>
        <author>
          <firstname>M</firstname>
          <othername role='mi'>A</othername>
          <surname>O'Keefe</surname>
        </author>
        <author>
          <firstname>M</firstname>
          <othername role='mi'>A</othername>
          <surname>Tabbernor</surname>
        </author>
      </authorgroup>
      <title role='PRIMARY'>Relativistic Hartree-Fock X-ray and electron atomic scattering factors
at high angles</title>
      <volumenum>45</volumenum>
      <pubdate role='PRIMARY'>1989</pubdate>           8
      <pagenums role='start'>786</pagenums>            9
      <pagenums role='end'>793</pagenums>
      <bibliosource class="uri">
        <ulink url='http://www.iucr.org/paper?hh0289'/>      10
      </bibliosource>
    </biblioset>

  </biblioentry>
1

The role must correspond to one of the RIS reference TY (type) entries: "ABST", "ADVS", "ART", "BILL", "BOOK", "CASE", "CHAP", "COMP", "CONF", "CTLG", "DATA", "ELEC", "GEN", "HEAR", "ICOMM", "INPR", "JFULL", "JOUR", "MAP", "MGZN", "MPCT", "MUSIC", "NEWS", "PAMP", "PAT", "PCOMM", "RPRT", "SER", "SLIDE", "SOUND", "STAT", "THES", "UNBILL", "UNPB" or "VIDEO"

2

The relation attribute should be something consistent to identify the class of data, in this instance a JOURnal title.

3

The titleabbrev is used to specify journal titles and should probably conform to the abbreviation styles of Chemical Abstracts(??). It should be uniformly preformatted as J.Irrep.Results. Use "." with no space to identify an abbreviated word or a space to indicate a complete word.

4

The SERIES titleabbrev does not exist in RIS, so it is obsolete here because there is no way to specify how to use it in the CITESTYLE file when it is needed.

5

Instead the SERIES titleabbrev is stashed here and accessed as USERDEF1 in PUBTYPE styles.

6

This contains reference specific details, rather than general info.

7

Article authors are PRIMARY authors. Book editors are SECONDARY authors. Series editors are TERTIARY editors.

8

What is a SECONDARY date? Submitted, accepted?

9

Split on role just to be pedantic.

10

A ulink element could be obtained from a RefDB bibliographic database, but an olink could exist in a documents static internal bibliography.

If the "raw" data exists in an external database, then the internal document bibliography could be comprised of "empty" biblioentry tags, one to match each citation basename used within the document. However, this is not strictly necessary, and is probably superfluous.

<bibliography> <title>Bibliography</title>
 <biblioentry id="Smith00"/>
 <biblioentry id="Walsh99"/>
 <biblioentry id="Stayton05"/>
</bibliography>

In this case the associated "raw" data must be accessible through the bibliography.collection variable or the RefDB CGI (web) interface.

References

1. Hoenicka M. (2005)

2. Library of Congress. Metadata Object Description Schema (MODS). . (2004). Washington The Network Development and MARC Standards OfficeLoC.

Chapter 6. Possible Customisations

Table of Contents

References

Being an all XSLT solution, the options and possibilities for customisation are endless. To that extent, many of the actual formatting templates were written expressly to provide opportunities and hooks to capture the processing at different stages and to cater for different reference types in different fashions. These options exist over and above the style options provided by the CITESTYLE (1) file:

In addition, there are many possibilities to configure the gross behaviour and common processing features.

Remote raw DocBook Bibliography Database

The XSLT parameter refdb.bibliography.collection.relative defines a node-set with respect to which the DocBook XSL (2) auxilliary bibliographic database file parameter, bibliography.collection is located. The default definition of refdb.bibliography.collection.relative loads the bibliography.collection database with respect to your DocBook source document, which seems the logical choice. It could however, be made with respect to the RefDB-lite stylesheet directory.

Conceivably the template that makes use of the refdb.bibliography.collection.relative parameter (that would be refdb.raw.biblist in file collate.xsl) could be overridden to access that bibliography database via HTTP. According to my understanding of Kay (3), the XSLT document() function only ever loads a given unique URL once, regardless of the modification status of that URL. So basically, it would not be inefficient to make multiple access calls to such a database file ...

Simplify the citation format

One possibility is to customise citation matching to obviate the role="REFDB" requirements on citation. You would need the following template added to your customisation file.


<xsl:template match="citation">
  <xsl:choose>
    <!-- xsl:when test="@role='REFDB' and child::biblioref" -->
    <xsl:when test="child::biblioref" >
      <xsl:call-template name="refdb-render-citation"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-imports /> <!-- normal DocBook XSL citations -->
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

You might also need this:


<xsl:template match="citation" mode="refdb-lite.collate.mode">
   <!-- ... -->
</xsl:template>

Match on biblioref and/or xref

This is closely related to the previous customisation. Search for all bibliorefs and replaces with xref

Uncited References

Another possibility is to permit uncited references to be included in the bibliography:

<xsl:param name="refdb.use.uncited.references" select="1"/>

to your customisation file, or set it on the command line as an argument e.g. for xsltproc:

     --param  refdb.use.uncited.references  1

Apply Conditional Processing Beyond the CITESTYLE Limit

Do some references need special treatment? Need a sort key based on series title/volume/date instead of by author/date? Get in there and customize your own refdb.add.raw.bibentry template.

Special Treatment for USERDEFn and MISCn

How are these things used in practise? Can we standardise some uses? For instance, the Acta.Cryst.xml style uses, (I believe) USERDEF1 as a container for the abbreviated series title (that being A, B, C, D or E. Was that a good choice? Is it treated correctly?

You really want specialised citation formats?

You want to cite by title? or include a computer program title as part or all of a citation link? This would be a very useful customisation!

Convert a RISX database file to DocBook raw

Dump your RefDB database as RISX and apply the RefDB-lite? RISX-to-DocBook-raw XSL Transformation stylesheet as a once-off conversion.

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:import href="refdb-lite/xsl/docbook/risx/risx2dbk.xsl"/>
<xsl:template match="/">
  <xsl:call-template name="RISX-to-DocBook-raw">
    <xsl:with-param name="risx" select="."/>
  </xsl:call-template>
</xsl:template>
</xsl:stylesheet>

Want to convert DocBook raw back to RISX?

Write yourself a complete XSL Transformation stylesheet. It can't be that hard to do? Can it?

Actually, there may be problems of ambiguity.

Choose the citation style based on a processing instruction?

You want to change citation styles midstream? Initially, that sounds a bit inconsistent and esoteric. Yet it could happen, particularly in, say, different parts of a book, or different books of a set, or to faithfully amalgamate published articles from journals, with differing styles, into a thesis.

It would require a significant reorganisation, rather than a trivial XSLT customisation. But it is not inconceivable.

What else?

Do you need to resolve DOI adresses or MODS via SRU? http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/02/26/opa-proxy-script I have no idea what that involves!

Do we have to handle citebiblioid as well?

References

1. Hoenicka M. (2005)

2. Stayton B. (2005) Sagehill Enterprises, Santa Cruz

3. Kay M. (2003)

Chapter 7. Configuring the RefDB-lite gateway server

Table of Contents

References

Although installation of the RefDB-lite CGI gateway script is trivial, there are a number of reasonably intuitive, but essential prerequisites.

  • For testing, you need a client computer with XSLT capability and both the DocBook and RefDB-lite stylesheets installed and your your DocBook XML source document, including citations.

  • You need a second computer running an HTTP server, such as Apache. This computer must have the RefDB client program refdbc installed and operational.

  • You need a third computer running the RefDB refdbd daemon process and hosting the SQL based bibliographic reference-database.

Installing RefDB is beyond the scope of this document. Consult the RefDB manual (1) for details.

In actual fact, all three of those processes can be run concurrently on the same computer, communicating via the loopback (lo) ethernet interface. This would be the securest scenario. Then, your RefDB CGI interface would be accessed via the localhost address, for example:

 <xsl:param name="refdb.server.address"
   select="'http://localhost/refdb/refdb-lite-server.cgi'" />
 

Assuming your prerequisites are in place, and fully operational, you need to install the RefDB-lite server script and configure your HTTP server to enable CGI execution permissions in the scripts installation directory.

  1. On your HTTP server computer, create a directory such as /usr/local/share/refdb/www/

  2. Copy the RefDB-lite server script, refdb-lite/www/server/refdb-lite-server.cgi to the directory /usr/local/share/refdb/www/. It has to be executable (chmod 555 refdb-lite-server.cgi).

  3. Configure your HTTP server to serve files and enable CGI execution of .cgi files in the /usr/local/share/refdb/www/ directory. For the Apache2 server running on a Debian GNU/Linux box, this ammounts to adding the following to iether /etc/apache2/httpd.conf or, better, a file in the /etc/apache2/sites-enabled/ directory, such as 000-default:

     
        Alias /refdb/ "/usr/local/share/refdb/www/"
    
        <Directory "/usr/local/share/refdb/www/refdb-lite">
            Options  +ExecCGI
            AddHandler cgi-script .cgi
            AllowOverride None
            Order allow,deny
            Allow from all
        </Directory>
     

    Follow that with /etc/init.d/apache2 restart and you should be away!

[Warning] Warning

You use this script at your risk! Potentially it is a very large security hole. Tread carefully. Read the source code. Figure out how to do it better!

Server Operation

To communicate with the RefDB refdbc client, the XSLT client should first provide a username and database password. This info is stored in a /tmp/refdb-sid* filename to provide persistence of state over repeated transactions. Sadly, the temporary file permissions currently permit all bonafide user accounts on the server to read them (You thought I was joking about the security thing didn't you!).

Worse than that, under a heavy HTTP server load, it is possible that your HTTP server thread could be reallocated to service a different HTTP client address and in doing so, reallocate your session state file to the new client. You'll know when it happens, because your RefDB references won't resolve and the stylesheet processing will abort with a message saying you aren't recognised any more.

I would like to think that people who properly knew what they were doing could improve on this relatively easily.

References

1. Hoenicka M. (2005)

Chapter 8. Example Document Structures

For any particular citation, the basic principle for deciding the appropriate bibliography to insert the associated biblioentry data into and then to link to, is that it will the first bibliography child of its closest ancestor. The following examples should therefore demonstrate plausible DocBook document structures where RefDB-lite would operate in a perfectly logical and consistent manner.

Example 8.1. A perfectly valid book

    <book>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
        <bibliography id="bib1"/>
      </chapter>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib2 -->
        <bibliography id="bib2"/>
      </chapter>
      <appendix>
        <para><citation/></para> <!-- definitely pointing to bib3 -->
        <bibliography id="bib3"/>
      </appendix>
    </book>

Example 8.2. Another perfectly valid book

    <book>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <appendix>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </appendix>
      <bibliography id="bib1"/>
    </book>

Example 8.3. A third perfectly valid book

    <book>
      <bibliography id="bib1"/>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <appendix>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </appendix>
    </book>

Example 8.4. A perfectly valid part

    <part>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <chapter>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </chapter>
      <bibliography id="bib1"/>
      <appendix>
        <para><citation/></para> <!-- definitely pointing to bib2 -->
        <bibliography id="bib2"/> <!-- but if bib2 was missing, then also to bib1 -->
      </appendix>
    </part>

Example 8.5. A perfectly valid article

    <article>
      <section>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </section>
      <section>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </section>
      <appendix>
        <para><citation/></para> <!-- definitely pointing to bib1 -->
      </appendix>
      <bibliography id="bib1"/>
    </part>

Example 8.6. A wierd concoction that should still be valid

    <article>
      <articleinfo>
        <para><!-- a citation here would have no target bibliography --></para>
      </articleinfo>
      <section>
        <para><citation/></para> <!-- definitely pointing to bib3 -->
        <section>
          <para><citation/></para> <!-- definitely pointing to bib1 -->
          <bibliography id="bib1"/>
        </section>
        <section>
          <para><citation/></para> <!-- definitely pointing to bib2 -->
          <bibliography id="bib2"/>
        </section>
        <section>
          <para><citation/></para> <!-- definitely pointing to bib3 -->
        </section>
        <section>
          <para><citation/></para> <!-- definitely pointing to bib3 -->
        </section>
        <bibliography id="bib3"/>
      </section>
    </part>

Chapter 9. The Pitfalls

Table of Contents

References

The astute author would note that DocBook permits bibliography elements within appendix, article, book, chapter, glossary, part, preface, sect1, sect2, sect3, sect4, sect5 and section parents. RefDB-lite, should generally work quite happily with those. However, it is technically possible to create a complete nonsense hierarchy without too much effort, such as the following pseudo DocBook structure:

  <set>
    <setinfo>
       <citation/> <!-- With no possible target bibliography -->
    </setinfo>
    <book>
      <bibliography id="bib1"/>
      <chapter>
        <citation/> <!-- definitely pointing to bib2 -->
        <bibliography id="bib2"/>
      </chapter>
      <chapter>
        <citation/> <!-- Which bibliography should this point to?-->
      </chapter>
      <bibliography id="bib3"/>
      <appendix>
        <citation/> <!-- Which bibliography should this point to ?-->
      </appendix>
    </book>
  </set>

Clearly there is ample scope for processing confusion. But, it is to be hoped that the wary author will exercise a modicum of restraint in their placement of bibliographies, despite the flexibility that DocBook provides.

Another major pitfall lies with the mismatch between the RISX, and DocBook "raw" bibliographic formats, and the application of the CITESTYLE formatting thereof. To be clear, RISX is a relatively simple mapping into XML of the aged, but widely supported, RIS bibliographic information format. Although useful, RIS, is rather crude in terms of detailed semantic and formatting information it can handle. The RefDB CITESTYLE formatting specification was designed to reflect the keys and elements of the RDBMS relational tables into which RefDB interally partitions and disects the imported RIS data. In contrast, the capacity of DocBook to hold "raw" bibliographic information, is almost limitless, though not necessarily clear, concise, easy to author or unique.

As a result there are several issues that will need careful consideration. In particular, can the representation of dates in the RefDB exported RISX be standardised in a numerical format? Are there better ways to handle journal series abbreviations? Can use of the USERDEFn and MISCn attributes be standardised across different citation styles? Could corporate authors, editors and publisher items be identified and transformed more consistently? And does the current RefDB-lite RISX (1) to DocBook "raw" conversion process rigorously capture all the intricacies of the source format?

Other issues will arise in the form of bugs and defects of the current XSL implementation of RefDB-lite, for instance RISX uses an abbreviated journal title compactification scheme whose expansion hasn't yet been implemented here

Another glaring ommission is that ordering of multiple references in a multi-biblioref citation, according to a numeric citation style, does not apply ascending or descending date ordering. Of course you could do it yourself, but this is a computer dammit. It automates things.

A third example arose with the handling of olinks in a handcrafted DocBook "raw" biblioentry, that exposed a deficiency in the underlying DocBook style sheets. If olinks are added within an internal bibliography, resolving those in the DocBook style sheets requires customising the xref.xsl olink template to support <xsl:param name="context" select="/"/>. Then modify its document($target.database.filename,/) statement to use document($target.database.filename,$context) (Potentially this could be remedied in the DocBook XSL stylesheets, but of course noone ever needed it before).

You will have to pass the context parameter to the select.target.database template as well (called within the olink template) and then repeat the above for the common/olink.xsl select.target.database template too. The problem is that the olink is coppied into a temporary result tree with an obscure base:uri, whereas document() needs to resolve the olink with respect to the source DocBook documents identifier in the olink database (hmmm, maybe we could copy the document ID to the rootnode of the temporary tree?).

Thats about it. Enjoy!

References

1. Hoenicka M. (2005)

Appendix 1. Citation Formatting Tests

Table of Contents

References

There is just a practical demonstration of using and formatting citations. Consider it a tutorial, but there is really nothing new to be learnt here.

We used the CITESTYLE INTEXTDEF format (-X) citation, for example "(1)", throughout in this document. This, so-called parenthetic allusion, effortlessly switches presentation style between numerically ordered (numeric) and alphabetically sorted (author-year) bibliography formats.

However we could have used an AUTHORONLY (-A) format to say something like, "Walsh, Muellner, and Stayton wrote a tremendously useful book — DocBook: The Definitive Guide (1)" where we followed followed the explicit title with a YEARONLY (-Y) format citation. Like INTEXTDEF, the year can change from being a year (obviously), to a numerical reference, according to the specifics of the CITESTYLE file you choose. It really is a shame though that we can't explicitly cite the reference title here though.

Anyway, those are the most portable citation formats across the different styles. But, if you know from the outset that you will only be using an author-year format, then it would be perfectly ok to go using Whole AUTHORYEAR, (-W) style citations, willy nilly, to your hearts content, so long as the sentence grammar remains intact and for example, Walsh, Muellner, and Stayton (1) as well as the USA Library of Congress (2), do not object to having their names cast about gratuitously, just to illustrate a rather irrelevant issue. You should note, that AUTHORYEAR is not supported by RefDB propper, so in the event you switch from RefDB-lite to full RefDB support, using the runbib command, you might be sadly disappointed.

Just for the record though, most bibliographic styles do not include a parenthetic AUTHORONLY mode, for the simple reason that it does not in general discriminate unambiguously between different references by the same author(s). You could however construct your own CITESTYLE to achieve this effect if it was really important to you.

Beyond the foregoing it starts to get tricky, as we move into the domain of multiple citations. For instance Markus appears to be really quite a prolific chap herein, with four references to his name (3-6), created as:

<citation role="REFDB">
<biblioref endterm="RefDB-X"/>
<biblioref endterm="RefDB_Man-X"/>
<biblioref endterm="RISX-X"/>
<biblioref endterm="CITESTYLE-X"/>
</citation>

But sadly this package, RefDB-lite, doesn't really do him justice, formatting wise. A good formatting package, in an author-year mode, would have contracted that down to something like (Hoenicka, 2005; 2005b; 2005c; 2005d). Possibly we could simulate that by citing years on the last three, e.g. (3-6), but that is more fluffing around than is really desireable in an automated world. In a numeric CITESTYLE scheme, of course that is irrelevant, as it contracts quite nicely to something like "(3—6)".

For the final citation formatting tests, it is useful to examine the treatment of multiple references, particularly sequential references in a numeric style. For example

<citation role="REFDB">
<biblioref endterm="Walsh99-X"/>
<biblioref endterm="MODS-X"/>
<biblioref endterm="RefDB_Man-X"/>
<biblioref endterm="RISX-X"/>
<biblioref endterm="CITESTYLE-X"/>
<biblioref endterm="XSLT_1.0-X"/>
</citation>

(One could really get carried away here). Now that parenthetic allusion renders as: "(1; 2; 4-7)". But watch what happens if we change the citation order to:

<citation role="REFDB">
<biblioref endterm="XSLT_1.0-X"/>
<biblioref endterm="RefDB_Man-X"/>
<biblioref endterm="Walsh99-X"/>
<biblioref endterm="RISX-X"/>
<biblioref endterm="MODS-X"/>
<biblioref endterm="CITESTYLE-X"/>
</citation>

Hopefully the citations "(1; 2; 4-7)" are now ordered and sorted identically to the previous example. Note that these biblioref references explictly used the portable -X formatting command. Watch out for that or your carefully crafted document may unexpectedly transform into rubbish!

Clearly there are certain issues to be settled regarding optional sorting and reordering of multiple bibliorefs, but thats a job for another day.

A better chap than I might also have demonstrated the presentation of accented characters and foreign language titles etc. But I don't know nuttin' bout such things.

So that concludes tonights viewing. Goodnight.

References

1. Walsh N., Muellner L., and Stayton B. (1999) O'Reilly & Associates Inc., Sebastopol

2. Library of Congress. Metadata Object Description Schema (MODS). . (2004). Washington The Network Development and MARC Standards OfficeLoC.

3. Hoenicka Markus. RefDB 0.9.6. . (2005). .

4. Hoenicka M. (2005)

5. Hoenicka M. (2005)

6. Hoenicka M. (2005)

7. W3C. XSL Transformations (XSLT) Version 1.0. . (1999). Cambridge World Wide Web ConsortiumW3C.