Gorazd Export

The application was created within the project GORAZD: The Old Church Slavonic Digital Hub (implemented thanks to the NAKI II programme of the Ministry of Culture of the Czech Republic, DG16P02H024). Its goal is to export dictionary entries from the system Invenio (https://invenio-software.org).

As an input, the application expects MARC XML from the managing system Invenio with metadata of exported entries and potential XSL transformation for Gorazd XML (XSD schema of Gorazd XML format: http://gorazd.org/sites/default/files/software/gXML.zip). Its output are entries transformed for the presentational environment or HTML format that can be further used for preparation for print in standard editors (for example MS Word).

Installation

The application Gorazd Export is distributed in the form of Python package/module and it can be installed in the standard way using utility pip. To run properly, the application Gorazd Export needs several libraries that are installed automatically unless they are already present in the system. Therefore it may be necessary to install the application Gorazd Export with the rights of root, or into Python virtual environment virtualenv.

  • $ pip install [-t cilovy_adresar] gorazd_export-1.zip

Launch

This application is launched from the command line directly on the server where Invenio runs.

  • $ python exporter.py
  • usage: exporter.py [-h] [-p] [-l LOG_DIR]
    marcxml output_dir [transform [transform ...]]
  • exporter.py: error: too few arguments

Parameters are passed to the application on the command line in the standard way; you can get help by the parameter –h

  • $ python exporter.py –h
  • usage: exporter.py [-h] [-p] [-l LOG_DIR]
    marcxml output_dir [transform [transform ...]]

Exporter of Gorazd XML files

positional arguments:

  • marcxml MARC XML export from Invenio
  • output_dir Directory for storing the exported files
  • transform XSL transforms for producing the output

optional arguments:

  • -h, --help show this help message and exit
  • -p, --print_file Create print.html file in the output directory that contains all the transformed records sorted by their PAGE ID
  • -l LOG_DIR, --log_dir LOG_DIR Directory for log files. Default is the current working directory.

The parameters marc_xml and output_dir are obligatory.

The parameter marc_xml contains the paths to the input file for Gorazd Export. This input file is MARC XML export of entries from Invenio. The content of the file „marc_xml“ are metadata of entries that are intended for export.

Input into the application

The input file is created on the server using native script of Invenio – bibexport. In order for the script bibexport to run, it is necessary to configure a request via which relevant marc_xml records are selected from the database. The configuration file „marcxml.cfg“ is in the folder /opt/invenio/etc/bibexport. Requests are performed in the same way as directly in the application Invenio.

The script bibexport can be launched by the command /opt/invenio/bin/bibexport giving the relevant parameters. The parameters are passed to Invenio on the command line in the standard way; you can get help by the parameter -h.

  • Usage: /opt/invenio/bin/bibexport [options]
  • usage: exporter.py [-h] [-p] [-l LOG_DIR]
    marcxml output_dir [transform [transform ...]]

Command options:

Export options:

  • -w, --wjob=j1[,j2] Run specific exporting jobs j1, j2, etc (e.g. 'sitemap').
  • --force-recrawling When using the sitemap export will force all the timestamp there included to refer to correspond at least to now. In this way crawlers are going to crawl all the content again. This is useful in case of a major update in the detailed view of records.

Scheduling options:

  • -u, --user=USER User name under which to submit this task.
  • -t, --runtime=TIME Time to execute the task. [default=now]
    Examples: +15s, 5m, 3h, 2002-10-27 13:57:26.
  • -s, --sleeptime=SLEEP Sleeping frequency after which to repeat the task.
    Examples: 30m, 2h, 1d. [default=no]
  • --fixed-time Avoid drifting of execution time when using --sleeptime
  • -I, --sequence-id=SEQUENCE-ID Sequence Id of the current process
  • -L --limit=LIMIT Time limit when it is allowed to execute the task.
    Examples: 22:00-03:00, Sunday 01:00-05:00.
    Syntax: [Wee[kday]] [hh[:mm][-hh[:mm]]].
  • -P, --priority=PRI Task priority (0=default, 1=higher, etc).
  • -N, --name=NAME Task specific name (advanced option).

General options:

  • -h, --help Print this help.
  • -V, --version Print version information.
  • -v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max).
  • --profile=STATS Print profile information. STATS is a comma-separated list of desired output stats (calls, cumulative, file, line, module, name, nfl, pcalls, stdname, time).
  • --stop-on-error In case of unrecoverable error stop the bibsched queue.
  • --continue-on-error In case of unrecoverable error don't stop the bibsched queue.
  • --post-process=BIB_TASKLET_NAME[parameters] Postprocesses the specified bibtasklet with the given parameters between square brackets.
    Example:--post-process "bst_send_email[fromaddr='foo@xxx.com', toaddr='bar@xxx.com', subject='hello', content='help']"
  • --email-logs-to=EMAILS Sends an email with the results of the execution of the task, and attached the logs (EMAILS could be a comma-separated lists of email addresses)
  • --email-logs-on-error Send an e-mail to user that ran the task in case of ERROR.
  • --host=HOSTNAME Bind the task to the specified host, it will only ever run on that host.

For example.: /opt/invenio/bin/bibexport --wjob=marcxml

Output

The application Gorazd Export saves files containing Gorazd XML of all entries in the given input file marc_xml into the folder given in the parameter output_dir. If the folder given in the parameter output_dir does not exist, it will be created. The files in the output folder are named according to RECORD ID.

HTML output

The optional parameter –p generates the file “print.html” containing all entries ordered by PAGE ID into the output folder.

Transformation

It is also possible to add transformations that process input files into an expected form. The names of transformations are passed in the parameter transform. There can be more of these transformations and they can be put after one another. The order of transformations is important, therefore they are numbered for the use in Gorazd Export and they should be ordered in ascending order into the parameter transform.

The generator of print outputs currently uses 4 XSL transformations. The first one transforms individual GorazdXMLs into one HTML unit that is not, however, ready for display. The second transformation transforms rules for controversial characters and works with spaces. The third transformation performs the same step. The fourth one checks the last unprocessed spaces and controversial characters and transforms the whole output into a valid HTML file.

For example:

  • $ python exporter.py -p marc.xml vystup/ gorazd-GTV-part1.xsl gorazd-GTV-part2.xsl
  • $ python exporter.py -p -l log/ in/vstup.xml out/vystup.xsl gorazd-GTV-part1.xsl gorazd-GTV-part2.xsl gorazd-GTV-part3.xsl gorazd-GTV-part4.xsl

The file “print.html” can be further used to prepare data for printing

Logging

The application Gorazd Export writes information about the export process into a terminal and also into a log directory that is set for log/. This directory can be changed by the optional parameter l LOG_DIR. If any errors appear, the operation of the program is not interrupted and a notification about the error is logged into a file with the extension .err in the log directory.

Licence

The application is distributed based on the open licence GNU GPL v3. The application can be used for generating entries of other dictionaries by altering the source code. The authors of this application would be very grateful if you inform them about using the source code of this application or its parts in other projects. You can contact them via e-mail: gorazd@slu.cas.cz.

Installation package:

Gorazd Export 1.0: http://gorazd.org/sites/default/files/software/gorazd_export-1.zip

Authors:

  • Mgr. Vít Tuček, Ph.D. (programmer)
  • Mgr. Olga Čiperová (development analyst)
  • Bc. Martin Majer (development manager)
  • PhDr. Štefan Pilát, Ph.D. (expert development consultant)

© 2018, Institute of Slavonic studies of the Czech Academy of Sciences