DSpace on Fedora Core 4

Dspace Logo

[Introduction] - [Preparation] - [Installation] - [Configuration] - [Tweaks] - [Summary]

Introduction:

After collecting papers on molecular simulation for several years I realized that I was in dire need of some new organizational structure. I found myself downloading the same paper repeatedly and saving them in different locations on my computer. Even worse I would often find multiple printed copies in at home and work, not to mention the needless time spent searching for previously downloaded papers online. To solve this problem I decided to install a document server on my office workstation. The idea is to eventually upload all electronic articles, preprints, reprints, and reports into a searchable database and end the paper searching/downloading/printing madness. In addition, it is my hope that my collaborators will help me with this task when we are working on projects together.

My requirements for this document server were fairly simple: (1) It must use a common web browser interface (2) It must be able to organize scientific PDF articles and reports (3) It must provide a flexible interface for searching the contents of PDF documents (4) I must be able to install and configure a working server in less than a day (5) It must use a minimal number of extra packages (6) It must be managed and configured through the web interface.

I first tried CDSware from CERN but after several days of hacking I became extremely frustrated. At the time installation required installing far too much software (most of it unpackaged) and the default out-of-the-box configuration was not close enough to what I needed. I have seen many CDSware installations that look very beautiful, but it is clear that this took a substantial amount of extra configuration. I believe CDSware will become a viable alternative for the casual user, but not at the time of writing this guide.

As you can guess from the title of this guide, I ended up using DSpace. It uses Java, PostgreSQL, Apache-Ant, and Apache-Tomcat and basically nothing else. It almost met all of my criteria and it was very easy to get a workable server running within a day. The following guide outlines how I setup DSpace on my workstation running Fedora Core 4. There are many other ways to configure things, and what follows is just specific to my configuration (although yours may be similar).

Preparation:

I decided that for security and maintenance issues I would install most everything as the user "dspace" So the first step was to create the user "dspace".

# su -
# useradd dspace

Before I could start installing DSpace I needed to make sure I had some software installed.

  1. JDK 1.5.0: After downloading and installing JDK from Sun, I made created the file /etc/profile.d/java-jdk.sh and put the following in the file:
    #!/bin/sh
    JAVA_HOME=`ls -td /usr/java/jdk* | head -1`
    if [ -d ${JAVA_HOME} ]
    then
      export JAVA_HOME
      export PATH=${JAVA_HOME}/bin:$PATH
    fi
    
    Basically all this does is sets JAVA_HOME to point to the JDK installation and puts the JDK bin directory in your path. You could also accomplish this by putting the export commands in your bashrc file.
  2. Apache Ant: This is the compiler that will be used to build Dspace for Apache-Tomcat. You can think of it as an alternative to "make". Since it's a compiler and not a server, I could have installed this globally but opted instead to install it as the user dspace in the directory /home/dspace/ant. I put the following command in user dspace's bashrc file:
    export ANT_HOME=/home/dspace/ant
    
  3. Apache Tomact: This is the webserver. I am already running a standard Apache webserver on my machine so I opted for the standalone version and installed it as user dspace in /home/dspace/tomcat. I then put the following command in user dspace's bashrc file:
    export CATALINA_HOME=/home/dspace/tomcat
    
  4. PostgreSQL: This is the main backend database server. Installing this one was easy using yum:
    # su -
    # yum install postgresql postgresql-server \
                  postgresql-jdbc postgresql-libs
    
    PostgreSQL-JDBC is the Java interface for PosgreSQL which will allow DSpace to access the database through Tomcat.

Before I could move on to installing DSpace, I needed to configure the software listed above.

  1. I tested to make sure Tomcat was working:
    # su - dspace
    [dspace@localhost]# /home/dspace/tomcat/bin/startup.sh
    
    I then connected to Tomcat through my web browser using http://localhost:8080/. This was denied at first, then I remembered to add an entry to the iptables firewall to allow connections to port 8080 and port 8443:
    -A RH-Firewall-1-INPUT -m state --state NEW 
                           -m tcp -p tcp --dport 8080 -j ACCEPT
    -A RH-Firewall-1-INPUT -m state --state NEW 
                           -m tcp -p tcp --dport 8443 -j ACCEPT
    
  2. I initialized the PostgreSQL database:
    # su - postgres
    [postgres@localhost]# initdb -D /var/lib/pgsql/data
    
    Next I changed opened the /var/lib/pgsql/data/pg_hba.conf configuration file, and at the end of the file I commented out the following lines:
    # TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
    # "local" is for Unix domain socket connections only
    local   all         all                               trust
    # IPv4 local connections:
    host    all         all         127.0.0.1/32          trust
    # IPv6 local connections:
    host    all         all         ::1/128               trust
    
    and added the following lines
    host dspace dspace 127.0.0.1/32 md5
    local dspace postgres trust
    local template1 postgres trust
    local template1 dspace trust
    
    Next I started the PostgreSQL server:
    # /etc/init.d/postgresql start
    
    Then I switched back to user postgres and added dspace to the database
    # su - postgres
    [postgres@localhost]# createuser -U postgres -d -A -P dspace
    Enter password for new user:
    Enter it again:
    CREATE USER
    
    [postgres@localhost]# createdb -U dspace -E UNICODE dspace
    CREATE DATABASE
    

Installation:

After following all of the steps above, I was ready to install Dspace.

  1. Download Dspace 1.3.1, as user dspace and saved it in /home/dspace/dspace-1.3.1-source.tar.gz. Then I unpacked the archive
    [dspace@localhost]# tar -xzvf dspace-1.3.1-source.tar.gz
    
  2. I created symbolic links for PostgreSQL-JDBC in dspace-1.3.1-source/lib by issuing the following commands
    [dspace@localhost]# cd dspace-1.3.1-source/lib
    [dspace@localhost]# ln -s /usr/share/java/postgresql-8.0-311.jdbc3.jar  \
                              postgresql-jdbc3.jar 
    
    Just for good measure I did the same with jdbc2 and jdbc2ee, although I don't think they are needed.
  3. I had to edit the dspace.cfg file in dspace-1.3.1-source/config with the following entries
    dspace.dir = /home/dspace
    dspace.url = https://me.dept.uni.edu:8443/dspace
    dspace.hostname = me.dept.uni.edu
    dspace.name = Dspace Server
    config.template.log4j.properties 
                = /home/dspace/config/log4j.properties
    config.template.log4j-handle-plugin.properties 
                = /home/dspace/config/log4j-handle-plugin.properties
    config.template.oaicat.properties 
                = /home/dspace/config/oaicat.properties
    db.name = postgres
    db.url = jdbc:postgresql://localhost:5432/dspace
    db.driver = org.postgresql.Driver
    db.username = dspace
    db.password = <password used in postgresql setup above>
    mail.server=localhost
    mail.from.address = dspace-noreply@me.dept.uni.edu
    feedback.recipient = <my email address>
    mail.admin = <my email address>
    alert.recipient = <my email address>
    assetstore.dir = /home/dspace/assetstore
    history.dir = /home/dspace/history
    search.dir = /home/dspace/search
    log.dir = /home/dspace/log
    upload.temp.dir = /home/dspace/upload
    upload.max = 536870912
    report.public = false
    report.dir = /home/dspace/reports/
    handle.prefix = 0
    handle.dir = /home/dspace/handle-server
    
    Note that if you want to set up a real handle prefix with handle.net you should follow the directions on their website. I decided to make my server only internal for my group, so this was not necessary.
  4. I then built and installed Dspace with the following commands:
    [dspace@localhost]# cd /home/dspace/dspace-1.3.1-source
    [dspace@localhost]# ant
    [dspace@localhost]# ant fresh_install
    [dspace@localhost]# cp build/*.war /home/dspace/tomcat/webapps/
    
  5. I created an administrator account with
    [dspace@localhost]# /home/dspace/bin/create-administrator
    
  6. Finally I stopped Tomcat and cleaned up the old references, moved the old ROOT, and restarted Tomcat:
    [dspace@localhost]# /home/dspace/tomcat/bin/shutdown.sh
    [dspace@localhost]# cd /home/dspace/tomcat/webapps
    [dspace@localhost]# rm -rf dspace dspace-oai
    [dspace@localhost]# /home/dspace/tomcat/bin/startup.sh
    
    I was now able to see the Dspace website by directing my browser to http://localhost:8080/dspace/. Later on I outline how I configured Tomcat for SSL so I could use the final interface on https://localhost:8080/dspace/.

Configuration:

By this point in time everything was working just fine, however I wanted to make sure it would all come back up automatically if I restarted the machine, and to automate the indexing and other tasks.

  1. I added the vacuumdb command to the crontab of user postgres with the following commands:
    # su - postgres
    [postgres@localhost]# crontab -e
    [postgres@localhost]# crontab -l
    # Clean up the database nightly at 2.40am
    40 2 * * * vacuumdb --analyze dspace > /dev/null 2>&1
    
  2. I added commands to process email subscriptions and update the search indexes each night:
    # su - dspace
    [dspace@localhost]# crontab -e
    [dspace@localhost]# crontab -l
    0 1 * * * /home/dspace/bin/sub-daily \
              > /home/dspace/log/sub-daily.log 2>&1
    0 2 * * * /home/dspace/bin/filter-media \
              > /home/dspace/log/filter-media.log 2>&1
    
  3. I configured PostgreSQL and Tomcat to start on boot by first issuing the command:
    # /sbin/chkconfig postgresql on
    
    Then I created the file /etc/init.d/tomcat with the following contents
    #!/bin/bash
    #
    # tomcat       Startup script for the Apache-Tomcat HTTP Server
    #
    # chkconfig: - 71 19
    # description:  Start up the Tomcat servlet engine.
    
    # Source function library.
    . /etc/init.d/functions
    
    
    RETVAL=$?
    CATALINA_HOME=/home/dspace/tomcat
    TOMCAT_USER=dspace
    
    start () {
      if [ -f $CATALINA_HOME/bin/startup.sh ];
      then
        echo $"Starting Tomcat"
        /bin/su $TOMCAT_USER $CATALINA_HOME/bin/startup.sh
      fi
    }
    
    stop () {
      if [ -f $CATALINA_HOME/bin/shutdown.sh ];
      then
        echo $"Stopping Tomcat"
        /bin/su $TOMCAT_USER $CATALINA_HOME/bin/shutdown.sh
      fi
    }
    
    case "$1" in
     start)
            start
            ;;
     stop)
            stop
            ;;
     restart)
            stop
            start
            ;;
     *)
            echo $"Usage: $0 {start|stop|restart}"
            exit 1
            ;;
    esac
    
    exit $RETVAL
    
    I made it executable
    # chmod +x /etc/init.d/tomcat
    
    And made it start up in runlevels 2-5
    # /sbin/chkconfig add tomcat
    # /sbin/chkconfig tomcat on
    
  4. I enabled SSL in Tomcat by first creating a self-signed certificate as user dspace stored in the file /home/dspace/.keystore:
    [dspace@localhost]# keytool -genkey -alias tomcat -keyalg RSA
    Enter keystore password: changeit
    What is your first and last name? <me.dept.uni.edu>
    What is the name of your organizational unit? My Department
    What is the name of your organization? My University
    What is the name of your City or Locality? My City
    What is the name of your State or Province? My State
    What is the two-letter country code for this unit? US
    Enter key password for <tomcat>
            (RETURN if same as keystore password): <RETURN>
    
    I then uncommented the following section in /home/tomcat/conf/server.xml
    <!-- Define a SSL Coyote HTTP/1.1 Connector on port 8443 -->
    <Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
        port="8443" minProcessors="5" maxProcessors="75"
        enableLookups="true"
        keystore="/home/dspace/.keystore"
        keypass="changeit"
        acceptCount="100" debug="0" scheme="https" secure="true"
        useURIValidationHack="false" disableUploadTimeout="true">
    <Factory 
         className="org.apache.coyote.tomcat4.CoyoteServerSocketFactory"
         clientAuth="false" protocol="TLS" />
    </Connector>
    
    Finally, restarted tomcat with
    [dspace@localhost]# /home/dspace/tomcat/bin/shutdown.sh
    [dspace@localhost]# /home/dspace/tomcat/bin/startup.sh
    
    And connected to the SSL server at https://localhost:8443/dspace/. If something went wrong, you can look in the /home/dspace/tomcat/logs/catalina.out log file.

Tweaks:

  1. After adding a Community and a few Collections, I noticed that there were some references to http://www.handle.net and I am not planning to this service since my Dspace is only locally accessible. To fix this problem I found some tips in the dspace-tech email archives where someone indicated that they were using a different persistent URL service, PURL. I followed these instructions to create self-referring handles instead of the ones through handle.net. Hence, my handles will point right back at my server. To do this I opened the file /home/dspace/dspace-1.3.1-source/src/org/dspace/handle/HandleManager.java in a text editor, and changed getCanonicalForm to the following:
    public static String getCanonicalForm(String handle)
        {
            //        return "hdl:" + handle;
            //[ORIG] return "http://hdl.handle.net/" + handle;
            return "https://me.dept.uni.edu:8443/dspace/handle/" + handle;
        }
    
    Now when handles are returned in email notices, or elsewhere they will only point to my server. To update Dspace with this change I recompiled the war files, copied them into tomcat/webapps, and restarted tomcat.
    [dspace@localhost]# cd /home/dspace/dspace-1.3.1-source
    [dspace@localhost]# ant -Dconfig=config/dspace.cfg update
    [dspace@localhost]# cp *.war /home/dspace/tomcat/webapps/
    [dspace@localhost]# cd /home/dspace/tomcat/webapps
    [dspace@localhost]# ../bin/shutdown.sh
    [dspace@localhost]# rm -rf dspace dspace-oai
    [dspace@localhost]# ../bin/startup.sh
    
  2. After uploading a few items I decided to reorganize things and have a single community for all uploads, then use the item mapper to link items to "focus group communities". Unfortunately, this meant moving items from one community to another which is not supported by dspace. However, it is easy to do if you just interface with the PostgresQL server and edit the tables. To do this I first installed pgadmin3 which is a nice GUI interface to the PostgresQL server. Once I started up pgadmin3, it asked for some information on my SQL server: Server=127.0.0.1, Description=Dspace, Port=5432, Database=dspace, Username=dspace, Service=<blank>, Password=<My Password>. If you forgot your password, it's in the dspace.cfg file. To move items you need to edit columns in public->Tables->item and in public->Tables->collection2item. In the "item" table, you should change the owning_collection column. In the "collection2item" table, you should change the collection_id column. In both case you right click to get the "View Data" option which pulls up the spreadsheet.you forgot your password, it's in the dspace.cfg file.

Summary and Acknowledgments:

I was quite please with the help I found on the Dspace website and from various user blogs and emails. By far the two best resources were Clive Gould's Blog and this Dspace install guide. Without these two guides I would have never tried to install Dspace, and would have probably attempted to write my own PHP PostgreSQL server. That would have wasted at least a week of my life.

Dspace Links:

Dspace

Dspace Docs

Dspace-Tech Mail

DSpace on FC3

Dspace Install Guide

Other Links:

CDSware


cow bullet Last Updated: 18-Aug-05
HTML 4.01
Up