[Introduction] - [Preparation] - [Installation] - [Configuration] - [Tweaks] - [Summary]
After collecting papers on molecular simulation for several years I realized that I was in dire need of some new organizational structure. I found myself downloading the same paper repeatedly and saving them in different locations on my computer. Even worse I would often find multiple printed copies in at home and work, not to mention the needless time spent searching for previously downloaded papers online. To solve this problem I decided to install a document server on my office workstation. The idea is to eventually upload all electronic articles, preprints, reprints, and reports into a searchable database and end the paper searching/downloading/printing madness. In addition, it is my hope that my collaborators will help me with this task when we are working on projects together.
My requirements for this document server were fairly simple: (1) It must use a common web browser interface (2) It must be able to organize scientific PDF articles and reports (3) It must provide a flexible interface for searching the contents of PDF documents (4) I must be able to install and configure a working server in less than a day (5) It must use a minimal number of extra packages (6) It must be managed and configured through the web interface.
I first tried CDSware from CERN but after several days of hacking I became extremely frustrated. At the time installation required installing far too much software (most of it unpackaged) and the default out-of-the-box configuration was not close enough to what I needed. I have seen many CDSware installations that look very beautiful, but it is clear that this took a substantial amount of extra configuration. I believe CDSware will become a viable alternative for the casual user, but not at the time of writing this guide.
As you can guess from the title of this guide, I ended up using DSpace. It uses Java, PostgreSQL, Apache-Ant, and Apache-Tomcat and basically nothing else. It almost met all of my criteria and it was very easy to get a workable server running within a day. The following guide outlines how I setup DSpace on my workstation running Fedora Core 4. There are many other ways to configure things, and what follows is just specific to my configuration (although yours may be similar).
I decided that for security and maintenance issues I would install most everything as the user "dspace" So the first step was to create the user "dspace".
# su - # useradd dspace
Before I could start installing DSpace I needed to make sure I had some software installed.
#!/bin/sh
JAVA_HOME=`ls -td /usr/java/jdk* | head -1`
if [ -d ${JAVA_HOME} ]
then
export JAVA_HOME
export PATH=${JAVA_HOME}/bin:$PATH
fi
Basically all this does is sets JAVA_HOME to point to the JDK installation and puts the JDK bin directory in your path. You could also accomplish this by putting the export commands in your bashrc file.export ANT_HOME=/home/dspace/ant
export CATALINA_HOME=/home/dspace/tomcat
# su -
# yum install postgresql postgresql-server \
postgresql-jdbc postgresql-libs
PostgreSQL-JDBC is the Java interface for PosgreSQL which will allow DSpace to access the database through Tomcat.Before I could move on to installing DSpace, I needed to configure the software listed above.
# su - dspace [dspace@localhost]# /home/dspace/tomcat/bin/startup.shI then connected to Tomcat through my web browser using http://localhost:8080/. This was denied at first, then I remembered to add an entry to the iptables firewall to allow connections to port 8080 and port 8443:
-A RH-Firewall-1-INPUT -m state --state NEW
-m tcp -p tcp --dport 8080 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW
-m tcp -p tcp --dport 8443 -j ACCEPT
# su - postgres [postgres@localhost]# initdb -D /var/lib/pgsql/dataNext I changed opened the /var/lib/pgsql/data/pg_hba.conf configuration file, and at the end of the file I commented out the following lines:
# TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only local all all trust # IPv4 local connections: host all all 127.0.0.1/32 trust # IPv6 local connections: host all all ::1/128 trustand added the following lines
host dspace dspace 127.0.0.1/32 md5 local dspace postgres trust local template1 postgres trust local template1 dspace trustNext I started the PostgreSQL server:
# /etc/init.d/postgresql startThen I switched back to user postgres and added dspace to the database
# su - postgres [postgres@localhost]# createuser -U postgres -d -A -P dspace Enter password for new user: Enter it again: CREATE USER [postgres@localhost]# createdb -U dspace -E UNICODE dspace CREATE DATABASE
After following all of the steps above, I was ready to install Dspace.
[dspace@localhost]# tar -xzvf dspace-1.3.1-source.tar.gz
[dspace@localhost]# cd dspace-1.3.1-source/lib
[dspace@localhost]# ln -s /usr/share/java/postgresql-8.0-311.jdbc3.jar \
postgresql-jdbc3.jar
Just for good measure I did the same with jdbc2 and jdbc2ee, although I don't think they are needed.
dspace.dir = /home/dspace
dspace.url = https://me.dept.uni.edu:8443/dspace
dspace.hostname = me.dept.uni.edu
dspace.name = Dspace Server
config.template.log4j.properties
= /home/dspace/config/log4j.properties
config.template.log4j-handle-plugin.properties
= /home/dspace/config/log4j-handle-plugin.properties
config.template.oaicat.properties
= /home/dspace/config/oaicat.properties
db.name = postgres
db.url = jdbc:postgresql://localhost:5432/dspace
db.driver = org.postgresql.Driver
db.username = dspace
db.password = <password used in postgresql setup above>
mail.server=localhost
mail.from.address = dspace-noreply@me.dept.uni.edu
feedback.recipient = <my email address>
mail.admin = <my email address>
alert.recipient = <my email address>
assetstore.dir = /home/dspace/assetstore
history.dir = /home/dspace/history
search.dir = /home/dspace/search
log.dir = /home/dspace/log
upload.temp.dir = /home/dspace/upload
upload.max = 536870912
report.public = false
report.dir = /home/dspace/reports/
handle.prefix = 0
handle.dir = /home/dspace/handle-server
Note that if you want to set up a real handle prefix with handle.net you should follow the directions on their website. I decided to make my server only internal for my group, so this was not necessary.[dspace@localhost]# cd /home/dspace/dspace-1.3.1-source [dspace@localhost]# ant [dspace@localhost]# ant fresh_install [dspace@localhost]# cp build/*.war /home/dspace/tomcat/webapps/
[dspace@localhost]# /home/dspace/bin/create-administrator
[dspace@localhost]# /home/dspace/tomcat/bin/shutdown.sh [dspace@localhost]# cd /home/dspace/tomcat/webapps [dspace@localhost]# rm -rf dspace dspace-oai [dspace@localhost]# /home/dspace/tomcat/bin/startup.shI was now able to see the Dspace website by directing my browser to http://localhost:8080/dspace/. Later on I outline how I configured Tomcat for SSL so I could use the final interface on https://localhost:8080/dspace/.
By this point in time everything was working just fine, however I wanted to make sure it would all come back up automatically if I restarted the machine, and to automate the indexing and other tasks.
# su - postgres [postgres@localhost]# crontab -e [postgres@localhost]# crontab -l # Clean up the database nightly at 2.40am 40 2 * * * vacuumdb --analyze dspace > /dev/null 2>&1
# su - dspace
[dspace@localhost]# crontab -e
[dspace@localhost]# crontab -l
0 1 * * * /home/dspace/bin/sub-daily \
> /home/dspace/log/sub-daily.log 2>&1
0 2 * * * /home/dspace/bin/filter-media \
> /home/dspace/log/filter-media.log 2>&1
# /sbin/chkconfig postgresql onThen I created the file /etc/init.d/tomcat with the following contents
#!/bin/bash
#
# tomcat Startup script for the Apache-Tomcat HTTP Server
#
# chkconfig: - 71 19
# description: Start up the Tomcat servlet engine.
# Source function library.
. /etc/init.d/functions
RETVAL=$?
CATALINA_HOME=/home/dspace/tomcat
TOMCAT_USER=dspace
start () {
if [ -f $CATALINA_HOME/bin/startup.sh ];
then
echo $"Starting Tomcat"
/bin/su $TOMCAT_USER $CATALINA_HOME/bin/startup.sh
fi
}
stop () {
if [ -f $CATALINA_HOME/bin/shutdown.sh ];
then
echo $"Stopping Tomcat"
/bin/su $TOMCAT_USER $CATALINA_HOME/bin/shutdown.sh
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
;;
esac
exit $RETVAL
I made it executable
# chmod +x /etc/init.d/tomcatAnd made it start up in runlevels 2-5
# /sbin/chkconfig add tomcat # /sbin/chkconfig tomcat on
[dspace@localhost]# keytool -genkey -alias tomcat -keyalg RSA
Enter keystore password: changeit
What is your first and last name? <me.dept.uni.edu>
What is the name of your organizational unit? My Department
What is the name of your organization? My University
What is the name of your City or Locality? My City
What is the name of your State or Province? My State
What is the two-letter country code for this unit? US
Enter key password for <tomcat>
(RETURN if same as keystore password): <RETURN>
I then uncommented the following section in /home/tomcat/conf/server.xml
<!-- Define a SSL Coyote HTTP/1.1 Connector on port 8443 -->
<Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
port="8443" minProcessors="5" maxProcessors="75"
enableLookups="true"
keystore="/home/dspace/.keystore"
keypass="changeit"
acceptCount="100" debug="0" scheme="https" secure="true"
useURIValidationHack="false" disableUploadTimeout="true">
<Factory
className="org.apache.coyote.tomcat4.CoyoteServerSocketFactory"
clientAuth="false" protocol="TLS" />
</Connector>
Finally, restarted tomcat with
[dspace@localhost]# /home/dspace/tomcat/bin/shutdown.sh [dspace@localhost]# /home/dspace/tomcat/bin/startup.shAnd connected to the SSL server at https://localhost:8443/dspace/. If something went wrong, you can look in the /home/dspace/tomcat/logs/catalina.out log file.
public static String getCanonicalForm(String handle)
{
// return "hdl:" + handle;
//[ORIG] return "http://hdl.handle.net/" + handle;
return "https://me.dept.uni.edu:8443/dspace/handle/" + handle;
}
Now when handles are returned in email notices, or elsewhere they will only point to my server. To update Dspace with this change I recompiled the war files, copied them into tomcat/webapps, and restarted tomcat.
[dspace@localhost]# cd /home/dspace/dspace-1.3.1-source [dspace@localhost]# ant -Dconfig=config/dspace.cfg update [dspace@localhost]# cp *.war /home/dspace/tomcat/webapps/ [dspace@localhost]# cd /home/dspace/tomcat/webapps [dspace@localhost]# ../bin/shutdown.sh [dspace@localhost]# rm -rf dspace dspace-oai [dspace@localhost]# ../bin/startup.sh
I was quite please with the help I found on the Dspace website and from various user blogs and emails. By far the two best resources were Clive Gould's Blog and this Dspace install guide. Without these two guides I would have never tried to install Dspace, and would have probably attempted to write my own PHP PostgreSQL server. That would have wasted at least a week of my life.