This principle should also apply to data on CDs. The example that started me off on this exercise was the Archive Magazine "Volume 12" CD, including the complete contents of all the magazines for the last eight years (the magazine is currently on volume 13, but the early issues are not available electronically). The CD is mastered to the ISO9660 standard, and the text is in HTML form - so theoretically it should be possible to read the contents on any platform, including the home-built Linux system which currently shares my desk with the Acorn RiscPC. Being able to do so would have assured me that it was safe to throw out the paper magazines, and reclaim a metre or so of shelf space, yet be able to turn to the CD for reference at any time in the future. So I mounted the CD and started browsing...
The task turned out to be not as easy as expected, and this article describes what had to be set up in order to be able to browse the disc and the magazine articles under Linux. It is not intended as a step-by-step guide, but rather as an example of how problems can be solved and the necessary conversions implemented using the available applications and tools. It is also an illustration of the value of open-source software, especially the ability for anyone to make changes and extensions to suit their needs.
I make no claims that this is the only way the problem could be solved, or that it is the "correct" way to do so, but it works for me. Although the examples relate to the Archive CD mentioned above, the general principles should be applicable to any similar CD.
$ ls /cdrom/html
archive gallery index/htm menus
$ ls -laF /cdrom/html
ls: /cdrom/html/index/htm: No such file or directory
total 10
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 .
dr-xr-xr-x 1 root root 2048 Sep 21 20:01 ..
dr-xr-xr-x 1 root root 2048 Sep 21 20:13 archive
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 gallery
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 menus
$
and KDE fared little better:
It seemed that the file in question ("index/htm
") was
actually recorded on the CD with a slash in the name. Which will
naturally cause great confusion for any Unix-based system, since it is
the one special filename character which is interpreted by the kernel
as the directory separator and which can never exist in a leaf name.
The
Unix-Haters' Handbook
describes a similar situation.
The same applied to the article files, further down in this directory tree. Which was not promising: in this state it was not even possible to read the vital HTML files, let alone display them!
The solution to this was to modify the isofs file system (in source
file
fs/isofs/dir.c
).
I had recently upgraded my ancient
Slackware system to
SuSE 6.2 which included
kernel 2.2.10, and rebuilt the kernel with isofs as a module. The
kernel already incorporated a patch by
Matthew Wilcox to partially
handle the Acorn CD format and append a file type suffix where
appropriate, but it did not address the "/"-in-filename problem. But
this turned out to be easy enough to fix, in the existing function
'isofs_name_translate'. The second change, in Matthew's new function
'get_acorn_filename', ensures that the ",xxx
" extension
is not added if the filename already contains a "." (which may have
originally been a "/"), so that the HTML files don't get the Acorn
filetype appended as well.
Having made this change and rebuilt the module, and carefully read the
--- fs/isofs/dir.c.orig Thu Nov 25 11:22:55 1999
+++ fs/isofs/dir.c Fri Nov 26 09:30:43 1999
@@ -88,4 +88,8 @@
c = '.';
+ /* RiscOS hack - convert '/' to '.' */
+ if (c == '/')
+ c = '.';
+
new[i] = c;
}
@@ -106,13 +110,16 @@
if ((*((unsigned char *) de) - std) != 32) return retnamlen;
chr = ((unsigned char *) de) + std;
- if (strncmp(chr, "ARCHIMEDES", 10)) return retnamlen;
+ if (strncmp(chr, "ARCHIMEDES", 10) !=0 ) return retnamlen;
if ((*retname == '_') && ((chr[19] & 1) == 1)) *retname = '!';
- if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff)
- && ((chr[12] & 0xf0) == 0xf0))
+ if (memchr(retname, '.', retnamlen) == 0)
{
- retname[retnamlen] = ',';
- sprintf(retname+retnamlen+1, "%3.3x",
- ((chr[12] & 0xf) << 8) | chr[11]);
- retnamlen += 4;
+ if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff)
+ && ((chr[12] & 0xf0) == 0xf0))
+ {
+ retname[retnamlen] = ',';
+ sprintf(retname+retnamlen+1, "%3.3x",
+ ((chr[12] & 0xf) << 8) | chr[11]);
+ retnamlen += 4;
+ }
}
return retnamlen;
mount(8)
manual page and discovered the
map=acorn
option to enable the Acorn extensions, it was
possible to mount the CD and get directory listings with sensible file
names (not only for the HTML, but for the sprite files as well):
$ mount -o ro,map=a /cdrom
$ ls -laF /cdrom/html
total 11
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:01 ../
dr-xr-xr-x 1 root root 2048 Sep 21 20:13 archive/
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 gallery/
-r-xr-xr-x 1 root root 466 Apr 23 1999 index.htm*
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 menus/
$ ls -laF /cdrom/html/gallery
total 248
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ../
-r-xr-xr-x 1 root root 156984 Apr 23 1999 cbot,ff9*
-r-xr-xr-x 1 root root 89592 Apr 23 1999 ctop,ff9*
$
file:/cdrom/html/index.htm
and get a page displayed.
But there were two problems: what should have been the front page of
the magazine was displayed as two broken images, and the links to the
volume and article indexes led nowhere. Taking the second of these as
the more serious, this is because the index.htm
file read
something like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD><TITLE>Archive Magazine Front Cover</TITLE></HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
<CENTER>
<a href="Menus/volume.htm"><img src="Gallery/CTop" width="416" height="214" border="0"></a><br>
<a href="Menus/artcle.htm"><img src="Gallery/CBot" width="416" height="376" border="0"></a><br>
</CENTER>
<HR><br>
<CENTER><font size=1>Copyright © 1996-99 Magnate.</font><br></CENTER>
</BODY>
</HTML>
Leaving out the absence of ALT=
specifications for the
images for the moment, the problem is that the link filenames are
correct (and relative, good mark there!) but use mixed case. This is
not a problem on RiscOS with its case-independent filing system, but
fails miserably on Unix (filenames recorded on a CD are normally in
upper case, and isofs converts them to all lower).
Another scan through the mount(8)
manual page led me to the
check=relaxed
option, to make all filename lookups
case-independent. But while this seemed like the ideal option to use,
it didn't seem to work:
At this point I couldn't work out what was happening - all was correct
according to the man page, but the $ mount -o ro,map=a,check=r /cdrom
$ ls -laF /cdrom/html/Menus
cat: /cdrom/html/Menus: No such file or directory
$ ls -laF /cdrom/html/menus
total 102
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ../
-r-xr-xr-x 1 root root 88235 Sep 30 15:54 artcle.htm*
-r-xr-xr-x 1 root root 8644 Sep 30 10:38 volume.htm*
$ cat /cdrom/html/Menus/volume.htm
cat: /cdrom/html/Menus/volume.htm: No such file or directory
$
check=relaxed
option
just didn't seem to be working. After a lot of scanning through the
sources, and following the age-old debugging technique of inserting
printk(9)
's at key places, I noticed that there appeared
to be some missing else
s in the big option decoding loop
(function 'parse_options' in source file
fs/isofs/inode.c
).
Without going into too much detail, the effect of this was that if
certain mount options (map=
and session=
)
were used, they would be decoded correctly but any subsequent options
would be ignored. I don't know if this is intended behaviour -
although it is not documented, and I can think of no good reason why
it should work this way, so it is probably a bug.
Putting the mount options the other way round fixed the problem:
and correcting the option parsing ensured that it would work for all
combinations of options:
$ mount -o ro,check=r,map=a /cdrom
$ ls -laF /cdrom/html/Menus
total 102
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ../
-r-xr-xr-x 1 root root 88235 Sep 30 15:54 artcle.htm*
-r-xr-xr-x 1 root root 8644 Sep 30 10:38 volume.htm*
$
--- fs/isofs/inode.c.orig Thu Nov 25 12:36:36 1999
+++ fs/isofs/inode.c Thu Nov 25 12:40:06 1999
@@ -344,5 +344,5 @@
else return 0;
}
- if (!strcmp(this_char,"session") && value) {
+ else if (!strcmp(this_char,"session") && value) {
char * vpnt = value;
unsigned int ivalue = simple_strtoul(vpnt, &vpnt, 0);
@@ -350,5 +350,5 @@
popt->session=ivalue+1;
}
- if (!strcmp(this_char,"sbsector") && value) {
+ else if (!strcmp(this_char,"sbsector") && value) {
char * vpnt = value;
unsigned int ivalue = simple_strtoul(vpnt, &vpnt, 0);
file:/cdrom/html/index.htm
) and follow the
hyperlinks. But there were, of course, no illustrations - they
are all stored in Acorn sprite format which of course the RiscOS
browsers understand but the "industry standard" ones do not (Netscape
somehow managed to deduce that they were MIME type
video/unknown
...).
Before even thinking about how to convert the sprite files into a
suitable form for displaying (preferably on the fly as required),
there was the problem of recognising the sprite files (the Acorn
extensions report the names as "foo,ff9
" and similarly
for other file types). Unfortunately the filename matching in pretty
much every Unix application expects the extension to be separated by a
dot rather than the comma, so the next step was to fix this.
It would have been simple enough to just substitute a dot for the comma in Matthew Wilcox's patch above. However, in response to a question that I posted on the Acorn programming newsgroup, Darren Salt sent me a patch which incorporates a table of known Acorn file types and extensions, and some additional mount options. Known file types are converted to dot-separated extensions (exactly the format that was required), while other file types would be appended as a comma-separated number as before. He had also anticipated my original "/" to "." change.
After applying Darren's patch, and restoring the two missing
else
s in the newly-patched inode.c
, the
rebuilt module should have given sensible results for both the HTML
and sprite files. But it didn't - sometimes it would fail to add
either the registered extension or the file type number,
inconsistently and apparently at random. After looking more closely
at Darren's code it turned out that he was using
strchr(3)
on a string that is not zero-terminated,
possibly running off the end (the sort of thing that can easily panic
the system!). Making yet another patch:
and again rebuilding the module now gives the correct results for all
file types:
--- fs/isofs/dir.c.ds Fri Nov 26 14:40:55 1999
+++ fs/isofs/dir.c Fri Nov 26 14:41:30 1999
@@ -133,5 +133,5 @@
/* We have a filetype -> extension mapping;
* only append it if no existing extension */
- if (strchr(retname, '.') == 0)
+ if (memchr(retname, '.', retnamlen) == 0)
{
const char *ext = acorn_file_extensions[i].extension;
$ mount -o ro,map=A,check=r /cdrom
$ ls -laF /cdrom/html
total 11
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:01 ../
dr-xr-xr-x 1 root root 2048 Sep 21 20:13 archive/
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 gallery/
-r-xr-xr-x 1 root root 466 Apr 23 1999 index.htm*
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 menus/
$ ls -laF /cdrom/html/Gallery
total 248
dr-xr-xr-x 1 root root 2048 Sep 21 20:32 ./
dr-xr-xr-x 1 root root 2048 Sep 21 20:12 ../
-r-xr-xr-x 1 root root 156984 Apr 23 1999 cbot.spr*
-r-xr-xr-x 1 root root 89592 Apr 23 1999 ctop.spr*
$
.spr
" extension, I needed a converter which would allow
them to be displayed in a "standard" graphics format - preferably
one that the browser would understand natively so that inline images
would be displayed as intended. A quick search of the Web and
Hensa turned up the
sprtools
package, produced by DEEJ Technology (aka
David J. Ruck). This converts between Acorn sprites and a
number of other formats, and works on Windows and Unix as well as
RiscOS. Unfortunately Hensa only had the ARC format file available
(according to the documentation a tar file is available for Unix), but
nspark
unpacked the archive quite happily.
After moving some files around to convert the RiscOS directory
structure to the corresponding Unix version, and configuring it to
support Linux in accordance with the instructions (the package as
distributed supported HP-UX and SunOS/Solaris only), the tools
compiled and installed with no problems. Having done that, and after
some experimenting with image formats and additional netpbm
filters, it was possible to do something like:
and view the sprite files (in their own window, for the moment) in all
their glory:
$ spr2ras /cdrom/html/gallery/ctop.spr | display
/cdrom
in the document root it was possible to browse the
CD (starting from URL
http://localhost/cdrom/html/
) in the same way as
before. Still no images yet, but...
Using Apache's AddType and Action
directives, it is possible to associate first of all a MIME type with
a file extension, and then a filter script which is called to handle
that MIME type. The filter script works in the same way as a CGI, so
it can convert the input file to any content type. So all it took was
a simple handler script, installed in the server's CGI directory:
(There is no single conversion in sprtools from sprite to
GIF - or any other format that Netscape can handle
directly - so the two-stage pipeline and the additional
netpbm filter is necessary). Now this handler can be
registered in the Apache configuration file:
#!/usr/bin/perl
$ENV{PATH} .= ":/usr/local/bin:/usr/X11/bin";
$| = 1;
print "Content-Type: image/gif\n\n";
exec "spr2ras $ENV{PATH_TRANSLATED} | rasttopnm | ppmtogif";
With all this up and running, it was finally possible to view a sprite
image in the browser:
<Location /cdrom>
AddType image/x-acorn-sprite .spr
Action image/x-acorn-sprite /cgi-bin/sprite-handler.pl
</Location>
Gallery/CTop
" as above will load and convert
the sprite file "gallery/ctop.spr
" via the handler above.
This involves not only doing a case-independent lookup (taken care of
with check=relaxed
above), but also loading the file
"ctop.spr
" when file "ctop
" was requested.
My first attempt at fixing this was to use Apache's spelling checker (compiled in as standard, but normally inactive unless it is enabled with the CheckSpelling directive), since one of the checks that it will perform is for a file with the same basename as requested but with an additional extension. Unfortunately, even if this results in only one correction Apache considers it sufficiently "different" from the original that it always outputs a confirmation page (with a link to the correction) - it does not issue an automatic redirect for the correction. Netscape, of course, does not follow the link (although it would follow an automatic redirect) and still displayes "broken" images.
So the next approach was to use the ErrorDocument
directive to implement a handler for the error 404
("Not Found"), check for the existence of the file with
extension, and return a redirect to that if appropriate. This script
is installed in the server's CGI directory:
and registered in the server configuration file:
#!/usr/bin/perl
require "cgi-lib.pl"; # the form parsing library
use File::Basename; # file name processing
$file = $ENV{REQUEST_URI}; # file that was requested
$file =~ s/^\///; # make relative to root
$base = basename($file); # base name of file
$base =~ tr/A-Z/a-z/; # make all lower case
$file = dirname($file)."/".$base; # reconstruct full pathname
# find all alternatives
@alts = split(' ',`cd $ENV{DOCUMENT_ROOT} && ls -d $file.* $file,??? 2>/dev/null`);
if (scalar(@alts)==1) # only one alternative
{
print "Status: 302 Use Alternative\n"; # so redirect to that
print "Location: /",$alts[0],"\n";
print "\n";
exit 0;
}
# No alternative found, or more than one, so display a normal error page.
print &PrintHeader;
print &HtmlTop("404 Not Found");
print "URL not found: <CODE>$ENV{REQUEST_URI}</CODE>\n";
print "<BR>\n";
print "Error information: <CODE>$ENV{REDIRECT_ERROR_NOTES}</CODE>\n";
print "<P>\n";
if (scalar(@alts)>0)
{
print "Possible alternatives:<BR>\n";
print "<UL>\n";
foreach (@alts)
{
print "<LI><A HREF=\"/$_\">/$_</A>\n";
}
print "</UL>\n";
}
print &HtmlBot;
And that works nicely: requesting a sprite file (without the
" ErrorDocument 404 /cgi-bin/cgi-error_404.pl
.spr
" extension) runs the script to find the
corresponding file with the extension, and returns a redirect to that
location:
(Note: according to the Apache documentation, status 303
("See Other") ought to be more appropriate than 302
("Moved Temporarily"). But this status code is not mentioned in
RFC 1945, and neither Netscape not KDE recognise it as a
redirect.)
$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /cdrom/html/Gallery/CTop HTTP/1.0
HTTP/1.1 302 Use Alternative
Date: Fri, 26 Nov 1999 21:00:15 GMT
Server: Apache/1.3.6 (Unix) (SuSE/Linux)
Location: http://localhost/cdrom/html/Gallery/ctop.spr
Connection: close
Content-Type: text/plain
Connection closed by foreign host.
$
And, as a bonus, it all works in KDE as well. And even, as a final perverse twist, in Fresco on the RiscPC (over the network)...
.spr
" extension and the
image/x-acorn-sprite
MIME type to Netscape's list of
applications, and setting up the filter in the plugger(7)
configuration, it may be possible to convert and display the sprite
images this way. However, that doesn't solve the problem of needing
to have the extension added to the filename in a hyperlink, so for the
moment I will stick to the Apache approach.
But instead of going to all that trouble, why didn't I just give up and read the CD on the RiscPC instead? Mainly because I don't have it switched on all the time (and, unfortunately, seem to find myself doing that less and less as time goes on). But also, much as it saddens me to say this, Netscape (certainly on the Linux platform anyway) is a far better browser than any Acorn product will ever be, definitely beating Fresco on usability (if it resizes the window of its own accord just once more I will DO SOMETHING VERY REGRETTABLE INDEED!!!) and performance (even simply reading local files). And how long has it been since even a bugfix version of Fresco was released, let alone an upgrade?
Unfortunately it seems that, whatever new hardware and software may be produced in the future, and whatever the developers and enthusiasts end up doing, the Acorn/RiscOS systems will always be several steps behind other platforms in terms of what they can do. Sad, but undeniably true.
Home Mail |
Page by Jonathan Marten
Last modified: Thu Jan 27 10:32:20 GMT 2005 |