[Tux Meets Acorn]  
How to Browse a RiscOS CD under Linux


Making the Disc Readable
Fixing the Hyperlinks
Identifying the Sprite Files
Converting the Sprite Files
Doing That Automatically
 

Fixing the Image Links
Success at last!
An alternative approach?
So what was the point of all that?

"Cross-platform" should be one of the most important concepts in the designer's mind when publishing electronically. Certainly for the Internet, most of which at least is accessible by anyone regardless of the hardware platform or operating system that they happen to be using (despite the best efforts of some software producers and Web site designers to continue to lock users into proprietary software and data formats).

This principle should also apply to data on CDs. The example that started me off on this exercise was the Archive Magazine "Volume 12" CD, including the complete contents of all the magazines for the last eight years (the magazine is currently on volume 13, but the early issues are not available electronically). The CD is mastered to the ISO9660 standard, and the text is in HTML form - so theoretically it should be possible to read the contents on any platform, including the home-built Linux system which currently shares my desk with the Acorn RiscPC. Being able to do so would have assured me that it was safe to throw out the paper magazines, and reclaim a metre or so of shelf space, yet be able to turn to the CD for reference at any time in the future. So I mounted the CD and started browsing...

The task turned out to be not as easy as expected, and this article describes what had to be set up in order to be able to browse the disc and the magazine articles under Linux. It is not intended as a step-by-step guide, but rather as an example of how problems can be solved and the necessary conversions implemented using the available applications and tools. It is also an illustration of the value of open-source software, especially the ability for anyone to make changes and extensions to suit their needs.

I make no claims that this is the only way the problem could be solved, or that it is the "correct" way to do so, but it works for me. Although the examples relate to the Archive CD mentioned above, the general principles should be applicable to any similar CD.

 

Making the Disc Readable

The CD could be mounted with no problems (using the default mount options for now). However, the results of looking in the obvious place for the HTML files were not quite right:
  $ ls /cdrom/html
  archive    gallery    index/htm  menus
  $ ls -laF /cdrom/html
  ls: /cdrom/html/index/htm: No such file or directory
  total 10
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 .
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:01 ..
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:13 archive
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 gallery
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 menus
  $
and KDE fared little better:

[KFM screen shot]

It seemed that the file in question ("index/htm") was actually recorded on the CD with a slash in the name. Which will naturally cause great confusion for any Unix-based system, since it is the one special filename character which is interpreted by the kernel as the directory separator and which can never exist in a leaf name. The Unix-Haters' Handbook describes a similar situation.

The same applied to the article files, further down in this directory tree. Which was not promising: in this state it was not even possible to read the vital HTML files, let alone display them!

The solution to this was to modify the isofs file system (in source file fs/isofs/dir.c). I had recently upgraded my ancient Slackware system to SuSE 6.2 which included kernel 2.2.10, and rebuilt the kernel with isofs as a module. The kernel already incorporated a patch by Matthew Wilcox to partially handle the Acorn CD format and append a file type suffix where appropriate, but it did not address the "/"-in-filename problem. But this turned out to be easy enough to fix, in the existing function 'isofs_name_translate'. The second change, in Matthew's new function 'get_acorn_filename', ensures that the ",xxx" extension is not added if the filename already contains a "." (which may have originally been a "/"), so that the HTML files don't get the Acorn filetype appended as well.

  --- fs/isofs/dir.c.orig Thu Nov 25 11:22:55 1999
  +++ fs/isofs/dir.c      Fri Nov 26 09:30:43 1999
  @@ -88,4 +88,8 @@
                          c = '.';
 
  +               /* RiscOS hack - convert '/' to '.' */
  +               if (c == '/')
  +                       c = '.';
  +
                  new[i] = c;
          }
  @@ -106,13 +110,16 @@
          if ((*((unsigned char *) de) - std) != 32) return retnamlen;
          chr = ((unsigned char *) de) + std;
  -       if (strncmp(chr, "ARCHIMEDES", 10)) return retnamlen;
  +       if (strncmp(chr, "ARCHIMEDES", 10) !=0 ) return retnamlen;
          if ((*retname == '_') && ((chr[19] & 1) == 1)) *retname = '!';
  -       if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff)
  -               && ((chr[12] & 0xf0) == 0xf0))
  +       if (memchr(retname, '.', retnamlen) == 0)
          {
  -               retname[retnamlen] = ',';
  -               sprintf(retname+retnamlen+1, "%3.3x",
  -                       ((chr[12] & 0xf) << 8) | chr[11]);
  -               retnamlen += 4;
  +               if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff)
  +                   && ((chr[12] & 0xf0) == 0xf0))
  +               {
  +                       retname[retnamlen] = ',';
  +                       sprintf(retname+retnamlen+1, "%3.3x",
  +                               ((chr[12] & 0xf) << 8) | chr[11]);
  +                       retnamlen += 4;
  +               }
          }
          return retnamlen;
Having made this change and rebuilt the module, and carefully read the mount(8) manual page and discovered the map=acorn option to enable the Acorn extensions, it was possible to mount the CD and get directory listings with sensible file names (not only for the HTML, but for the sprite files as well):
  $ mount -o ro,map=a /cdrom
  $ ls -laF /cdrom/html
  total 11
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:01 ../
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:13 archive/
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 gallery/
  -r-xr-xr-x   1 root     root          466 Apr 23  1999 index.htm*
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 menus/
  $ ls -laF /cdrom/html/gallery
  total 248
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ../
  -r-xr-xr-x   1 root     root       156984 Apr 23  1999 cbot,ff9*
  -r-xr-xr-x   1 root     root        89592 Apr 23  1999 ctop,ff9*     
  $

 

Fixing the Hyperlinks

Now it was possible to point Netscape at file:/cdrom/html/index.htm and get a page displayed. But there were two problems: what should have been the front page of the magazine was displayed as two broken images, and the links to the volume and article indexes led nowhere. Taking the second of these as the more serious, this is because the index.htm file read something like this:
  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
  <HTML>
  <HEAD><TITLE>Archive Magazine Front Cover</TITLE></HEAD>
  <BODY TEXT="#000000" BGCOLOR="#FFFFFF">
  <CENTER>
  <a href="Menus/volume.htm"><img src="Gallery/CTop" width="416" height="214" border="0"></a><br>
  <a href="Menus/artcle.htm"><img src="Gallery/CBot" width="416" height="376" border="0"></a><br>
  </CENTER>
   <HR><br>
  <CENTER><font size=1>Copyright © 1996-99 Magnate.</font><br></CENTER>
  </BODY>
  </HTML>
Leaving out the absence of ALT= specifications for the images for the moment, the problem is that the link filenames are correct (and relative, good mark there!) but use mixed case. This is not a problem on RiscOS with its case-independent filing system, but fails miserably on Unix (filenames recorded on a CD are normally in upper case, and isofs converts them to all lower).

Another scan through the mount(8) manual page led me to the check=relaxed option, to make all filename lookups case-independent. But while this seemed like the ideal option to use, it didn't seem to work:

  $ mount -o ro,map=a,check=r /cdrom
  $ ls -laF /cdrom/html/Menus
  cat: /cdrom/html/Menus: No such file or directory
  $ ls -laF /cdrom/html/menus
  total 102
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ../
  -r-xr-xr-x   1 root     root        88235 Sep 30 15:54 artcle.htm*
  -r-xr-xr-x   1 root     root         8644 Sep 30 10:38 volume.htm*
  $ cat /cdrom/html/Menus/volume.htm
  cat: /cdrom/html/Menus/volume.htm: No such file or directory
  $
At this point I couldn't work out what was happening - all was correct according to the man page, but the check=relaxed option just didn't seem to be working. After a lot of scanning through the sources, and following the age-old debugging technique of inserting printk(9)'s at key places, I noticed that there appeared to be some missing elses in the big option decoding loop (function 'parse_options' in source file fs/isofs/inode.c). Without going into too much detail, the effect of this was that if certain mount options (map= and session=) were used, they would be decoded correctly but any subsequent options would be ignored. I don't know if this is intended behaviour - although it is not documented, and I can think of no good reason why it should work this way, so it is probably a bug.

Putting the mount options the other way round fixed the problem:

  $ mount -o ro,check=r,map=a /cdrom
  $ ls -laF /cdrom/html/Menus
  total 102
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ../
  -r-xr-xr-x   1 root     root        88235 Sep 30 15:54 artcle.htm*
  -r-xr-xr-x   1 root     root         8644 Sep 30 10:38 volume.htm*
  $
and correcting the option parsing ensured that it would work for all combinations of options:
  --- fs/isofs/inode.c.orig       Thu Nov 25 12:36:36 1999
  +++ fs/isofs/inode.c    Thu Nov 25 12:40:06 1999
  @@ -344,5 +344,5 @@
                          else return 0;
                  }
  -               if (!strcmp(this_char,"session") && value) {
  +               else if (!strcmp(this_char,"session") && value) {
                          char * vpnt = value;
                          unsigned int ivalue = simple_strtoul(vpnt, &vpnt, 0);
  @@ -350,5 +350,5 @@
                          popt->session=ivalue+1;
                  }
  -               if (!strcmp(this_char,"sbsector") && value) {
  +               else if (!strcmp(this_char,"sbsector") && value) {
                          char * vpnt = value;
                          unsigned int ivalue = simple_strtoul(vpnt, &vpnt, 0);

 

Identifying the Sprite Files

By now it was possible to browse the text (starting from URL file:/cdrom/html/index.htm) and follow the hyperlinks. But there were, of course, no illustrations - they are all stored in Acorn sprite format which of course the RiscOS browsers understand but the "industry standard" ones do not (Netscape somehow managed to deduce that they were MIME type video/unknown...).

Before even thinking about how to convert the sprite files into a suitable form for displaying (preferably on the fly as required), there was the problem of recognising the sprite files (the Acorn extensions report the names as "foo,ff9" and similarly for other file types). Unfortunately the filename matching in pretty much every Unix application expects the extension to be separated by a dot rather than the comma, so the next step was to fix this.

It would have been simple enough to just substitute a dot for the comma in Matthew Wilcox's patch above. However, in response to a question that I posted on the Acorn programming newsgroup, Darren Salt sent me a patch which incorporates a table of known Acorn file types and extensions, and some additional mount options. Known file types are converted to dot-separated extensions (exactly the format that was required), while other file types would be appended as a comma-separated number as before. He had also anticipated my original "/" to "." change.

After applying Darren's patch, and restoring the two missing elses in the newly-patched inode.c, the rebuilt module should have given sensible results for both the HTML and sprite files. But it didn't - sometimes it would fail to add either the registered extension or the file type number, inconsistently and apparently at random. After looking more closely at Darren's code it turned out that he was using strchr(3) on a string that is not zero-terminated, possibly running off the end (the sort of thing that can easily panic the system!). Making yet another patch:

  --- fs/isofs/dir.c.ds   Fri Nov 26 14:40:55 1999
  +++ fs/isofs/dir.c      Fri Nov 26 14:41:30 1999
  @@ -133,5 +133,5 @@
                                  /* We have a filetype -> extension mapping;
                                   * only append it if no existing extension */
  -                               if (strchr(retname, '.') == 0)
  +                               if (memchr(retname, '.', retnamlen) == 0)
                                  {
                                          const char *ext = acorn_file_extensions[i].extension; 
and again rebuilding the module now gives the correct results for all file types:
  $ mount -o ro,map=A,check=r /cdrom
  $ ls -laF /cdrom/html
  total 11
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:01 ../
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:13 archive/
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 gallery/
  -r-xr-xr-x   1 root     root          466 Apr 23  1999 index.htm*
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 menus/               
  $ ls -laF /cdrom/html/Gallery
  total 248
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:32 ./
  dr-xr-xr-x   1 root     root         2048 Sep 21 20:12 ../
  -r-xr-xr-x   1 root     root       156984 Apr 23  1999 cbot.spr*
  -r-xr-xr-x   1 root     root        89592 Apr 23  1999 ctop.spr*
  $

 

Converting the Sprite Files

Now that the sprite files could be identified by their ".spr" extension, I needed a converter which would allow them to be displayed in a "standard" graphics format - preferably one that the browser would understand natively so that inline images would be displayed as intended. A quick search of the Web and Hensa turned up the sprtools package, produced by DEEJ Technology (aka David J. Ruck). This converts between Acorn sprites and a number of other formats, and works on Windows and Unix as well as RiscOS. Unfortunately Hensa only had the ARC format file available (according to the documentation a tar file is available for Unix), but nspark unpacked the archive quite happily.

After moving some files around to convert the RiscOS directory structure to the corresponding Unix version, and configuring it to support Linux in accordance with the instructions (the package as distributed supported HP-UX and SunOS/Solaris only), the tools compiled and installed with no problems. Having done that, and after some experimenting with image formats and additional netpbm filters, it was possible to do something like:

  $ spr2ras /cdrom/html/gallery/ctop.spr | display
and view the sprite files (in their own window, for the moment) in all their glory:

[ImageMagick screen shot]

 

Doing That Automatically

Somehow the sprite files have to be coverted to a form that the browser can display inline, on demand. At this point I couldn't find a way of doing so with the browser alone (but see below), so some more help was needed. I'd installed the Apache web server on the system, so after setting up a symlink to /cdrom in the document root it was possible to browse the CD (starting from URL http://localhost/cdrom/html/) in the same way as before. Still no images yet, but...

Using Apache's AddType and Action directives, it is possible to associate first of all a MIME type with a file extension, and then a filter script which is called to handle that MIME type. The filter script works in the same way as a CGI, so it can convert the input file to any content type. So all it took was a simple handler script, installed in the server's CGI directory:

  #!/usr/bin/perl
  $ENV{PATH} .= ":/usr/local/bin:/usr/X11/bin";
  $| = 1;
  print "Content-Type: image/gif\n\n";
  exec "spr2ras $ENV{PATH_TRANSLATED} | rasttopnm | ppmtogif";
(There is no single conversion in sprtools from sprite to GIF - or any other format that Netscape can handle directly - so the two-stage pipeline and the additional netpbm filter is necessary). Now this handler can be registered in the Apache configuration file:
  <Location /cdrom>
  AddType        image/x-acorn-sprite .spr
  Action         image/x-acorn-sprite /cgi-bin/sprite-handler.pl
  </Location>
With all this up and running, it was finally possible to view a sprite image in the browser:

[Netscape screen shot]

 

Fixing the Image Links

There is one more thing that needs to be done so that a link to, for example, "Gallery/CTop" as above will load and convert the sprite file "gallery/ctop.spr" via the handler above. This involves not only doing a case-independent lookup (taken care of with check=relaxed above), but also loading the file "ctop.spr" when file "ctop" was requested.

My first attempt at fixing this was to use Apache's spelling checker (compiled in as standard, but normally inactive unless it is enabled with the CheckSpelling directive), since one of the checks that it will perform is for a file with the same basename as requested but with an additional extension. Unfortunately, even if this results in only one correction Apache considers it sufficiently "different" from the original that it always outputs a confirmation page (with a link to the correction) - it does not issue an automatic redirect for the correction. Netscape, of course, does not follow the link (although it would follow an automatic redirect) and still displayes "broken" images.

So the next approach was to use the ErrorDocument directive to implement a handler for the error 404 ("Not Found"), check for the existence of the file with extension, and return a redirect to that if appropriate. This script is installed in the server's CGI directory:

  #!/usr/bin/perl

  require "cgi-lib.pl";                                 # the form parsing library
  use File::Basename;                                   # file name processing

  $file = $ENV{REQUEST_URI};                            # file that was requested
  $file =~ s/^\///;                                     # make relative to root

  $base = basename($file);                              # base name of file
  $base =~ tr/A-Z/a-z/;                                 # make all lower case
  $file = dirname($file)."/".$base;                     # reconstruct full pathname
                                                        # find all alternatives
  @alts = split(' ',`cd $ENV{DOCUMENT_ROOT} && ls -d $file.* $file,??? 2>/dev/null`);

  if (scalar(@alts)==1)                                 # only one alternative
  {
      print "Status: 302 Use Alternative\n";            # so redirect to that
      print "Location: /",$alts[0],"\n";
      print "\n";
      exit 0;
  }

  #  No alternative found, or more than one, so display a normal error page.

  print &PrintHeader;
  print &HtmlTop("404 Not Found");

  print "URL not found: <CODE>$ENV{REQUEST_URI}</CODE>\n";
  print "<BR>\n";
  print "Error information: <CODE>$ENV{REDIRECT_ERROR_NOTES}</CODE>\n";
  print "<P>\n";

  if (scalar(@alts)>0)
  {
      print "Possible alternatives:<BR>\n";
      print "<UL>\n";
      foreach (@alts)
      {
          print "<LI><A HREF=\"/$_\">/$_</A>\n";
      }
      print "</UL>\n";
  }

  print &HtmlBot;
and registered in the server configuration file:
  ErrorDocument 404 /cgi-bin/cgi-error_404.pl
And that works nicely: requesting a sprite file (without the ".spr" extension) runs the script to find the corresponding file with the extension, and returns a redirect to that location:
  $ telnet localhost 80
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
  GET /cdrom/html/Gallery/CTop HTTP/1.0
 
  HTTP/1.1 302 Use Alternative
  Date: Fri, 26 Nov 1999 21:00:15 GMT
  Server: Apache/1.3.6 (Unix)  (SuSE/Linux)
  Location: http://localhost/cdrom/html/Gallery/ctop.spr
  Connection: close
  Content-Type: text/plain
 
  Connection closed by foreign host.
  $
(Note: according to the Apache documentation, status 303 ("See Other") ought to be more appropriate than 302 ("Moved Temporarily"). But this status code is not mentioned in RFC 1945, and neither Netscape not KDE recognise it as a redirect.)

 

Success at last!

After doing all of this, it was finally possible to view the pages in Netscape, as they were intended to appear and with all the links working. Here's the front page, just to prove it:

[Netscape screen shot]

And, as a bonus, it all works in KDE as well. And even, as a final perverse twist, in Fresco on the RiscPC (over the network)...

 

An alternative approach?

Having looked more closely at the Netscape installation, it appears to use something called Plugger to display MIME types which are not built in (it comes configured for various image, video and audio formats). So by adding the ".spr" extension and the image/x-acorn-sprite MIME type to Netscape's list of applications, and setting up the filter in the plugger(7) configuration, it may be possible to convert and display the sprite images this way. However, that doesn't solve the problem of needing to have the extension added to the filename in a hyperlink, so for the moment I will stick to the Apache approach.

 

So what was the point of all that?

Well, it was an interesting exercise, and a demonstration of the power of free software. And now I can have all those old Archive issues available for reference, yet reclaim that shelf space after all!

But instead of going to all that trouble, why didn't I just give up and read the CD on the RiscPC instead? Mainly because I don't have it switched on all the time (and, unfortunately, seem to find myself doing that less and less as time goes on). But also, much as it saddens me to say this, Netscape (certainly on the Linux platform anyway) is a far better browser than any Acorn product will ever be, definitely beating Fresco on usability (if it resizes the window of its own accord just once more I will DO SOMETHING VERY REGRETTABLE INDEED!!!) and performance (even simply reading local files). And how long has it been since even a bugfix version of Fresco was released, let alone an upgrade?

Unfortunately it seems that, whatever new hardware and software may be produced in the future, and whatever the developers and enthusiasts end up doing, the Acorn/RiscOS systems will always be several steps behind other platforms in terms of what they can do. Sad, but undeniably true.


[Home Page] Home         [Mail Me] Mail Page by Jonathan Marten
Last modified: Thu Jan 27 10:32:20 GMT 2005