Data Management & Warehousing
  • Welcome
    • Contact Us
    • QR Codes
    • Privacy & Cookie Policy
    • Internal
    • Websites
  • White Papers
  • Presentations
  • Blog
  • Our Services
    • Rates
Home » Posts tagged "Unix"

Tag Archives: Unix

Fly Fishing File System

Posted on 29 June 2003 by David M. Walker Posted in Humour Leave a comment

San Francisco USENIX Conference 1992 – Contest

The conference contest, “Name That File System”, was a great success, with several hundred (yes, you read that right) entries. The rules read as follows:

In the beginning, there was the file system, and it was good enough for the disk technology of the time. Then disks got bigger, and the block size was expanded from 512 bytes to 1K, and the 1KFS was born.

But this was not Good Enough, and so the Fast File System (FFS) had to be. And it came, and 1KFS went.

Then the Sun rose in the west, and there were networks, and NFS (Network File System) came to pass. But the east wanted to get in on the deal, and so there was RFS (Remote File System) too.

But there also had to be a local file system, and so the file system switch was born of virgin parents. And that was the end of self-control.

Now we have LFS (Log Structured File System), TFS (Translucent File System), MFS (Memory-based File System – the file system that has files and is a system, but isn’t a file system), and being introduced at this conference, the 3DFS (Three-Dimensional File System). Not to mention a few others (which I won’t mention).

What’s next? Surely we can think of something! Like: SFS – Stochastic File System. When you open a file, it opens a file at random. Opens go very quickly, and it is useful for selecting random input to programs. (No – I couldn’t use the name RFS [Random File System] here, because the initials were taken). MMFS – Mickey Mouse File System. Runs in your watch. LIFS – Language Independent File System. You have to do a set_locale() call before you open the file; all text files are translated into the appropriate language on reads and writes. SSFS – Slow SLIP File System. Carefully tuned to give performance commensurate with Serial Line IP over a 1200-baud modem. (Also, note the rare triply nested acronym [S [=Slow] S [=S [= Serial] L [=Line] IP [= Internet Protocol]] F [=File] S [=System]].) SMFS o_+ Send- mail File System. I don’t know what this does, but I had to propose it now to avoid the obvious submissions. In any case, the semantics are certainly defined by rewriting rules.

The rules of the contest are simple:

(1) All submissions must have an “XXFS” style name and a one-line expansion of the initials.
(2) There must be a (short) semantic description of the file system. Short and snappy is better than full descriptions (after all, we can always read the man page).
(3) All submissions must have your name and email address. Teams are OK, but pick one person to act as a representative.
(4) Submissions are due by 5:00 PM on Thursday. Boxes will be set up in the registration area.
(5) We’ll try to publish all submissions in ;login: and in comp.org.usenix, so try to make your submissions printable.
(6) Winners will be announced at 3:20 PM Friday (immediately before the last session).
(7) Decisions of the (guaranteed biased, arbitrary, etc.) judges are final.
(8) Prizes cannot be returned.
(9) There may be other arbitrary rules added as we think them up.

We ignored some of our own rules (as specifically permitted; we made rules that unmade rules), as will be clear in the winning entries.

The biased panel of judges apologizes if it missed some especially good ones; when you read that many, your definition of what is funny seems to get a bit blurry.

Whilst not winning a special award went to the following entry for being the only submission that actually compiles, printed here in its entirety.

The Code

/*
* Copyright (c) 1992 The Regents of the Restaurant of Le Central.
* All rights deserved.
*
* Fly-Fishing File system (FFFS) public definitions.
*
* This source code is derived from a restaurant tablecloth scribbled on
* by Eric Brunner, Marc Donner, Jan-Simon Pendry and Bucky Pope.
*
* Redistribution and use in cooked and raw forms, with or without garlic,
* are permitted provided that the following conditions are met:
* 1. Redistributions of fish must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in cooked form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* menu and/or other materials provided with the food.
* 3. All advertising materials mentioning features or use of this fish
* must display the following acknowledgement:
* This food includes fish caught with the Fly-Fishing Filesystem
* developed at the Restaurant of Le Central by its diners.
* 4. Neither the name of the Restaurant nor the names of its diners
* may be used to endorse or promote fast food products derived from
* this fish without specific prior written permission.
*
* 1.2 (Le Central) 1/22/92
*/

#ifndef _FFFS_H
#define _FFFS_H

/*
* NB: Requires the new “live”, “dead” and “fresh” type
* modifiers in ANSI C.
*/

#include /* for fishing license and legal size */
#include /* for type pan */
#include /* for struct lure and struct line */

#define LEGAL_SIZE(fish) (sizeof(fish) > TOO_SMALL)

/* to kill the fish, just cast it to “dead” */
#define KILL(fish) ((dead) (fish))

struct fish;
struct bite;
typedef struct bite meal[10];

/*
* VFS (victual file system) operations for FFFS.
*/

/*
* cast() takes a lure, casts it and returns a pointer to the fishing line.
* on error it returns null and sets errno to one of:
*
* ECALIFORNIA – No water.
*/
struct line *cast(const struct lure *fly);

/*
* hook() takes a pointer to a fishing line and returns
* a fish descriptor (fd).
*/
int hook(const struct line *l);

/*
* reel() takes a fish descriptor and returns a pointer to a fish.
* reel() closes the fish descriptor (fd).
* on error it returns null and sets errno to one of:
*
* EBADF – Bad Fish.
* ECONNABORTED – It got away.
* ENETUNREACH – Fishing net unreachable.
* ETOOBIG – The one that got away.
*/
live struct fish *reel(int fd);

/*
* unhook() removes the fish from the hook, returning the fresh fish
*/
live struct fish unhook(live struct fish *catch);

/*
* release() returns the live fish to the free pool.
*/
void release(live struct fish f);

/*
* fry() takes a fish, and a pan and makes a meal.
* on error it returns null and sets errno to one of:
*
* EBONES – Fish couldn’t be filleted.
* EINEDIBLE – Fish caught off Long Island.
* ENOSPC – Frying pan full.
* ESTALE – Fish not fresh.
* ETOOSMALL – Fish is illegally small.
*/
fresh meal *fry(dead struct fish f, pan p);

#endif /* _FFFS_H */

File System Humour Programming Unix USENIX

KeySum – Using Checksum Keys

Posted on 15 January 1997 by David M. Walker Posted in Articles Leave a comment

In 1997 we were working on a project for Swisscom Mobile and needed to innovate a way of using a checksum key on a database. Here is the solution we design:

Introduction

Keysum is a new and interesting technique (not a product) in the generation of keys within a database. It has particular application within Data Warehouses where keys are often made up of de-normalised alphanumeric data. 

The Problems

Data that has been de-normalised often has a primary key that is made up of a single string, a series of concatenated strings, or other data types that can be converted to strings. The key is traditionally costly in terms of storage requirements and access speed when used in an index. It is, however, vital to the usability of the data.

The second issue is that in a data warehousing environment data may be loaded and assigned an arbitrary unique number as a key. If the data needs to be re-loaded at a later date, possibly with additions, then it is impossible to guarantee that the same arbitrary key will be assigned to the same row.

The Solution

The solution is simplicity itself. The generated key of the row should be the checksum of the string that makes up the unique key. This will, depending on the checksum algorithm chosen, generate a large integer that will be nearly unique within the scope of the data. For example using the industry standard CRC32 algorithm will generate a number in the range 0 to 4294967296, whilst using the Message Digest algorithm MD5 will generate a number between 0 and 3.4 * 1038.

In addition to this the result can incorporate the length of the original string which improves the uniqueness of lower order algorithm results considerably.

How Does This Help?

The table key is now an integer, the optimal format on which to index. The user now calls a function to convert the required string into the checksum and uses the index to look up the appropriate row. On very large tables this is considerably faster than conventional string look-up.

Furthermore the data can be validated, as, if the current checksum differs from the stored checksum then the data has changed. This also works when re-loading data, as any existing data will still be able to reference the old key. It should also be noted that when a field within the key is altered the key also needs to be re-generated.

If this technique is used in contexts such as trend analysis within a Data Warehouse it is also possible that the occasional mis-match because of a duplicate checksum will not be statistically significant and therefore the key can be considered unique.

What are the issues?

No checksum is guarantied to be unique. It is therefore possible that two different records can return the same value. If the length is included in the checksum it is still not guarantied but it further reduces the risk. When choosing a checksum algorithm it is important to consider the amount of records for which the checksum will provide a key. If you have a table with 500,000 rows (such as a table that contains addresses) then CRC32 will have an 8500:1 chance of duplicates without considering the length of the original string.

MD5 on the other has the remote 6.8*1032:1 chance of generating a duplicate checksum. This is because it uses 128 bits rather than CRC32 which uses only 32 bits.

When implementing the algorithm it is important to note that checksums normally return unsigned integers as their result. Your database and routines that access the checksum must all be able to handle the size of the result and ensure that they deal with the issue of signed versus unsigned variables.

Is this feature available now?

There is no direct implementation of a checksum within the SQL Dialects of the major vendors currently available, however it can be implemented via an external procedure call.

The author has implemented this technique within an Oracle7™ database. A daemon was created that took as its input the string and returned two values, the checksum and the length. This was connected to the database via a ‘Database Pipe’. When a checksum was required a PL/SQL stored procedure was called that placed the string into the database pipe and received the two values, the checksum and the length, back.

The daemon was also implemented as a shared library so that it could be accessed from the command line and from other utilities that could call a shared ‘C’ library.

An optional parameter was included to allow the use of different algorithms in different contexts. For example where only a small data set needs a checksum key then CRC32 may be suitable, whilst MD5 is used only for the largest data sets.

Where do I get a checksum algorithm?

The inevitably answer to this question is ‘From the Internet’. Any site that distributes the source for FreeBSD includes an implementation of CRC32. MD5 is also widely available.

The Future Direction

The author hope that in the future that Database vendors such as Oracle will add the checksum function to their SQL dialects. Once available as a in-built function the need to implement checksums via external procedure calls will disappear and performance will be improved even more. It will also allow some standardisation is the choice and handling of the checksum algorithms.

Download KeySum: Using Checksum Keys now

Checksum Data Warehousing MD5 MIME-64 Programming Unix

Client Server Very Large Databases

Posted on 1 February 1993 by David M. Walker Posted in Articles, Presentations Leave a comment

In 1993 I was still working for Sequent Computer Systems as the Technical Leader for Databases. We won a contract to build a then massive 1000 user Unix system using Oracle 7.1 with ten 16 processor system and up to 50Gb of data in a client-server arrangement. Building this system that run Perot/Europcar car rental administration system was a major achievement at the time.

At the end of the project I presented the story at the International Oracle User Group in the Moscone Centre, San Francisco. In those days a Powerpoint was not sufficient for a presenter and I also had to produce a paper to go with the presentation.

Summary

The replacement of ‘Legacy’ systems by ‘Right-Sized’ solutions has lead to a growth in the number of large open client/server installations. In order to achieve the required return on investment these solutions must be at the same time flexible, resilient and scaleable. This paper sets out to describe the building of an open client/server solution by breaking it up into components. Each component is examined for scaleability and resilience. The complete solution, however, is more than the sum of the parts and therefore a view of the infrastructure required around each element will be discussed.

Download Client Server Very Large Databases (and Paper) Now

Client/Server Oracle Technical Architecture Unix

Archives

Random Quote

Simplicity is the ultimate sophistication.

— Leonardo da Vinci

Contact Details

Tel (UK): 01458 850 433
Skype : datamgmt

Registered Office

Manchester House, High Street
Stalbridge, Sturminster Newton
Dorset, DT10 2LL
United Kingdom

Company Registration

Registered in England and Wales
Registration Number: 3526504
VAT Registration: GB724448236
Directors: DM & HJ Walker

© 1995-2013 Data Management & Warehousing
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy