Date: Mon, 11 May 1998 22:27:43 +1000 (EST) Subject: categories: the FISh language From: Barry Jay [apologies for multiple announcements] *********************************** * Functional = Imperative + Shape * *********************************** FISh is a new array programming language that combines (and extends) the EXPRESSIVE POWER of functional programming with the EFFICIENT EXECUTION of imperative, or procedural, programming by performing STATIC SHAPE ANALYSIS on all programs. This shape computation reduces higher-order functional programs to simple imperative forms, i.e. F - Sh = I. Conversely, FISh works best when functions are constructed from imperative procedures and shape functions, as recommended by the slogan that gives the language its name. F = I + Sh FISh execution speeds on typical array problems are SEVERAL TIMES FASTER than other higher-order, polymorphic languages. This announcement, research papers, reference material and the implementation are all available from http://linus.socs.uts.edu.au/~cbj/FISh/announcement.html The main themes are introduced in the paragraphs below. Barry Jay http://linus.socs.uts.edu.au/~cbj *********************************** Expressive Power ~~~~~~~~~~~~~~~~ FISh supports - the usual functional features such as strong typing, higher-order functions and parametric polymorphism. It also supports - simple imperative features such as assignment, for- and while-loops, and local variables, using a type of commands. Procedures are represented as functions into commands, as in Algol-like languages. Unlike existing Algol-like languages, it also supports data types of arrays, so that array programs may be polymorphic in the size of the array. Now this has been extended to support - poly-dimensional array programming. If a is any array type then [a] is the type of all arrays with entries in a that are - finite dimensional, and - regular, such as vectors, matrices, three-dimensional arrays, etc. Thus, poly-dimensional array programs can be written that act on a vector, matrix, or higher-dimensional array. For example, the standard prelude supports a polymorphic constant map : (a -> b) -> [a] -> [b] which will map a function over a vector, matrix, or higher-dimensional array. Note that a matrix of integers, or type [int] has a different type from a vector of vectors of type [[int]]. Thus, given sum_int : [int] -> int which adds up an array of integers, we have map sum_int : [[int]] -> [int] which will sum each entry of the outer array. This is useful when - representing complex numbers as arrays of length 2, or - each entry in an array has an associated array of data, e.g. - each entry is given an array of nearest neighbours, or - decomposing an array into an array of blocks, e.g. - a matrix into a matrix of matrices. Efficiency ~~~~~~~~~~ ------------------------------------------------------------- | |Map Reduce Q'sort FFT MM-loops MM-ip | |_______________|____________________________________________| |---------------|--------------------------------------------| |FISh |1.04 0.37 1.93 3.57 4.36 7.05 | |---------------|--------------------------------------------| |Ocaml(in-lined)|4.00 2.66 3.53 | |---------------|--------------------------------------------| |Ocaml |5.99 4.59 15.47 8.16 7.71 60.61 | |---------------|--------------------------------------------| |speedup |5.8 12.4 8.0 2.3 1.8 8.6 | -------------------------------------------------------------- Benchmark user times See http://linus.socs.uts.edu.au/~cbj/FISh/Benchmarks for more details. Static Shape Analysis ~~~~~~~~~~~~~~~~~~~~~ Shape analysis is used to determine the number of dimensions, and the size in each dimension, of every array appearing in a program. (A program is a closed term of array or command type.) In particular, it checks that every array is regular, i.e. is a hyper-cube whose entries all have the same shape. The latter requirement is necessary to ensure that every entry in a vector of vectors has the same length, i.e. that a vector of vectors is a matrix. As well as detecting irregularities, it is able to detect all other shape errors, such as multiplying matrices of innapropriate sizes, or having incompatible numbers of dimensions. It is the power of shape analysis that makes poly-dimensional programming feasible. Knowledge of the shapes can be used to improve memory management during compilation of the resulting imperative program. In particular, FISh supports a clear stack discipline, and so does not require a garbage collector. The existing FISh compiler translates programs in to simple C code, which is then compiled and executed in the usual way. This approach is also being applied in the design and implementation of a parallel version of FISh, called GoldFISh. F = I + Sh ~~~~~~~~~~ The slogan is represented within the standard prelude by a function proc2fun : (var a -> var b -> comm) -> (#b -> #a) -> b -> a for converting procedures (and shapes) to functions. proc2fun pr sh x uses the shape function sh and the shape #x of the array x to determine the shape of the result. It then creates a local variable y of this shape, and invokes the procedure pr on and a stored value of x, finally returning the value of y. Semantics ~~~~~~~~~ Shape analysis is supported by a clean and powerful categorical semantics, in which the shape-data (or shape-entry) decomposition is represented by a pullback. For multi-dimensional arrays, this is entries [a] ---------> List a | | | |___| | shape | | map # | | \/ \/ List N x #a -----> List #a c where c takes the list of numbers ns and the a-shape sh and produces a list of length given by the product of ns whose entries are all sh. In other words, all entries of an array must have the same shape. Poly-dimensionality ~~~~~~~~~~~~~~~~~~~ FISh supports the usual Hindley-Milner style of polymorphism. It also supports polymorphism in array sizes (unlike earlier Algol-like languages) and in the number of dimensions of an array. That is, FISh supports polydimensional programming. For example, the map function is defined to have type map : (a -> b) -> [a] -> [b] and so may act on a vector, matrix or higher-dimensional array. However, the existence of the simple imperative features means that this can be compiled into a sequence of nested for-loops corresponding to the number of dimensions of the array argument. Polydimensionality is a form of *shape polymorphism*. Array Programming ~~~~~~~~~~~~~~~~~ Poly-dimensionality means that many array programs, e.g. numerical recipes, can be written for any number of dimensions while maintaining static type and shape checking. For example, FISh supports poly-dimensional stencilling and a poly-dimensional difference equation solver (see Sample Programs). Parallel Programming ~~~~~~~~~~~~~~~~~~~~ Size and shape are important parameters in estimating the cost of alternative parallel implementations. FISh has been designed to support a parallel variant, called GoldFISh, currently under development. GoldFISh will support additional parallel combinators for some of the existing FISh functions that express the usual second-order constants and also the usual array distributions (see Sample Programs). *********************************** From: "Frode Odegard" Date: Mon, 11 May 1998 12:26:46 -0700 Subject: categories: Re: the FISh language The right URL appears to be: http://www-staff.socs.uts.EDU.AU:8080/~cbj/FISh/Announcement/ Best regards, Frode Odegard -- .......................................................... Frode Odegard - frode@odegard.com - http://www.odegard.com Date: Wed, 28 Oct 1998 19:22:05 +1100 (EST) Subject: categories: Update on FISh and shape From: Barry Jay The FISh homepage http://www-staff.socs.uts.EDU.AU:8080/~cbj/FISh/ has a lot of new material. Here are a couple of pages. Barry Jay What's New! =========== Polymorphic quicksort in FISh is faster than in C for quicksort (see below). Download FISh-1.5. Check the list of new features. The formal language definition for FISh is based on the original design. Versions since 1.4 have been checked for compliance. Source code for FISh is also available with the distribution. Now you can compile your own executables, or try out your own language design ideas. Poly-dimensional array programming (revised 4/8/98) Costing Parallel Programs as a Function of Shapes (revised 29/9/98) A simple example of polydimensionality in solving difference equations. A new function "selfmap" in standard_prelude.fsh gives more efficient mapping algorithms for idempotent functions. Mail list for FISh users - sign up now! Sometimes FISh faster than C ============================ Shape analysis in FISh allows polymorphic functions to be specialised to the exact shape of their arguments, so that it is never necessary to box data structures. This can have a significant impact when the cost of pointer-chasing is high compared to the operation being performed. This is illustrated by polymorphic quicksort. The FISh program looks like a standard recursive functional program of type [a] -> [a]. Here is the corresponding C code using C's standard polymorphic quicksort in C, called qsort. FISh takes less than half the time on large arrays of integers. Details of the tests can be found in the Benchmarks page. The reason is that the C program achieves polymorphism by requiring the comparison function to act on pointers, rather than values. The actual comparison function for floats is int comp(const void *i, const void *j) { int res ; if (*(double*)i - *(double*)j > 0.0 ) {res = 1 ;} else {res = -1 ;} return res; } As well as slowing down the program, these pointers are a source of program clutter, potential errors, and type violations. From: "Ralph Loader" Date: Thu, 29 Oct 1998 13:46:02 +0000 Subject: categories: Re: Update on FISh and shape > > Polymorphic quicksort in FISh is faster than in C for quicksort (see below). Some comments. You're simply not comparing like-with-like. The differences in performance that you're seeing have nothing to do with the boxing/ unboxing issues that you claim are involved. (1) Your C code compares using two floating point operations (subtraction, compare), while the fish program uses just one (no subtraction). (2) The output of the fish-to-C translation uses statically allocated memory, and won't multi-thread correctly. Sun Solaris does support multi-threading, so while your compiler uses statically allocated memory here, the Solaris C library can't. [As a matter of interest, what happens if you call your fish quicksort with a comparison function that itself calls quicksort?]. (3) The C library qsort takes a three-valued comparison function, while your fish code uses a two-valued comparison. > Shape analysis in FISh allows polymorphic functions to be specialised > to the exact shape of their arguments, so that it is never necessary > to box data structures. This can have a significant impact when the > cost of pointer-chasing is high compared to the operation being > performed. ["never"? Show me an unboxed cyclic data structure.] (4) As you note, your fish code gets optimised by specialising to the datatype and comparison function used. This is the main performance issue, and has nothing to do with boxing/unboxing issues. Note that the C library qsort is a polymorphic function, is not specialised to particular arguments, and still takes an array of unboxed elements. Specialising functions to their arguments is nothing new to FISh - some (but not all) C compilers, for instance, on inlining a function will also specialise it. The reason why the qsort function is not specialised to its arguments is because it's a precompiled library function, not because it's written in C. > The reason is that the C program achieves polymorphism by requiring > the comparison function to act on pointers, rather than values. The > actual comparison function for floats is The reality is that the machine code generated by your fish program ends up dealing with pointers also: on a typical RISC processor, to compare two doubles held in memory, you need to compute the addresses of the doubles, load them into registers and do the comparison. Passing pointers to the comparison function just means that the values are loaded into registers in the function, rather than before the function call. Whether its faster to load the values into registers before the function call or in the function will depend on things like instruction scheduling in the compiler & CPU, & register allocation in the caller & in any case the difference will be tiny. > As well as slowing down the program, these pointers are a source of It's not very often that you want to sort an array of numbers. If you want to sort some records that contain both a sort-key and some extra data, then using pointers is faster. > program clutter, potential errors, and > type violations. > > > > Ralph. (not speaking on behalf of Paradigm) Ralph Loader Paradigm Technology, Level 13, Paxus House, 79 Boulcott Steet, Wellington, New Zealand Phone: +64 4 495 1004 Fax: +64 4 499 7762 Date: Mon, 2 Nov 1998 16:46:48 +1100 (EST) Subject: categories: Re: FISh performance From: Barry Jay Dear Ralph, thank you for taking an interest in the FISh performance results for quicksort. You raise two basic questions. Is the FISh program faster? If so, why? Is quicksort in FISh faster than qsort (on arrays of doubles)? ============================================================== > (1) Your C code compares using two floating point operations >(subtraction, compare), while the fish program uses just one (no >subtraction). This makes a small difference to the overall performance of the algorithm. I re-ran the test for qsort with the following comparison function int comp(const void *i, const void *j) { return (*(double*)i > *(double*)j ) ; } that has only one operation. The C results are better by about 10% but FISh is still twice as fast. C benchmark1 C benchmark2 FISh benchmark 3.59 3.35 1.65 >(3) The C library qsort takes a three-valued comparison function, >while your fish code uses a two-valued comparison. I'm not sure what point you're making. The designers of qsort had a free choice. I suspect that the three-valued comparison is used to exploit equality of values, to obtain greater efficiency. In any event, it is unlikely to make much difference in the light of (1) above. Conclusion: The FISh program *is* faster. Why is quicksort in FISh faster? ================================= >(4) As you note, your fish code gets optimised by specialising to the >datatype and comparison function used. Yes. This is a given. The question is to pinpoint the source of the speedup. > ... This is the main >performance issue, and has nothing to do with boxing/unboxing issues. Point taken. The data supplied to quicksort is not boxed, but this is because (the length of the array and) the size of the array entries are given as explicit parameters. In this sense qsort is *not* polymorphic in its choice of data type. If it were not provided by the C programmer then boxing would add another layer of indirection to the program. Such information about the size of entries etc. is inferred by the FISh compiler. >... Whether [FISh is faster] will depend on >things like instruction scheduling in the compiler & CPU, & register >allocation in the caller & in any case the difference will be tiny. Your conclusion does not match the performance figures above. The FISh program avoids making a function call at all. That is, instead of passing the addresses to a function, a primitive comparison is made. The key point is that this can only be done because information like the length of the array and the size of the entries (in bytes) can be inferred by the FISh compiler, instead of being supplied by the user. Yours, Barry Jay ************************************************************************* | Associate Professor C.Barry Jay, | | Reader in Computing Sciences Phone: (61 2) 9514 1814 | | Head, Algorithms and Languages Group, Fax: (61 2) 9514 1807 | | University of Technology, Sydney, e-mail: cbj@socs.uts.edu.au | | P.O. Box 123 Broadway, 2007, http://www-staff.socs.uts.edu.au/~cbj | | Australia. FISh homepage ... ~cbj/FISh | ************************************************************************* Date: Tue, 03 Nov 1998 15:41:11 +1100 From: Robbie Gates Subject: categories: Re: FISh performance Warning: there's no category theory in this letter - it's purely a reponse to the discussion of execution speed of FISh quicksort. Barry Jay wrote: > Why is quicksort in FISh faster? > ================================= I would like to propose another reason the FISh quicksort appears to be faster than the library qsort - they aren't the same algorithm. In particular, the standard library qsort and the FISh sort are using different methods to pick the pivot. Choice of pivot in quicksort is important to ensure the worst case doesn't degenerate to O(n^2). The choice of pivot in the FISh code is always the first element, which does give very poor performance in the case the array is already sorted. I'm not sure what the qsort in the libc i have here does. However, typical tricks include selecting the median of three elements at the bottom, middle and top of the array, choosing a random element, choosing the median of three random elements, and so on. Importantly, each of these choices requires some work, and thus negatively impacts on the average speed of the sort in order to alleviate the worst case speed. Caveat: I'm not a professional benchmarker/tester, and the tests below are probably very flawed. I have tried to give details on everything that seems relevant, but your mileage may vary. This data is more given to suggest an alternative, plausible explanation for the observed speed difference in the FISh and C benchmarks, than to claim i have identified all the issues in the timing results found for FISh and C quicksort. To test this hypothesis, i coded up quicksort in C++ using the same pivot picking strategy as the FISh code. I then modified my C++ code, and the C benchmark code and C translated FISh code (CTFC) from the FISh website (by CTFC i mean the contents of test01.c), by altering the random number generator to allow specification of two command line arguments giving the multiplier and step for the seeding algorithm. i.e rather than seed = seed * 25173 + 17431 i used seed = seed * mult + 17431 for varying mult. This allowed me to test the extent to which the algorithm was sensitive to the data being sorted. What follows are timing values on a DGUX AlphaStation 255 (233 MHz) with mult = 1,2,3, ... ,10 (on an array of 10000 values). For each program, i give the total (user + system) time reported by the builtin time function of bash version 2.02.0(1)-release (alpha-dec-osf4.0) (averaged over 10 trials in each case). The time columns are libqsort (= the standard library qsort), cppqsort (= my C++ qsort), fishqsort (= CTFC). They were compiled with gcc, g++, gcc in each case, and -O optimization. NOTE: With these mult values, the data supplied to the qsort is deliberately nonrandom to try and detect the effect proposed in the second paragraph. mult libqsort cppqsort fishqsort 1 .0350 3.462 3.410 2 .0348 4.039 4.039 3 .1003 .0334 .0391 4 .0350 4.085 4.052 5 .1130 .0332 .0393 6 .0356 4.042 4.039 7 .1025 .0341 .0382 8 .0357 4.061 4.054 9 .1088 .0336 .0382 10 .0355 4.053 4.041 Observe that the libqsort has much better average performance (and that its profile is different to the other two - one could probably determine something about the choice of pivots with enough tests ;-). Also, the CTFC at best beats the library by a factor a bit under 3, whereas the library beats the CTFC by a factor of over 110. More importantly, the C++ and fish code run in very similar time, neither consistently beating the other. Also, their profiles are almost identical. I also ran one bigger test - in the case of a 90000 element array consisting of 1 to 90,000 in order, the library took around .3 seconds, whereas the CTFC overflowed the stack and crashed just under 200 seconds into the test (the C++ code took 315 seconds to overflow the stack and crash - i'm not sure what's causing the difference here, but i suspect it has to do with the fact my C++ code used paramaters rather than a static struct). Based on this, I'm proposing that the standard library quicksort is trading off speed for the average cases to bring the worst cases down to something acceptable. This is presumably a serious issue for a library quicksort - its better to run on average somewhat slower in order to get consistency, as this makes applications easier to test for market acceptance of speed, rather than the application apparently hanging when the user manages to sort the worst case database with about 1 million entries in it (when your testers happily sorted their million entry db's with no trouble). one other point - the compile time polymorphism is not specific to FISh (as Ralph Loader mentioned). You write: > The FISh program avoids making a function call at all. That is, > instead of passing the addresses to a function, a primitive comparison > is made. The key point is that this can only be done because > information like the length of the array and the size of the entries > (in bytes) can be inferred by the FISh compiler, instead of being > supplied by the user. C++ templates also give the ability to supply a compile time comparision function to sort routines, the code will likewise be specialised on the given comparison and on the size of the entries. One could write sort templates that take the length as well, however since the size of stuff to be sorted typically varies depending on the current application state (i.e. in a manner unknown at compile time) this tends not to be enough of a win to be worth doing. - robbie -- *--------------------------------> + <---------------------------------* |_| robbie gates | pgp key available |_| V category theorist V //cat.maths.usyd.edu.au/~robbie V *--------------------------------> + <---------------------------------*