The MT Stack: Paging Algorithm and Performance in a Distributed Virtual Memory System


 
 
Advances in parallel computation are of central importance to Artificial Intelligence due to the significant amount of time and space their pro- grams require. Functional languages have been identified as providing a clear and concise way of programming parallel machines for artificial intelligence tasks. The problems of exporting, creating, and manipulating processes have been thoroughly studied in relation to the paralleliza- tion of functional languages, but none of the necessary support structures needed for the ab- straction, like a distributed memory, have been properly designed. In order to design and im- plement parallel functional languages efficiently, we propose the development of an all-software based distributed virtual memory system de- signed specifically for the memory demands of a functional language. In this paper, we review the MT architecture and briefly survey the related literature that lead to its development. We then present empirical results obtained from observ- ing the paging behavior of the MT stack. Our empirical results suggest that LRU is superior to FIFO as a page replacement policy for MT stack pages. We present a proof that LRU is an opti- ?Partially supported by the Seton Hall University Re- search Council. †Partially supported by NSF grant CDA-9114481. ‡Partially supported by NSF grant HRD-9703600. mal page replacement policy. Based on this proof the MT stack page replacement policy was de- veloped and implemented. We outline the paging algorithm and present an argument of partial cor- rectness. The MT stack page replacement policy is superior to LRU, because it does not incur the expensive time penalties associated with imple- menting LRU in software. 
 
 



INTRODUCTION
Functional languages are closely intertwined with modern software and knowledge engineering.In software engineering, LISP-like languages provide software engineers with an environment in which they can rapidly build powerful prototypes that evolve as the needs of end-users change.Furthermore, they provide an environment in which specification testing can be easily and rapidly done, and they facilitate the development of provably correct software systems.In knowledge engineering, as with most artificial intelligence (AI) applications, the inherent flexibility and orientation toward symbol manipulation of LISP-like languages make these the preemi-CLEI ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 nent programming languages.AI programs often manipulate complex information whose natural representations make full use of LISP's ability to create novel data structures; in addition, functional languages permit full flexibility in defining and manipulating programs as well as data.
Given the important role that functional languages play in both software and knowledge engineering, improving the performance these languages is critical.This is especially true for AI programs due to the significant amount of time and space they require [41].Parallelism holds great promise for improving the performance of functional languages.Writing parallel programs, however, remains a difficult and non-intuitive task, despite efforts to parallelize AI programming languages such as LISP.Parallel programming is difficult, because programs must be partitioned into independent tasks, these tasks must be mapped onto a network of processors, and communication between tasks must be explicitly programmed without causing the system to deadlock [40].
In order to design and implement parallel functional languages efficiently, we propose the development of an all-software based distributed virtual memory system designed to efficiently satisfy the memory demands of a functional language.There is, however, no clear understanding of the paging behavior of functional languages.The MT system is being developed to study the design of an all-software distributed virtual memory (DVM) for a list-based pure functional language.Efficient storing and accessing of the stack data in functional languages is of particular importance, because function calling is more prevalent than in other languages.In this paper, we present results obtained from observing the paging behavior of the MT stack.Our focus in this paper is on the policy used for swapping stack pages in and out of the evaluator's local (and private) memory.We empirically establish that LRU1 performs better than FIFO 2 .Furthermore, we argue for the optimality of LRU and present a page replacement algorithm that behaves like LRU for stack pages, but that does not incur the expensive time penalties per stack access associated with implementing LRU in software.DVM systems to date have not been designed or developed to satisfy the needs of a specific type of programming language [27,31,39].To the authors' knowledge MT is the first such attempt.

Parallelism in Functional Languages
Functional languages which are used by AI developers, such as LISP, provide a clear and concise way to program parallel computers [11,13,16,17,25,28,40,49].These languages are attractive candidates for parallelization, because no new language constructs need to be developed to extract parallelism, the results of all programs are deterministic, deadlock can not arise, and establishing program correctness is no more difficult than in its sequential counterpart [40].In fact, the parallel code may look the same as sequential code.
We can identify two approaches to the exploitation of parallelism in functional languages.The first has focused on running the evaluator in parallel with a garbage collector.The second approach identifies parallelism in user code.In this approach, the amount of parallelism that can be exploited depends on the type of code a programmer writes.One of the goals of the MT system is to foster locality of reference without employing a garbage collector.The results presented in this article were obtained from a system in which garbage collection was not enabled.Therefore, we will not survey distributed garbage collection literature.The interested reader is referred to [1] for a survey of distributed garbage collection algorithms.
There have been several attempts to parallelize user code [6,15,18,29,30].These approaches attempt to evaluate arguments in parallel (horizontal parallelism) or attempt to pipeline processes that produce values with processes that consume values (vertical parallelism).Some systems require the programmer to develop parallel algorithms [40], to use predefined templates [6], or to use annotations [29].These approaches break the abstraction barrier that functional languages provide and require the programmer to identify parallelism, thus, focusing the programmer away from the problem they want to solve.
Systems that do not require the programmer to identify parallelism have not achieved their theoretical potentials [15,35].Some of the inefficiency has been associated with the excessive copying of data between processors or the amount of dereferencing of objects that are not locally stored [15].In fact, Goldberg [15], who used matrix multiplication as one of his benchmarks, concluded that exploiting implicit parallelism proved successful only for programs without large shared data structures and that more work needed to be done on data partitioning.
Parallel functional languages have also fallen victim to granularity.That is, these systems will take longer than their sequential counterparts to execute some programs due to communication overhead and/or to processor speed.The granularity problem, as suggested by Goldberg's conclusion, is in part due to storage.That is, how large data structures are stored and accessed is what makes the granule of computation too small.There have been very few initiatives to design and develop an efficient distributed memory system (i.e.storage system) that is needed by parallel functional languages.The exporting, creation, and manipulating of processes has been thoroughly studied [42], but none of the necessary support structures (e.g.virtual memory and network topology) for the abstraction have been studied in-depth to date.
The pHluid system [9] is a distributed memory implementation of a parallel functional language.The compiler for this system produces fine grain multithreaded code which generates parallelism at two levels: horizontal parallelism and parallel communication.Parallel communication is achieved by overlapping the communication requests of the multiple threads executing at one node.One of the main objectives of the system is to minimize communication.The pHluid system separates objects that can be explicitly deallocated from those that need to be garbage collected.In this manner, the amount of communication generated by the garbage collection processes is reduced.The memory system for pHluid is not based on a DVM system and the only distributed data structure is the heap.Each processing node owns a part of the heap.All processing is halted when a node runs out of heap space to participate in a global garbage collection process.
The approaches taken to exploit parallelism have mostly focused on analyzing user code.An exception have been systems that run a parallel garbage collector.Both of these approaches, however, have failed to parallelize the engine that evaluates programs.The MT System, in contrast, is an initiative to apply parallelism to memory management beyond garbage collection.The main goals of MT are to speed-up memory accesses and memory management during program evaluation by parallelizing the engine that executes programs.

DVM Systems
General purpose DVM3 systems have been developed to provide a single large address space within a multiprocessor.The idea is to provide the illusion of a single address space comprised of the memories at each processor.In order to provide this illusion several design choices must Abstract view of the architecture of an MT node be made regarding structure, granularity, access, coherence semantics, scalability, and heterogeneity [39].
Structure refers to the layout of data in memory.Most DVM systems do not structure memory beyond viewing it as an array of pages like, for example, Ivy [31].In MT, heap memory and stack memory is structured as an array of pages.Each page is an array of words where each word can hold an S-expression (i.e.nil, a value, a symbol, or a cons-cell).Heap and stack memory as well as the other MT components (see below) will have their own address space.In this regard, MT is novel being the only DVM system with multiple address spaces.
Granularity refers to the size of the sharing unit.
In traditional DVM systems sharing occurs between processes that need to access the same data on different processors.In the current version of MT, sharing occurs between the different components of a functional system.The sharing unit in MT is a page.Pages have been used in other DVM systems such as Ivy [31] and Mirage [10].
Coherence semantics define how memory updates are propagated throughout the system.Functional programmers assume a coherence semantics called strict consistency and this is what is implemented in MT.In a system with strict consistency a read operation returns the last value written to the memory location requested.In a general purpose DVM system this is an ambiguous concept, because many processes may attempt to write to the same location at the same time.In the current version of MT, the semantics are clear given that there is only one evaluator and only the evaluator can mutate data.
Scalability defines how the system's performance is affected by an increase in the number of available processors.Unlike tightly-coupled processors [48], the distributed nature of the MT System potentially permits the number of processors to increase without memory becoming a bottleneck.Furthermore, there are no global broadcasts or propagation of clock signals that can cause a bottleneck.
The degree to which different types of processors are used is called heterogeneity.The current MT System is a homogeneous system.The all-software nature of our design, however, facilitates the integration of different types of processors.

THE MT SYSTEM
The MT architecture is based on an all-software DVM tailored to the needs of a pure functional language [34,36].Unlike previous attempts to apply parallelism to functional languages by parallelizing user code, the primary goal of the MT system is to exploit parallelism to make memory management faster.In MT, the basic computing entity is an MT node instead of a processor.
At a high level of abstraction, each MT node consists of five different memory spaces: the heap, the stack, the function space, the evaluator, and the garbage collector (the major components of a functional system).view of an MT node which can be implemented on a Beowulf machine [47] using MPI [19].Each component of the language is has its own backing store which is managed by an all-software DVM system.In essence, each MT node gives its evaluator an intelligent backing store that organizes data and code according to the demands of the evaluator and/or any other component.In contrast, pHluid [9] and GUM [32,33] only attempt to efficiently provide the illusion of a large shared memory space and do not attempt to exploit intelligence to separate program evaluation from memory management locally at each computing element.If a system with multiple MT nodes were desired, data structures can be shared by communicating via the DVM system without interrupting program evaluation while simultaneously providing efficient local access to distributed data structures.
The management of the distributed address space is done in the management networks of a MT node in parallel with the computation taking place at the evaluator.Unlike classical virtual memory on sequential systems, MT's separate memory spaces suggest that each component does not have to share a common address space with the other components.
The results we present in this article are of interest to designers of sequential functional languages as well as designers of parallel functional systems that parallelize user code.Designers of sequential functional systems will be interested in the characterization of the stack access patterns that we provide.In addition, our results suggest efficient ways of organizing memory which is still a major concern in sequential systems as evidenced by [43].Designers of traditional parallel functional systems will be interested in our results, because they suggest an efficient mechanism to share data between multiple evaluators.Figure 2 displays an abstract view of a system with multiple MT nodes.Each MT node is a computing element that can communicate with other MT nodes through the interconnection network.The interconnection network may or may not implement a fully connected graph (e.g. the interconnection network may be a hy-percube).
This approach is expected to deliver superior virtual memory performance for parallel functional languages by providing efficient access to a large virtual memory space within a MT node.In addition to efficient sharing of data within a MT node, MT also offers the possibility of having a truly parallel garbage collector, parallel replication for data sharing, multiple paging policies (for the different components), multiple views of memory (e.g.pages and segments), parallel communication, and parallel resolution of faults.

The MT Language
The MT language is a pure subset of LISP which includes the arithmetic operators, the relational operators, the Boolean operators, cons, car, cdr, and the predicates null?, eq?, atom?, list? and symbol?.In addition, the MT language has two random number generating primitives.We chose this small subset of LISP in order to gain insight into how virtual memory is used by a listbased system.Once an efficient DVM is designed and implemented for this small language, we can build on it to add more complex objects such as closures and continuations.The implementation of these more complex objects is still a major concern as evidenced in recent literature [5,23,44].
The MT set of primitives is contained in most Lisp implementations (e.g.[2,7,8,21,22]).This subset includes the primitives that are most commonly used by programmers [45].Computational, heap, and stack intensive programs have been written in MT such as matrix multiplication, graph searching, and a metacircular interpreter for MT (all of which are part of our benchmarking set).The semantics of the MT primitives are the same as those of Common Lisp or Scheme.The MT evaluator is implemented using applicative-order evaluation and uses the MT allocation algorithm for heap allocation [36].
The evaluator node runs an interpreter for the MT language.The current version of MT is implemented on a network of transputers [24].This

The MT Data Stack
The MT data stack (henceforth the MT stack) is used for parameter passing.That is, the arguments passed to a function are stored on the stack.Along with the arguments to a function, each activation record also contains the address of the previous activation record 4 .Each stack page can hold 512 objects (e.g.numbers, symbols and cons-cells).Each object is 72 bits wide making each page roughly 4KB which is a commonly used page size in modern computers.
The MT stack has its own address space and is not heap allocated despite the fact that stack memory is dynamically allocated.The decision not to heap-allocate the stack was based on the observation that stack memory can be recycled without having to call a garbage collector.The memory space used by an activation record can be immediately recycled after the function it was created for is applied.Thus, after popping an activation record off the stack the space it occupied can immediately be used for the next activation record.In Figure 1, the virtual channel between the stack network and the garbage collection network only exists because the stack may contain pointers into the heap that are needed to identify heap memory that can safely be recycled when the collector is enabled.
When a function is called an activation record is created to hold the arguments it is passed.As each argument is computed it is pushed on the stack.The construction of an activation record is complete when all arguments to a function are evaluated.This construction may be temporarily suspended in order to create other activation records to compute the values of the arguments being passed in.The partially computed activation record is left on the stack and any other activation records needed to compute arguments are pushed on top of it.When an argument is computed it is returned on the top of the stack which is in MT the location where that argument is to be stored in the suspended activation record.When the construction of an activation record is complete the function it was created for is applied.
We present the empirical data collected from 80 experiments using our benchmarks.Data on stack fault rates for both FIFO and LRU along with the relative difference between the two is presented.In addition, we present an argument establishing the optimality of LRU for the MT stack.We then proceed to describe the MT stack page replacement algorithm that simulates LRU and avoids the overhead traditionally associated with software implementations of LRU.
The data on stack fault rates refers to how pages are swapped between the evaluator and the stack network.That is, they serve as an indication of the expected traffic on the virtual channel between the evaluator node and the stack network which is represented as a line segment in Fig- ure 1.In contrast with other studies [9,32], the first goal of MT is to make local access of distributed data structures efficient by studying the paging performance of the different memory spaces.After we have achieved this goal, we can focus on the task of how to efficiently share these data structures by exploiting parallel accesses and replication.
The fault rates reported count all accesses to stack data regardless of how the local memory hierarchy stores data at the evaluator.The use of registers and cache memory, for example, does affect memory access time, but does not affect the number of times the MT stack is accessed.
Our software system does not assume the availability of registers, cache, and/or any other special hardware beyond the existence of random access memory at each processor that is part of an MT node.All stack data is held in a data structure defined in software.

VIRTUAL MEMORY PERFORMANCE OF THE MT STACK
We chose five programs to run as benchmarks.
The selected programs are representative of many pure Lisp programs; we note in passing that Gabriel's benchmarks [12] use assignment.The MT measurements presented in this section were taken on a system employing an exclusive pool of frames for the stack pages.That is, the stack does not compete with other components for memory frames (i.e. for memory space) with other MT components at the evaluator node.Thus, the re-sults pertain exclusively to stack behavior.

Benchmarks Used
A brief description of the benchmarks used is given below.The number of run-time stack accesses for each benchmark is presented in Table 1 which range from 3.8 to 22.4 million.The number of stack accesses performed by our benchmarks is comparable to the total number of memory accesses performed by benchmarks used by Bobrow et.al. [4].In their study, the total number of memory accesses ranged from 1.2 to 8.4 million.To the authors' knowledge past studies have not specified the number of accesses to any individual component of a functional system.
• ins: This is an implementation of insertionsort.The program was used to sort a randomly generated list of three hundred positive integers.The input list of random numbers is traversed once and the list of sorted numbers so far is repeatedly traversed until it contains all the elements of the input list.
• qs: This is a version of quicksort.Each sublist of unsorted numbers created is traversed twice.The first pass extracts the numbers less than or equal to the pivot while the second pass extracts the numbers greater than the pivot.These traversals, however, do not occur back-to-back.Instead, they are separated by the quicksorting of the smaller elements.This program ran on a list of one thousand randomly generated positive integers.
• MM: This is matrix multiplication.Each row of the matrix is represented by a list.The resulting matrix is built one row at the time.This means that computing AxB requires each column of B to be extracted multiple times (i.e.once for each row of A).After creating matrices A and B this program spends most of its time traversing list-based structures.The program was used to multiply two randomly generated 20x20 matrices.
• • eval: This is an interpreter for MT written in MT.It takes as inputs an expression, an environment, and a function table.One of the reasons eval was chosen as a benchmark is the fact that it is unclear from a syntactical inspection what the dominating access pattern is.These experiments were the most memory intensive of the whole set.

Empirical Data
Table 2 presents the page fault rates 5 observed for our benchmarks.Each benchmark was executed with the same input using different sizes of memory allocated to the stack at the evaluator node.Stack memory size is presented as a percentage of the total number of stack pages needed during the execution of the benchmark in the column labeled memory size.Unlike the MT heap, for which FIFO is a superior page replacement algorithm [34], the data strongly suggests that LRU is a better page replacement policy for stack pages.For most memory sizes, LRU achieved a significantly lower fault rate.For small memory sizes (i.e. at 20%), FIFO achieved the same fault rate as LRU for all benchmarks except quicksort.This tight performance between the two policies is expected for small memory sizes since the working set of the programs is larger than the number of memory frames allocated to the stack.
As expected, for both algorithms fault rates fall as the memory size is increased.No anomalies, such as Belady's anomaly [46], are observed for FIFO.The gains obtained from increasing the stack's memory size level off when memory can hold between 50% to 60% of the total number of stack pages used.Thus, the system should strive to allocate to the stack at the evaluator node a number of frames that can hold half of the total number of stack pages accessed.Allocating more stack frames would not justify the overhead incurred in management.Furthermore, these extra frames can probably be better employed by other MT components.In the 50% to 60% interval, we can also observe some of the largest relative differences between FIFO and LRU.This suggests that employing LRU is of critical importance for the MT stack.Since MT is an all-software based system, we can adopt different paging policies (e.g.FIFO for heap pages and LRU for stack pages).Such a design would not be considered 6 We define the relative difference as feasible for a hardware dependent system.
The performance gap between LRU and FIFO is much larger for the MT stack than for the MT heap [34,37,38].This is evidenced by the relative difference between the two page replacement algorithms.For matrix multiplication and for eval, the maximum difference is 50%.For insertion-sort, FIFO produced up to 60 % more faults than LRU while for graph searching LRU can be twice as good as FIFO.For quicksort the relative difference reaches 600 % in favor of LRU.Such differences suggest that FIFO is not a viable page replacement algorithm for the MT stack.

The MT STACK PAGING ALGORITHM
All the information needed to evaluate a function is stored within its activation record.Since there are no nested functions or global variables in MT, it follows that the evaluator only needs to access the top activation record on the stack for the information it requires to apply a function or to build the next activation record.Within an activation record there may be random access but there is no random access between activation records.In fact, the MT stack accesses memory pages in a mostly last-in first-out discipline if LAR7 < SP S8 .When and activation record straddles the boundary between two pages the top two pages may not be accessed in a strict LIFO manner.For this memory access pattern, nonetheless, LRU is an optimal page replacement policy 9 .

LRU is Optimal for the MT Stack
The assumption needed about the activation record size in comparison to the stack page size is reasonable for a real system since the number of parameters to a LISP function is usually between 2 and 3 [45] which makes activation records much smaller than our page size.The following theorem establishes our claim.
Theorem 1 Given that LAR < SP S, LRU is an optimal page replacement policy for the MT stack.
Proof: For the proof, it is important to remember that stack pages are held in a set of frames that are allocated for exclusive use by the stack.This means that stack pages can not be swapped out by non-stack pages (e.g.heap pages).Without loss of generality, assume that the stack grows towards higher addresses.
As the stack grows, stack memory is allocated linearly to hold each new activation record.This means that higher numbered pages are being accessed as the stack grows.Similarly, when the stack is shrinking, stack space is deallocated linearly as each activation record is popped.Now, let low and high be the index of the lowest and highest numbered stack pages, respectively, held in the evaluator's memory at any given time and let SP low and SP high be the pages themselves.Since allocation and deallocation of activation records is always linear, we have that all stack pages from SP low to SP high are contained in the stack frames at the evaluator before the first fault occurs.Under these conditions the evaluator can only fault if it tries to access SP low−1 or SP high+1 .Faulting on SP high+1 means the stack is growing and that the least recently used page is the one numbered SP low .This page is also the memory resident page that will not be accessed for the longest period of time since all pages from high down to low + 1 must be accessed at least once before accessing SP low again.This follows from the assumption that all activation records are smaller than a page.Thus, when faulting on SP high+1 , LRU selects as a victim the same page the optimal page replacement algorithm would.When the fault is serviced all pages SP low+1 to SP high+1 are memory left resident.These are all the stack pages from the low-

The MT Stack Page Replacement Algorithm
In an all-software based distributed virtual memory system (local) access time is proportional to the overhead associated with LRU derived from time stamping after every access.The proof that LRU is optimal for the MT stack suggests a page replacement algorithm that will not incur the heavy penalty in access time associated with time stamping.The access time for accesses that do not cause a fault is constant.For accesses that cause a fault the code needed to be executed at the evaluator can be done in a constant amount of time.The algorithm does not require stack pages to be time stamped and only requires four variables.
The design roles of the four variables are as follows: • slow: the index of the lowest numbered stack page resident at the evaluator.
• shigh: the index of the highest numbered stack page at the evaluator.
• f low: the index of the frame holding the lowest numbered page.
• f high: the index of the frame holding the highest numbered page.
Let SFRAMES be the number of frames allocated to the stack in the evaluator's memory and CLEI ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 pagenum be the index of the stack page that is to be accessed.The memory and faulting algorithm can be described as follows: 1. Load evaluator stack frames from 0..(SFRAMES-1) with the stack pages 0..(SFRAMES-1).Set slow and f low to 0 and shigh and f high to (SFRAMES-1).
2. Upon receiving a stack request determine if it is between slow and shigh.If so goto 3 else goto 4 to service the fault before accessing the stack.

Proof of Partial Correctness
In order to establish the correctness of the MT stack replacement algorithm (MTSRA), we must prove that MTSRA causes the same paging behavior as LRU.That is, we must establish that when a fault occurs under MTSRA the following holds: • LRU would fault on the same page.
• MTSRA will swap out of the evaluator's memory the same page LRU would swap out.
Proving that the following assertions are invariant will help us establish that MTSRA produces the same paging behavior as LRU: • Stack pages slow to shigh are held in the evaluator's memory.
• Stack pages slow to shigh also be held in the evaluator's memory under LRU.
The memory access stream of program P written in MT on input x will always be the same regardless of the paging algorithm employed.Therefore, we know that MTRSA is equivalent to LRU if it always faults on the same page LRU would fault on and if it always swaps out the same page LRU would swap out.

Proof:
The proof is achieved by induction on the number of times a stack fault occurs.For the base case, we must establish that the invariant properties hold when there are no faults.
Step 1 of MTSRA initializes stake frames in order with SP 0 ..SP SF RAM ES−1 , initializes f low to 0, and f high to (SFRAMES -1).Stack frames are initialized in the same manner for LRU.LRU and MTSRA would hold the same pages, SP 0 ..SP SF RAM ES−1 , in the evaluator's memory.SP i , 0 ≤ i ≤ (SFRAMES -1), is stored in the frame given by m = (flow + (i -slow)) MOD SFRAMES.
In addition, there is no paging activity so all the invariant properties hold for our base case.
For the inductive step, assume that the three invariant assertions hold after k stack faults.

MTSRA is Optimal
The above proof establishes that MTSRA causes the same paging behavior as LRU meaning that they both fault on the same pages and they both swap out the same page for any given fault.When MTRSA swaps out the least recently used page it is swapping out the memory resident page that will not be accessed for the longest period of time as established in Theorem 5.1.This means that MTSRA is optimal for MT stack pages.
MTSRA is superior to LRU, however, because it does not incur the overhead associated with implementing LRU in software.MTSRA does not time stamp a page after every access, page searching time is constant since it always knows where the page to be swapped out is stored, and no page table is required.These properties significantly reduce the stack's effective access time and fault service time.

FUTURE WORK AND FINAL REMARKS
The MT system is being developed as a research tool and test bed for the development of a distributed virtual memory system tailored to the demands of a pure parallel functional language.This is the first effort made to design the necessary storage support needed by a specific type of language.This approach is expected to provide the necessary tools for data sharing and accessing both at the intra-node and inter-node levels.
We have empirically established that LRU is superior to FIFO as a page replacement policy for MT stack pages.For small stack memory sizes FIFO will perform as well as LRU because the system's paging performance will be poor.If the evaluator can only cache a small number of pages then the stack's working-set will be too large causing excessive paging.As stack memory space is increased the performance of both FIFO and LRU improves, but LRU's performance significantly surpasses FIFO's performance.
We have also proven that LRU is optimal as a page replacement policy for MT stack pages.Based on our proof, we have developed the MT stack page replacement algorithm which is also an optimal page replacement policy for MT stack pages.We have established the partial correctness of our page replacement algorithm.This algorithm yields a memory access time that is constant for accesses that do not cause a fault.The code needed by the evaluator to resolve a fault is also executed in a constant amount of time.Furthermore, MTSRA is also superior to LRU, because stack pages do not have to be time stamped.The time and space complexity of the MT stack page replacement algorithm is superior to that of LRU.
Stack space in MT is further being studied to establish how efficiently stack frames are being used.Given that MT has several address spaces it is important to determine how efficiently frames are being used.We expect our findings to suggest ways to initially allocate frames and how to dynamically change frame allocations at the eval-CLEI ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 uator.
The MT system is currently being expanded to study the memory accessing patterns to function space.In this new version, functions will be first-class and the function space will be distributed.One of the interesting questions that we will need to answer is whether any changes necessary should be integrated into the current MT memory spaces and data structures or whether new memory spaces and data structures are justified.In particular, we are studying the effects of introducing complex objects such as closures and continuations.The flexibility inherent in software design allows us to consider designs that would be considered prohibitive in hardware and design different memory spaces for the different components of a functional language.

3 .
Access frame ((flow + pagenum -slow) MOD SFRAMES) by the appropriate displacement.Goto 2. 4. If pagenum > shigh (a) Victim is SP slow and is stored in frame f low.(b) Swap SP slow out to backing store (c) Swap requested page in from backing store into frame f low.(d) Increase all four variables by 1 (MOD SFRAMES).(e) Access the requested page and goto 2. 5.If pagenum < slow (a) Victim is SP shigh and is stored in frame f high.(b) Swap SP shigh out to backing store.(c) Swap requested page in from backing store into frame f high.(d) Decrease all four variables by 1. (e) If (f low or f high) becomes negative set it to (SFRAMES-1).(f) Access the requested page and goto 2.
ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 however, does not affect the memory access patterns of the data stack (see below for the definition of the data stack) which is the focus of this article.The next version of MT is being develop to study the memory demands made by distributing user-defined function space, by making functions first-class, and by adding closures and continuations.

Table 2 :
ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 Page Fault Rates for the MT Stack

Table 3 :
Relative difference between FIFO and LRU for MT stack Pages est indexed to the highest indexed which are resident at the evaluator.It follows that this condition is invariant and allows us to conclude that all subsequent faults will leave stack memory in a similar state.Faulting on SP low−1 means that the stack is shrinking and that the least recently used page is SP high This page is also the page that will not be accessed for the longest period of time.Thus, LRU is optimal for MT stack pages. Q.E.D CLEI ELECTRONIC JOURNAL, VOLUME 5, NUMBER 1, PAPER 2, JUNE 2002 be caused by attempting to access SP slow−1 or SP shigh+1 .Faulting on SP slow−1 means that MTSRA swaps out SP high and correctly updates the four variables needed by MTSRA to restore the invariant.By the inductive hypothesis we know that LRU will have the same pages in memory.Since the memory access stream is the same for LRU and MTRSA, we can conclude that LRU would also fault on SP slow−1 .Faulting on SP slow−1 means that the stack is shrinking.Given that activation records are accessed in a LIFO manner and that the stack page size is larger than any activation record, SP high is the page the has not been accessed in the longest period of time.This is the page LRU selects for eviction.MTSRA will swap out the page stored in f rame f high which by the inductive hypothesis is SP h igh.An analogous analysis establishes that faulting on SP shigh+1 will cause MTSRA to behave like LRU.Thus, we can conclude that MTSRA and LRU exhibit the same paging be- havior.Q.E.D.