Ace: An Object-Oriented Language Which Supports Teamwork Lachlan Patrick
Basser Department of Computer Science
University of Sydney
N.S.W. 2006
Australia
Email: loki@cs.usyd.edu.au Topics: Programming Language, Object-Oriented Programming, Collaborative Teamwork Abstract Teamwork is an important stategy used in effective object-oriented programming. Objects provide a way for a team of programmers to break a large problem into pieces and solve each of these separately, however the challenge then becomes to effectively communicate and document the design within that team. Ace is a new, high level, object-oriented language currently being developed to support a design by contract methodology. The language features enforced indentation, concise syntax, strongly typed data structures such as arrays, hash tables and tuples, and a way to define variables by initialisation. The compiler produces documentation from source code and runs testing code as part of the compilation process. Ace should have advantages over other languages currently used in industry and teaching. This paper describes the key design principles of the language, and the reasons for those choices. 1 Introduction Learning software engineering skills is important to a programmer's maturation, and effective teamwork, good communication, and concise, well tested, well documented programs are the fruits of such learning. In recent years there have been moves towards teaching object-oriented languages as the first programming language of tertiary educated programmers [4,9,12]. Institutions in Australia such as Monash University and the University of Sydney already teach languages such as Java [1,3] as the first language, using teamwork as a motivation for object-oriented design. Such moves bode well for the next generation of software engineers. However, within the languages and development tools used in these courses, there is often little explicit support for teamwork itself, such as sharing code, testing code integration or planning and designing in a team. Programmers instead might use electronic mail to send files to each other, and have difficulties integrating their code into a complete product because they have not incrementally shared and tested their code. Surely, students learning to be good software engineers should be exposed to modern programming tools which support these processes? A common technique when teaching software engineering is to expose students to a development environment which supports file transfers and group work, for example the version control system CVS [2], but this does not address the language-specific issues of how programmers learn to divide problems into smaller pieces and integrate the solutions. So while a file-sharing mechanism might form part of a good teaching system, a well designed language would complement and reinforce the programming styles we wish future software engineerings to employ. Ace is a language designed to directly support literate object-oriented programming in teams, and to teach software engineering skills to students of a tertiary education institution. An experimental compiler has been created, but no formal evaluation of its use in teaching has yet been attempted. This paper gives an overview of Ace, and describes design principles aimed at supporting teamwork as well as good high-level programming. 2 Design Principles Ace was designed with a few guiding principles in mind. Most of these principles have an educational basis, aiming at creating a language which can introduce students to software engineering principles such as testing, teamwork, and designing robust and correct programs. These principles of design are first described in this section and then the application of these principles to the language design is discussed in the following sections. Principle 1: Simple and High-Level The first design principle is that a language should be simple and high-level, which can be conflisting goals. Simplicity for the language designer can often be anathema to the programmer. For example, LISP uses a single data type, the list, to the exclusion of all others, which is certainly simple but can lead to obscure and unreadable code. At the other extreme, C++ over burdens programmers with punctuation or keywords without adding significantly to the problem-solving ability of the language. Often there are historical rather than functional reasons for these designs, and their removal or redesign could improve the simplicity of the language without reducing its usefulness. A language should be simple to facilitate learning and comprehension of code, and thus aid reuse of code. Concise, high-level code is also faster to produce so programmers can solve real-world problems in less time. A high-level language should also have a greater variety of data structures than simply integers, strings and arrays. Other powerful data structuring constructs, such as classes, tuples, hash tables and the like are becoming standard inclusions in programming languages, and greatly increase the ease with which programmers can describe solutions to problems, which also helps learners develop good problem solving skills. Principle 2: Correctness support Correctness is very important when solving problems in industry, and when learning to write programs, and static type checking is the cornerstone of ensuring a program's correctness. Some languages, such as Python [13,14], dispense altogether with the idea of static type checking, in favour of dynamic type checking performed as the program is executing. This increases the speed with which programs can be written at the cost of complicating debugging efforts, and in the worst case, making verifying correctness impossible. While dynamic type checking has its place, it is a high price to pay when the correctness of a program is of prime importance. For learners it can also be confusing because variables might change type dynamically or unintentionally, leading to strange errors remote from the source of the problem. Some languages go further than static type checking, and mandate pre and post conditions as part of the correctness support of the language. A general purpose mechanism of assertion checking can be used to implement pre and post conditions as well as loop invariants while keeping the language simple. Principle 3: Teamwork support Object-orientation is the primary way many languages support teamwork, but there are other ways to help programmers work together. Simple techniques such as facilitating a good programming style can help. Some languages, such as Blue [8] and Java, either enforce the inclusion of special documentation comments in programs, or provide tools which allow such comments to be readily converted into documentation. Python actually enforces indentation, which has payoffs in terms of readability, conciseness, and consistency of the coding style used by different members of a programming team. Ace goes even further than this by enabling the compiler to produce from a program's source an interface view of the code, as a programmer using the code might need to see, including the documentation comments. Ace also allows special testing code to be executed during each compilation as a way of verifying that the design has not been violated during modifications. These features all follow from the principle that object-orientation alone does not solve all teamwork problems, but rather there are a number of language ideas which, when taken together, greatly increase the ease with which programmers can collaborate. 3 Language Details Ace is an object-oriented language which supports strong static typing, single inheritance, garbage collection, dynamic dispath and separate compilation. Its novel feature set includes enforced indentation, high-level generic data structures in the language, and a compiler which produces documentation from the source code and runs testing code as part of the compilation process. This section describes the language and highlights the features which make it useful for teaching object-orientation and teamwork concepts. Not all details of the language can be discussed in the space available, but the important characteristics will give a good overall picture of the design. 3.1 General Language Structure Classes and Modules Program code in Ace is organised into classes or modules. A class is a named user-defined data structure, which supports the concepts of information hiding, inheritance and code reuse. A module is similar to a class, except only one instance of a module is ever created, at loading time. This provides a way of using functions and data which would be called global or static in another language. Code is organised into blocks defined by indentation. Classes and modules are named in the outermost block, while data fields and functions are indented inside those blocks, and executable code indented further. The following example of an Ace class demonstrates how to define a class which contains two data fields, an integer and a string, and two functions, an initialiser and a routine to print one of the data fields. class Example # an example class count: int name: string ## create a new instance init(s:string) count = size(s) name = s ## print the name display(): int if count > 0 print(name) else print("No name.") return count Fields are named and given a type, separated by a colon. The colon is used consistently as a means to define the type of what precedes it. Functions are declared using parentheses, which may include formal parameters, and may optionally be followed by a colon and a return type. The format is line based, so there is no need to separate statements with a semicolon or other punctuation. Indentation is used for lexical structure. When used consistently, indentation has been shown to improve comprehension of program code [7,10]. The enforced use of indentation to provide block structure is a simple and convenient technique which prevents some problems common to even experienced programmers, such as the "dangling else" problem, since indentation and meaning are synchronised. Some languages use keywords like loop ... end loop to signify the start and end of a block, but these keywords can be misunderstood by learners. For instance, end loop could be taken to mean stop looping, whereas the language designers might intend it to mean to go back to the loop test. Keywords, if poorly chosen, can mislead the beginner and slow the expert. Punctuation, although adding redundant cues about block structure which might help the beginner, can also lead learners astray. The use of the semicolon in Pascal [15] as a statement separator, rather than a terminator, is one notorious example [6]. It can be seen from the above example that Ace code is reasonably low on punctuation compared with an industrial language such as C++, while retainly some of the same syntactic conventions, such as function calls, the equals sign for assignment and so on. Another point to note is that, like Java, Ace has two kinds of comments, normal comments which begin with a single hash symbol, and documentation comments which begin with two hash symbols. Both forms of comments continue only to the end of the line. The compiler can treat the two forms differently, as discussed later. Variables Ace borrows from Newsqueak [11] and Limbo [5] the idea of declaring and initialising variables by usage. In Ace, a variable can be defined and/or initialised in a variety of ways: name : string name : string = "Rachel" name := "Rachel" The colon operator is used to define the type of the variable, for static type checking. The equals sign is the assignment operator. Note that := is not the same as the Pascal assignment operator; rather, it is two operators, definition followed by assignment. Thus, name := "Rachel" is a way of defining name implicitly to be a string, and initialising it to the string "Rachel". Given the following definition of a function readstring readstring(): string # code omitted the following lines are also equivalent and define name to be a string: name : string = readstring() name := readstring() These declarations and assignments can occur anywhere within a function, so variables can be declared close to where they are used, similar to Java and C++ practice. Because variables can be declared by usage, Ace code is often more concise than the equivalent code in those other languages. Type names can often be omitted, except when declaring functions or class fields. This makes Ace code almost as concise as Python, while providing static type checking and better error handling. This fits neatly with design principles 1 and 2, that code should be high level and simple but also provide static type checking. The compiler helps to reduce errors by enforcing another rule: local variables, function parameters, class fields and function names must be unique within their scope. For instance, the following will not compile: class Person name: string age: int init(Name:string, Age:int) name = Name age = Age print_age() name := "Age is " + string(age) print(name) The compiler will complain that the local variable name in the function print_age masks the class field name (remember the colon defines a new variable). Similarly, if a parameter or function was called name, that would also be an error. Of course, a local variable or parameter within one function can have the same name as within another function; it is when scopes are nested that masking problems occur. Ace includes a number of high-level data types, besides numeric types and strings, including tuples, arrays and hash tables. These are all strongly typed generic classes of object. The following declarations define variables of those types: tuple1 : (int,string) array1 : [real] table1 : [string -> real] The variable tuple1 is a structure which contains one integer and one string (tuples store disparate kinds of objects). The variable array1 is a dynamic array of real numbers, while table1 is a hash table where the keys are strings, and the values are real numbers. The arrow notation is used to signify the hash look-up relationship. Parentheses are used when defining tuples, while brackets are used in array and hash table definitions. These notations were chosen to represent the conceptual differences between a fixed size data structure such as a tuple, and a data structure which dynamically grows, as dynamic arrays and hash tables do. Note, there is no need for arrays to use different brackets from hash tables, since the arrow notation reveals this difference. Like other variables, these collection types can be defined by usage, as in: tuple2 := (5, "Hello") array2 := [3.14159, 2.718281828, -1.0] table2 := ["Pi" -> 3.14159, "e" -> 2.718281828] The above variables have the same types as the corresponding variables previously defined. Defining by usage makes these data types easy to use and helps the language remain concise and high level (principle 1). Ace uses an infix method of defining these types, which allows complex types to be defined quite easily: complex : ([int],boolean,[string -> real]) In the above, complex is defined as a tuple containing three kinds of object: an array of integers, a single boolean value, and a hash table where the keys are strings and the values are real numbers. Brackets are also used to access elements within these objects, for example: tuple2[1] == 5 array2[2] == 2.718281828 table2["Pi"] == 3.14159 Ace uses braces to allow for generic collection classes. The above notations for tuples, arrays and hash tables are convenient special cases of generic classes, and are equivalent to the following: tuple3 : tuple{int,string} array3 : array{real} table3 : table{string,real} A generic stack class might be defined thus: class stack{TYPE} data: [TYPE] = [] push(thing: TYPE) data.append(thing) pop(): TYPE last := size(data) return data.remove(last) This generic class implements a stack using an array of the given type of object. Arrays have operations such as appending a new element to the end, or removing a particular element from the array. The stack class defines two operations, push and pop, which work by manipulating the data array. All of these operations are parameterised by the given type. This generic class might be used thus: stack1 : stack{int} stack1.push(7) stack1.push(-4) i := stack1.pop() stack2 : stack{Person} stack2.push(Person("Rachel",27)) At the end of this sequence of instructions, stack1 holds one integer, 7, and i holds the integer -4, while stack2 holds one object of type Person, which was created by the constructor for that class. The reason tuples, arrays and hash tables have their own notation is because they are so commonly used, and they form the basis for building other generic classes. They make learning and solving problems with Ace much simpler, in accordance with the first design principle. Functions In Ace, references to functions can be stored into variables and later called. Functions are always bound to the object to which they belong, so this does not violate the object orientation principle. For example, suppose we define the variable compare as: compare : function(s:string, t:string): int This defines a variable which can hold a reference to a function. That function accepts two string paramters and returns an integer. Suppose we define a class which contains such a function: class StringSorter compare_strings(s:string, t:string): int # code omitted If we had an instance of this class, we can refer to that function as if it were a field: sorter := StringSorter() compare = sorter.compare_strings The function can be called through the variable by using normal function call syntax: result := compare("First", "Second") If the compare_strings function used any instance data from its parent class, this would still work correctly, because the variable stores a reference not only to the function but to the object to which it belongs. There are many important kinds of problems for which function references are an elegant solution, including sorting, pattern matching and evaluating expressions, so inclusion of these strongly-typed function references is a useful addition to the language, making it high level but retaining its support for correctness, as principles 1 and 2 state. 3.2 Object Orientation The Object Model Ace uses object orientation to allow large projects to be constructed, to facilitate code reuse and allow teamwork. Code can be present in either classes or modules. Classes can have private and public data or functions, to support encapsulation. Instances of a class are passed by reference into functions, so they can be modified. Modules allow code which is not tied to a single instance of an object to be used. An example module might be: module string ## string operations tolower(s:string): string # code omitted toupper(s:string): string # code omitted The functions tolower and toupper can be used thus: import "string" name := string.tolower("Andrew") print(name) # prints "andrew" Modules can also have public and private data and functions, but only one copy of the module is ever loaded, so they are essentially global (although still only available within a named scope). The Java language allows global data and functions to be defined within classes using the static keyword, but this approach was dismissed in Ace because it presents to learners an unnecessarily complicated conceptual model of a class. By separating global and instance data a simpler model can be presented which has the same power, which design principle 1 favours. This can also lead to cleaner program designs, which facilitates team work (principle 3). Information Hiding and Inheritance Ace supports the notions of information hiding and inheritance. Data and functions are normally publically available, but can be hidden within a class or module by using the private keyword, for example: class RoboticProbe ## a Robotic Probe class private frequency: real name: string # public name ## start transmission set_frequency(f:real) frequency = f The frequency variable will only be accessible within this class, or its child classes. Ace allows a form of single inheritance, with a simple syntax which uses a colon to specify the parent-child relationship: class MarsLander: RoboticProbe water: int locate(): real return frequency Here, MarsLander inherits from the RoboticProbe class, adding new implementation fields and functions. Because the locate function is not private, it can be accessed by any class using MarsLander, and because it is a RoboticProbe, it can access the private frequency variable. Multiple inheritance is not defined within Ace because of the scope complications this introduces. Java has a nice solution to this problem, by allowing a class to inherit the definitions of functions from many places, but an implementation from only one class. Classes which only define functions but no implementation have been termed interface classes (not to be confused with the interface view of a class mentioned above). That kind of inheritance has not been attempted in Ace because of principle 1, that the language should be simple and high-level. In theory, a compiler should be able to recognise that a class implements the same set of functions as an interface class without the programmer having to inform the compiler of that fact. In Ace, it should be possible to simply add to a class all the functions required for a particular task, without having to use any form of multiple inheritance to check the usage of those functions. 3.3 Teamwork Dynamic Linking All linking to objects in Ace is achieved by linking to the public interface of the class or module, not to the implementation. Such dynamic linking avoids problems which occur often in languages such as C++, where reordering a private variable within a class can force a complete recompilation of all code which uses that class, despite the fact that the variable was private. Linking to the interface allows truly separate compilation, which speeds teamwork by allowing programmers to work concurrently without worrying about dependencies between code being violated. This is in accordance with principle 3. Enforced Indentation Enforcing indentation may seem harsh to programmers who like to space their code in their own personal style, but there are many pay-offs, particularly in teamwork situations. Indentation allows a reduction in punctuation and a simpler, cleaner style of coding. It is easier to see what the code does without having the `noise' of braces around blocks. Indenting is fast and doesn't require balancing of braces if using several nested loops or if-statements. From a learner's point of view, it helps beginners to learn good indentation habits. From a team's point of view, it means code will be in a fixed format across all team members' work, which supports such concepts as "ego-less programming" where all team members own and are responsible for all code. Indentation also allows the compiler to easily generate the interface view of a class or module, which are very useful in team projects. Interface Views It is useful to be able to share an external or interface view of a class with a colleague when working in a team to produce a program. The interface view of the RoboticProbe class given earlier would be: class RoboticProbe ## a Robotic Probe interface name: string ## start transmission set_frequency(f:real) The compiler can automatically generate this view of the class. Note that it only shows public fields and functions, and documentation comments, since this is all that should be visible from outside the class. This produces a useable piece of code which can be easily fleshed out to a final working product. For instance, a team might design a project first at this interface level, then distribute these small pieces of code to the team members, who add actual lines of code. It is easy to verify that the design has not been violated by comparing the interface views to the original design as programming proceeds. The interface view is also useful in specifying a design, and then producing several different implementations of the same class to compare performance. Testing Code There is a way to write a class in Ace which tests another class at compile-time. class MarsTester tests MarsLander test(): boolean m := MarsLander() f := 83.6 m.set_frequency(f) if m.locate() != f return false return true When this class is compiled the test routine is then executed and if the result is false (or any other error occurs) the compiler informs the programmer how the test failed. This is useful for checking the behaviour of a class is consistent through many revisions of the source code, which is particularly useful in teamwork (principle 3). 4 Implementation An experimental Ace compiler has been developed which implements the functionality discussed in this paper. Currently, the compiler outputs Java source code, which is then compiler by a Java compiler into executable byte code. This approach has a number of advantages. The code is verified not only by the Ace compiler, but by the Java compiler as well. Programs are kept in the very high level Ace language, so changes in the specification of Java won't require changes to program source code, only to the compiler. Ace has a few features Java does not, such as generic classes. The compiler can produce correct casting to and from the Java collection classes without requiring the creation of a "wrapper" class, which increases the speed of the final program and avoids the introduction of a lot of extra classes into a project. There are a few disadvantages in producing Java source code as an intermediate step. Compilation is slower than it would be to directly produce byte code (although this is offset by run-time speed, since Java compilers produce highly optimised byte code). Also, the Ace compiler must use a command-line Java compiler to operate, which rules out many of the integrated Java environments which exist. Future work would involve directly producing Java byte code from Ace source code, and adding the ability to link existing Java code into Ace projects without having to rewrite everything as Ace. Another project is to add a learning environment to Ace to help beginners start programming. 5 Conclusion Many existing languages do not support the same range of features as Ace. Even Java does not provide for generic classes, and Python, which shares many similarities with Ace, is more error prone since it lacks static type checking. The concepts used in the design of Ace are not new; rather it is the combination of concepts which is unique. It is hoped this language could be used as a good introduction to software engineering students working in teams in a tertiary education institution. With a basic implementation completed, more work and evaluation of the system is needed to continue development of this new language. References 1. K. Arnold, J. Gosling, The Java&tm; Programming Language, Addison-Wesley, 1996. 2. P. Cederqvist et al, Version Management with CVS, available from http://www.cyclic.com 3. D. Clark, C. MacNish, G.F. Royle, Java as a Teaching Language - opportunities, pitfalls and solutions, 3rd Australasian Conference on Computer Science Education, ACM, 1998. 4. R. Decker, St. Hirshfield, Top-Down Teaching: Object-Oriented Programming in CS 1, SIGCSE, pp. 270-273, ACM, 1993. 5. S. Dorward, R. Pike, D. Presotto, D.M. Ritchie, H.W. Trickey, P. Winterbottom, The Inferno Operating System, Bell Labs Technical Journal, 2, 1, Winter, 1997. 6. B.W. Kernighan Why Pascal is Not My Favorite Programming Language, Bell Labs Comp. Sci. Tech. Rep. No. 100, July 1981. 7. T.E. Kesler, R.B. Uram, F Magareh-Abed, A. Fritzsche, C. Amport and H.E. Dunsmore, The Effect of Indentation on Program Comprehension." pp 415-428, International Journal of Man-Machine Studies 21, 1984. 8. M. K÷lling, J. Rosenberg, Blue - A Language for Teaching Object-Oriented Programming, 27th SIGCSE Technical Symposium, pp. 190-194, ACM, 1996. 9. M. K÷lling, J. Rosenberg, An Object-Oriented Program Development Environment for the First Programming Course, 27th SIGCSE Technical Symposium, pp. 83-87, ACM, 1996. 10. R.J. Miara, J.A. Musselman, J.A. Navarro and B. Schneiderman, Program Indentation and Comprehensibility, pp. 861-867, Communications of the ACM 26, 1983. 11. R. Pike, Newsqueak: A Language for Communicating with Mice, Bell Labs Comp. Sci. Tech. Rep. No. 143, March 1989 12. R.J. Reid, The Object-Oriented Paradigm in CS 1, SIGCSE, pp. 265-269, ACM, 1993. 13. A.R. Watters, G. van Rossum, J. Ahlstrom, Internet Programming with Python, MIS Press/Henry Holt, 1996. 14. A.R. Watters The What, Why, Who, and Where of Python, Tutorial Article No. 005, UnixWorld Online, 1995. 15. N. Wirth, The Programming Language Pascal, Acta Informatica, 1, pp. 35-63, June 1971.