Abstract:
We present an algorithmic method for the quantitative, performance-aware synthesis of concurrent programs. The input consists of a nondeterministic partial program and of a parametric performance model. The nondeterminism allows the programmer to omit which (if any) synchronization construct is used at a particular program location. The performance model, specified as a weighted automaton, can capture system architectures by assigning different costs to actions such as locking, context switching, and memory and cache accesses. The quantitative synthesis problem is to automatically resolve the nondeterminism of the partial program so that both correctness is guaranteed and performance is optimal. As is standard for shared memory concurrency, correctness is formalized "specification free", in particular as race freedom or deadlock freedom. For worst-case (average-case) performance, we show that the problem can be reduced to 2-player graph games (with probabilistic transitions) with quantitative objectives. While we show, using game-theoretic methods, that the synthesis problem is NEXP-complete, we present an algorithmic method and an implementation that works efficiently for concurrent programs and performance models of practical interest. We have implemented a prototype tool and used it to synthesize finite-state concurrent programs that exhibit different programming patterns, for several performance models representing different architectures.

Abstract:
Predicate abstraction is a key enabling technology for applying finite-state model checkers to programs written in mainstream languages. It has been used very successfully for debugging sequential system-level C code. Although model checking was originally designed for analyzing concurrent systems, there is little evidence of fruitful applications of predicate abstraction to shared-variable concurrent software. The goal of this paper is to close this gap. We have developed a symmetry-aware predicate abstraction strategy: it takes into account the replicated structure of C programs that consist of many threads executing the same procedure, and generates a Boolean program template whose multi-threaded execution soundly overapproximates the concurrent C program. State explosion during model checking parallel instantiations of this template can now be absorbed by exploiting symmetry. We have implemented our method in the SATABS predicate abstraction framework, and demonstrate its superior performance over alternative approaches on a large range of synchronization programs.

Abstract:
In shared-memory concurrent programming, shared resources can be protected using synchronization mechanisms such as monitors or channels. The connection between these mechanisms and the resources they protect is, however, only given implicitly; this makes it difficult both for programmers to apply the mechanisms correctly and for compilers to check that resources are properly protected. This paper presents a mechanism to automatically check that shared memory is accessed properly, using a methodology called shared ownership. In contrast to traditional ownership, shared ownership offers more flexibility by permitting multiple owners of a resource. On the basis of this methodology, we define an abstract model of resource access that provides operations to manage data dependencies, as well as sharing and transfer of access privileges. The model is rigorously defined using a formal semantics, and shown to be free from data races. This property can be used to detect unsafe memory accesses when simulating the model together with the execution of a program. The expressiveness and efficiency of the approach is demonstrated on a variety of programs using common synchronization mechanisms.

Abstract:
Synchronization operations make a huge expense for concurrent Java programs. This paper proposes an effective and precise static analysis algorithm for the redundant synchronization removal. The algorithm consists of two phases-basic analysis and inter-thread temporal analysis. Both phases take the effect of control flow relation and thread control relation into count. This paper also constructs a Java compiler-JTool and implements the algorithm on it. To deterministic single-threaded programs, the removal ratio reaches 100% and to multi-threaded programs, the removal ratio is higher than the existing analysis tools.

Abstract:
This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS) in directed graphs that immediately leads to a parallel algorithm with linear speed-up for a range of processors for non-sparse graphs. The note extends the approach to ordered Breadth-First Search (BFS). With $p$ processors, both DFS and BFS algorithms run in $O(m/p+n)$ time steps on a shared-memory parallel machine allowing concurrent reading of locations, e.g., a CREW PRAM, and have linear speed-up for $p\leq m/n$. Both algorithms need $n$ synchronization steps.

Abstract:
We present a \emph{pairwise normal form} for finite-state shared memory concurrent programs: all variables are shared between exactly two processes, and the guards on transitions are conjunctions of conditions over this pairwise shared state. This representation has been used to efficiently (in polynomial time) synthesize and model-check correctness properties of concurrent programs. Our main result is that any finite state concurrent program can be transformed into pairwise normal form. Specifically, if $Q$ is an arbitrary finite-state shared memory concurrent program, then there exists a finite-state shared memory concurrent program $P$ expressed in pairwise normal form such that $P$ is strongly bisimilar to $Q$. Our result is constructive: we give an algorithm for producing $P$, given $Q$.

Abstract:
A critical component in the implementation of a concurrent tabling system is the design of the table space. One of the most successful proposals for representing tables is based on a two-level trie data structure, where one trie level stores the tabled subgoal calls and the other stores the computed answers. In this work, we present a simple and efficient lock-free design where both levels of the tries can be shared among threads in a concurrent environment. To implement lock-freedom we took advantage of the CAS atomic instruction that nowadays can be widely found on many common architectures. CAS reduces the granularity of the synchronization when threads access concurrent areas, but still suffers from low-level problems such as false sharing or cache memory side-effects. In order to be as effective as possible in the concurrent search and insert operations over the table space data structures, we based our design on a hash trie data structure in such a way that it minimizes potential low-level synchronization problems by dispersing as much as possible the concurrent areas. Experimental results in the Yap Prolog system show that our new lock-free hash trie design can effectively reduce the execution time and scale better than previous designs.

Abstract:
Asynchronous programming is a ubiquitous systems programming idiom to manage concurrent interactions with the environment. In this style, instead of waiting for time-consuming operations to complete, the programmer makes a non-blocking call to the operation and posts a callback task to a task buffer that is executed later when the time-consuming operation completes. A co-operative scheduler mediates the interaction by picking and executing callback tasks from the task buffer to completion (and these callbacks can post further callbacks to be executed later). Writing correct asynchronous programs is hard because the use of callbacks, while efficient, obscures program control flow. We provide a formal model underlying asynchronous programs and study verification problems for this model. We show that the safety verification problem for finite-data asynchronous programs is expspace-complete. We show that liveness verification for finite-data asynchronous programs is decidable and polynomial-time equivalent to Petri Net reachability. Decidability is not obvious, since even if the data is finite-state, asynchronous programs constitute infinite-state transition systems: both the program stack and the task buffer of pending asynchronous calls can be potentially unbounded. Our main technical construction is a polynomial-time semantics-preserving reduction from asynchronous programs to Petri Nets and conversely. The reduction allows the use of algorithmic techniques on Petri Nets to the verification of asynchronous programs. We also study several extensions to the basic models of asynchronous programs that are inspired by additional capabilities provided by implementations of asynchronous libraries, and classify the decidability and undecidability of verification questions on these extensions.

Abstract:
We study the problem of automatically computing the time complexity of concurrent object-oriented programs. To determine this complexity we use intermediate abstract descriptions that record relevant information for the time analysis (cost of statements, creations of objects, and concurrent operations), called behavioural types. Then, we define a translation function that takes behavioural types and makes the parallelism explicit into so-called cost equations, which are fed to an automatic off-the-shelf solver for obtaining the time complexity.

Abstract:
The semantics of assignment and mutual exclusion in concurrent and multi-core/multi-processor systems is presented with attention to low level architectural features in an attempt to make the presentation realistic. Recursive functions on event sequences are used to define state dependent functions and variables in ordinary (non-formal-method) algebra.