Professional Documents
Culture Documents
C,
which indicates that we might get an E that is not consistent
with P. If the program order (P edges) of all the critical
cycles is enforced by delays, we would get an E of C that
is sequential consistent. We refer to such a delay relation as
D. uDv indicates that u must complete before v is issued. The
following lemma formally describes the behavior of memory
access pairs in a delay relation:
Delay Lemma [1]. For any execution, E should be
consistent with D.
Further, if the delay relation D contains all P edges of
all the critical cycles in P
w
a
t
e
r
-
n
s
b
a
r
n
e
s
f
m
m
o
c
e
a
n
w
a
t
e
r
-
s
p
f
f
t
c
h
o
le
s
k
y
lu
r
a
d
ix
N
o
r
m
a
liz
e
d
D
e
t
e
c
t
io
n
T
im
e
No_PVSC Detection Lock-set Hybrid
Figure 11: Cost of PVSC detection over race detection.
Table II: False positives of PVSC detection scheme in 10 SPLASH-2
benchmarks. LS is short for lock-set, Hbr for hybrid, and HB for happens-
before. We determine each false positive by manually examining the source
code.
Prog. LS Hbr HB Prog. LS Hbr HB
MySQL 408 272 160 Apache 184 106 94
water-ns 81 50 0 ocean 573 124 0
water-sp 42 37 0 fft 73 73 0
barnes 116 64 25 fmm 444 178 6
raytrace 36 23 1 cholesky 140 56 8
lu 22 12 0 radix 22 15 2
detector other than Helgrind is used, the additional overhead
would still be low.
One thing needs to mention is that the PVSC detection
overhead using lock-set is slightly higher than that of using
the hybrid race detection algorithm. The reason is that
the former has a higher race detection rate but with a
lower overhead. Thus, the overhead of PVSC detection
scheme with lock-set contributes more percentage-wise to
the overall detection time than that of using the hybrid
scheme. As our approximate implementation of happens-
before algorithm could incur the execution time abnormally,
we did not provide its overhead of PVSC detection. A full
implementation is left as future work, and its overhead of
PVSC detection would not be that different as the additional
cost is still quite low.
D. False Positives
Table II shows the number of false positives introduced
by different race detection techniques on the SPLASH-
2 benchmarks and the two real applications. As we can
see, the happens-before scheme inserts signicantly fewer
unnecessary fences than that in the lock-set scheme for
SPLASH-2 programs. There are two reasons. First, lock-set
algorithm generally suffers from higher false positives; as a
result the corresponding PVSC detector will nd more false
PVSC bugs. Second, the implementation of the SPLASH-2
benchmarks heavily use barrier synchronizations that cannot
be handled by the lock-set algorithm, worsening the situa-
tion. However, for MySQL and Apache, the disparity on
false positives is rather small, since they mostly use lock syn-
chronizations that can be handled well by both techniques.
Another cause of false positives for both techniques comes
32
hash_delete(HASH *h,byte *r) {
blength=h->blength;
data=&h->array.buf;
pos=data+hash_mask(r);
gpos=0;
while (pos->data != r) {
gpos=pos;
pos=data+pos->next;
}
if(--(h->records)<
h->blength>>1)
h->blength>>=1;
lastpos=data+h->records;
empty=pos;
empty_idx=(uint) (empty-data);
if (gpos) { tmp = pos->next;
gpos->next=tmp;}
else {
empty_idx=pos->next;
empty=data+empty_idx;
tmp = empty->data;
pos->data=tmp;
pos->next=tmp;
}
tmp = empty->next;
if (array->elements) {
--h->array->elements;
return (h->array.buf +
h->array.elements *
h->array.size)
}
return 0;
}//end hash_delete
Thread 1
pthread_mutex_lock(L1);
hash_delete(&table1,r1);
pthread_mutex_unlock(L1);
Thread 2
pthread_mutex_lock(L2);
hash_delete(&table2, r2);
pthread_mutex_unlock(L2);
Program order edge
value: Conflict access
Delay and fence
(a) (b)
Figure 12: (a) hash delete code from MySQL 5.0.2/hash.c and (b) the threads that execute hash delete. With a conservative compiler analysis, at least 20
fences would be inserted. Our scheme does not insert any fence since no race cycle is found.
from the impreciseness of our delay computation algorithm,
which is discussed in section III-D. For example, among
the 30 fences inserted by the happens-before algorithm in
barnes, 25 of them are redundant for this reason. Although
the number of false positives seems rather high as listed,
the performance lost is tolerable even for the lock-set. This
is because the dynamic count on fences is actually rather
low compared to the total memory access count, and thus,
the impact of dynamic fences is small percentage-wise
compared to the overall execution time. This result indicates
that it is still fairly affordable even to insert all the fences
generated by our scheme without further optimization.
V. RELATED WORK
Language Memory Model Emerging language-level
memory models, such as Java Memory Model [39] and
C++ Concurrency Model [26], suggest programmers to use
volatiles or atomics instead of explicit fences to
impose ordering. These models, however, are still under
development [40] and are not supported by most compilers
yet. Further, even if they become widely adopted, our
tool could still help programmers to identify data variables
that should be marked as volatile or atomic, e.g. by
marking the variables accessed in race cycles as volatile.
Data Race Detection Previous work in data race detection
can be divided into dynamic and static approaches. Dynamic
detection includes lock-set [7], [13], [14], happens-before
[9], [10], [12] and the hybrid schemes using both [8],[15],
[6], [20]. Some work took special consideration on weak
memory models [19]. Static data race detection techniques
generally require type-safe systems [16], [17]. Tools have
also been developed to classify data races [11], [21]. How-
ever, all of the data race detectors, as discussed in section I,
do not help directly in the PVSC detection and elimination.
Verication Verication tools [25], [24] aim at inserting
fence instructions accurately. These tools take the concurrent
program and a relaxed memory consistency model, e.g.
TSO [24], as inputs, then enumerate all possible execution
patterns and simulate them according to the memory consis-
tency model. Fences can be inserted according to the execu-
tions that lead to non-SC results. Verication tools work well
for relatively small applications that involve a small number
of memory accesses. However, even with some proposed
optimization techniques [24], they still cannot handle large
applications with many shared-memory accesses.
Compiler Analysis Compiler techniques [2], [3] statically
analyze a concurrent program and identify all possible
concurrent accesses to shared memory locations. Then,
primarily based on Shasha/Snirs algorithm [1], a delay set
is computed. Finally, fences are inserted (with some possible
optimizations [4], [3]) according to the delay set. Compiler
approaches could be quite effective for strong-typed pro-
grams with simple synchronization support [2]. However,
they could be quite conservative for general concurrent
C/C++ programs that are hard to analyze statically be-
cause of pointer aliasing and more complex synchronization
schemes. As shown in Figure 12, the compiler could at least
identify 20 possible delays and fences if it could not gure
out that hash delete is actually correctly synchronized with
locks by the callers from different threads. The unnecessary
fences could badly hurt the performance.
Other concurrency bug detection schemes Atomicity
violation (serializability violation) detection has been studied
in recent years [29], [30]. MUVI proposed in [32] identies
correlate variables and can detect concurrency bugs associ-
ated with different variables. However, as seen in section I,
the nature of PVSC bugs is different from that of atomicity
violation bugs, thus these tools cannot help. Although PVSC
bugs are not characterized in [31], we believe they are
important due to the subtleness and difculties in detecting
such bugs.
VI. CONCLUSION
In this paper, we proposed an effective and efcient
scheme to detect and eliminate bugs called potential vio-
lations of sequential consistency (PVSC) using existing data
race detection techniques. A PVSC bug refers to a series
of data races that might lead to a non-sequential-consistent
execution, and can be eliminated by inserting fences.
Compared with static compiler analysis schemes, our ap-
proach has a less impact (less than 6.3%) on the performance
of the original concurrent programs because unnecessary
fences are substantially reduced. Compared with some ex-
isting verication tools, our approach is more scalable. We
33
have detected and eliminated PVSC bugs on some real-world
applications, such as MySQL, Apache, SPLASH-2, and Cilk
Programs, with our implemented prototype. Moreover, the
cost of our scheme over race detection is low, with 3.3% on
average.
Our approach inherently suffers from limitations of data
race detection. However, with the improvement of data
race detecting techniques, our approach would show more
potential, since it only requires a bit more extension to them.
ACKNOWLEDGMENT
This paper is supported by a project of the Nation
Basic Research Program of China (No. 2005CB321602),
a project of the National Natural Science Foundation of
China (No. 60736012) and a project of the National High
Technology Research and Development Program of China
(No. 2007AA01Z110).
REFERENCES
[1] D.Shasha, M.Snir, Efcient and correct execution of parallel programs
that share memory. ACM Trans. Program. Lang. Syst.,10(2):282-
312,1988.
[2] A.Kamil, J.Su, K.Yelick, Make Sequential Consistency Practical in
Titanium, Proc. of the ACM/IEEE SC 2005 Conf. Supercomputing,
2005.
[3] X.Fang, J.Lee, S.P.Midkiff, Automatic Fence Insertion for Shared
Memory Multiprocessing, Proc. of the Intl. Conf. on Supercomputing,
2003.
[4] J.Lee, D.A.Padua, Hiding Relaxed Memory Consistency with a Com-
piler. Proc. of Intl Conf. on Parallel Architectures and Compilation
Techniques, 2000.
[5] W.Y.Chen, A.Krishnamurthy, K.Yelick, Polynomial-Time algorithms
for Enforcing Sequential Consistency in SPMD Programs with Arrays.
In Languages and Compilers for Parallel Computing, 2003.
[6] Y.Yu, T.Rodeheffer, W.Chen, RaceTrack: Efcient Detection of Data
Race Conditions via Adaptive Tracking. In 20th ACM Symposium on
Operating Systems Principles, 2005.
[7] J.-D.Choi et al. Efcient and precise data race detection for multi-
threaded object-oriented programs. In Proc. of Programming Language
Design and Implementation, 2002.
[8] R.O.Callahan, J.-D.Choi, Hybrid Dynamic Data Race Detection. In
Principles and Practice of Parallel Programming, 2003.
[9] A. Dining and E.Schonberg. An empirical comparison of monitoring
algorithms for access anomaly detection. In Principles and Practice of
Parallel Programming, 1990.
[10] R.H.B.Netzer and B.P.Miller. Improving the accuracy of data race
detection. In Principles and Practice of Parallel Programming, 1991.
[11] S.Narayannasamy, Z.Wang, J.Tigani, A.Edwards, B.Calder. Automat-
ically classifying benign and harmful data races using replay analysis.
In Programming Language Design and Implementation, 2007.
[12] D.Perkovic and P.J.Keleher. Online data-race detection via coherency
guarantees. In Operating System Design and Implementation, 1996.
[13] S.Savage, M.Burrows, G.Nelson, P.Sobalvarro, and T. Anderson.
Eraser: A dynamic data race detector for multithreaded programs. In
ACM Tran. On Computer System, 1997.
[14] C. von Praun and T.R.Gross. Object race detection. In Object-Oriented
Programming, Systems, Languages and Applications, 2001.
[15] E.Pozniansky and A.Schuster. Efcient on-the-y data race detection
in multithreaded c++ programs. In Principles and Practice of Parallel
Programming, 2003.
[16] C.Boyapati. R.Lee, and M.Rinard. Owership types for safe pro-
gramming: Preventing data races and deadlocks. In Object-Oriented
Programming, Systems, Languages and Applications, 2002
[17] C.Flanagan and S.N.Freund. Type-based race detection for java. In
Programming Language Design and Implementation, 2000.
[18] K.Gharachorloo, P.B.Gibbons, Detecting violations of sequential con-
sistency. In Symposium on Parallel Algorithms and Architectures,
1991.
[19] S.V.Adve, M.D.Hill, B.P.Miller, R.H.B.Netzer, Detecting data races on
weak memory systems. In Intl. Symposium on Computer Architecture,
1991.
[20] M.Prvulovic, CORD: Cost-effective (and nearly overhead-free) Order-
Recording and Data race detection. In High Performance Computer
Architecture, 2006.
[21] M.Prvulovic and J.Torrellas. ReEnact: Using thread-level speculation
mechanisms to debug data races in multithreaded codes. In Intl.
Symposium on Computer Architecture, 2003.
[22] S.L.Min and J.-D.Choi. An efcient cache-based access anomaly de-
tection scheme. In Architectural Support for Programming Languages
and Operating Systems, 1991.
[23] J.D.Choi, S.L.Min, Race Frontier: Reproducing Data Races in Parallel
Program Debugging, In Principles and Practice of Parallel Program-
ming, 1991.
[24] S.Burckhardt, M.Musuvathi, Effective Program Verication for Re-
laxed Memory Models. In Computer Aided Verication, 2008.
[25] S.Burckhardt, R.Alur, M.M.K.Martin, CheckFence: checking con-
sistency of concurrent data types on relaxed memory models. In
Programming Language Design and Implementation, 2007.
[26] H.J.Boehm, S.V.Adve, Foundations of the C++ Concurrency Memory
Model, In Programming Language Design and Implementation, 2008.
[27] L.Lamport. How to make a multiprocessor computer that correctly
executes multiprocess programs. In IEEE Tran. On Computer, 1979.
[28] S.V.Adve, K.Gharachorloo, Shared Memory Consistency Models: A
tutorial. In IEEE computer, 1995.
[29] S.Lu, J.Tucek, F.Qin, Y.Y.Zhou. AVIO: Detecting atomicity violations
via access interleaving invariants. In Architecture Support for Program
Languages and Operating Systems, 2006.
[30] M.Xu, R.Bodik, M.Hill. A serializability violation detector for shared-
memory server programs. In Programming Language Design and
Implementation, 2005.
[31] S.Lu, S.Park, E.Seo, Y.Y.Zhou. Learning from mistakes-a compre-
hensive study on real world concurrency bug characteristics. In Archi-
tecture Support for Programming Languages and Operating Systems,
2008.
[32] S.Lu, S.Park, C.Hu, X.Ma, W.Jiang, Z.Li, R.Popa, Y.Y.Zhou. MUVI:
automatically inferring multi-variable access correlations and detecting
related semantic and concurrency bugs. In Symposium on Operating
System Principles, 2007.
[33] D.Schmidt, T.Harrison. Double-checked-locking: an optimization pat-
tern for efciently initializing and accessing thread-safe objects. In
Programming Language Design and Implementation, 1996.
[34] The Double-checked-locking is broken declaration. http://www.
cs.umd.edu/ pugh/java/memoryModel/DoubleCheckedLocking.html.
[35] M.Frigo, C.E.Leiserson, K.H.Randall. The Implementation of the
cilk-5 multithreaded language. In Programming Language Design and
Implementation, 1998.
[36] N. Nethercote and J. Seward. Valgrind: A Program Supervision
Framework. Electr. In Notes Theor. Comput. Sci., 2003.
[37] P.Zhou, R.Teodorescu, and Y.Zhou. Hard: Hardware-assisted lockset-
based race detection. In High Performance Computer Architecture,
2007.
[38] Intel 64 Architecture Memory Ordering White Paper.
http://developer.intel.com/products/processor/manuals/318147.pdf
[39] J.Manson, W.Pugh and S.Adve. The Java memory model. In Proc.
Symp. on Principles of Programming Languages, 2005.
[40] D.Aspinall and J.Sevcik. Java Memory Model ex-
amples: Good, bad, and ugly. VAMP07 Proceedings
http://www.cs.ru.nl/ chaack/VAMP07/,2007.
34