Marjan Gusev, Sasko Ristov


Cache memory is playing a huge role in determining the performance when solving scientific problems. Most of these problems include a high number of repetition of complex or simple calculations on various data elements stored as arrays in sequential order in the memory. When executing the algorithms, these elements are brought to cache and then used by the processor. This process is usually followed by conflict and capacity misses in the cache, and the performance is degraded by a cache placement function, replacement policy or capacity constraints. In this paper we analyze the algorithms and performance impact of set associative caches when a large array is referenced in sequentially ordered memory. We map the problem of cache use into an IT related mathematical model to analyze the performance and give a scientific explanation for performance drops due to associativity and conflict cache misses in caches.


high performance computing; performance drops; shared memory multiprocessor; superlinear speedup

Full Text:



J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach. MA, USA: Elsevier, 2012.

M. Gusev and S. Ristov, A superlinear speedup region for matrix multiplication, Concurrency and Computation: Practice and Experience, 2013.

G. Tsilikas and M. Fleury, Matrix multiplication performance on commodity sharedmemory multi-processors, in International Conference on Parallel Computing in Electrical Engineering, PARELEC 2004, Sept. 2004, pp. 13 – 18.

S. Sen, S. Chatterjee, and N. Dumir, Towards a theory of cache-efficient algorithms, Journal of the ACM (JACM), 49 2002 pp. 828–858.

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, Optimization of sparse matrix-vector multiplication on emerging multi-core platforms, Parallel Computing, 35 2009 pp. 178–194.

H. Yoon, T. Zhang, and M. H. Lipasti, Sip: Speculative insertion policy for high performancecaching, Computer Sciences Department University of Wisconsin-Madison, Tech. Rep. 1676, 2010.

X. Ding, K. Wang, and X. Zhang, Ulcc: a user-level facility for optimizing shared cache performance on multicores, in Proceedings of the 16th ACM Symposium on Principles and practice of parallel programming, ser. PPoPP ’11. ACM, 2011, pp. 103–112.

M. Gusev and S. Ristov, Performance gains and drawbacks using set associative cache, Journal ofNext Generation Information Technology (JNIT), 3, 31 Aug 2012,pp. 87–98.

L. Djinevski, S. Arsenovski, S. Ristov, and M. Gusev, Performance drawbacks for matrix multi-plication using set associative cache in gpu devices, in MIPRO, 2013 Proceedings of the 36th Interna-tional Convention, IEEE Conference Publications, Croatia, 2013, pp. 213–218.

CPU-world. (2013, Sep.) AMD Opteron(tm) 8347 @ONLINE. [Online]. Available:



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Contact details

Bul. Krste Misirkov br.2
1000 Skopje, Republic of Macedonia
Tel. ++389 2 3235-400
cell:++389 71 385-106
About the journal

CSNMBS is a part of the MASA Contribution series. Published by the Section Natural, Mathematical and Biotechnical Sciences.
About this site

Maintained by the Researh center for Materials and Enviroment - MANU/MASA.
Site (including the theme) set, adapted by MASA - CSIT.