OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

电子学报 2012

面向按序执行处理器的预执行指导的数据预取方法

DOI: 10.3969/j.issn.0372-2112.2012.11.001, PP. 2145-2151

党向磊,王箫音,佟冬,陆俊林,程旭,王克义

Keywords: 数据预取,预执行,访存延迟包容,按序执行处理器

Full-Text Cite this paper Add to My Lib

Abstract:

为提高按序执行处理器的访存性能,本文提出一种预执行指导的数据预取方法(PEDP).PEDP利用跨距预取器对规则的访存模式进行预取,并在发生L2Cache失效后通过预执行后续指令对不规则的访存模式进行精确的预取,从而结合两者的优势提高预取覆盖率.同时,PEDP利用预执行过程中提前捕获的真实访存信息指导跨距预取器的预取过程.在预执行的指导下,跨距预取器可以对预执行能够产生的符合跨距访存模式的地址更早地发起预取请求,从而改善预取及时性.此外,为进一步优化上述指导过程,PEDP使用更新过滤器有效去除指导过程中对跨距预取器的有害更新,从而提高预取准确率.实验结果表明,在平均情况下,PEDP将基准处理器的性能提升33.0%.与跨距预取和预执行各自单独使用相比,PEDP将性能分别提高16.2%和7.3%.

References

[1]	P Kongetira,et al.Niagara:A 32-way multithreaded Sparc processor[J].IEEE Micro,2005,25(2):21-29.
[2]	王箫音,等.一种高能效的面向单发射按序处理器的预执行机制[J].电子学报,2011,39(2):458-463. X Y Wang,et al.An energy-efficient executing ahead mechanism for improving the performance of single-issue in-order microprocessors[J].Acta Electronica Sinic,2011,39(2):458-463.(in Chinese)
[3]	T F Chen,J L Baer.Effective hardware-based data prefetching for high-performance processors[J].IEEE Transactions on Computers,1995,44(5):609-623.
[4]	D Joseph,D Grunwald.Prefetching using Markov predictors .Int’l Symposium on Computer Architecture .Denver,Colorado,USA:IEEE Computer Society,1997.252-263.
[5]	G Hinton,et al.The microarchitecture of the Pentium 4 processor[J].Intel Technology Journal,2001,5(1):1-13.
[6]	A Hilton,et al.iCFP:Tolerating all-level cache misses in in-order processors .Int’l Symposium on High Performance Computer Architecture .Raleigh,North Carolina,USA:IEEE Computer Society,2009.431-442.
[7]	T F Wenisch,et al.Making address-correlated prefetching practical[J].IEEE Micro,2010,30(1):50-59.
[8]	S Iacobovici,et al.Effective stream-based and execution-based data prefetching .Int’l Conference on Supercomputing .Saint-Malo,France:IEEE Computer Society,2004.1-11.
[9]	T Austin,et al.SimpleScalar:An infrastructure for computer system modeling[J].IEEE Computer,2002,35(2):59-67.
[10]	D Wang,et al.DRAMsim:A memory system simulator[J].ACM Computer Architecture News,2005,33(4):100-107.
[11]	D Brooks,et al.Wattch:A framework for architectural-level power analysis and optimizations .Int’l Symposium on Computer Architecture .Vancouver,British Columbia,Canada:IEEE Computer Society,2000.83-94.
[12]	E Perelman,et al.Picking statistically valid and early simulation points .Int’l Conf on Parallel Architectures and Compilation Techniques .New Orleans,Louisiana,USA:IEEE Computer Society,2003.244-255.
[13]	X Cheng,et al.Research progress of UniCore CPUs and PKUnity SoCs[J].Journal of Computer Science and Technology,2010,25(2):200-213.
[14]	K Asanovic,et al.The landscape of parallel computing research:A view from Berkeley .California,USA:Dept of EECS,University of California at Berkeley,2006.
[15]	S P Vanderwiel,D J Lilja.Data prefetch mechanisms[J].ACM Computing Surveys,2000,32(2):174-199.
[16]	S Palacharla,R E Kessler.Evaluating stream buffers as a secondary cache replacement .Int’l Symposium on Computer Architecture .Chicago,Illinois,USA:IEEE Computer Society,1994.24-33.
[17]	S Somogyi,et al.Spatio-temporal memory streaming .Int’l Symposium on Computer Architecture .Austin,Texas,USA:IEEE Computer Society,2009.69-80.
[18]	H W Cain,P Nagpurkar.Runahead execution vs.conventional data prefetching in the IBM POWER6 microprocessor .Int’l Symposium on Performance Analysis of Systems and Software .White Plains,New York,USA:IEEE Computer Society,2010.203-212.
[19]	J Dundas,T Mudge.Improving data cache performance by pre-executing instructions under a cache miss .Int’l Conference on Supercomputing .Vienna,Austria:IEEE Computer Society,1997.68-75.
[20]	O Mutlu,et al.Runahead execution:An effective alternative to large instruction windows[J].IEEE Micro,2003,23(6):20-25.
[21]	R D Barnes,et al.Tolerating cache-miss latency with multipass pipelines[J].IEEE Micro,2006,26(1):40-47.
[22]	S Nekkalapu,et al.A simple latency tolerant processor .Int’l Conference on Computer Design .Lake Tahoe,California,USA:IEEE Computer Society,2008.384-389.
[23]	K I Farkas,N P Jouppi.Complexity/performance trade-offs with non-blocking loads .Int’l Symposium on Computer Architecture .Chicago,Illinois,USA:IEEE Computer Society,1994.211-222.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133