P Kongetira,et al.Niagara:A 32-way multithreaded Sparc processor[J].IEEE Micro,2005,25(2):21-29.
[2]
王箫音,等.一种高能效的面向单发射按序处理器的预执行机制[J].电子学报,2011,39(2):458-463. X Y Wang,et al.An energy-efficient executing ahead mechanism for improving the performance of single-issue in-order microprocessors[J].Acta Electronica Sinic,2011,39(2):458-463.(in Chinese)
[3]
T F Chen,J L Baer.Effective hardware-based data prefetching for high-performance processors[J].IEEE Transactions on Computers,1995,44(5):609-623.
[4]
D Joseph,D Grunwald.Prefetching using Markov predictors .Int’l Symposium on Computer Architecture .Denver,Colorado,USA:IEEE Computer Society,1997.252-263.
[5]
G Hinton,et al.The microarchitecture of the Pentium 4 processor[J].Intel Technology Journal,2001,5(1):1-13.
[6]
A Hilton,et al.iCFP:Tolerating all-level cache misses in in-order processors .Int’l Symposium on High Performance Computer Architecture .Raleigh,North Carolina,USA:IEEE Computer Society,2009.431-442.
[7]
T F Wenisch,et al.Making address-correlated prefetching practical[J].IEEE Micro,2010,30(1):50-59.
[8]
S Iacobovici,et al.Effective stream-based and execution-based data prefetching .Int’l Conference on Supercomputing .Saint-Malo,France:IEEE Computer Society,2004.1-11.
[9]
T Austin,et al.SimpleScalar:An infrastructure for computer system modeling[J].IEEE Computer,2002,35(2):59-67.
[10]
D Wang,et al.DRAMsim:A memory system simulator[J].ACM Computer Architecture News,2005,33(4):100-107.
[11]
D Brooks,et al.Wattch:A framework for architectural-level power analysis and optimizations .Int’l Symposium on Computer Architecture .Vancouver,British Columbia,Canada:IEEE Computer Society,2000.83-94.
[12]
E Perelman,et al.Picking statistically valid and early simulation points .Int’l Conf on Parallel Architectures and Compilation Techniques .New Orleans,Louisiana,USA:IEEE Computer Society,2003.244-255.
[13]
X Cheng,et al.Research progress of UniCore CPUs and PKUnity SoCs[J].Journal of Computer Science and Technology,2010,25(2):200-213.
[14]
K Asanovic,et al.The landscape of parallel computing research:A view from Berkeley .California,USA:Dept of EECS,University of California at Berkeley,2006.
[15]
S P Vanderwiel,D J Lilja.Data prefetch mechanisms[J].ACM Computing Surveys,2000,32(2):174-199.
[16]
S Palacharla,R E Kessler.Evaluating stream buffers as a secondary cache replacement .Int’l Symposium on Computer Architecture .Chicago,Illinois,USA:IEEE Computer Society,1994.24-33.
[17]
S Somogyi,et al.Spatio-temporal memory streaming .Int’l Symposium on Computer Architecture .Austin,Texas,USA:IEEE Computer Society,2009.69-80.
[18]
H W Cain,P Nagpurkar.Runahead execution vs.conventional data prefetching in the IBM POWER6 microprocessor .Int’l Symposium on Performance Analysis of Systems and Software .White Plains,New York,USA:IEEE Computer Society,2010.203-212.
[19]
J Dundas,T Mudge.Improving data cache performance by pre-executing instructions under a cache miss .Int’l Conference on Supercomputing .Vienna,Austria:IEEE Computer Society,1997.68-75.
[20]
O Mutlu,et al.Runahead execution:An effective alternative to large instruction windows[J].IEEE Micro,2003,23(6):20-25.
[21]
R D Barnes,et al.Tolerating cache-miss latency with multipass pipelines[J].IEEE Micro,2006,26(1):40-47.
[22]
S Nekkalapu,et al.A simple latency tolerant processor .Int’l Conference on Computer Design .Lake Tahoe,California,USA:IEEE Computer Society,2008.384-389.
[23]
K I Farkas,N P Jouppi.Complexity/performance trade-offs with non-blocking loads .Int’l Symposium on Computer Architecture .Chicago,Illinois,USA:IEEE Computer Society,1994.211-222.