%0 Journal Article
%T Optimization of BLAS Level 2 Based on Multi-Core Loongson 3A
多核龙芯3A上二级BLAS库的优化
%A LI Yi
%A HE Song-Song
%A LI Kai
%A
李毅
%A 何颂颂
%A 李恺
%J 计算机系统应用
%D 2011
%I
%X According to characteristics of Loongson 3A architecture and BLAS level 2, this article derives the parallel solutions from instruction level, storage level and thread level. We summarize some suitable optimization methods and make a quantitative analysis. Experiment shows that the single-threading performance of BLAS level 2 is increased by 20%, and the multi-threading speedup reaches to 2.5. All of these will give some help to the optimization of system software on multi-core Loongson 3A.
%K Loongson 3A
%K BLAS
%K optimization
%K Gemv
%K Ger
%K memory access
%K multi-threading
龙芯3A
%K BLAS
%K 优化
%K Gemv
%K Ger
%K 访存
%K 多线程
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=D4F6864C950C88FFCE5B6C948A639E39&aid=9030C91A80F5BF44B076F1D2C0C97AFF&yid=9377ED8094509821&vid=A04140E723CB732E&iid=CA4FD0336C81A37A&sid=D5C9DC4EF2F78008&eid=ED01F5AE50BE09C0&journal_id=1003-3254&journal_name=计算机系统应用&referenced_num=0&reference_num=9