{"id":302,"date":"2015-01-08T17:12:17","date_gmt":"2015-01-08T09:12:17","guid":{"rendered":"http:\/\/planckscale.info\/?p=302"},"modified":"2020-04-30T17:20:45","modified_gmt":"2020-04-30T09:20:45","slug":"cuda-%e8%bd%af%e4%bb%b6%e6%8a%bd%e8%b1%a1%e7%9a%84%e5%b9%bb%e5%bd%b1%e8%83%8c%e5%90%8e-%e4%b9%8b%e4%ba%8c","status":"publish","type":"post","link":"https:\/\/mindspectrum.xyz\/en\/2015\/01\/08\/cuda-%e8%bd%af%e4%bb%b6%e6%8a%bd%e8%b1%a1%e7%9a%84%e5%b9%bb%e5%bd%b1%e8%83%8c%e5%90%8e-%e4%b9%8b%e4%ba%8c\/","title":{"rendered":"CUDA, \u8f6f\u4ef6\u62bd\u8c61\u7684\u5e7b\u5f71\u80cc\u540e \u4e4b\u4e8c"},"content":{"rendered":"<p>\u5148\u66f4\u65b0\u5230\u8fd9\u513f\uff0c\u7a0d\u540e\u518d\u56de\u6765\u629b\u5149\u67e5\u9519\u3002CUDA\u6bd4\u8f83\u6742\uff0c\u6211\u4e00\u5199\u8d77\u6765\u5bb9\u6613\u6ee1\u5634\u8dd1\u706b\u8f66\u5f04\u51fa\u9519\u8bef\uff0c\u6b22\u8fce\u62cd\u7816\u3002<\/p>\n<p>**********************************************************************<\/p>\n<h5><strong>\u7248\u6743\u58f0\u660e\uff1a\u539f\u521b\u4f5c\u54c1\uff0c\u6b22\u8fce\u8f6c\u8f7d\uff0c\u4f46\u8f6c\u8f7d\u8bf7\u4ee5\u8d85\u94fe\u63a5\u5f62\u5f0f\u6ce8\u660e\u6587\u7ae0\u6765\u6e90(<a href=\"https:\/\/mindspectrum.xyz\" target=\"_blank\" rel=\"noopener noreferrer\">planckscale.info<\/a>)\u3001\u4f5c\u8005\u4fe1\u606f\u548c\u672c\u58f0\u660e\uff0c\u5426\u5219\u5c06\u8ffd\u7a76\u6cd5\u5f8b\u8d23\u4efb\u3002<\/strong><\/h5>\n<p>\u4e0a\u4e00\u7bc7\u91cc\u8bf4\u5230\uff0c\u6709\u4e24\u70b9\u5bf9CUDA\u7684\u8ba1\u7b97\u80fd\u529b\u5f71\u54cd\u751a\u5927\uff1a\u6570\u636e\u5e76\u884c\uff0c\u4ee5\u53ca\u7528\u591a\u7ebf\u7a0b\u63a9\u76d6\u5ef6\u8fdf\u3002\u63a5\u4e0b\u6765\u6211\u4eec\u8981\u6df1\u5165\u5230\u5176\u786c\u4ef6\u5b9e\u73b0\uff0c\u770b\u4e00\u770b\u8fd9\u4e9b\u673a\u5236\u662f\u5982\u4f55\u8fd0\u4f5c\u7684\u3002<\/p>\n<p>\u901a\u5e38\u4eba\u4eec\u7ecf\u5e38\u8bf4\u67d0GPU\u6709\u51e0\u767e\u751a\u81f3\u6570\u5343\u7684CUDA\u6838\u5fc3\uff0c\u8fd9\u5f88\u5bb9\u6613\u8ba9\u4eba\u8054\u60f3\u5230\u591a\u6838CPU\u3002\u4e0d\u8fc7\u4e8b\u5b9e\u4e0a\u4e24\u79cd\u201c\u6838\u5fc3\u201d\u662f\u4e0d\u4e00\u6837\u7684\u6982\u5ff5\uff0cGPU\u7684CUDA\u6838\u5fc3\u53ea\u76f8\u5f53\u4e8e\u5904\u7406\u5668\u4e2d\u7684\u6267\u884c\u5355\u5143\uff0c\u8d1f\u8d23\u6267\u884c\u6307\u4ee4\u8fdb\u884c\u8fd0\u7b97\uff0c\u5e76\u4e0d\u5305\u542b\u63a7\u5236\u5355\u5143\u3002\u53ef\u4ee5\u7c7b\u6bd4\u5230CPU\u6838\u5fc3\u7684\u662f\u6d41\u591a\u5904\u7406\u5668\uff08Streaming Multiprocessor\uff0c\u7b80\u5199\u4e3aSM. Kepler\u4e2d\u53eb\u505aSMX\uff0cMaxwell\u4e2d\u53eb\u505aSMM\uff09\uff0c\u901a\u5e38\u4e00\u4e2aGPU\u4e2d\u6709\u6570\u4e2aSM\uff0c\u800c\u6bcf\u4e2aSM\u4e2d\u5305\u542b\u51e0\u5341\u6216\u8005\u4e0a\u767e\u4e2aCUDA\u6838\u5fc3\uff0c\u4ee5\u53ca\u6570\u4e2awarp scheduler\uff08\u76f8\u5f53\u4e8e\u63a7\u5236\u5355\u5143\uff09\u3002\u5982\u4e0b\u56feGM204\u4e2d\u670916\u4e2aSM\uff0c\u6bcf\u4e2aSM\u4e2d\u6709128\u4e2aCUDA\u6838\u5fc3\uff0c4\u4e2awarp scheduler\u3002<\/p>\n<p><a href=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/GeForce_GTX_980_SM_Diagram-545x1024.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-303 size-large aligncenter\" src=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/GeForce_GTX_980_SM_Diagram-545x1024-545x1024.png\" alt=\"GeForce_GTX_980_SM_Diagram-545x1024\" width=\"545\" height=\"1024\" srcset=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/GeForce_GTX_980_SM_Diagram-545x1024.png 545w, https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/GeForce_GTX_980_SM_Diagram-545x1024-160x300.png 160w\" sizes=\"auto, (max-width: 545px) 100vw, 545px\" \/><\/a><\/p>\n<p style=\"text-align: center;\">\u56fe 1. \u00a0GM204\u7684SM\u7ed3\u6784\u56fe<\/p>\n<p>\u6bcf\u4e2aSM\u4e2d\u6709\u5927\u91cf\u7684\u5bc4\u5b58\u5668\u8d44\u6e90\uff0c\u5728GM204\u7684\u4f8b\u5b50\u4e2d\uff0c\u6709\u603b\u517164k 32-bit\u5bc4\u5b58\u5668\uff0c\u53ef\u4ee5\u517b\u6d3b\u6210\u5343\u4e0a\u4e07\u7684\u7ebf\u7a0b\u3002SM\u4e2d\u53e6\u5916\u4e00\u4e2a\u91cd\u8981\u8d44\u6e90\u662fShared Memory,\u6ca1\u9519\uff0c\u5b83\u6b63\u662f\u8f6f\u4ef6\u62bd\u8c61\u4e2dShared Memory\u7684\u5bf9\u5e94\u7269\u3002\u5728GM204\u4e2d\uff0c\u6bcf\u4e2aSM\u670996KB\u7684Shared Memory.<\/p>\n<p>\u5230\u8fd9\u91cc\uff0cSM\u5728\u8f6f\u4ef6\u62bd\u8c61\u91cc\u7684\u5bf9\u5e94\u4e5f\u547c\u4e4b\u6b32\u51fa\u4e86\uff0c\u6ca1\u9519\uff0c\u6b63\u662fBlock\u3002\u6211\u4eec\u4e0d\u59a8\u5148\u6446\u51fa\u8fd9\u4e2a\u5bf9\u5e94\uff1a<br \/>Block &lt;-&gt; SM<br \/>Thread\u6267\u884c &lt;-&gt; CUDA Cores<br \/>Thread\u6570\u636e &lt;-&gt; Register\/Local Memory<\/p>\n<p>\u540c\u4e00Grid\u4e0b\u7684\u4e0d\u540cBlock\u4f1a\u88ab\u5206\u53d1\u5230\u4e0d\u540c\u7684SM\u4e0a\u6267\u884c\u3002SM\u4e0a\u53ef\u80fd\u540c\u65f6\u5b58\u5728\u591a\u4e2aBlock\u88ab\u6267\u884c\uff0c\u5b83\u4eec\u4e0d\u4e00\u5b9a\u6765\u81ea\u540c\u4e00\u4e2akernel\u51fd\u6570\u3002\u6bcf\u4e2aThread\u4e2d\u7684\u5c40\u57df\u53d8\u91cf\u88ab\u6620\u5c04\u5230SM\u7684\u5bc4\u5b58\u5668\u4e0a\uff0c\u800cThread\u7684\u6267\u884c\u5219\u7531CUDA\u6838\u5fc3\u6765\u5b8c\u6210\u3002<\/p>\n<p>SM\u4e0a\u53ef\u4ee5\u540c\u65f6\u5b58\u5728\u591a\u5c11\u4e2aBlock\uff1f\u8fd9\u7531\u786c\u4ef6\u8d44\u6e90\u7684\u6d88\u8017\u51b3\u5b9a\uff1a\u6bcf\u4e2aSM\u4f1a\u5360\u7528\u4e00\u5b9a\u6570\u91cf\u7684\u5bc4\u5b58\u5668\u548cShared Memory\uff0c\u56e0\u6b64SM\u4e0a\u540c\u65f6\u5b58\u6d3b\u7684Block\u6570\u76ee\u4e0d\u5e94\u5f53\u8d85\u8fc7\u8fd9\u4e9b\u786c\u4ef6\u8d44\u6e90\u7684\u9650\u5236\u3002\u7531\u4e8eSM\u4e0a\u53ef\u4ee5\u540c\u65f6\u6709\u6765\u81ea\u4e0d\u540ckernel\u7684Block\u5b58\u5728\uff0c\u56e0\u6b64\u6709\u65f6\u5019\u5373\u4fbfSM\u4e0a\u5269\u4f59\u8d44\u6e90\u4e0d\u8db3\u4ee5\u518d\u5bb9\u7eb3\u4e00\u4e2akernel A\u7684Block\uff0c\u4f46\u5374\u4ecd\u53ef\u80fd\u5bb9\u7eb3\u4e0b\u4e00\u4e2akernel B\u7684Block.<\/p>\n<p>\u63a5\u4e0b\u6765\u4e00\u4e2a\u5f88\u91cd\u8981\u7684\u95ee\u9898\u662fBlock\u5982\u4f55\u88ab\u6267\u884c\u3002\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cSM\u4e0a\u7684CUDA\u6838\u5fc3\u662f\u6709\u9650\u7684\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u80fd\u591f\u5728\u7269\u7406\u4e0a\u771f\u6b63\u5e76\u884c\u7684\u7ebf\u7a0b\u6570\u2014\u2014\u8f6f\u4ef6\u62bd\u8c61\u91cc\uff0cBlock\u4e2d\u6240\u6709\u7684\u7ebf\u7a0b\u662f\u5e76\u884c\u6267\u884c\u7684\uff0c\u8fd9\u53ea\u662f\u4e2a\u903b\u8f91\u4e0a\u65e0\u61c8\u53ef\u51fb\u7684\u62bd\u8c61\uff0c\u4e8b\u5b9e\u4e0a\u6211\u4eec\u4e0d\u53ef\u80fd\u5bf9\u4e00\u4e2a\u4efb\u610f\u5927\u5c0f\u7684Block\u90fd\u7ed9\u51fa\u4e00\u4e2a\u540c\u7b49\u5927\u5c0f\u7684CUDA\u6838\u5fc3\u9635\u5217\uff0c\u6765\u771f\u6b63\u5e76\u884c\u7684\u6267\u884c\u5b83\u4eec\u3002<br \/>\u56e0\u800c\u6709\u4e86Warp\u8fd9\u4e2a\u6982\u5ff5\uff1a\u7269\u7406\u4e0a\uff0cBlock\u88ab\u5212\u5206\u6210\u4e00\u5757\u5757\u5206\u522b\u6620\u5c04\u5230CUDA\u6838\u5fc3\u9635\u5217\u4e0a\u6267\u884c\uff0c\u6bcf\u4e00\u5757\u5c31\u53eb\u505a\u4e00\u4e2aWarp.\u76ee\u524d\uff0cCUDA\u4e2d\u7684Warp\u90fd\u662f\u4ecethreadIdx = 0\u5f00\u59cb\uff0c\u4ee5threadIdx\u8fde\u7eed\u768432\u4e2a\u7ebf\u7a0b\u4e3a\u4e00\u7ec4\u5212\u5206\u5f97\u5230\uff0c\u5373\u4fbf\u6700\u540e\u5269\u4e0b\u7684\u7ebf\u7a0b\u4e0d\u8db332\u4e2a\uff0c\u4e5f\u5c06\u5176\u4f5c\u4e3a\u4e00\u4e2aWarp.CUDA kernel\u7684\u914d\u7f6e\u4e2d\uff0c\u6211\u4eec\u7ecf\u5e38\u628aBlock\u7684size\u8bbe\u7f6e\u4e3a32\u7684\u6574\u6570\u500d\uff0c\u6b63\u662f\u4e3a\u4e86\u8ba9\u5b83\u80fd\u591f\u7cbe\u786e\u5212\u5206\u4e3a\u6574\u6570\u4e2aWarp\uff08\u66f4\u6df1\u523b\u7684\u539f\u56e0\u548c\u5b58\u50a8\u5668\u8bbf\u95ee\u6027\u80fd\u6709\u5173\uff0c\u4f46\u8fd9\u79cd\u60c5\u51b5\u4e0b\u4ecd\u7136\u548cWarp\u7684size\u8131\u4e0d\u4e86\u5e72\u7cfb\uff09\u3002<br \/>\u5728GM204\u7684SM\u7ed3\u6784\u56fe\u91cc\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cSM\u88ab\u5212\u5206\u6210\u56db\u4e2a\u76f8\u540c\u7684\u5757\uff0c\u6bcf\u4e00\u5757\u4e2d\u6709\u5355\u72ec\u7684Warp Scheduler\uff0c\u4ee5\u53ca32\u4e2aCUDA\u6838\u5fc3\u3002Warp\u6b63\u662f\u5728\u8fd9\u91cc\u88ab\u6267\u884c\u3002<br \/>Warp\u7684\u6267\u884c\u975e\u5e38\u7c7b\u4f3c\u4e8eSIMD. Warp\u4e2d\u7684\u6d3b\u52a8\u7ebf\u7a0b\u7531Warp Scheduler\u9a71\u52a8\uff0c\u540c\u6b65\u6267\u884c\u3002\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cGM204\u4e2d32\u4e2aCUDA\u6838\u5fc3\u5171\u4eab\u4e00\u4e2aWarp Scheduler. \u5173\u4e8eWarp\u6267\u884c\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u590d\u6742\u4e9b\u7684\u95ee\u9898\uff0c\u7559\u5230\u4e0b\u6587\u53e6\u5916\u8bf4\u3002<\/p>\n<p>\u73b0\u5728\u53ef\u4ee5\u6574\u7406\u4e00\u4e0b\u8fd9\u4e2a\u4e16\u754c\u7684\u56fe\u666f\u4e86\u3002SM\u4e0a\u5b58\u6d3b\u7740\u51e0\u4e2aBlock\uff0c\u6bcf\u4e2aBlock\u4e2d\u7684\u53d8\u91cf\u5360\u636e\u7740\u81ea\u5df1\u7684\u5bc4\u5b58\u5668\u548cShared Memory\uff0cBlock\u88ab\u5212\u5206\u621032\u4e2a\u7ebf\u7a0b\u7ec4\u6210\u7684Warp. \u8fd9\u6837\uff0c\u5927\u91cf\u7684Warp\u751f\u5b58\u5728SM\u4e0a\uff0c\u7b49\u5f85\u88ab\u8c03\u5ea6\u5230CUDA\u6838\u5fc3\u9635\u5217\u53bb\u6267\u884c\u3002<\/p>\n<p>Warp Scheduler\u6b63\u5982\u5176\u540d\uff0c\u662f\u8fd9\u4e2aWarp\u4e16\u754c\u91cc\u7684\u8c03\u5ea6\u8005\u3002\u5f53\u4e00\u4e2aWarp\u6267\u884c\u4e2d\u51fa\u73b0\u7b49\u5f85\uff08\u5b58\u50a8\u5668\u8bfb\u5199\u5ef6\u8fdf\u7b49\uff09\u540e\uff0cWarp Scheduler\u5c31\u8fc5\u901f\u5207\u6362\u5230\u4e0b\u4e00\u4e2a\u53ef\u6267\u884c\u7684Warp\uff0c\u5bf9\u5176\u53d1\u9001\u6307\u4ee4\u76f4\u5230\u8fd9\u4e2aWarp\u53c8\u4e00\u6b21\u51fa\u73b0\u7b49\u5f85\uff0c\u5468\u800c\u590d\u59cb\u3002\u8fd9\u5c31\u662f\u4e0a\u4e00\u7bc7\u6240\u8bf4\u201c\u7528\u591a\u7ebf\u7a0b\u63a9\u76d6\u5ef6\u8fdf\u201d\u5728\u786c\u4ef6\u56fe\u666f\u4e0b\u7684\u6a21\u6837\u3002<\/p>\n<p><a href=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-307 size-large aligncenter\" src=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE-1024x291.png\" alt=\"CPU_GPU_COMPARE\" width=\"625\" height=\"178\" srcset=\"https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE-1024x291.png 1024w, https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE-300x85.png 300w, https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE-624x177.png 624w, https:\/\/mindspectrum.xyz\/wp-content\/uploads\/2015\/01\/CPU_GPU_COMPARE.png 1388w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p style=\"text-align: center;\">\u56fe 2. \u00a0GPU\u7528\u591a\u4e2aWarp\u63a9\u76d6\u5ef6\u8fdf \/ \u4e0eCPU\u8ba1\u7b97\u6a21\u5f0f\u7684\u5bf9\u6bd4<\/p>\n<p style=\"text-align: center;\">\u672c\u56fe\u5f15\u7528\u81eaPPT\u00a0&#8220;CUDA Overview&#8221; from\u00a0Cliff Woolley, NVIDIA.<\/p>\n<p>\u5982\u56fe\uff0cGPU\u7528\u591a\u4e2aWarp\u5feb\u901f\u5207\u6362\u6765\u63a9\u76d6\u5ef6\u8fdf\uff0c\u800cCPU\u7528\u5feb\u901f\u7684\u5bc4\u5b58\u5668\u6765\u51cf\u5c0f\u5ef6\u8fdf\u3002\u4e24\u8005\u7684\u91cd\u8981\u533a\u522b\u662f\u5bc4\u5b58\u5668\u6570\u76ee\uff0cCPU\u7684\u5bc4\u5b58\u5668\u5feb\u4f46\u5c11\uff0c\u56e0\u6b64Context Switch\u4ee3\u4ef7\u9ad8\uff1bGPU\u5bc4\u5b58\u5668\u591a\u800c\u6162\uff0c\u4f46\u5bc4\u5b58\u5668\u6570\u91cf\u4fdd\u8bc1\u4e86\u7ebf\u7a0bContext Switch\u975e\u5e38\u5feb\u3002<\/p>\n<p>\u591a\u5c11\u7ebf\u7a0b\u624d\u80fd\u591f\u63a9\u76d6\u6389\u5e38\u89c1\u7684\u5ef6\u8fdf\u5462\uff1f\u5bf9\u4e8eGPU\uff0c\u6700\u5e38\u89c1\u7684\u5ef6\u8fdf\u5927\u6982\u8981\u6570\u5bc4\u5b58\u5668\u5199\u540e\u8bfb\u4f9d\u8d56\uff0c\u5373\u4e00\u4e2a\u5c40\u57df\u53d8\u91cf\u88ab\u8d4b\u503c\u540e\u63a5\u7740\u4e0d\u4e45\u53c8\u88ab\u8bfb\u53d6\uff0c\u8fd9\u65f6\u5019\u4f1a\u4ea7\u751f\u5927\u7ea624\u4e2a\u65f6\u949f\u5468\u671f\u7684\u5ef6\u8fdf\u3002\u4e3a\u4e86\u63a9\u76d6\u6389\u8fd9\u4e2a\u5ef6\u8fdf\uff0c\u6211\u4eec\u9700\u8981\u81f3\u5c1124\u4e2aWarp\u8f6e\u6d41\u6267\u884c\uff0c\u4e00\u4e2aWarp\u9047\u5230\u5ef6\u8fdf\u540e\u7684\u7a7a\u95f2\u65f6\u95f4\u91cc\u6267\u884c\u5176\u4f5923\u4e2aWarp\uff0c\u4ece\u800c\u4fdd\u6301\u786c\u4ef6\u7684\u5fd9\u788c\u3002\u5728Compute Capability 2.0\uff0cSM\u4e2d\u670932\u4e2aCUDA\u6838\u5fc3\uff0c\u5e73\u5747\u6bcf\u5468\u671f\u53d1\u5c04\u4e00\u6761\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u9700\u898124*32 = 768\u4e2a\u7ebf\u7a0b\u6765\u63a9\u76d6\u5ef6\u8fdf\u3002<br \/>\u4fdd\u6301\u786c\u4ef6\u5fd9\u788c\uff0c\u7528CUDA\u7684\u672f\u8bed\u6765\u8bf4\uff0c\u5c31\u662f\u4fdd\u6301\u5145\u5206\u7684Occupancy,\u8fd9\u662fCUDA\u7a0b\u5e8f\u4f18\u5316\u7684\u4e00\u4e2a\u91cd\u8981\u6307\u6807\u3002<\/p>\n<p>\uff08\u672a\u5b8c\u5f85\u7eed\uff09<\/p>\n<p>***********************************************<\/p>\n<p>\u4e00\u4e9b\u540e\u7eed\u8865\u5145\u3002<\/p>\n<p>\u7f51\u53cb\u90b5\uff1a<\/p>\n<p>SM\u4e0a\u53ef\u4ee5\u540c\u65f6\u5b58\u5728\u591a\u5c11\u4e2aBlock\uff0c\u9664\u4e86\u53d7\u5230\u8d44\u6e90\u7684\u9650\u5236\u4e4b\u5916\uff0c\u8fd8\u53d7\u5230\u8bbe\u5907\u4e0a\u9650\u7684\u9650\u5236\uff0c\u6bcf\u4e2aSM\u6709\u4e00\u4e2aDevice Limit\uff0cwarps\u548cblocks\u4e0d\u80fd\u8d85\u8fc7\u5bf9\u5e94\u7684\u4e0a\u9650\u3002<\/p>\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5148\u66f4\u65b0\u5230\u8fd9\u513f\uff0c\u7a0d\u540e\u518d\u56de\u6765\u629b\u5149\u67e5\u9519\u3002CUDA\u6bd4\u8f83\u6742\uff0c\u6211\u4e00\u5199\u8d77\u6765\u5bb9\u6613\u6ee1\u5634\u8dd1\u706b\u8f66\u5f04\u51fa\u9519\u8bef\uff0c\u6b22\u8fce\u62cd\u7816\u3002 ********************************************************************** \u7248\u6743\u58f0\u660e\uff1a\u539f\u521b\u4f5c\u54c1\uff0c\u6b22\u8fce\u8f6c\u8f7d\uff0c\u4f46\u8f6c\u8f7d\u8bf7\u4ee5\u8d85\u94fe\u63a5\u5f62\u5f0f\u6ce8\u660e\u6587\u7ae0\u6765\u6e90(planckscale.info)\u3001\u4f5c\u8005\u4fe1\u606f\u548c\u672c\u58f0\u660e\uff0c\u5426\u5219\u5c06\u8ffd\u7a76\u6cd5\u5f8b\u8d23\u4efb\u3002 \u4e0a\u4e00\u7bc7\u91cc\u8bf4\u5230\uff0c\u6709\u4e24\u70b9\u5bf9CUDA\u7684\u8ba1\u7b97\u80fd\u529b\u5f71\u54cd\u751a\u5927\uff1a\u6570\u636e\u5e76\u884c\uff0c\u4ee5\u53ca\u7528\u591a\u7ebf\u7a0b\u63a9\u76d6\u5ef6\u8fdf\u3002\u63a5\u4e0b\u6765\u6211\u4eec\u8981\u6df1\u5165\u5230\u5176\u786c\u4ef6\u5b9e\u73b0\uff0c\u770b\u4e00\u770b\u8fd9\u4e9b\u673a\u5236\u662f\u5982\u4f55\u8fd0\u4f5c\u7684\u3002 \u901a\u5e38\u4eba\u4eec\u7ecf\u5e38\u8bf4\u67d0GPU\u6709\u51e0\u767e\u751a\u81f3\u6570\u5343\u7684CUDA\u6838\u5fc3\uff0c\u8fd9\u5f88\u5bb9\u6613\u8ba9\u4eba\u8054\u60f3\u5230\u591a\u6838CPU\u3002\u4e0d\u8fc7\u4e8b\u5b9e\u4e0a\u4e24\u79cd\u201c\u6838\u5fc3\u201d\u662f\u4e0d\u4e00\u6837\u7684\u6982\u5ff5\uff0cGPU\u7684CUDA\u6838\u5fc3\u53ea\u76f8\u5f53\u4e8e\u5904\u7406\u5668\u4e2d\u7684\u6267\u884c\u5355\u5143\uff0c\u8d1f\u8d23\u6267\u884c\u6307\u4ee4\u8fdb\u884c\u8fd0\u7b97\uff0c\u5e76\u4e0d\u5305\u542b\u63a7\u5236\u5355\u5143\u3002\u53ef\u4ee5\u7c7b\u6bd4\u5230CPU\u6838\u5fc3\u7684\u662f\u6d41\u591a\u5904\u7406\u5668\uff08Streaming Multiprocessor\uff0c\u7b80\u5199\u4e3aSM. Kepler\u4e2d\u53eb\u505aSMX\uff0cMaxwell\u4e2d\u53eb\u505aSMM\uff09\uff0c\u901a\u5e38\u4e00\u4e2aGPU\u4e2d\u6709\u6570\u4e2aSM\uff0c\u800c\u6bcf\u4e2aSM\u4e2d\u5305\u542b\u51e0\u5341\u6216\u8005\u4e0a\u767e\u4e2aCUDA\u6838\u5fc3\uff0c\u4ee5\u53ca\u6570\u4e2awarp scheduler\uff08\u76f8\u5f53\u4e8e\u63a7\u5236\u5355\u5143\uff09\u3002\u5982\u4e0b\u56feGM204\u4e2d\u670916\u4e2aSM\uff0c\u6bcf\u4e2aSM\u4e2d\u6709128\u4e2aCUDA\u6838\u5fc3\uff0c4\u4e2awarp scheduler\u3002 \u56fe 1. \u00a0GM204\u7684SM\u7ed3\u6784\u56fe \u6bcf\u4e2aSM\u4e2d\u6709\u5927\u91cf\u7684\u5bc4\u5b58\u5668\u8d44\u6e90\uff0c\u5728GM204\u7684\u4f8b\u5b50\u4e2d\uff0c\u6709\u603b\u517164k 32-bit\u5bc4\u5b58\u5668\uff0c\u53ef\u4ee5\u517b\u6d3b\u6210\u5343\u4e0a\u4e07\u7684\u7ebf\u7a0b\u3002SM\u4e2d\u53e6\u5916\u4e00\u4e2a\u91cd\u8981\u8d44\u6e90\u662fShared Memory,\u6ca1\u9519\uff0c\u5b83\u6b63\u662f\u8f6f\u4ef6\u62bd\u8c61\u4e2dShared Memory\u7684\u5bf9\u5e94\u7269\u3002\u5728GM204\u4e2d\uff0c\u6bcf\u4e2aSM\u670996KB\u7684Shared Memory. \u5230\u8fd9\u91cc\uff0cSM\u5728\u8f6f\u4ef6\u62bd\u8c61\u91cc\u7684\u5bf9\u5e94\u4e5f\u547c\u4e4b\u6b32\u51fa\u4e86\uff0c\u6ca1\u9519\uff0c\u6b63\u662fBlock\u3002\u6211\u4eec\u4e0d\u59a8\u5148\u6446\u51fa\u8fd9\u4e2a\u5bf9\u5e94\uff1aBlock &lt;-&gt; SMThread\u6267\u884c &lt;-&gt; CUDA CoresThread\u6570\u636e &lt;-&gt; Register\/Local Memory \u540c\u4e00Grid\u4e0b\u7684\u4e0d\u540cBlock\u4f1a\u88ab\u5206\u53d1\u5230\u4e0d\u540c\u7684SM\u4e0a\u6267\u884c\u3002SM\u4e0a\u53ef\u80fd\u540c\u65f6\u5b58\u5728\u591a\u4e2aBlock\u88ab\u6267\u884c\uff0c\u5b83\u4eec\u4e0d\u4e00\u5b9a\u6765\u81ea\u540c\u4e00\u4e2akernel\u51fd\u6570\u3002\u6bcf\u4e2aThread\u4e2d\u7684\u5c40\u57df\u53d8\u91cf\u88ab\u6620\u5c04\u5230SM\u7684\u5bc4\u5b58\u5668\u4e0a\uff0c\u800cThread\u7684\u6267\u884c\u5219\u7531CUDA\u6838\u5fc3\u6765\u5b8c\u6210\u3002 SM\u4e0a\u53ef\u4ee5\u540c\u65f6\u5b58\u5728\u591a\u5c11\u4e2aBlock\uff1f\u8fd9\u7531\u786c\u4ef6\u8d44\u6e90\u7684\u6d88\u8017\u51b3\u5b9a\uff1a\u6bcf\u4e2aSM\u4f1a\u5360\u7528\u4e00\u5b9a\u6570\u91cf\u7684\u5bc4\u5b58\u5668\u548cShared Memory\uff0c\u56e0\u6b64SM\u4e0a\u540c\u65f6\u5b58\u6d3b\u7684Block\u6570\u76ee\u4e0d\u5e94\u5f53\u8d85\u8fc7\u8fd9\u4e9b\u786c\u4ef6\u8d44\u6e90\u7684\u9650\u5236\u3002\u7531\u4e8eSM\u4e0a\u53ef\u4ee5\u540c\u65f6\u6709\u6765\u81ea\u4e0d\u540ckernel\u7684Block\u5b58\u5728\uff0c\u56e0\u6b64\u6709\u65f6\u5019\u5373\u4fbfSM\u4e0a\u5269\u4f59\u8d44\u6e90\u4e0d\u8db3\u4ee5\u518d\u5bb9\u7eb3\u4e00\u4e2akernel A\u7684Block\uff0c\u4f46\u5374\u4ecd\u53ef\u80fd\u5bb9\u7eb3\u4e0b\u4e00\u4e2akernel B\u7684Block. \u63a5\u4e0b\u6765\u4e00\u4e2a\u5f88\u91cd\u8981\u7684\u95ee\u9898\u662fBlock\u5982\u4f55\u88ab\u6267\u884c\u3002\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cSM\u4e0a\u7684CUDA\u6838\u5fc3\u662f\u6709\u9650\u7684\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u80fd\u591f\u5728\u7269\u7406\u4e0a\u771f\u6b63\u5e76\u884c\u7684\u7ebf\u7a0b\u6570\u2014\u2014\u8f6f\u4ef6\u62bd\u8c61\u91cc\uff0cBlock\u4e2d\u6240\u6709\u7684\u7ebf\u7a0b\u662f\u5e76\u884c\u6267\u884c\u7684\uff0c\u8fd9\u53ea\u662f\u4e2a\u903b\u8f91\u4e0a\u65e0\u61c8\u53ef\u51fb\u7684\u62bd\u8c61\uff0c\u4e8b\u5b9e\u4e0a\u6211\u4eec\u4e0d\u53ef\u80fd\u5bf9\u4e00\u4e2a\u4efb\u610f\u5927\u5c0f\u7684Block\u90fd\u7ed9\u51fa\u4e00\u4e2a\u540c\u7b49\u5927\u5c0f\u7684CUDA\u6838\u5fc3\u9635\u5217\uff0c\u6765\u771f\u6b63\u5e76\u884c\u7684\u6267\u884c\u5b83\u4eec\u3002\u56e0\u800c\u6709\u4e86Warp\u8fd9\u4e2a\u6982\u5ff5\uff1a\u7269\u7406\u4e0a\uff0cBlock\u88ab\u5212\u5206\u6210\u4e00\u5757\u5757\u5206\u522b\u6620\u5c04\u5230CUDA\u6838\u5fc3\u9635\u5217\u4e0a\u6267\u884c\uff0c\u6bcf\u4e00\u5757\u5c31\u53eb\u505a\u4e00\u4e2aWarp.\u76ee\u524d\uff0cCUDA\u4e2d\u7684Warp\u90fd\u662f\u4ecethreadIdx = 0\u5f00\u59cb\uff0c\u4ee5threadIdx\u8fde\u7eed\u768432\u4e2a\u7ebf\u7a0b\u4e3a\u4e00\u7ec4\u5212\u5206\u5f97\u5230\uff0c\u5373\u4fbf\u6700\u540e\u5269\u4e0b\u7684\u7ebf\u7a0b\u4e0d\u8db332\u4e2a\uff0c\u4e5f\u5c06\u5176\u4f5c\u4e3a\u4e00\u4e2aWarp.CUDA kernel\u7684\u914d\u7f6e\u4e2d\uff0c\u6211\u4eec\u7ecf\u5e38\u628aBlock\u7684size\u8bbe\u7f6e\u4e3a32\u7684\u6574\u6570\u500d\uff0c\u6b63\u662f\u4e3a\u4e86\u8ba9\u5b83\u80fd\u591f\u7cbe\u786e\u5212\u5206\u4e3a\u6574\u6570\u4e2aWarp\uff08\u66f4\u6df1\u523b\u7684\u539f\u56e0\u548c\u5b58\u50a8\u5668\u8bbf\u95ee\u6027\u80fd\u6709\u5173\uff0c\u4f46\u8fd9\u79cd\u60c5\u51b5\u4e0b\u4ecd\u7136\u548cWarp\u7684size\u8131\u4e0d\u4e86\u5e72\u7cfb\uff09\u3002\u5728GM204\u7684SM\u7ed3\u6784\u56fe\u91cc\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cSM\u88ab\u5212\u5206\u6210\u56db\u4e2a\u76f8\u540c\u7684\u5757\uff0c\u6bcf\u4e00\u5757\u4e2d\u6709\u5355\u72ec\u7684Warp Scheduler\uff0c\u4ee5\u53ca32\u4e2aCUDA\u6838\u5fc3\u3002Warp\u6b63\u662f\u5728\u8fd9\u91cc\u88ab\u6267\u884c\u3002Warp\u7684\u6267\u884c\u975e\u5e38\u7c7b\u4f3c\u4e8eSIMD. Warp\u4e2d\u7684\u6d3b\u52a8\u7ebf\u7a0b\u7531Warp Scheduler\u9a71\u52a8\uff0c\u540c\u6b65\u6267\u884c\u3002\u6211\u4eec\u53ef\u4ee5\u770b\u5230\uff0cGM204\u4e2d32\u4e2aCUDA\u6838\u5fc3\u5171\u4eab\u4e00\u4e2aWarp Scheduler. \u5173\u4e8eWarp\u6267\u884c\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u590d\u6742\u4e9b\u7684\u95ee\u9898\uff0c\u7559\u5230\u4e0b\u6587\u53e6\u5916\u8bf4\u3002 \u73b0\u5728\u53ef\u4ee5\u6574\u7406\u4e00\u4e0b\u8fd9\u4e2a\u4e16\u754c\u7684\u56fe\u666f\u4e86\u3002SM\u4e0a\u5b58\u6d3b\u7740\u51e0\u4e2aBlock\uff0c\u6bcf\u4e2aBlock\u4e2d\u7684\u53d8\u91cf\u5360\u636e\u7740\u81ea\u5df1\u7684\u5bc4\u5b58\u5668\u548cShared Memory\uff0cBlock\u88ab\u5212\u5206\u621032\u4e2a\u7ebf\u7a0b\u7ec4\u6210\u7684Warp. \u8fd9\u6837\uff0c\u5927\u91cf\u7684Warp\u751f\u5b58\u5728SM\u4e0a\uff0c\u7b49\u5f85\u88ab\u8c03\u5ea6\u5230CUDA\u6838\u5fc3\u9635\u5217\u53bb\u6267\u884c\u3002 Warp Scheduler\u6b63\u5982\u5176\u540d\uff0c\u662f\u8fd9\u4e2aWarp\u4e16\u754c\u91cc\u7684\u8c03\u5ea6\u8005\u3002\u5f53\u4e00\u4e2aWarp\u6267\u884c\u4e2d\u51fa\u73b0\u7b49\u5f85\uff08\u5b58\u50a8\u5668\u8bfb\u5199\u5ef6\u8fdf\u7b49\uff09\u540e\uff0cWarp Scheduler\u5c31\u8fc5\u901f\u5207\u6362\u5230\u4e0b\u4e00\u4e2a\u53ef\u6267\u884c\u7684Warp\uff0c\u5bf9\u5176\u53d1\u9001\u6307\u4ee4\u76f4\u5230\u8fd9\u4e2aWarp\u53c8\u4e00\u6b21\u51fa\u73b0\u7b49\u5f85\uff0c\u5468\u800c\u590d\u59cb\u3002\u8fd9\u5c31\u662f\u4e0a\u4e00\u7bc7\u6240\u8bf4\u201c\u7528\u591a\u7ebf\u7a0b\u63a9\u76d6\u5ef6\u8fdf\u201d\u5728\u786c\u4ef6\u56fe\u666f\u4e0b\u7684\u6a21\u6837\u3002 \u56fe 2. \u00a0GPU\u7528\u591a\u4e2aWarp\u63a9\u76d6\u5ef6\u8fdf \/ \u4e0eCPU\u8ba1\u7b97\u6a21\u5f0f\u7684\u5bf9\u6bd4 \u672c\u56fe\u5f15\u7528\u81eaPPT\u00a0&#8220;CUDA Overview&#8221; from\u00a0Cliff Woolley, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[36],"tags":[7,70,24,43],"class_list":["post-302","post","type-post","status-publish","format-standard","hentry","category-tech","tag-cuda","tag-occupancy","tag-warp","tag-43"],"translation":{"provider":"WPGlobus","version":"2.12.2","language":"en","enabled_languages":["zh","en"],"languages":{"zh":{"title":true,"content":true,"excerpt":false},"en":{"title":false,"content":false,"excerpt":false}}},"_links":{"self":[{"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/posts\/302","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/comments?post=302"}],"version-history":[{"count":11,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/posts\/302\/revisions"}],"predecessor-version":[{"id":1015,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/posts\/302\/revisions\/1015"}],"wp:attachment":[{"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/media?parent=302"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/categories?post=302"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mindspectrum.xyz\/en\/wp-json\/wp\/v2\/tags?post=302"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}