Compiler-assisted Operator Template Library for DNN Accelerators (original) (raw)

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems (2015). http://tensorflow.org/. Software available from tensorflow.org
  2. AnandTech: Cambricon, Makers of Huawei’s Kirin NPU IP. https://www.anandtech.com/show/12815/cambricon-makers-of-huaweis-kirin-npu-ip-build-a-big-ai-chip-and-pcie-card (2018)
  3. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 269–284. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2541940.2541967
  4. Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2012)
    Google Scholar
  5. Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)
    Article Google Scholar
  6. Culberson, J.C.: Iterated Greedy Graph Coloring and the Difficulty Landscape. Tech. rep. (1992)
  7. Deng, J., Dong, W., Socher, R., jia Li, L., Li, K., Fei-fei, L.: Imagenet: A large-scale hierarchical image database. In: In CVPR (2009)
  8. DMLC teams: mshadow. https://github.com/dmlc/mshadow (2018)
  9. Guennebaud, G., Jacob, B., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
  10. He, K., et al.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
  11. Hearst, M.A.: Support Vector Machines. IEEE Intelligent Systems 13(4), 18–28 (1998)
    Article Google Scholar
  12. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
    Article Google Scholar
  13. Howard, A.G., et al.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017)
  14. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
  15. Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1MB model size. CoRR abs/1602.07360 (2016)
  16. J. Hrdtlein C. Pflaum, A.L.C.H.W.: Advanced expression templates programming. In: Computing and Visualization in Science. Springer (2010). https://doi.org/10.1007/s00791-009-0128-2
  17. Jianwen Zhu: Static memory allocation by pointer analysis and coloring. In: Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001, pp. 785–790 (2001). https://doi.org/10.1109/DATE.2001.915121
  18. Jouppi, N.P., Young, C., Patil, N., Patterson, D., et al.: In-datacenter performance analysis of a tensor processing unit. ISCA’17, p. 1–12. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3079856.3080246
  19. Krizhevsky, A., et al.: ImageNet Classification with Deep Convolutional Neural Networks. NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012)
  20. Li, L., Feng, H., Xue, J.: Compiler-directed scratchpad memory management via graph coloring. ACM Trans. Archit. Code Optim. 6(3) (2009). https://doi.org/10.1145/1582710.1582711
  21. Lian Li, Lin Gao, Jingling Xue: Memory coloring: a compiler approach for scratchpad memory management. In: 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05), pp. 329–338 (2005). https://doi.org/10.1109/PACT.2005.27
  22. Liao, H., Tu, J., Xia, J., Zhou, X.: Davinci: A scalable architecture for neural network computing. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–44. IEEE Computer Society, Los Alamitos, CA, USA (2019). https://doi.org/10.1109/HOTCHIPS.2019.8875654
  23. Liu, S., Du, Z., Tao, J., Han, D., Luo, T., Xie, Y., Chen, Y., Chen, T.: Cambricon: An instruction set architecture for neural networks. In: Proceedings of the 43rd International Symposium on Computer Architecture, ISCA ’16, p. 393–405. IEEE Press (2016). https://doi.org/10.1109/ISCA.2016.42
  24. Moazeni, M., Bui, A., Sarrafzadeh, M.: A memory optimization technique for software-managed scratchpad memory in gpus. In: 2009 IEEE 7th Symposium on Application Specific Processors, pp. 43–49 (2009). https://doi.org/10.1109/SASP.2009.5226334
  25. Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1998)
    Google Scholar
  26. Munshi, A., Gaster, B., Mattson, T.G., Fung, J., Ginsburg, D.: OpenCL Programming Guide, 1st edn. Addison-Wesley Professional, Boston (2011)
    Google Scholar
  27. NVIDIA teams: Cutlass. https://github.com/NVIDIA/cutlass (2017)
  28. P. Briggs, K.D.C., Torczon, L.: Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst. 16(3), 428–455 (1994)
  29. Progsch, J., Ineichen, Y., Adelmann, A.: A new vectorization technique for expression templates in C++. CoRR abs/1109.1264 (2011). arXiv:1264
  30. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2014). arXiv:1409.1556
  31. Springer, M., Sun, Y., Masuhara, H.: Inner Array Inlining for Structure of Arrays Layout. In: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2018, p. 50–58. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219753.3219760
  32. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arXiv:1409.4842
  33. Szegedy, C., et al.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015)
  34. Williams, S., Waterman, A., Patterson, D.: Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
    Article Google Scholar
  35. Wu, J., Belevich, A., Bendersky, E., Heffernan, M., Leary, C., Pienaar, J., Roune, B., Springer, R., Weng, X., Hundt, R.: Gpucc: An Open-Source GPGPU Compiler. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO ’16, p. 105–116. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2854038.2854041

Download references