The objective of this curse is to help students understanding the different architectural approaches to efficient implement training and inference of deep learning algorithms. This includes a thorough review of the most relevant hardware alternatives (CPU, GPU, FPGA, etc.), an analysis of the limitations and bottlenecks including possible solutions, and a tour on parallelism and massively parallel architectures.