在过去几年中,计算机视觉领域的显著进步大体上归功于深度学习,得益于大量标记数据的可用性,以及GPU范式的爆炸性增长。在认同这一观点的同时,这项工作批评了该领域的科学进步,并提出在基于信息的自然规律框架内对视觉进行研究。
这项工作提出了一些关于视觉的基本问题,这些问题还远未被理解,引导读者踏上了一段充满新奇挑战的旅程,这些挑战与机器学习的基础产生了共鸣。提出的中心论点是,为了更深入地理解视觉计算过程,有必要超越通用机器学习算法的应用,转而关注考虑视觉信号时空性质的适当学习理论。
本书旨在启发和激发批判性反思和讨论,但不需要事先掌握先进的技术知识,因此自然可以与经典的计算机视觉教科书搭配使用,以更好地描述当前的技术状态、开放性问题和新颖的潜在解决方案。因此,它将对计算机科学、计算神经科学、物理学和其他相关学科的研究生和高级本科生大有裨益。
Deep Learning to See: Towards New Foundations of Computer Vision
The remarkable progress in computer vision over the last few years is, by and large, attributed to deep learning, fueled by the availability of huge sets of labeled data, and paired with the explosive growth of the GPU paradigm. While subscribing to this view, this work criticizes the supposed scientific progress in the field, and proposes the investigation of vision within the framework of information-based laws of nature.
This work poses fundamental questions about vision that remain far from understood, leading the reader on a journey populated by novel challenges resonating with the foundations of machine learning. The central thesis proposed is that for a deeper understanding of visual computational processes, it is necessary to look beyond the applications of general purpose machine learning algorithms, and focus instead on appropriate learning theories that take into account the spatiotemporal nature of the visual signal.
Serving to inspire and stimulate critical reflection and discussion, yet requiring no prior advanced technical knowledge, the text can naturally be paired with classic textbooks on computer vision to better frame the current state of the art, open problems, and novel potential solutions. As such, it will be of great benefit to graduate and advanced undergraduate students in computer science, computational neuroscience, physics, and other related disciplines.
OR