There’s been a constant demand for smaller, lighter, and cheaper cameras. But miniaturization is restricted by the lens system and the focusing distance required by refractive lenses.
Researchers have developed a new image reconstruction method that improves computation time and provides high-quality images. The typical optical hardware of the lens-less camera is a thin mask and an image sensor. A mathematical algorithm then reconstructs the image.
The mask and the sensor can be fabricated together in established semiconductor manufacturing processes for future production. The mask optically encodes the incident light and casts patterns on the sensor. Though the casted patterns are entirely non-interpretable for the human eye, they can be decoded with explicit knowledge of the optical system.
The decoding process is the problem.
Deep learning could help avoid the limitations of model-based decoding since it can learn the model and decode the image by a non-iterative direct process instead, but it cannot produce good-quality images. Tokyo Tech has proposed a novel, dedicated machine learning algorithm for image reconstruction. The proposed algorithm is based on Vision Transformer (ViT), which is better at global feature reasoning. The novelty of the algorithm lies in the structure of the multistage transformer blocks with overlapped ”patchify” modules, allowing it to efficiently learn image features in a hierarchical representation.
In summary, the proposed method solves the limitations of conventional methods such as iterative image reconstruction-based processing and CNN-based machine learning with the ViT architecture, enabling the acquisition of high-quality images in a short computing time.