# A 128x128, 34µm pitch, 8.9mW, 190mK NETD, TECless Uncooled IR bolometer image sensor with column-wise processing

Laurent Alacoque, Sébastien Martin, Wilfried Rabaud, Edith Beigné, Antoine Dupret, Bertrand Dupont; Université Grenoble Alpes, CEA-Leti, Minatec Campus F38054 Grenoble, France.

# Abstract

In this paper, a 128x128,  $34\mu m$  pixel-pitch, room temperature infrared image sensor and processor is presented. With a measured power consumption of 8.9mW ( $540\mu W$  / pixel) in full operating mode (image acquisition and data processing), the sensor exhibit a Noise Equivalent Temperature Difference (NETD) of 190mK at room temperature and doesn't require a Thermo-Electric Cooler (TEC). The circuit also features a novel  $\Sigma \Delta$  Analogue to Digital Conversion architecture, 12 frames of Built-in SRAM and 128 column-wise fullcustom processors that target a broad range of applications such as 2points corrected IR camera or feature extraction for privacy-compliant presence detection, localization and counting. Built-in analogue and digital pixel-level offset pre-correction improves operability and manufacturing yields thus pushing bolometers IR technology one step forward towards high-end applications for consumer market.

# Introduction

The market of presence detection sensors greatly relies on cheap pyroelectric motion sensors. As they only sense motion and are constituted of single elements or small arrays (e.g. 8x8) these sensors cannot support emerging applications that need steady detection with a higher spatial resolution e.g. people localization and counting.

On the other hand, visible image sensors fail at operating in the dark and infrared (IR) camera based detection systems are often too expensive for home applications and too power hungry to be battery operated. The former is caused by the requirement of an expensive vacuum packaging, Thermo-Electrical Cooler (TEC) thermal regulation of the system and the high-variability of the bolometers which impacts production yields and require off-chip image correction. The latter is partly due to the fact that IR-imagers feature below 100mK thermal resolution and a high pixelcount. These features are over-specified for autonomous people detection applications, giving room to a better power vs. resolution tradeoff. Moreover in both visible and IR the use of off-the-shelf image sensors raises the concern of privacy.

Some recent developments [1] present a new type of IR sensors tailored to the needs of cost-effective IR presence sensing but still require the output of images for counting and localization.

In this paper, an 8.9mW, 6x5mm<sup>2</sup> active area, combined TEC-less infrared digital camera and computing system is presented. It relies on a 128x128 pixel-level packaging compliant bolometers array operated at room temperature that allows cost-effective steady presence detection. Its 128 dedicated processing elements (PE) for column-wise digital signal processing enable the generation of privacy-compliant output and alleviate the need of a companion Image Signal Processor (ISP). Built-in analogue and digital Offset Pre-Correction (OPC) allows yields improvement. Chip's moderate power consumption in full operating mode along with its built-in power management unit makes it suitable for battery-operated applications.

# **Circuit Overview**

Figure 1 presents an architecture overview of the circuit while Figure 2 presents a correspondingly annotated photo of the chip's die.

The circuit is constituted of a  $34\mu$ m pitch 128 by 128 infrared bolometer pixels matrix. Each pixel contains a  $25x25 \mu$ m<sup>2</sup> a-Si bolometer designed at the CEA-Leti facility that provides response in the thermal  $8\mu$ m spectral band. The  $9\mu$ m gap between bolometers makes the pixel compliant with Pixel Level Packaging [1] where the pixel encapsulation is performed at post-process level for further cost reduction.



Figure 1 Circuit's functional overview

Additionally to the analogue readout circuit, the pixel contains a SRAM. This SRAM consists in 12 lines of 12 bits words which amounts to a built-in 12 frames memory.

At each column end, the circuit features column-wise current integration Analogue to Digital Conversion (ADC) and a dedicated full custom nano-processor which is also responsible for Sigma-Delta modulation filtering for the ADC  $\Sigma\Delta$  modulator.

The circuit is operated by mean of digital instructions which cover all part of the circuit control, configuration and Processing Elements instruction subset.

# Infrared image sensor readout

The architecture of the image sensor readout and A-D conversion is presented on Figure 3 and detailed hereafter.

Infrared bolometers are thermistors whose resistance value slightly vary with their temperature. When packaged in the vacuum using either a dedicated package, Wafer-Level Packaging (WLP) or Pixel-Level Packaging (PLP), bolometers are independently heated by infrared radiations from the scene and therefore converge towards temperatures that are images of the temperature of their respective sources in the scene.



Figure 2 Annotated chip die with an active part of 6x5 mm<sup>2</sup> fabricated on Altis semiconductors 130n 6MP General purpose CMOS process. Bolometer post-process was realized in the CEA-Léti clean-room facility

This temperature change causes a slight resistance variation which can be measured to evaluate the radiance of the scene and build a thermal image.

To measure the bolometer matrix, a voltage difference is applied to the active bolometers and the current flowing through the bolometer is integrated during a given integration time using a Capacitive Trans-Impedance Amplifier (CTIA).

Bolometers resistance variation caused by the imaged scene is several orders smaller than the bolometer nominal value which results in a large offset of the signal. Moreover, bolometers resistance variations caused by the circuit temperature changes are greater than the scene thermal dynamic range. For these reasons, the current flowing through the active bolometer must be skimmed by a passive reference bolometer whose temperature is kept close to the circuit thanks to a dedicated layout design.

## Pixel architecture

The pixel readout architecture uses a conventional injection p-type MOSFET MPI used as a source follower. Hence the MPI's source voltage is set to (bias\_active +  $V_{SGsat}$ ) voltage. Therefore, it controls the voltage difference ( $V_{Bol}$  – bias\_active –  $V_{SGsat}$ ) at the terminals of the active bolometer. The readout circuit is completed by a selection switch SL that is controlled row-wise by the line\_sel line selection signal from the row controller. When SL is closed, a current Isense flows through the active bolometer into the column-shared analogue column bus.

Additionally to this readout circuit, the remaining of the  $34x34\mu m^2$  pixel is dedicated to a SRAM block of 12 words of 12b and its controller. To achieve the highest possible memory density, pixels were arranged in blocks of 2x2 pixels where the four pixels share a single cut of 48 words of 12b SRAM with common R/W controllers. The controller is connected to a column-wise 12b Read and Write data bus driven by the column-end digital block.

The layout of a pixels-quad layout is presented on Figure 4.



Figure 3 Simplified schematic of the pixel (top), temperature-dependent source (bottom) and column-end readout, analog-digital conversion and computing blocks (middle). OTA output  $(\mathcal{D})$  is plotted on Fig. 5.

## Column-end architecture

The column-end block is responsible for the analogue readout and A-D conversion of the pixel's bolometer resistance.

The main advantage of massively parallel  $\Sigma$ - $\Delta$  Analogue-Digital (A-D) conversion lies in linearity, robustness to process-variability and TECless temperature-range improvement. It has been presented in [2,3]. In order to reduce power consumption, and keep room for in-pixel memory, a novel column-wise  $\Sigma$ - $\Delta$  A-D conversion architecture optimized for TECless operation was implemented in this circuit. The architecture of this converter similar to [3] is described in [4] and below.

On the circuit, a single reference bolometer (Figure 3, bottom), highly thermally-coupled to the focal plane, is biased similarly to the sensing pixels' bolometers. It is used as a temperature-dependent skimming current source [5] through the current mirror MRI-MS1,2 to provide TECless capability.



Figure 4 Cropped view of the layout of the matrix showing pixels quads (Orange). Active pixels are represented with white squares. SRAM cut (Yellow, dashed) and SRAM controller (Yellow, full) are shared by 4 pixels. Analogue readout circuits (Green) are shared by two consecutive lines.

The end of column readout and A-D conversion circuit is based on an improved CTIA architecture. In order to combine a high conversion gain with a high dynamic range, it operates like a first order Sigma-Delta modulator. For this, a comparator monitors the output of the CTIA and the skimming current reference provided by the reference block is modulated by the output of this comparator.

When the CTIA's output raises above the reference, S1 is switchedoff and, due to the current mirror 1/8 imbalance, a current of  $I_{skim} = I_{ref} \times$ 7/8 is skimmed from  $I_{sense}$ , causing the CTIA output to lower until it crosses the reference again.

Then the switch closes and  $I_{skim} = I_{ref} \times 9/8$  is drawn causing the CTIA output to rise again. This regulation loop reduces signal dynamic at OTA's output as it can be seen on the Figure 5. This regulation of the OTA output allows the use of a high gain in the CTIA, reduces the size of the integrating capacitor and prevents saturation.

Furthermore, in order to address the technological mismatches of the bolometers a set of 4 additional MOS are used. They can be programmatically added or removed in parallel with the unswitched skimming MOS MS1 to provide current offset pre-correction (OPC) as described in [5]. This provides pixel-wise offset pre-compensation, thus improving circuit operability and manufacturing yields.

The comparator output modulation is fed to a column-wise PE, which implements modulation digital filtering during the A-D conversion thanks to a specialized instruction.

Figure 6 shows the thermal resolution given by the Noise Equivalent Temperature Difference (NETD) at room temperature. Because the circuit can directly process the pixel data, the digital video output is not mandatory. Omitting video output in favor of e.g. statistical descriptor or outputting video only when the scene contains significant changes can therefore lead to higher frame rates. Results shown on Figure 5 were therefore acquired for various oversampling ratios (OSR), the frames being output or not.

Measured results show circuit compatibility with presence detection application up to 150 frames per second. The Sigma-Delta architecture allows flexibly changing the accuracy of the A-D conversion by lowering the OSR to adjust the balance between power saving, higher frame-rate and NETD.







Figure 5 OTA output during integration (top) and corresponding Sigma-Delta modulation (bottom).

## **Processors array**

Each column-end block contains a processing element that is responsible for three different operations:

- Sigma-Delta modulation filtering
  - Digital Signal processing
  - Circuit serialized output

The Processing Elements work in a Single Instruction Multiple Data (SIMD) parallel architecture, each PE executing the same instruction on its own data.

## Nano-processor architecture

Figure 7 presents the PE layout and architecture.



Figure 7 Processing Elements array at the column-end of the pixel matrix and their communication channels (top) and Processing Element detailed architecture (bottom)

Each PE contains four 12b registers and an ALU to perform simple arithmetical, logical, and conditional operations. When a row is selected for pixel operation, its 128x12x12b SRAM is R/W accessible by the 128 PEs, thus effectively providing a 12 frames memory to the circuit.

Specialized output instructions were added to allow Shift-Register style PE local communications as well as first and second order integration filtering for  $\Sigma\Delta$  modulation.

Arithmetic 24-bits operations are available, in which case CH and CL registers are merged into a single 24b register.

During pixel A-D conversion, the modulation (red) is accumulated in register A while the ALU is configured to perform a 24b accumulation of A in register CHCL thus acting as a second order digital integrator suitable for  $\Sigma\Delta$  modulation filtering. At the end of the conversion phase, the pixel value is available in the processor registers A for first order integration or CHCL for second order integration and simple operations like 1 point correction can immediately be achieved before pixel output. Readout is performed using OUTCL or OUTCHCL instruction which activate the transmission of respectively CL or CHCL registers to the next processing element in a shift-register-like operation. Depending on a configuration bit, PE#0 output is either connected to the circuit output, or connected to PE#127 input thus providing circular communication among the processing elements.

Table 1 summarizes the instruction set of the processing elements.

| Table 1 | Processors | Instruction set |
|---------|------------|-----------------|
|         | 1100033013 | mou ucuon oc    |

| Arithmetic  | ADD, SUB, MULSTEP, INTEG,<br>DBLINTEG       |
|-------------|---------------------------------------------|
| Logical     | AND, OR, XOR, NOT, SHL, SHR,<br>ASSIGN      |
| Flags       | CARRY, ZERO, MSB                            |
| Conditional | IF, IFNOT, ELSE, ENDIF                      |
| Registers   | PIX_RAM, A, B, CL, CH, CLCH, DATA*, PROCID* |
| Readout     | GOTOLINE, NEXTLINE, PREVLINE                |
| ADC         | SETPCO, ADCCYCLE, ADCRESET                  |
| Output      | OUTCL, OUTCHCL                              |

Each PE can perform operations with data available from its five registers A, B, CL, CH and CHL, as well as the pixel-local 12 words SRAM. A read-only circuit-wide DATA register allows PEs to perform operations with a globally-shared data argument. This is useful for operations such as e.g. initializations and thresholding. At last, a read-only local PROCID register contains the PE unique number. It can be used along with logical and conditional operation to target a specific set of processing elements for locally-dependent processing.

Conditional operations are based on Carry, Zero and MSB flags. Depending on the state of these flags and the chosen conditional operation, the PE turns into a hibernation state or executes the next instruction until the occurrence of ENDIF or ELSE conditional instruction. Hibernation state allows processors that failed to match the condition to safely ignore the code block that depends on it.

#### **Power consumption**

Power consumption in both standby and full operating mode were measured. They are presented in Table 2. For the full operating mode, a scenario of Image acquisition, offset correction and digital video output was used and the circuit was operated at 10 million instructions per second.

Table 2 Measured power consumption

| Domain     | Voltage | Power<br>(full op.) | Power<br>(hibernation) |
|------------|---------|---------------------|------------------------|
| Digital    | 1.5 V   | 3.1 mW              | 0.7 mW                 |
| IOs        | 1.8 V   | 0.2 mW              | 0.0 mW                 |
| Analogue   | 3.3 V   | 4.5 mW              | 0.0 mW                 |
| Bolometers | 3.3 V   | 1.1 mW              | 0.0 mW                 |
| Total      | -       | 8.9 mW              | 650 μW                 |

As it can be seen, the circuit power consumption for full operation mode is below 9 mW which is about 5 times lower than the best comparison candidate [1]. If the number of pixels is taken into account, this work exhibits a power/pixel Factor of Merit (FoM) about 13 times better than [1] (see Table 3 at end of paper for an extended comparison with previous work).

# Scalability of the imager architecture

# Scalability towards smaller pixels

This circuit was designed for pixel-level-packaging compliant 34µm-pitch pixels. Smaller pixel-pitches can be attained at the price of SRAM size and processing elements complexity.

The  $\Sigma\Delta$  conversion automatic noise reduction [3] and OTA output dynamic reduction [4] already allows to reduce the size of the pixels' MPI injection MOS and column-end integration capacitor. This decreases the footprint of active elements in the pixel and in the column-end block.

In this implementation, pixel's SRAM represents 90% of the pixel size. Pixel pitch reduction can therefore be achieved by using smaller SRAM bitcells using a more advanced technology node or by reducing the amount of built-in frames memory.

Finally, the column pitch is also constrained by the columnwise processing elements. As for the SRAM, smaller pitches can be reached by targeting a more advanced technology node or by simplifying the processing elements architecture.

As an illustration, reducing the PEs to the minimum requirement of  $\Sigma\Delta$  modulation filters would allow a size reduction of about 80% of the end-of-column digital block, thus relaxing size constraints and permitting pixel pitch reduction.

## Scalability towards higher formats and framerates

Because of its column-wise architecture, the circuit is naturally scalable to a greater number of columns at the same framerate. On the other hand, the increase of the number of lines or framerate would require an adaptation of the ADC.

In this circuit's conversion architecture, like in any  $\Sigma\Delta$  ADC architecture, there is an ADC resolution vs. OSR tradeoff that needs to be adjusted in order to increase the number of lines at the same framerate.

One way to achieve this is to replace the single order  $\Sigma\Delta$  modulator by a higher order one. This would improve the conversion speed by lowering the required OSR for a given framerate, thus providing faster conversion time and allowing for greater line-counts formats at the same framerate.

# Applications for this circuit

Even if the circuit can be operated as a raw IR camera, the processors, along with built-in 12 frames memory are designed to perform a variety of tasks, which range from 1 or 2 points pixel correction to the extraction of image features.

As it was shown in [6], on-chip signal processing enables lowlatency processing by closing the sense and react loop on the chip without the need for image output. This also allows privacycompliant output and alleviate the need of a companion ISP, hence reducing system costs.

In that perspective, we demonstrate a person counting and localization application based only on the output of on-chipcomputed 256 image statistical descriptors per frame.

Figure 8 shows sample images taken by the circuit using a "camera" script with on-chip 1 point correction. Figure 9 shows screenshot of an android application that displays the output of a "presence detection and localization" script, the processed IR image being broadcasted or not, in which case, the chip output is 256 words of statistical data that allow the external fast computation of the annotated presence bounding boxes while keeping good privacy.

## Conclusion

Compared to state of the art micro-bolometer IR image sensor, our circuit features the capability of performing privacy respectful IR image acquisition and processing for only 8.9mW. Not only this is achieved thanks to its low pixel count, but its energy per pixel, including processing, is either similar or an order of magnitude lower than best in class IR image sensors without processing.

Moreover its Sigma-Delta end of column readout circuits allows balancing NETD and power consumption or frame rate to address several application cases. Built-in  $\Sigma\Delta$  A-D conversion and built-in OPC improves manufacturing yields, while reduced chip area, pixel-level packaging and TECless operation greatly reduce system costs, making it compatible with consumer-market.



Figure 8: Four different 24 bits infrared images acquired using "simple camera script" (top left) and "1pt correction camera" script (others). The limited field of view is due to the f/2 lens vignetting.



Figure 9 Two screenshots of the Enlink location and counting android application with preprocessed IR images (left) and bounding-boxes only broadcast (right)

# Acknowledgment

This work was sponsored by the French Government (DGCIS) via the SEEL (Catrene) project and the European Research Agency through the ENLIGHT (Eniac) project.

Authors wish to thank ULIS packaging lab for the customization of the packaging, Leti's bolometer design and characterization teams and Semir Mayel for their valuable help.

# References

- P. Robert et al., "Low power consumption infrared thermal sensor array for smart detection and thermal imaging applications", in Proceedings IRS<sup>2</sup> 2013, AMA Conferences, pp. 24-27, May 2013.
- [2] D. Weiler et al., "Uncooled digital IRFPA-family with 17μm pixelpitch based on amorphous silicon with massively parallel Sigma-Delta-ADC readout" in Proc. SPIE 9070, Infrared Technology and Applications XL, 90703B, June 2014.
- [3] F. Guellec et al. "Sigma-delta column-wise A/D conversion for cooled ROIC". Proc. SPIE 6542, Infrared Technology and Applications XXXIII, 65423N (May 14, 2007);

- [4] L. Alacoque, "Measurement circuit for bolometric detector", EPO Patent WO2015090925, June 2015
- [5] B. Dupont et al., "A [10°C; 70°C] 640x480 17μm Pixel Pitch TEC-Less IR Bolometer Imager with Below 50mK and Below 4V Power Supply", ISSCC 2013, vol. 56, pp.394-395, Feb. 2013.
- [6] J. W. Little et al., "Digital pixel CMOS focal plane array with on-chip multiply accumulate units for low-latency image processing", in Proc. SPIE 9070, Infrared Technology and Applications XL, 90703B, June 2014.

# **Author Biography**

Laurent Alacoque received the engineering degree in electronics and information processing from the ESCPE of Lyon, France (1998), and the Ph.D. degree from the INSA institute, Villeurbanne, France (2002). He joined CEA-LETI, Grenoble, France (2003) first as a Postdoctoral Student and then as a full-time member of the Smart-Imaging Lab. Since then, his work has focused on the imaging chain, from pixel-level design, imager-specific analogue–digital conversion, to image signal-processing algorithms.

| Reference                      | [1]                           | [2]                     | [5]                     | [6]                                | This Work                          |
|--------------------------------|-------------------------------|-------------------------|-------------------------|------------------------------------|------------------------------------|
| Year                           | 2013                          | 2014                    | 2013                    | 2014                               |                                    |
| Matrix                         | 80x80                         | 640x480                 | 640x480                 | 256x256                            | 128x128                            |
| Frame rate                     | 50                            | 30                      | 50                      | N.A.                               | flexible, 50 typ.                  |
| Pixel Pitch                    | 34µm                          | 17µm                    | 17µm                    | 30µm                               | 34µm                               |
| Bolometer process              | a-Si                          | a-Si                    | a-Si                    | InGaAs                             | a-Si                               |
| Readout Architecture           | Single Path                   | Direct conversion       | Mirror / Single<br>Path | Direct conversion                  | Mirror / Single Path               |
| ADC Architecture               | -                             | in macropixel SD<br>ADC | -                       | in Pixel SD ADC                    | In Column SD ADC                   |
| Power Supply                   | 3.3V                          | -                       | 4V                      | -                                  | 3.3V                               |
| NETD (at 50 fps, 300K)         | 95mK                          | <80mK                   | 40mK                    | -                                  | 190mK                              |
| Pixel SRAM                     | -                             | -                       | 4b / pixel              | -                                  | 12 × 12b /pixel                    |
| Offset pre-correction<br>(OPC) | -                             | -                       | analog                  | -                                  | analog and digital                 |
| On-chip data processing        | -                             | -                       | -                       | 2x256 MAC<br>units                 | 128 processors                     |
| Output nature                  | pixels values                 | pixels values           | pixels values           | pixels values or<br>image features | pixels values or<br>image features |
| Output type                    | 14b digital or<br>analog      | 16b digital             | analog                  | 28b digital                        | 12b or 24b digital                 |
| Power (Full operation)         | 45mW dig. or<br>15mW ana.     | -                       | 170mW                   | -                                  | 8.9mW                              |
| FoM (Power/pixel)              | 7.0 μW dig. or<br>2.3 μW ana. | -                       | 0.55 µW                 | -                                  | 0.54 μW                            |

## Table 3 Comparison to relevant previous work