Principal Engineer, EMC
February 14th, 2014, 11am-12pm, DBH 6011
Lossless compression of high resolution medical images in a block-based storage system
EMC DataDomain is a purpose built high throughput streaming appliance that is designed for storing and archiving backup workloads. It reduces the storage cost for highly redundant backup data using in-line data de-duplication. Any remaining unique segments are then compressed using general purpose compressors such as LZ77 and Gzip. Typically, it achieves about 10x compression with data de-duplication and an additional 2x with local compression.
In this work, we focus on storing high resolution medical images. Typical image archives are large. Newer modalities, capture improvements and regulations exacerbate the storage requirements. For legal reasons, lossless compression schemes are typically required. However, medical device regulations hinder wide spread deployment of compression. Instead of forcing applications and end users to deal with the complexity of deploying compression, we investigated ways to transparently compress images in the storage system.
Deploying a data-specific compression algorithm in a general-purpose storage system is challenging. First, there are multiple layers of indirection between the application and the underlying storage system. Maintaining image intelligence, boundaries, and semantics is difficult especially when multiple users are writing images to the same storage. Second, compression degrades system performance. We designed our Medical Image Compression Algorithm (MICA) that addresses these challenges. We compared the compression ratio and throughput of MICA against LZ4, Gzip and CharLS using 5 million images totaling 2.2 TB from a variety of formats, modalities and sources. MICA markedly outperformed all other algorithms in both compression ratio and throughput.
Surendar Chandra is a principal engineer in the CTO office of EMC Data Protection and Availability Division. His research interests are directed towards system topics in a variety of application areas. At EMC, he investigated the virtualization capabilities of our storage platform, improved system performance using Intel hardware instructions, developed data type specific compression algorithm as well as improve system performance by unifying security as an end-to-end concern. Earlier, he was a visiting scientist at Fuji Xerox Palo Alto Lab where he built and open sourced components of a real time, multi-modal and multi-party collaboration system. Specifically, he built a high performance screen sharing system, re-purposed smartphone cameras for high definition tele-immersion as well as a middleware for specifying and managing complex collaboration sessions. He was also on the faculty of the University of Notre Dame and the University of Georgia. His work in academia was supported by the US Defense Intelligence Agency, HP, US National Science Foundation, VMware and Yamacraw. He received his Ph.D. in Computer Science from Duke University under the supervision of Carla Schlatter Ellis. He is the recipient of an US NSF CAREER award and is a senior member of the ACM.