In this post, we will see various ways to compress and un-compress String:
– Compress and decompress data.
– Write string to zip file, and read contents from the same file.
– Write string to gz file, and read contents from the same file.
All classes mentioned here uses deflate lossless compression algorithm.
Difference in compressed data size:
– Original string size: 445 bytes
– deflater data size: 270 bytes
– file.zip file size: 396 bytes
– file.gz file size: 282 bytes
Note: This might not be an ideal example to show difference in compressed data size as the input string size is too small.
Example 1: Input string used in the below examples
public static StringBuilder inputStr = new StringBuilder();
public ZipTest()
{
inputStr.append( “Lorem ipsum dolor sit amet, consectetur adipiscing elit, ” )
.append( “sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ” ) .append( “Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris ” )
.append( “nisi ut aliquip ex ea commodo consequat. ” )
.append( “Duis aute irure dolor in reprehenderit in voluptate velit ” )
.append( “esse cillum dolore eu fugiat nulla pariatur. ” )
.append( “Excepteur sint occaecat cupidatat non proident, ” )
.append( “sunt in culpa qui officia deserunt mollit anim id est laborum.” );
}
Example 2: Deflate & inflate
Classes DeflaterOutputStream & InflaterInputStream uses deflate lossless compression algorithm.
As opposed to the next two examples, this is not a file format.
For the purpose of this example, we will pass the output byte[] of deflate() as input to inflate() method.
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.InflaterInputStream;
…
private byte[] deflate()
{
byte[] data = null;
try( final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream() )
{
try( final DeflaterOutputStream deflaterOutputStream = new DeflaterOutputStream( byteArrayOutputStream ) )
{
deflaterOutputStream.write( inputStr.toString().getBytes( StandardCharsets.UTF_8 ) );
}
data = byteArrayOutputStream.toByteArray();
System.out.printf( “Deflate – Input bytes: %s, Output bytes: %s %n”, inputStr.toString().getBytes().length, data.length );
}
catch( IOException e )
{
e.printStackTrace();
}
return data;
}
private void inflate( byte[] data )
{
String output = null;
try( final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( data ); final InflaterInputStream inflaterInputStream = new InflaterInputStream( byteArrayInputStream ) )
{
output = IOUtils.toString( inflaterInputStream, StandardCharsets.UTF_8 );
System.out.printf( “Inflate – Input bytes: %s, Output bytes: %s %n”, data.length, output.getBytes().length );
}
catch( IOException e )
{
e.printStackTrace();
}
}
Output
Deflate – Input bytes: 445, Output bytes: 270
Inflate – Input bytes: 270, Output bytes: 445
Example 3: GZIP compress & decompress
GZIPOutputStream & GZIPInputStream classes are used to write compressed data / read uncompressed in GZIP file format respectively.
GZIPOutputStream extends DeflaterOutputStream class used above; and GZIPInputStream extends InflaterInputStream.
Note: In the above example 1, you can replace InflaterInputStream with GZIPInputStream to compress String with GZIP file format first, before creating gz file.
Similarly replace DeflaterOutputStream with GZIPOutputStream to decompress GZIP data after reading it from a file.
Unlike ZIP, GZIP can only compress data, and cannot archive. Hence it is commonly used with tar to archive multiple files and/or folders.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
…
private void gzip()
{
try( final FileOutputStream fileOutputStream = new FileOutputStream( “F:/work/test/file.gz” ) )
{
try( final GZIPOutputStream gzipOutputStream = new GZIPOutputStream( fileOutputStream ) )
{
gzipOutputStream.write( inputStr.toString().getBytes( StandardCharsets.UTF_8 ) );
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
private void gunzip()
{
String output = null;
try( final FileInputStream fileInputStream = new FileInputStream( “F:/work/test/file.gz” ); final GZIPInputStream gzipInputStream = new GZIPInputStream( fileInputStream ) )
{
output = IOUtils.toString( gzipInputStream, StandardCharsets.UTF_8 );
}
catch( IOException e )
{
e.printStackTrace();
}
}
Output
file.gz gets created in the specified path.
Original string size – 445 bytes
file.gz file size – 282 bytes
Example 4: Zip & unzip
ZipOutputStream & ZipInputStream classes are used to write compressed / read uncompressed data in GZIP file format respectively.
ZipOutputStream extends DeflaterOutputStream class used above; and ZipInputStream extends InflaterInputStream.
ZIP can compress data, and archive multiple files and/or folders. Each file entry is compressed and then added to the final archive file.
In this example two files (first.txt, and second.txt) are created, compressed, and then added to file.zip.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;
…
private void zipFile()
{
byte[] output1 = inputStr.toString().getBytes( StandardCharsets.UTF_8 );
byte[] output2 = inputStr.toString().getBytes( StandardCharsets.UTF_8 );
try( final FileOutputStream fileOutputStream = new FileOutputStream( “F:/work/test/file.zip” ) )
{
try( final ZipOutputStream zipOutputStream = new ZipOutputStream( fileOutputStream ) )
{
ZipEntry zipEntry1 = new ZipEntry( “first.txt” );
zipEntry1.setSize( output1.length );
zipOutputStream.putNextEntry( zipEntry1 );
zipOutputStream.write( output1 ); |
zipOutputStream.closeEntry();
ZipEntry zipEntry2 = new ZipEntry( “second.txt” );
zipEntry2.setSize( output2.length );
zipOutputStream.putNextEntry( zipEntry2 );
zipOutputStream.write( output2 );
zipOutputStream.closeEntry();
System.out.printf( “Zip – File name: %s, Uncompressed size: %s, Compressed size: %s %n”, zipEntry1.toString(), zipEntry1.getSize(), zipEntry1.getCompressedSize() );
System.out.printf( “Zip – File name: %s, Uncompressed size: %s, Compressed size: %s %n”, zipEntry2.toString(), zipEntry2.getSize(), zipEntry2.getCompressedSize() );
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
private void unzipFile()
{
ZipEntry zipEntry;
String output;
try( final FileInputStream fileInputStream = new FileInputStream( “F:/work/test/file.zip” ); final ZipInputStream zipInputStream = new ZipInputStream( fileInputStream ) )
{
while( ( zipEntry = zipInputStream.getNextEntry() ) != null )
{
byte outputBytes[] = new byte[4096];
zipInputStream.read( outputBytes, 0, outputBytes.length );
output = new String( outputBytes, StandardCharsets.UTF_8 ); zipInputStream.closeEntry();
System.out.printf( “Unzip – File name: %s, Uncompressed size: %s, Compressed size: %s %n”, zipEntry.toString(), zipEntry.getSize(), zipEntry.getCompressedSize() );
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
Output
Zip – File name: first.txt, Uncompressed size: 445, Compressed size: 264
Zip – File name: second.txt, Uncompressed size: 445, Compressed size: 264
Unzip – File name: first.txt, Uncompressed size: 445, Compressed size: 264
Unzip – File name: second.txt, Uncompressed size: 445, Compressed size: 264
file.zip gets created in the specified path and it contains two compressed files – first.txt and second.txt.
Original string size – 890 bytes
file.zip file size – 772 bytes