Quick and Dirty Dedupe Analyzer

From Outrun Wiki
(Redirected from Qdda)
Jump to: navigation, search


Intro

QDDA - The Quick & Dirty Dedupe Analyzer

  • Author
Bart Sjerps (DellEMC)
  • License
GPLv3+

QDDA can analyze files, block devices and data streams to find duplicate blocks so one can estimate potential storage savings by implementing deduplication-capable storage (such as Dell-EMC XtremIO). It will also estimate the compression ratio.

QDDA is written in C/C++ for performance and requires Linux to run on. It uses MHASH for hash calculations, ZLib for compression and SQLite3 as database.

Tested on EL6 (CentOS, but RHEL/OEL should also work fine) and Linux Mint 17.3.

Disclaimer

QDDA is licensed under GPLv3+ which means it's free to use, but use at your own risk. I accept no claims on any damage resulting from QDDA showing incorrect results.

(phew, that was the legal stuff. See the section below as well on protection against bugs).

Protection against bugs

QDDA is safe to run even on files/devices that are in use. It opens streams read-only and cannot modify any files. It writes to a database file that needs to be either newly created or a pre-existing SQLite3 database. It can remove the database file but ONLY if it is an SQLite3 file (it tests the file magic).

For added safety you may run "qdda" as a non-privileged user and only open named pipes or standard input (stdin).

Usage

qdda 1.4.2 - the quick & dirty dedupe analyzer

Usage: qdda [-D] [-B blksize] [-b <bandw>] [-c] [-d] [-f <dbpath>] [-k] [-p gb] [-r] [-v] [file list]
  -D (debug)       : show lots of annoying debug info
  -B <blksize_kb>  : set blocksize to blksize_kb kilobytes
  -b <bandw>       : throttle bandwidth in MB/s (default 200, set to 0 to disable throttling)
  -c (no-Compress) : skip compression estimate
  -d (dump)        : dump block offsets and hashes
  -k (keep)        : keep existing database when feeding more data
  -f <db filepath> : specify alternative database location (default /var/tmp/qdda.db)
  -p (perftest)    : run raw performance test with 1GB random data
  -q (quiet)       : don't show progress indicator or intermediate results
  -r (no-Report)   : don't show report
  -v (version)     : print version and copyright info

qdda is safe to run even on files/devices that are in use. It opens streams read-only and cannot modify any files.
It writes to a database file that needs to be either newly created or a pre-existing SQLite3 database.
It can remove the database file but ONLY if it is an SQLite3 file (it tests the file magic).
For added safety you may run qdda as a non-privileged user and only open named/unnamed pipes or standard input (stdin), i.e.
unix pipe:  sudo cat /dev/disk | qdda <options>
named pipe: mkfifo p; sudo cat /dev/disk > p & qdda <options> p

total               = Total data blocks scanned
free                = Free (zeroed) space
used                = Used (non-zero) space
unique              = Blocks that only appear once (non-dedupable)
deduped 2x          = Blocks that appear exactly 2 times
deduped 3x          = Blocks that appear exactly 3 times
deduped 4x          = Blocks that appear exactly 4 times
deduped >4x         = Blocks that appear 5 times or more
deduped total       = Required capacity after deduplication
stream compressed   = Sum of deduped block bytes compressed with LZ4
compress buckets 2k = Compressed blocks that fit in 2k slots (4 per 8K block)
compress buckets 4k = Compressed blocks that fit in 4k slots (2 per 8K block)
compress buckets 8k = Remaining blocks (not compressed, require full 8K block)
total compressed    = Blocks required to fit all compress buckets (overall required capacity)

Summary:
percentage used     = Percentage used/total (logical capacity, before optimization)
percentage free     = Percentage free/total (logical capacity, before optimization)
deduplication ratio = capacity used / deduped
compression ratio   = capacity deduped / required (bucket compression)
thin ratio          = capacity total / used
combined            = all ratios combined (total possible optimization efficiency)
raw capacity        = Total scanned capacity (same as total)
net capacity        = Total required capacity (same as required)

More info: http://outrun.nl/wiki/qdda

Usage example

Output from a run against a test VM running Oracle on ASM.

[bart@outrun01 ~]$ qdda -f /var/tmp/q.db 
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
blocksize           =           8 KiB
total               =    27648.00 MiB (   3538944 blocks)
free                =    15875.69 MiB (   2032088 blocks)
used                =    11772.31 MiB (   1506856 blocks)
unique              =     6559.70 MiB (    839642 blocks)
deduped 2x          =     2205.86 MiB (    282350 blocks)
deduped 3x          =       22.11 MiB (      2830 blocks)
deduped 4x          =       36.87 MiB (      4719 blocks)
deduped >4x         =       96.55 MiB (     12359 blocks)
deduped total       =     8921.09 MiB (   1141900 blocks)
stream compressed   =     3559.90 MiB (     60.10 %)
compress buckets 2k =      799.27 MiB (    409227 buckets)
compress buckets 4k =      799.02 MiB (    204550 buckets)
compress buckets 8k =     4125.96 MiB (    528123 buckets)
total compressed    =     5724.25 MiB (    732704 blocks)
                      *** Summary ***
percentage used     =           42.58 %
percentage free     =           57.42 %
deduplication ratio =            1.32
compression ratio   =            1.56
thin ratio          =            2.35
combined            =            4.83
raw capacity        =    27648.00 MiB
net capacity        =     5724.25 MiB

Explanation: We scanned 6 Oracle ASM devices with various types of data, total 27GB of which less than 12GB is actually used. Oracle rarely writes duplicate blocks (with some exceptions) so the dedupe ratio is not very high. But we also used RMAN to backup (as copy) some files to one of the diskgroups and those blocks get deduped against the primary database.

So we find a lot of blocks in the "deduped 2x" category and a few in the higher dedupe ratios (probably non-oracle data or due to other side effects).

The stream compression is about 60% which means if we would compress all the files (using LZ4) we would get 60% compression ratio (not including empty blocks so be aware). Dell EMC XtremIO avoids having to fragment data by sorting compressed blocks into 2K chunks (if the blocks compresses into 2048 bytes or less i.e. 75% or more) or 4K chunks (if compressed into 4096 bytes or less i.e. 50% or more). Blocks that don't compress at least 50% are not compressed at all.

This compression comes at the expense of a bit additional capacity but avoids heavy fragmentation and performance issues. So the "bucket" compression ratio is a bit lower than the "stream" compression ratio. As one may expect, Oracle blocks usually contain data and we get a "bucket" (total) compression ratio of 1.56. The required capacity on an array that does thin provisioning, bucket compression and deduplication with 8K blocks (XtremIO) would be roughly 5700 MB (and we have 27GB) so the overall efficiency here is 1:4.83.

Install

EL6 (CentOS/RHEL/OEL):

# Install the Outrun Extras repository
yum install http://yum.outrun.nl/outrun-extras.rpm
# Install qdda from repo
yum install qdda

Non-EL (Ubuntu, mint, etc)

# install rpm2cpio
sudo apt-get install rpm2cpio
# Download latest RPM
wget http://yum.outrun.nl/outrun/0.9/base/extras/qdda-<version>.x86_64.rpm
# List files in package
rpm2cpio qdda-1.0-0.x86_64.rpm | cpio -ivt
-rwxr-xr-x   1 root     root       209287 Apr  4 17:25 ./usr/bin/qdda
-rw-r--r--   1 root     root            0 Apr  4 17:25 ./var/tmp/qdda.db
# Extract qdda binary
rpm2cpio qdda-1.0-0.x86_64.rpm | cpio -idmv
./usr/bin/qdda
./var/tmp/qdda.db
410 blocks
# Copy to bin
cp usr/bin/qdda $HOME/bin

You can also copy /usr/bin/qdda from an RPM-compatible system.

ready to use!

Concept

QDDA creates a key-value store using an SQLite3 database. The key-value pairs are filled with hash values (the key) and the number of occurrences (the value).

The hash algorithm used is CRC32 (4 bytes) and calculated for every 8K block. The first time a hash value is encountered, a key-value pair is added to the database where k,v = hash,1. Every time a scanned block generates the same CRC32 hash again, the value is increased. Note that there is a very small chance of hash collisions (1 in 4294967296) but for statistical analysis this is not a problem.

When scanning is complete, the number of unique keys represents the actual storage capacity that would be required after de-duplication. The sum of all values is the amount of raw blocks scanned. So by dividing the amount of unique keys by the total scanned blocks you get the deduplication ratio.

Every (unique) block is also compressed using the LZ4 compression algorithm . We keep track of the total bytes before and after compression for each block - so we can estimate the compression ratio. Some all-flash arrays like Dell EMC XtremIO use a performance optimized method for storing compressed data in "buckets" to increase performance and avoid fragmentation overhead. This "bucket compression" can also be calculated.

A zero block (8K with only zero values) will generate the special hash value 0 (the real CRC32 would be 3639908756 but is ignored) so we can also figure out how much space is allocated vs. unused.

Summary

Using a simple key-value table "kv" where k is the crc32 hash and v equals the number of times the hash is found.

For simplicity, say we scan 10 blocks where the hashes would be:

              0  5  6  7  8
block 0 = 5      *
block 1 = 0   *
block 2 = 6         *
block 3 = 7            *
block 4 = 0   *
block 5 = 6         *
block 6 = 8               *
block 7 = 6         *
block 8 = 6         *
block 9 = 5      *

Sums:         2  2  4  1  1
                       ^--^ - non-dedupeable
                    ^------ - deduped 4:1
                 ^--------- - deduped 2:1
              ^------------ - zero (not used)

here we have 

total   = 10
free    = 2
used    = 8
unique  = 2 (i.e. 7,8 appear only once)
duped   = 2 (5 and 6 appear more than once)
deduped = 4 (i.e. 5 non-zero hashes: 5,6,7,8)

FYI, The k,v table would look like this (5 rows):

k=0, v=2
k=5, v=2
k=6, v=4
k=7, v=1
k=8, v=1

How to query the kv table:

Total blocks      = sum(v)                      - total logical blocks scanned  (2+2+4+1+1 = 10)
Free blocks       = v        where k=0          - total logical zero blocks     (v[0]=2)
Used blocks       = sum(v)   where k!=0         - total logical non-zero blocks (2+4+1+1 = 8)
Deduped blocks    = count(k) where k!=0         - total amount of hashes - required capacity after dedupe - excluding free blocks (4 keys with k!=0)
Unique blocks     = count(v) where k!=0 and v=1 - total blocks which occur only once (no deduplication) (2, k[7] and k[8] )
Duped blocks      = count(k) where k!=0 and v>1 - total blocks which occur more than once (deduplication occurs) (2, k[5] and k[6])

More features

  • Scanning from named pipes, stdin
You can scan from other hosts via (named or unnamed) pipes. Examples:
dd if=/dev/urandom bs=1M count=1024          | qdda # via pipe on stdin
ssh root@bighost "dd if=/dev/oracleasm/data" | qdda # via pipe from remote ssh-> dd

# Using named pipe
mkfifo pipe
# start background dump to pipe
dd if=/dev/urandom of=pipe bs=1M count=1024 &
# let qdda read from pipe
qdda pipe
Note that ssh encrypts all data so performance may be limited. Use Netcat or other means to speed up.
  • Scanning multiple files/devices/pipes
You can scan a combi of multiple files or pipes at once. Just put multiple file names as parameters. stdin (unnamed pipe) is of course limited to one.
You may scan multiple files via stdin using a compound command:
{ cat file1 file2 file3 pipe ; ssh bighost "dd...." } | qdda # note that qdda thinks you're scanning only one file (/dev/stdin).
  • Running as non-privileged
qdda requires read access to the files/devices/pipes it is reading. You can avoid it using the named pipe method as described:
# as root (make sure the pipes have read access for "others" - chmod o+r <pipe>
# Note that dd can be extremely dangerous if you use wrong parameters!
dd if=/dev/sda bs=1M status=none of=/tmp/pipe-sda &
dd if=/dev/sdb bs=1M status=none of=/tmp/pipe-sdb &

# alternative (may not perform as well)
cat /dev/sda > /tmp/pipe-sda &
cat /dev/sdb > /tmp/pipe-sdb &

# On another terminal, non-root:
qdda /tmp/pipe*
Another method is (temporarily) changing the scanned devices to read-only for others (chmod o+r /dev/sda) but note that this can be a security issue (at least until next reboot or udev update)!
  • Change blocksize
You can change the blocksize between 1 and 65536 in multiples of 1K (1024 bytes). Note that you need to reset the database when starting with a new blocksize. Also be aware that bucket compression into 2K and 4K slots is still based on the default 8K blocksize even when using other blocksizes.
  • Bandwidth throttling
qdda prevents overloading IO stacks by silently limiting bandwidth to 200MB/s. You can go fullspeed with "-b 0" or throttle to other values, i.e. "-b 50" for 50MB/s.
  • No compress
You can skip compression of blocks with "-c" (noCompress) option. This may speed up CPU limited analyze runs or force reports without compression optimizations.
  • Change DB filename
Default database is /var/tmp/qdda.db. Change with "-f mydb" if you want to store the database elsewhere (or don't want to lose the contents of the other one).
  • In-memory
If you don't care about keeping the results and have enough RAM, you can specify the SQLite name for keeping the database in memory with "-f :memory:". Qdda will not create a database file.
  • Delete DB
The "-k" (Keep) option prevents deleting the database. This allows multiple qdda runs where the results are added to the existing database.
  • Quiet
Use "-q" (Quiet) to avoid showing the progress indicator or intermediate results.
  • No-report
Use "-r" (no-Report) to avoid showing the results when finished.
  • Performance test
Use "-p" to run a performance test. Qdda will show individual speeds for hashing, compression, db operations and reading from a stream and an estimate for the max performance on your system.
  • Debug
use "-D" (Debug) to show lots of annoying debug info.
  • Blocksize
The blocksize is hardcoded at 8192 bytes. You can compile the source code with a different value if needed.
  • Advanced: Run your own queries against the data
sqlite3 /var/tmp/qdda.db
and run your own queries. Example:
# count blocks that have exactly 3 duplicates
select count(k) from kv where v=3;
# Get list of blocks that appear more than twice and sort them by amount
select v,count(k) from kv where k!=0 and v>2 group by v order by v;

Notes

  • You can test other than 8K blocksizes with the -B option. The max (currently) is 64K and must be specified in kibibytes (1024 bytes)
  • Not tested (yet) with very large data sets. SQLite3 can handle very large amounts of values but I have not tuned SQLite much for performance. I expect it to run fine however up to several 100s of gigabytes.
  • CRC32 has a slightly higher chance of HASH collisions than, say, SHA256 because it uses fewer (32) bits (roughly 1 in 4294967296 blocks would be an illegal duplicate). Because this is just an analysis tool and not used to store real precious data, we can get away with using less bits for the hash and no hash collision check.
  • The compression ratio can differ from what the storage system uses. It's just a rough estimate. Mileage may vary. The algorithm is based on Dell EMC XtremIO and may work very different for other platforms.
  • SQLite is used because it is available on all EL6 systems (and probably most other Linux distro's) and because it's simple and fast. Other (real) key-value stores might be better but this does the job. We can also store other info than key-value data in the database which makes it a good choice.
  • The database roughly needs 4MB per 1GB unique (non-dedupable) data (I found 32MiB DB size for scanning 8GB non-dedupable data). It may require more when processing huge datasets (due to require deeper indices).
  • You can set the database filename to "/bigfilesystem/bla.db" if you run out of space on /var/tmp. Also note that many systems clean up /var/tmp after a day or so, place your database elsewhere if you want to keep results.

Performance

qdda is single-threaded (for now). On my Intel i5 @ 3.10GHz I get the following profile test results:

$ qdda -p
Deleting /var/tmp/qdda.db
Test set:             131072 random 8k blocks (1024 MiB)
Hashing:              926446 microseconds,    1158.99 MB/s
Compressing:          328950 microseconds,    3264.15 MB/s
DB insert:            269533 microseconds,    3983.71 MB/s
Reading:              114576 microseconds,    9371.44 MB/s
Total:               1639506 microseconds,     654.92 MB/s


So expect roughly 600MB/s on typical systems. I expect high-end server type CPUs to be even faster. Note that to get more than about 200MB/s you need to disable throttling (-b 0) which was built in as a safeguard against I/O starvation on production systems.