Quick and Dirty Dedupe Analyzer

From Outrun Wiki
(Redirected from Qdda)
Jump to: navigation, search


Intro

QDDA - The Quick & Dirty Dedupe Analyzer

  • Author
Bart Sjerps (DellEMC)
  • License
GPLv3+

QDDA can analyze files, block devices and data streams to find duplicate blocks so one can estimate potential storage savings by implementing deduplication-capable storage (such as Dell-EMC XtremIO). It will also estimate the compression ratio.

QDDA is written in C/C++ for performance and requires Linux to run on. It uses MD5 for hash calculations, LZ4 for compression and SQLite3 as database.

Tested on EL6 (CentOS, but RHEL/OEL and EL7 should also work fine) and Linux Mint 17.3.

Disclaimer

QDDA is licensed under GPLv3+ which means it's free to use, but use at your own risk. I accept no claims on any damage resulting from QDDA showing incorrect results.

(phew, that was the legal stuff. See the section below as well on protection against bugs).

Protection against bugs

QDDA is safe to run even on files/devices that are in use. It opens streams read-only and cannot modify any files. It writes to a database file that needs to be either newly created or a pre-existing SQLite3 database. It can remove the database file but ONLY if it is an SQLite3 file (it tests the file magic).

For added safety you may run "qdda" as a non-privileged user and only open named pipes or standard input (stdin).

Usage

qdda 1.8.2 - The Quick & Dirty Dedupe Analyzer
Use for educational purposes only - actual array reduction results may vary

Usage: qdda [-D] [-B blksize] [-b <bandw>] [-c] [-d] [-f <dbpath>] [h|H] [-i importdb] [-k] [-n] [-q] [-r] [-t [gb]] [-v] [-x] [file list]
  -D (debug)        : Show lots of annoying debug info
  -B <blksize_kb>   : Set blocksize to blksize_kb kilobytes
  -P (purge)        : Reclaim unused space in database (sqlite vacuum)
  -T <tempdir>      : Use tempdir instead of /var/tmp for SQLite temp files (or use export SQLITE_TMPDIR)
  -a (append)       : Append data to existing database
  -b <bandw>        : Throttle bandwidth in MB/s (default 200, set to 0 to disable throttling)
  -c (no-Compress)  : Skip compression estimate
  -d (dump)         : Dump block offsets and hashes (to search for block offset with hash)
  -f <database>     : Specify alternative database location (default /var/tmp/qdda.db)
  -i <import-db>    : Import data from another database (merge with existing data)
  -n (noupdate)     : Dryrun - just read data (no processing)
  -q (quiet)        : Don't show progress indicator or intermediate results
  -r (no-Report)    : Don't show report / don't merge
  -t<size_gb>       : Run raw performance test (no opts) or merge test (data size in GB)
  -v (version)      : print version and copyright info
  -x (eXtended)     : Extended report (file info and histograms)
  -h or ? (help)    : This help
  -H Long help      : More detailed information, tips & tricks

qdda is safe to run even on files/devices that are in use. It opens streams read-only and cannot modify any files.
It writes to a database file that needs to be either newly created or a pre-existing SQLite3 database.
It can remove the database file but ONLY if it is an SQLite3 file (it tests the file magic).
For added safety you may run qdda as a non-privileged user and only open named/unnamed pipes or standard input (stdin), i.e.
unix pipe:  sudo cat /dev/disk | qdda <options>
named pipe: mkfifo p; sudo cat /dev/disk > p & qdda <options> p

total               = Total data blocks scanned
free                = Free (zeroed) space
used                = Used (non-zero) space
unique              = Blocks that only appear once (non-dedupable)
dupcount = 2        = Blocks that appear exactly 2 times
dupcount > 2        = Blocks that appear more than 2 times (more detail: run histograms)
deduped             = Required capacity after deduplication
compressed (bytes)  = Sum of compressed block bytes (compressed with LZ4)
compressed (block)  = Sum of bucket sizes (compressed and sorted in buckets)
uncompressed        = Same as deduped (if blocksize is not 8K or 16K)
Summary:
percentage used     = Percentage used/total (logical capacity, before optimization)
percentage free     = Percentage free/total (logical capacity, before optimization)
deduplication ratio = capacity used / deduped
compression ratio   = capacity deduped / required (bucket compression)
thin ratio          = capacity total / used
combined            = all ratios combined (total possible optimization efficiency)
raw capacity        = Total scanned capacity (same as total)
net capacity        = Total required capacity (same as required)

More info: http://outrun.nl/wiki/qdda


Usage example

Output from a run against a test VM running Oracle on ASM.

[bart@outrun01 ~]$ qdda /dev/oracleasm/*
qdda 1.8.2 - The Quick & Dirty Dedupe Analyzer
Use for educational purposes only - actual array reduction results may vary
File 01, 262144 blocks (4096 MiB) processed, 187/9999 MB/s, 240 MB/s avg
File 02, 262144 blocks (4096 MiB) processed, 354/9999 MB/s, 241 MB/s avg
File 03, 65536 blocks (1024 MiB) processed, 349/9999 MB/s, 248 MB/s avg
File 04, 131072 blocks (2048 MiB) processed, 351/9999 MB/s, 348 MB/s avg
Merging 720896 blocks (11264 MiB) with 0 blocks (0 MiB)
Indexing in 1.45 sec (496700 blocks/s, 7760 MiB/s), Joining in 0.84 sec (853774 blocks/s, 13340 MiB/s)
                      *** Details ***
blocksize           =          16 KiB
total               =    11264.00 MiB (    720896 blocks)
free                =     3622.19 MiB (    231820 blocks)
used                =     7641.81 MiB (    489076 blocks)
unique              =      469.91 MiB (     30074 blocks)
dupcount = 2        =     2973.20 MiB (    190285 blocks)
dupcount > 2        =      308.67 MiB (     19755 blocks)
deduped             =     3751.78 MiB (    240114 blocks)
compressed (bytes)  =      365.13 MiB (     90.27 %)
compressed (block)  =      524.97 MiB (     33598 blocks)
                      *** Summary ***
percentage used     =           67.84 %
percentage free     =           32.16 %
deduplication ratio =            2.04
compression ratio   =            7.15
thin ratio          =            1.47
combined            =           21.46
raw capacity        =    11264.00 MiB
net capacity        =      524.97 MiB

Explanation: We scanned 4 Oracle ASM devices with various types of data, total 11GB of which less than 8GB is actually used. Oracle rarely writes duplicate blocks (with some exceptions) so the dedupe ratio is not very high. But we copied the data disk to data-copy to simulate a cloned database.

So we find a lot of blocks in the "deduped 2x" category and a few in the higher dedupe ratios (probably non-oracle data or due to other side effects).

The bytes compression is about 90% which means if we would compress all the files (using LZ4) we would get 90% compression ratio (not including empty blocks so be aware). Dell EMC XtremIO avoids having to fragment data by sorting compressed blocks into "buckets" which causes a little lower compression ratio. Multiple buckets can fit in a block depending on the bucket sizes.

This compression comes at the expense of a bit additional capacity but avoids heavy fragmentation and performance issues. So the "block" compression ratio is a bit lower than the "bytes" compression ratio. As one may expect, Oracle blocks usually contain data but in my test it was mostly empty tablespace - so we get a high "block (total) compression ratio of 7.15. The required capacity on an array that does thin provisioning, bucket compression and deduplication with 16K blocks would be roughly 524 MB (and we had 11GB) so the overall efficiency here is 1:21 (real world environments would not get that much reduction).

Install

EL6 (CentOS/RHEL/OEL)
# Install the Outrun Extras repository
yum install http://yum.outrun.nl/outrun-extras.rpm
# Install qdda from repo
yum install qdda
Non-EL (Ubuntu, mint, etc)

Copy /usr/bin/qdda from an RPM-compatible system.

ready to use!

Concept

QDDA creates a key-value store using an SQLite3 database. The key-value pairs are filled with hash values (the key) and the number of occurrences (the value).

The hash algorithm used is MD5 (stripped to 6 bytes) and calculated for every block. The compressed size for each block is also fetched. The results will be merged in a key-value table with 3 columns: key (the MD5 hash), value (block count with same hash), bytes (compressed size). A zeroed block will have the special "hash" value of 0.

When scanning is complete, the number of unique keys represents the actual storage capacity that would be required after de-duplication. The sum of all values is the amount of raw blocks scanned. So by dividing the amount of unique keys by the total scanned blocks you get the deduplication ratio.

Every (unique) block is also compressed using the LZ4 compression algorithm . We keep track of the total bytes before and after compression for each block - so we can estimate the compression ratio. Some all-flash arrays use a performance optimized method for storing compressed data in "buckets" to increase performance and avoid fragmentation overhead. This "bucket compression" can also be calculated.

A zero block (only zero values) will generate the special hash value 0 (the real hash is ignored) so we can also figure out how much space is allocated vs. unused.

Summary

Using a simple key-value table "kv" where k is the hash and v equals the number of times the hash is found.

For simplicity, say we scan 10 blocks where the hashes would be:

              0  5  6  7  8
block 0 = 5      *
block 1 = 0   *
block 2 = 6         *
block 3 = 7            *
block 4 = 0   *
block 5 = 6         *
block 6 = 8               *
block 7 = 6         *
block 8 = 6         *
block 9 = 5      *

Sums:         2  2  4  1  1
                       ^--^ - non-dedupeable
                    ^------ - deduped 4:1
                 ^--------- - deduped 2:1
              ^------------ - zero (not used)

here we have 

total   = 10
free    = 2
used    = 8
unique  = 2 (i.e. 7,8 appear only once)
duped   = 2 (5 and 6 appear more than once)
deduped = 4 (i.e. 5 non-zero hashes: 5,6,7,8)

FYI, The k,v table would look like this (5 rows):

k=0, v=2
k=5, v=2
k=6, v=4
k=7, v=1
k=8, v=1

How to query the kv table:

Total blocks      = sum(v)                      - total logical blocks scanned  (2+2+4+1+1 = 10)
Free blocks       = v        where k=0          - total logical zero blocks     (v[0]=2)
Used blocks       = sum(v)   where k!=0         - total logical non-zero blocks (2+4+1+1 = 8)
Deduped blocks    = count(k) where k!=0         - total amount of hashes - required capacity after dedupe - excluding free blocks (4 keys with k!=0)
Unique blocks     = count(v) where k!=0 and v=1 - total blocks which occur only once (no deduplication) (2, k[7] and k[8] )
Duped blocks      = count(k) where k!=0 and v>1 - total blocks which occur more than once (deduplication occurs) (2, k[5] and k[6])

More features

Scanning from named pipes, stdin
You can scan from other hosts via (named or unnamed) pipes. Examples:
dd if=/dev/urandom bs=1M count=1024          | qdda # via pipe on stdin
ssh root@bighost "dd if=/dev/oracleasm/data" | qdda # via pipe from remote ssh-> dd

# Using named pipe
mkfifo pipe
# start background dump to pipe
dd if=/dev/urandom of=pipe bs=1M count=1024 &
# let qdda read from pipe
qdda pipe
Note that ssh encrypts all data so performance may be limited. Use Netcat or other means to speed up.
Scanning multiple files/devices/pipes
You can scan a combi of multiple files or pipes at once. Just put multiple file names as parameters. stdin (unnamed pipe) is of course limited to one.
You may scan multiple files via stdin using a compound command:
{ cat file1 file2 file3 pipe ; ssh bighost "dd...." } | qdda # note that qdda thinks you're scanning only one file (/dev/stdin).
Running with non-privileged user for added security
qdda requires read access to the files/devices/pipes it is reading. You can avoid it using the named pipe method as described:
# as root (make sure the pipes have read access for "others" - chmod o+r <pipe>
# Note that dd can be extremely dangerous if you use wrong parameters!
dd if=/dev/sda bs=1M status=none of=/tmp/pipe-sda &
dd if=/dev/sdb bs=1M status=none of=/tmp/pipe-sdb &

# alternative (may not perform as well)
cat /dev/sda > /tmp/pipe-sda &
cat /dev/sdb > /tmp/pipe-sdb &

# On another terminal, non-root:
qdda /tmp/pipe*
Another method is (temporarily) changing the scanned devices to read-only for others (chmod o+r /dev/sda) but note that this can be a security issue (at least until next reboot or udev update)!
Change blocksize
You can change the default 16K blocksize between 1 and 128K in multiples of 1K (1024 bytes). Note that you need to reset the database when starting with a new blocksize.
Bandwidth throttling
qdda prevents overloading IO stacks by silently limiting bandwidth to 200MB/s. You can go fullspeed with "-b 0" or throttle to other values, i.e. "-b 50" for 50MB/s.
No compress
You can skip compression of blocks with "-c" (noCompress) option. This may speed up CPU limited analyze runs or force reports without compression optimizations.
Change DB filename
Default database is /var/tmp/qdda.db. Change with "-f mydb" if you want to store the database elsewhere (or don't want to lose the contents of the other one).
In-memory
If you don't care about keeping the results and have enough RAM, you can specify the SQLite name for keeping the database in memory with "-f :memory:". Qdda will not create a database file.
Append instead of replace
The default behaviour is to wipe the database with each scan. The "-a" (append) option prevents deleting the database. This allows multiple qdda runs where the results are added to the existing database.
Quiet
Use "-q" (Quiet) to avoid showing the progress indicator or intermediate results.
No-report
Use "-r" (no-Report) to avoid showing the results when finished. This keeps the database smaller and allows it to copy to another host before running the report.
Performance test
Use "-p" to run a performance test. Qdda will show individual speeds for hashing, compression, db operations and reading from a stream and an estimate for the max performance on your system.
Debug
use "-D" (Debug) to show lots of annoying debug info.
Advanced
Run your own queries against the data
sqlite3 /var/tmp/qdda.db
and run your own queries. Example:
# count blocks that have exactly 3 duplicates
select count(k) from kv where v=3;
# Get list of blocks that appear more than twice and sort them by amount
select v,count(k) from kv where k!=0 and v>2 group by v order by v;

Notes

  • You can test other than 16K blocksizes with the -B option. The max (currently) is 128K and must be specified in kibibytes (1024 bytes)
  • Not tested (yet) with very large data sets - I've tested up to 4TB datasets which took a few hours to process. SQLite3 can handle very large amounts of values but I have not tuned SQLite much for performance. I expect it to run fine however up to at least 10TB with default blocksize.
  • The compression ratio can differ from what the storage system uses. It's just a rough estimate. Mileage may vary. The algorithm is based on LZ4 and compress buckets and may work very different for various platforms.
  • SQLite is used because it is available on all EL6 systems (and probably most other Linux distro's) and because it's simple and fast. Other (real) key-value stores might be better but this does the job. We can also store other info than key-value data in the database which makes it a good choice.
  • The database roughly needs 20 bytes per block for scanning and about 3X that size for processing (non-dedupable) data (I found 16MiB DB size for scanning 8GB non-dedupable data). It may require more when processing huge datasets (due to require deeper indices).
  • You can set the database filename to "/bigfilesystem/bla.db" if you run out of space on /var/tmp. Also note that many systems clean up /var/tmp after a day or so, place your database elsewhere if you want to keep results.

Sizing

The scanning phase roughly requires 20 bytes per scanned block. So with a blocksize of 16K and 1TiB scanned data, the staging table requires:

1099511627776 bytes / 16384 = 67108864 blocks, 20 bytes/block means 1280MiB disk space. This is linear so 10TiB requires 12GB staging space.

The merge phase creates temporary tables as well as the final kv table and then truncates the staging space. Scanning 1TiB random data resulted in a database size of about 5GB. After purging it was reduced to 2.5 GB.

Performance

qdda is single-threaded (for now). On my Intel i5 @ 3.10GHz I get the following profile test results:

$ qdda -p
Deleting /var/tmp/qdda.db
Test set:             131072 random 8k blocks (1024 MiB)
Hashing:              926446 microseconds,    1158.99 MB/s
Compressing:          328950 microseconds,    3264.15 MB/s
DB insert:            269533 microseconds,    3983.71 MB/s
Reading:              114576 microseconds,    9371.44 MB/s
Total:               1639506 microseconds,     654.92 MB/s


So expect roughly 600MB/s on typical systems. I expect high-end server type CPUs to be even faster. Note that to get more than about 200MB/s you need to disable throttling (-b 0) which was built in as a safeguard against I/O starvation on production systems.