Qdda example

From Outrun Wiki
Jump to: navigation, search


Intro

This example shows how to use and understand the QDDA tool.

Example

We simulate a block device by creating a 100M test file, 4x 1M files with random data, and 1 file with zeroed data:

Preparation

# preparation
dd if=/dev/zero    of=testvol bs=1M count=100 status=none
dd if=/dev/zero    of=zero1   bs=1M count=1 status=none
dd if=/dev/urandom of=random1 bs=1M count=1 status=none
dd if=/dev/urandom of=random2 bs=1M count=1 status=none
dd if=/dev/urandom of=random3 bs=1M count=1 status=none
dd if=/dev/urandom of=random4 bs=1M count=1 status=none

Test 1

Test 1 - blank device
dd if=/dev/zero of=testvol bs=1M count=100 status=none
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =      100.00 MiB (     12800 blocks)
used                =        0.00 MiB (         0 blocks)
unique              =        0.00 MiB (         0 blocks)
deduped 2x          =        0.00 MiB (         0 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        0.00 MiB (         0 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        0.00 MiB (         0 blocks)
stream compressed   =        0.00 MiB (    100.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        0.00 MiB (         0 buckets)
total compressed    =        0.00 MiB (         0 blocks)
                      *** Summary ***
percentage used     =            0.00 %
percentage free     =          100.00 %
deduplication ratio =            0.00
compression ratio   =            0.00
thin ratio          =            0.00
combined            =            0.00
raw capacity        =      100.00 MiB
net capacity        =        0.00 MiB

We can see that no blocks are actually hashed, the used capacity is zero as all blocks are empty.

Test 2

Test 2 - 1M random
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=0
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       99.00 MiB (     12672 blocks)
used                =        1.00 MiB (       128 blocks)
unique              =        1.00 MiB (       128 blocks)
deduped 2x          =        0.00 MiB (         0 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        0.00 MiB (         0 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        1.00 MiB (       128 blocks)
stream compressed   =        1.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        1.00 MiB (       128 buckets)
total compressed    =        1.00 MiB (       128 blocks)
                      *** Summary ***
percentage used     =            1.00 %
percentage free     =           99.00 %
deduplication ratio =            1.00
compression ratio   =            1.00
thin ratio          =          100.00
combined            =          100.00
raw capacity        =      100.00 MiB
net capacity        =        1.00 MiB

The used capacity is now 1M (the file with random data we had written). As it is random data, no duplicate blocks are found, every non-zero block is unique. Random data also does not compress at all so the compression ratio is 1.

Test 3

Test 3 - 1x 1M random1 + 2x 1M random2
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=0
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=1
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       98.00 MiB (     12544 blocks)
used                =        2.00 MiB (       256 blocks)
unique              =        0.00 MiB (         0 blocks)
deduped 2x          =        1.00 MiB (       128 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        0.00 MiB (         0 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        1.00 MiB (       128 blocks)
stream compressed   =        1.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        1.00 MiB (       128 buckets)
total compressed    =        1.00 MiB (       128 blocks)
                      *** Summary ***
percentage used     =            2.00 %
percentage free     =           98.00 %
deduplication ratio =            2.00
compression ratio   =            1.00
thin ratio          =           50.00
combined            =          100.00
raw capacity        =      100.00 MiB
net capacity        =        1.00 MiB

As we wrote random2 twice, qdda finds 1M of data where the blocks appear twice (deduped 2x). For 3M of this set of random data we only need 2M deduped capacity.

Test 4

Test 4 - 3x 1M random1 + 2x 1M random2 + 1x 1M random3
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=0
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=1
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=2
dd of=testvol conv=notrunc status=none bs=1M if=random2 seek=3
dd of=testvol conv=notrunc status=none bs=1M if=random2 seek=4
dd of=testvol conv=notrunc status=none bs=1M if=random3 seek=5
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       94.00 MiB (     12032 blocks)
used                =        6.00 MiB (       768 blocks)
unique              =        1.00 MiB (       128 blocks)
deduped 2x          =        1.00 MiB (       128 blocks)
deduped 3x          =        1.00 MiB (       128 blocks)
deduped 4x          =        0.00 MiB (         0 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        3.00 MiB (       384 blocks)
stream compressed   =        3.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        3.00 MiB (       384 buckets)
total compressed    =        3.00 MiB (       384 blocks)
                      *** Summary ***
percentage used     =            6.00 %
percentage free     =           94.00 %
deduplication ratio =            2.00
compression ratio   =            1.00
thin ratio          =           16.67
combined            =           33.33
raw capacity        =      100.00 MiB
net capacity        =        3.00 MiB

Here qdda finds blocks which appear 3 times and blocks that appear 2 times, as well as blocks that are unique. The total amount of unique hashes indicates we need 3M deduped capacity.

Test 5

Test 5 - 5x 1M random1 + 2x 1M random2 + 1x 1M random3
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=0
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=1
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=2
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=3
dd of=testvol conv=notrunc status=none bs=1M if=random1 seek=4
dd of=testvol conv=notrunc status=none bs=1M if=random2 seek=5
dd of=testvol conv=notrunc status=none bs=1M if=random2 seek=6
dd of=testvol conv=notrunc status=none bs=1M if=random3 seek=7
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       92.00 MiB (     11776 blocks)
used                =        8.00 MiB (      1024 blocks)
unique              =        1.00 MiB (       128 blocks)
deduped 2x          =        1.00 MiB (       128 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        0.00 MiB (         0 blocks)
deduped >4x         =        1.00 MiB (       128 blocks)
deduped total       =        3.00 MiB (       384 blocks)
stream compressed   =        3.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        3.00 MiB (       384 buckets)
total compressed    =        3.00 MiB (       384 blocks)
                      *** Summary ***
percentage used     =            8.00 %
percentage free     =           92.00 %
deduplication ratio =            2.67
compression ratio   =            1.00
thin ratio          =           12.50
combined            =           33.33
raw capacity        =      100.00 MiB
net capacity        =        3.00 MiB

Similar but now qdda found blocks that appear more than 4x (actually 5). The total capacity required is still 3M as we only saved 3 unique datasets (some multiple times)

Test 6

Test 6 - 1x 1M zero - 3x 1M random1 + 2x 1M random2 + 1x 1M random3 (zero reclaim test)
dd of=testvol conv=notrunc status=none bs=1M if=zero1 seek=0
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       93.00 MiB (     11904 blocks)
used                =        7.00 MiB (       896 blocks)
unique              =        1.00 MiB (       128 blocks)
deduped 2x          =        1.00 MiB (       128 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        1.00 MiB (       128 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        3.00 MiB (       384 blocks)
stream compressed   =        3.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        3.00 MiB (       384 buckets)
total compressed    =        3.00 MiB (       384 blocks)
                      *** Summary ***
percentage used     =            7.00 %
percentage free     =           93.00 %
deduplication ratio =            2.33
compression ratio   =            1.00
thin ratio          =           14.29
combined            =           33.33
raw capacity        =      100.00 MiB
net capacity        =        3.00 MiB

Let's say some data gets deleted from our imaginary file system but the array does not know it as deleting files does not clear the block structures on disk. But we can overwrite the cleared files with zero blocks. We assume the first 1M of data we have written is cleared (remember that this was file1 which we copied multiple times (as if the files we deleted are copied elsewhere) so the capacity requirements do not drop.

Test 7

Test 7 - 1x 1M zero1 at offset 6M (zero reclaim test 2)
dd of=testvol conv=notrunc status=none bs=1M if=zero1 seek=7
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
Deleting /var/tmp/qdda.db
blocksize           =           8 KiB
total               =      100.00 MiB (     12800 blocks)
free                =       94.00 MiB (     12032 blocks)
used                =        6.00 MiB (       768 blocks)
unique              =        0.00 MiB (         0 blocks)
deduped 2x          =        1.00 MiB (       128 blocks)
deduped 3x          =        0.00 MiB (         0 blocks)
deduped 4x          =        1.00 MiB (       128 blocks)
deduped >4x         =        0.00 MiB (         0 blocks)
deduped total       =        2.00 MiB (       256 blocks)
stream compressed   =        2.00 MiB (      0.00 %)
compress buckets 2k =        0.00 MiB (         0 buckets)
compress buckets 4k =        0.00 MiB (         0 buckets)
compress buckets 8k =        2.00 MiB (       256 buckets)
total compressed    =        2.00 MiB (       256 blocks)
                      *** Summary ***
percentage used     =            6.00 %
percentage free     =           94.00 %
deduplication ratio =            3.00
compression ratio   =            1.00
thin ratio          =           16.67
combined            =           50.00
raw capacity        =      100.00 MiB
net capacity        =        2.00 MiB

By overwriting the last 1M of data which was unique, we reclaim capacity.