Sign in to follow this  
Followers 0
DiracDeBroglie

HDD performance <-> Allocation unit size

46 posts in this topic

The system cache (since W95 at least) is a brutal feature: every read or write to disk is kept in main Ram as long as no room is requested in this main memory for other puposes. If data has been read or written once, it stays available in Ram. Win takes hundreds of megabytes for this purpose, with no absolute size limit.

Is this some kind of a "RAM disk", but where its content is a perfect mirror of its respective file content on the HDD? (so that a powerfailure doesn't mess up file consistency on the HDD?)

Does that mean that the UDMA controller has access to whatever motherboard RAM size (hundreds of MBs, or even GBs) and where-ever located in the motherboard RAM?

I know that on old OSs there was a thing like RAM disk, but its content was not always consistent with the HDD, floppy, JAZZ, Zip drive content. As a result powerfailure was usually very damaging to your files on the HDD. I assume that kind of powerfailure damage cannot happen when using DMA for data transfer between motherboard RAM and the HDD? Or can it?

johan

Edited by DiracDeBroglie
0

Share this post


Link to post
Share on other sites

[Numbering and slicing by Pointertovoid, apologies for my bad manners]

(1) # of Oustanding I/O|=1, then =4 and again =32; I suppose that must be the Queue Depth equivalent to (or equal to) the NCQ in the HDD

(2) I'm not all to sure how to interpret the options |Maximum Disk Size (in sectors)| and |Starting Disk Sector|;

(3) I've put the value |4K; 100%Read; 0%Random|.

(4) I couldn't figure out how to introduce the image file inline in the text; they are now attached at the back.

(1) I too understand it as the number of simultaneous requests. 32 is huge, fits servers only. 2 or 4 is already much for a W2k or Xp workstation.

(2) IOMeter measures within a file of the size you define, beginning where you wish if possible (I hope it checks it before). The file size should represent the size of the useful part of the OS or application you want to access; 100MB is fine for W2k but too small for younger OS. After the first run begins, I stop it and defragment the volume so this file is contiguous.

(3) You lost. 0% random means contiguous reads, telling why you get as unrealistic figures as with Atto. I measure with 100% random, which is pessimistic. You define it by editing (or edit-copy) a test from the list; a cursor adjusts that.

(4) Something like "place inline" after the file is uploaded and the cursor is where you want.

0

Share this post


Link to post
Share on other sites

[system cache]

Is this some kind of a "RAM disk", but where its content is a perfect mirror of its respective file content on the HDD? (so that a powerfailure doesn't mess up file consistency on the HDD?)

Does that mean that the UDMA controller has access to whatever motherboard RAM size (hundreds of MBs, or even GBs) and where-ever located in the motherboard RAM?

I believe to understand that read or write requests to still unaccessed sectors are sent to the disk immediately (unless you check the "delayed write" in the driver properties) but any data read or written stays in the mobo Ram as long as possible for future reuse - that is, as long as Ram space is available. This means that the disk is still updated immediately (if no delayed write).

And, yes, I believe the Udma controller accesses any page of the mobo Ram. Which isn't very difficult if you see a block diagram of the memory controller.

0

Share this post


Link to post
Share on other sites

Jaclaz: and why I asked the question in the form:

Which role (if any) would NCQ (Native Command Queuing) have in the benchmark(s)?

With IOMeter, Ncq effect is very observable; very few other benchmarks show an effect; and the amount of improvement shown by IOMeter is consistent with real-life experience.

Just adding some duplicate plethoric redundant repetition again, to confirm we agree...

0

Share this post


Link to post
Share on other sites

Just adding some duplicate plethoric redundant repetition again, to confirm we agree...

Sure we agree on this :thumbup .

The question was aimed at DiracDeBroglie, meant as a hint :rolleyes: towards the fact that Atto is not the "ideal" benchmark for a SATA III drive with NCQ ;).

jaclaz

0

Share this post


Link to post
Share on other sites

What a disappointment!! With 10%random the Total MBs per Second drop from 90 (with 0% random) to about 3.5. With 100%random the performance went down to 1.5MB/Sec!!! Its time for an SSD. #I/O was all the time 32.

(1) I too understand it as the number of simultaneous requests. 32 is huge, fits servers only. 2 or 4 is already much for a W2k or Xp workstation.

(2) IOMeter measures within a file of the size you define, beginning where you wish if possible (I hope it checks it before). The file size should represent the size of the useful part of the OS or application you want to access; 100MB is fine for W2k but too small for younger OS. After the first run begins, I stop it and defragment the volume so this file is contiguous.

Oh, I forgot to ask in my previous post: How can one create a test file of a particular length in IOmeter?

Thanks

j

Edited by DiracDeBroglie
0

Share this post


Link to post
Share on other sites

Specifications of tested HDD: Seagate Barracuda 7200.12 (ST31000524AS); 1TB, 32MB cache; Queue Depth =32; NCQ =supported; average track size =0.8MB.

Some more IOmeter test results with parameters: 4K, 100%Read, a range of %s for randomness, and for 4 different numbers of simultaneous I/0s values (32; 4; 2; 1). The following figures show the throughput (Total MBs per Second) as a function of the % randomness. Test file = 500MB (Can be set by specifying the number of sectors in the option Disk Targets -> Maximum Disk Size; I my case I specified 1,000,000 sectors, as 1 sector is 512 bytes; Starting Disk Sector was kept at 0)

# simultaneous I/Os =32

------------------------

0% randomness -> 95 MB/s

1% randomness -> 96 MB/s (a bit higher than in the 0% case)

2% randomness -> 6.8 MB/s

3% randomness -> 6.3 MB/s

4% randomness -> 5.3 MB/s

5% randomness -> 5.0 MB/s

10% randomness -> 4.1 MB/s

20% randomness -> 3.2 MB/s

50% randomness -> 2.1 MB/s

100% randomness -> 1.4 MB/s

# simultaneous I/Os =4

------------------------

0% randomness -> 76 MB/s

1% randomness -> 84 MB/s (8 MB/s higher than in the 0% case)

2% randomness -> 17 MB/s

3% randomness -> 14 MB/s

4% randomness -> 10 MB/s

5% randomness -> 9.2 MB/s

10% randomness -> 4.8 MB/s

20% randomness -> 2.6 MB/s

50% randomness -> 1.4 MB/s

100% randomness -> 1.1 MB/s

# simultaneous I/Os =2

------------------------

0% randomness -> 67 MB/s

1% randomness -> 67 MB/s

2% randomness -> 14 MB/s

3% randomness -> 12 MB/s

4% randomness -> 9.8 MB/s

5% randomness -> 8.2 MB/s

10% randomness -> 4.7 MB/s

20% randomness -> 2.6 MB/s

50% randomness -> 1.2 MB/s

100% randomness -> 0.8 MB/s

# simultaneous I/Os =1

------------------------

0% randomness -> 43 MB/s

1% randomness -> 40 MB/s

2% randomness -> 9.8 MB/s

3% randomness -> 8.2 MB/s

4% randomness -> 7.2 MB/s

5% randomness -> 6.4 MB/s

10% randomness -> 4.0 MB/s

20% randomness -> 2.4 MB/s

50% randomness -> 1.1 MB/s

100% randomness -> 0.6 MB/s

All tests clearly show a maximum throughput at 0% or 1% randomness after which the performance rolls off quickly for higher randomness, usually from 2% randomness onwards.

One would expect the highest throughput for the highest #I/O value, but ... no way, depending on the randomness. #I/O=32 has the best performance for a randomness of 0%, 1%, 20%, 50% and 100%. For the randomness range of 2% -> 10% the highest throughput comes with #I/O=4, and probably the average randomness of 2% -> 10% comes the closest to an average "real live" system configuration. Take now the folder C:\Windows which is the dominant folder in how fast Win7 boots; the Windows folder contains 21GB and 92,000 files - hence average file size 0.22MB, which is a lot smaller than the average track size of 0.8MB on my system drive (so NCQ "could" be doing a good job and shorten the boot time).

Intra-file randomness (on the platters) can be reduced by defragging the folder/boot volume, but file-to-file randomness cannot be detected by any defragger (at least I do not know any defragger like that), for the simple reason that defraggers don't know what the file order is in the booting process. Hence that, although the boot volume is very well defragmented, one could still be facing a large (file-to-file) randomness taking down the "real live" throughput significantly. Bottom line is that I haven't got a clue how good or bad the file-to-file randomness could be on my boot volume. Furthermore, I wouldn't be surprised that after running a defragger, the intra-file randomness has gone down, but the file-to-file randomness could've been increased, thereby (partially) offsetting the gain (shorter boot time) of the defragmenation.

Fortunately, in Win7 (on my notebook) communication between the motherboard and the system drive happens through (U)DMA, although I have to say I have no idea how large the data-chunks are in my system at boot time. It could be that the UDMA (in Win7) plays down (significantly) the importance of randomness in the throughput; In (older) OSs that don't have (or have limited) UDMA, the throughput is likely more sensitive to the file-to-file randomness (I think its fair to say this was already implicitly confirmed by pointertovoid).

Conclusion:

1) jaclaz was/is right; Benchmark results need very careful interpretation when extrapolating them to "real live" situations, as some parameters (like file order in the booting process) are not know by the benchmarkers.

2) Defragging the volume may help improve the booting speed (we all know that already); however, the gain in boot time might be small in Win7 due to UDMA.

johan

Edited by DiracDeBroglie
0

Share this post


Link to post
Share on other sites

Conclusion:

1) jaclaz was/is right; Benchmark results need very careful interpretation when extrapolating them to "real live" situations, as some parameters (like file order in the booting process) are not know by the benchmarkers.

2) Defragging the volume may help improve the booting speed (we all know that already); however, the gain in boot time might be small in Win7 due to UDMA.

OT :ph34r:, but JFYI.

Many, many years ago, the world motocross champion was a belgian guy, named Georges Jobé:

http://en.wikipedia.org/wiki/Georges_Jobé

http://web.archive.org/web/20091211201410/http://jobe-racing.com/en/georges/index.html

To a journalist that asked him how he prepared himself phisically (which kind of exercises, jogging, gym, etc.) for the races he replied something like:

I tend to go on the bike a lot.

Usually 4 hours every morning and a couple more hours in the afternoon.

I find it better for me than the gym.

Well, he was right ;):

JobeMalherbe_130087.jpg

jaclaz

0

Share this post


Link to post
Share on other sites
Conclusion:

1) jaclaz was/is right; Benchmark results need very careful interpretation when extrapolating them to "real live" situations, as some parameters (like file order in the booting process) are not know by the benchmarkers.

2) Defragging the volume may help improve the booting speed (we all know that already); however, the gain in boot time might be small in Win7 due to UDMA.

You should also look around more here on the board. :)http://www.msfn.org/board/topic/140247-trace-windows-7-bootshutdownhibernatestandbyresume-issues/ That is the reason that Andre only recommends the use of MS's own defragging tools. As you found, defragging just to eliminate fragmented files is only part of the solution. File placement is also a factor.

Cheers and Regards

Edited by bphlpt
0

Share this post


Link to post
Share on other sites

Your figures with random access are now consistent with other disk measurements. I dare to assert that random access measured with IOMeter, within a limited size and with some parallelism, is a better hint to real-life performance than other benchmarks are.

Udma is fully useable with Win95 if you provide the driver. Ncq only needs parallel requests which exist already in Nt4.0 and maybe before - but not in W95-98-Me; it also needs adequate drivers wiling to run on a particular OS. Seven (and Vista?) bring only additional mechanisms to help the wear leveling algorithm of Flash media, keeping write fast over time.

To paraphrase bphlpt: Windows since 98 does know which files are loaded in what order because a special task observes this. Windows' defragger uses this information to optimize file placement on the disk and reduce the arm's movements. Xp and its prefetch go further by pre-loading the files that belong together before Win or the application request them; this combines very well with the Ncq as it issues many requests in parallel, and partly explains why Xp starts faster than 2k with less arm noise.

0

Share this post


Link to post
Share on other sites

You should also look around more here on the board. :)http://www.msfn.org/board/topic/140247-trace-windows-7-bootshutdownhibernatestandbyresume-issues/ That is the reason that Andre only recommends the use of MS's own defragging tools. As you found, defragging just to eliminate fragmented files is only part of the solution. File placement is also a factor.

A very, very useful link. Many thanks.

j

0

Share this post


Link to post
Share on other sites

To paraphrase bphlpt: Windows since 98 does know which files are loaded in what order because a special task observes this. Windows' defragger uses this information to optimize file placement on the disk and reduce the arm's movements.

Hey, I did not know that. So defragging is best done by Window's defragger then.

Many thanks for this tip to pointertovoid and bphlpt,

j

Edited by DiracDeBroglie
0

Share this post


Link to post
Share on other sites

So defragging is best done by Window's defragger then.

NOT really "necessarily", tools like Mydefrag (there may be others :unsure:):

http://www.mydefrag.com/

use similar techniques and - more than that - allow to experiment with different approaches thorugh a scripting-like language:

http://www.mydefrag.com/Manual-Scripts.html

As always, tests need to be conducted but I wouldn't swear that the MS built-in strategy (whatever it is) is actually the "best possible", there may be room for further optimizations.

jaclaz

0

Share this post


Link to post
Share on other sites

Off topic...

Many, many years ago, the world motocross champion was a belgian guy

Not so "many many" years ago (I truncated the quote on purpose :D ), but It seems an other country has taken over recently indeed (scroll up few lines). 2008 can be seen as an arguable transition phase.

However, if you remove the engine... we're still ahead ! :)

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.