MSFN Forum: Binary Data Compare Batch - [fc] - MSFN Forum

Jump to content



  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

Binary Data Compare Batch - [fc] Rate Topic: -----

#21 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2012 - 11:38 AM

View PostCoffeeFiend, on 28 January 2012 - 10:51 AM, said:

(and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it).

Actually jaclaz only said:

View Postjaclaz, on 28 January 2012 - 05:01 AM, said:


and he will do that AGAIN :ph34r: , this time actually implying that also SHA-1 is getting old:
Hmmm.
http://blog.dustintr.../07/md5-really/
http://csrc.nist.gov...ds_comments.pdf
http://csrc.nist.gov.../statement.html

And the new SHA-3 is to be soon selected.
http://csrc.nist.gov...ha-3/index.html
http://csrc.nist.gov...2012/index.html

jaclaz


#22 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,260
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 28 January 2012 - 12:05 PM

I'm well aware of those points :)

SHA1 is getting old indeed, but it's still "good enough" for most file comparison tasks and what's still getting used the most today, even when security is involved. Other hash algos tend to be slower and mainly overkill for this particular job here. MD5 though... I can't think of a reason I'd start on a new program/design using that in 2012.

#23 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2012 - 01:36 PM

View PostCoffeeFiend, on 28 January 2012 - 12:05 PM, said:

MD5 though...

Ow, comeon, how many accidental (NOT intentional) MD5 collisions did you ever see in your experience?
I would call them UNlikely.

But as seen here:
http://www.msfn.org/...s/page__st__136
given the right scenario ;), ....:

jaclaz

#24 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 28 January 2012 - 05:31 PM

CoffeeFiend said:

Modified Windows files? How about running sfc /scannow? That's built in, and meant to fix precisely those kinds of problems (there's system restore too). Or otherwise, why not compare the SHA1 hash of the file with one of the online lists that already exist, or from a known good file on another machine? As for identifying malware by running fc /b on 2 files... Most people have an antivirus which seems like a far better option for that


I don't need to endure belittling comments like this in my attempts to just try and learn something. All I want is advice, and I'll pursue the necessary steps based on feedback I've been given to improve on this. That's all I really want here...

Please note that I already mentioned my awareness to the system file checker command, and how I was just trying to find a very quick example. The use of this thing is limited to imagination, disregarding the limits of the script itself for larger files which has also already been made well aware to me throughout this thread by several other members. I don't need to have it reiterated to me over and over, that only really just gives me a headache and adds to extra posts in this thread that give me deja vu when I find out I'm reading what's already been told to me though.

I KNOW I'm not the greatest batch programmer, but I need some new information as well. Having it bashed into my head from criticism on how my script here is useless does not help me. It may be useless no matter what I do to this script. But I just want to learn how to improve on it so I can become better at batch.

I'm not looking to make a masterpiece here, or the absolute best script in the world for comparing files. If I was to do that, I probably wouldn't even use batch as it's slow anyway.

Take a look at what you've said to me below here...

CoffeeFiend said:

A simple byte-for-byte comparison would catch that. That's pretty easy to write in any language if it's not already built-in (you said you're using perl which has File::Compare, python has filecmp.cmp, etc). Hashing here only adds CPU load for no reason (and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it). It also increases comparison time not only by being CPU bound, but also by forcing you to hash the whole thing, whereas when you're doing a byte-for-byte comparison you can easily quit at the first byte that's dissimilar (and it very well may be the first byte of a file that's hundreds of MBs). Using hashes is mainly useful in different scenarios, like comparing one file to a known hash i.e. when you don't have the other file on hand, or don't want to send/copy it elsewhere to compare it there (and other tasks like for password authentication obviously). Unless you want to compare a large number of files together and identify duplicates (not necessarily comparing against one specific file), in which case hashing indeed works nicely (it saves times by not having to re-read lots of files, lots of times)


This part is something new to take into account. I already know well about password authentication and hash comparison methods, specifically used on websites for the most part to authenticate a user to a MySQL database, and either SHA1 or MD5 are most commonly used for that.

CoffeeFiend said:

Multi-threading is of no use here anyway. I'm not sure how you were expecting to use it, or what for. But if you try to read two or more files at once and then hashing them it's going to be quite slower, due to drastically increased seeking (except on SSDs). Unless you plan on having one thread reading the entire file (which might be huge) to RAM, and then while it hashes the other thread loads another file to RAM -- or one thread that queues files to hash in byte arrays in RAM while the other thread does the hashing. That would require TONS of RAM if there is large files (e.g. comparing two DVD9 .iso's would require more than 16GB of free RAM), and the speed gain is rather minimal vs using streams (which uses very little memory).


What would be the purpose of comparing DVD's though? Multi-threading can be of use, it just depends on how you want to use it. As you say you could compare 100 different files on a same thread, but memory would be a factor there.

CoffeeFiend said:

I'm well aware of those points :)

SHA1 is getting old indeed, but it's still "good enough" for most file comparison tasks and what's still getting used the most today, even when security is involved. Other hash algos tend to be slower and mainly overkill for this particular job here. MD5 though... I can't think of a reason I'd start on a new program/design using that in 2012.


There's nothing wrong with MD5 in my opinion. It still works. There's tons of hashes out there, but MD5 is common in most things. If you say SHA1 is still "good enough" then I don't see a reason as to why MD5 can't be classified in the same way.


As jaclaz said, and I myself have never came across an MD5 collision either, it's highly unlikely, and still fairly unlikely unless you look to try to match them purposely. But that would be hard to do with something like malware because you'd need to have the malicious function operational, while still trying to match an MD5, and just binding the file alone you're going to have to try to calculate how that can be done if you're binding to a windows file. This would include possible compression of that windows file to make sure that your binded malicious code matches the original filesize as well after everything is done. Not a very easy task for malicious code developers.

It's still a VERY common and used hashing algorithm out there, so there must not be anything wrong with it. C programming language is old, but it's still used out there lots. Being old doesn't mean we're forced to the idea of change because of the fact that it's old that there must be something better out there.

This post has been edited by AceInfinity: 28 January 2012 - 05:54 PM


#25 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,260
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 28 January 2012 - 06:09 PM

View PostAceInfinity, on 28 January 2012 - 05:31 PM, said:

belittling comments like this

:blink: In what way is it? I never criticized your batch skills (or other skills either), I'm just saying I don't see what this would be useful for. You seem rather easily offended.

View PostAceInfinity, on 28 January 2012 - 05:31 PM, said:

I probably wouldn't even use batch as it's slow anyway.

There's plenty more reasons NOT to use batch files :) It's pretty much my dead-last choice for just about anything, except for one particular task: passing a couple arguments to an installer (then again, you can barely call that a "batch" file, it's just one command line). Outside of that task, I can't think of any other language or scripting language that sucks so badly and is so limited. You said that you know perl, I don't see why you don't just use that instead.

View PostAceInfinity, on 28 January 2012 - 05:31 PM, said:

What would be the purpose of comparing DVD's though?

That's the question we've all been asking ;) What would be the purpose of comparing DVD's or any other files though? It's definitely not clear.

View PostAceInfinity, on 28 January 2012 - 05:31 PM, said:

Multi-threading can be of use, it just depends on how you want to use it. As you say you could compare 100 different files on a same thread, but memory would be a factor there.

Comparing 100 different files between themselves is better done with hashing anyway, and multithreading can barely help here really (it's very much IO-bound, unless you have a fancy SSD). Either ways, batch files don't give you that option.

Anyway. Have fun. I don't think I can be of much help with trying to improve on something without seeing the big picture (the end goal/purpose).

#26 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 29 January 2012 - 03:24 PM

Quote

I don't see why you don't just use that instead.


I have used perl, I created a file compare script in the past, which was mentioned in one of my earlier posts in this thread. But I wanted to see what I could do with batch just for personal learning experience :)

Quote

Comparing 100 different files between themselves is better done with hashing anyway, and multithreading can barely help here really (it's very much IO-bound, unless you have a fancy SSD). Either ways, batch files don't give you that option.

Anyway. Have fun. I don't think I can be of much help with trying to improve on something without seeing the big picture (the end goal/purpose).


That's true, i've been told to not use MD5, so maybe an SHA1 hash version of my perl script for the file comparison? My perl script does use a hashing comparison method, but currently the only way I knew this through batch was through a program called fc.exe which I found on Windows through some searching lol.

jaclaz told me about Comp, which I may look into next...

#27 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 29 January 2012 - 04:03 PM

Just bear something in mind if I wanted to verify that myfile.dll was not in some way altered, in order to check it against a known good myfile.dll using your script they would both need to be in the same directory to drag and drop them. As you know they could not both be in the same directory because they have the same name. It would seem crazy to take one or the other of them, rename it and then move its directory location along side the other one in order to then drag and drop it onto a batch file!

#28 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,260
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 29 January 2012 - 08:46 PM

View PostAceInfinity, on 29 January 2012 - 03:24 PM, said:

That's true, i've been told to not use MD5, so maybe an SHA1 hash version of my perl script for the file comparison?

That change should be pretty easy to make. Yes, MD5 collisions aren't all that likely, but why settle on such an old algo that's been decertified 14 years ago by NIST? Yes, that was for "secure purposes", but then again, it's so simple to use something more modern (then again, my own file hasher tool optionally does both -- just in case I want to check a "md5sum" which is quite unlikely).

View PostAceInfinity, on 29 January 2012 - 03:24 PM, said:

the only way I knew this through batch was through a program called fc.exe which I found on Windows through some searching lol.

Batch files don't have built-in methods for comparing files, or hashing, or anything of the sort. We're talking about early 1980's technology here, a VERY primitive kind of "scripting" language, which was replaced by vbscript & jscript back in the 90's. And now even vbscript/jscript are being quickly replaced (and haven't been meaningfully updated in over a decade) by powershell. You're stuck relying on external tools (often 3rd party) for pretty much everything but very simple loops and copying/moving/deleting files, and there's no way to write replacement for them in batch either.

Honestly, even batch files' main replacement (vbscript) isn't so great. I mean, it was pretty nice for its time but it's really showing its lack of being updated. The error handling is laughable, I hope you don't plan on sorting data too often (it's quite a pain), arrays are pretty limited (and forget about fancier data structures), all of your code must be in a single file, etc. Nevermind the VB syntax. But it's still far better than batch files for numerous reasons (namely native access to WMI, Databases, ADSI, COM, FSO, etc, and tons more very basic things batch files can't do, like math, working with dates or strings, etc). I personally moved from batch files to vbscript in the win2k era, and then started moving away from that in the last couple of years.

View PostAceInfinity, on 29 January 2012 - 03:24 PM, said:

jaclaz told me about Comp, which I may look into next...

It's a little bit fancier: It'll tell you if the sizes are different, otherwise it'll produce even more text when two files differ (3 whole lines of text per byte, or over 50 bytes of text for one byte that's different), but in some cases it'll stop at the first occurrence. IMO it doesn't change the overall picture all that much. You're still relying on an similar external tool for a basic task (file comparison) which you wouldn't have to do if you were using basically any other language.

#29 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 31 January 2012 - 01:01 AM

Powershell is something i'm quite familiar with, but I never would have found it a while back without trying to learn about batch. Perhaps It's senseless to keep batch as a part of my background knowledge and keep with Powershell? I've made lots more with powershell for my love of it's functionality over Batch, and i've created a few commandlets of my own that I can run through a quick console to do certain tasks that save me time on my computer.

Don't get me wrong though, batch still has some uses, but with Windows 7 where Powershell is built in by default for me, and even same with Windows 8, i'm pretty sure it will become the new batch in a little while here.

I find Powershell lots more enjoyable though, as it can create GUI's as well if you utilize and reference the system namespaces.

This post has been edited by AceInfinity: 31 January 2012 - 01:02 AM


#30 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,260
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 31 January 2012 - 02:27 AM

View PostAceInfinity, on 31 January 2012 - 01:01 AM, said:

Perhaps It's senseless to keep batch as a part of my background knowledge and keep with Powershell?

If you know powershell then I don't see much of a point to using batch files in general. As far as I'm concerned, they died when Win2k came out -- along with the other MS-DOS legacy stuff. This is even more true today with x64 OS'es not even having the NTVDM (not being able to run 16 bit apps from the same era or even later). The 80's are over. Yes, there is still some support for batch files for legacy purposes but that's about it.

Powershell takes a while to get used to and I wouldn't exactly call it perfect either but it's quite nice compared to the other built-in options (batch and WSH languages). I don't normally make GUIs for scripts myself (much like I never wasted much time creating HTA's from vbscript scripts) as I mostly run them from the command line (or in powershell's case, the ISE), or automated/scheduled.

Oh, and if you wanted to do something like this in powershell it's definitely possible too (as you're most likely already aware). You have full access to the System.Security.Cryptography namespace for hashing, etc. But I would personally still rather use C# for such a tool.

Share this topic:


  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users



All trademarks mentioned on this page are the property of their respective owners
Copyright © 2001 - 2011 msfn.org
Privacy Policy