Binary Data Compare Batch - [fc]

**jaclaz** · January 27, 2012

This kind of feedback I can respect though, now we're actually getting somewhere lol

Good.

JFYI, a small batch file (by mere coincidence also using FC.EXE) I just posted:

Take into account that it is posted in a very specific thread and aimed mainly at people that do know where their towel is and it is admittedly experimental.

jaclaz

**Yzöwl** · January 27, 2012

But this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up.

Taking the scenarios above, could you explain how I would do this using your script as posted and improved upon thus far.

Which files would I select in, for instance, system32 together with my suspect system32 file in order to start a comparison?

narrowing the search by only comparing files of similar filesize.

Now why would you do that in perl and not in the batch script? Based on the method you are using and the potential time it takes and, as jaclaz above states, the possibly huge output file to check that is exactly the kind of thing I'm expecting you to think about. I know that's a decision you could make yourself manually when selecting the files in your explorer window, but why accept manually when you're creating a script to save time.

**AceInfinity** · January 28, 2012

This kind of feedback I can respect though, now we're actually getting somewhere lol
Good.
JFYI, a small batch file (by mere coincidence also using FC.EXE) I just posted:
Take into account that it is posted in a very specific thread and aimed mainly at people that do know where their towel is and it is admittedly experimental.
jaclaz

Thanks for that reference I will definitely take a look jaclaz

Much appreciated!

But this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up.
Taking the scenarios above, could you explain how I would do this using your script as posted and improved upon thus far.
Which files would I select in, for instance, system32 together with my suspect system32 file in order to start a comparison?
narrowing the search by only comparing files of similar filesize.
Now why would you do that in perl and not in the batch script? Based on the method you are using and the potential time it takes and, as jaclaz above states, the possibly huge output file to check that is exactly the kind of thing I'm expecting you to think about. I know that's a decision you could make yourself manually when selecting the files in your explorer window, but why accept manually when you're creating a script to save time.

I was just trying to find a reference in a way that I could show you how it may be useful, however, maybe a utility like system file checker would be better for system files to check their file integrity, but if you saved a backup of explorer.exe somewhere, knowing that it was the factory default for that file. And you noticed odd behavior in your current explorer.exe (with non-regard towards the file size differences, if any) you could compare with the original explorer.exe that you have backed up someplace, with the one that you currently use on your system. I'm just being optimistic now, but trying to think of ideas here.

Now why would you do that in perl and not in the batch script?

... Hmm' date=' actually, see now we're getting somewhere! Thankyou for that suggestion... That is actually something to think about, and finally a post that I can use for improving this script.

I am personally apologizing for the past posts i've made towards you, that's all I was really looking for in a reply from you. I'm dumbfounded as to why I didn't think of this earlier, but when I made this script I had forgotten that I had made a perl version to do a similar task as well. All my Perl scripts are backed up on my external hard drive, some which are more than a year old.

I appreciate all of the responses i've been given so far, thanks guys!

[b']Edit: Wait... YzOwl... What about file sizes that may be the exact same but have different binary data? In the case of my perl script, I was comparing for SIMILAR file's and that's only possible if the file size is the same, otherwise it would indicate that it's a different file and the MD5 would change. In this case i'm trying to compare for DIFFERENT files, although if I limit myself to files of different file sizes, what about the files of the same size that have different binary data that i'm not scanning because I limit myself in the scan to scan for files of different sizes only?

Also, how would you go about scanning files of mass filesizes? (Bigger files)

In Perl I'm given the benefit of multi-threading, but i'm not sure how I could do this in batch (not multi-threading, just scanning larger files)

~Ace

Edited January 28, 2012 by AceInfinity

**jaclaz** · January 28, 2012

Hmmm.

http://www.mscs.dal.ca/~selinger/md5collision/

jaclaz

**CoffeeFiend** · January 28, 2012

If I remove from it the several lines of copyright related matters, it amounts to a handful of "normal", "common", batch commands (which can be made "better", as seen).
[...]
To invoke a Copyright on something, it must be something more "substantiated"

That. A copyright on something that can be summed up in 2 lines of pseudo-code?

It completely misses a number of checks for "sanity" and "safety" (just imagine that you compare with it two say, DVD .iso's, and you will get a bin_output\results.txt of several Gb's , and probably take a bit of time to run with CPU at 100%
It cannot run from read only media (and does not provide the user for a way to choose an alternate target location), it does not check for existence of compared files, it makes no checks for runnning OS, etc., etc..

This. These checks are fairly important and they're quick to add as well. And you're not kidding when you say several Gb's (or GBs or whatever). fc /b outputs 17 bytes per byte that's different. If you're comparing two full DVD5 images, you'd get a difference file of ~76GB!

I have to echo the general feeling of "gosh, I have absolutely NO idea what I would use this for".

this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up

Modified Windows files? How about running sfc /scannow? That's built in, and meant to fix precisely those kinds of problems (there's system restore too). Or otherwise, why not compare the SHA1 hash of the file with one of the online lists that already exist, or from a known good file on another machine? As for identifying malware by running fc /b on 2 files... Most people have an antivirus which seems like a far better option for that, and there's all the websites where you upload a file and it scans it with multiple AVs too. I for one, cannot identify a threat based on 2 narrow columns of hex numbers flying past at that speed. Also, viruses may hook APIs which may make infected files appear clean (unless you scan offline) i.d. identical (it may even infect your clean file).

Basically, the "comparing files" problem has already been solved a number of times by lots of people. There's several cmd line utils just for this (e.g. diff and diff3), there's many GUI tools for this as well (WinMerge is pretty popular), and there's many more tools for generating/comparing hashes. It looks to me like a solution in search of a problem.

What about file sizes that may be the exact same but have different binary data?

A simple byte-for-byte comparison would catch that. That's pretty easy to write in any language if it's not already built-in (you said you're using perl which has File::Compare, python has filecmp.cmp, etc). Hashing here only adds CPU load for no reason (and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it). It also increases comparison time not only by being CPU bound, but also by forcing you to hash the whole thing, whereas when you're doing a byte-for-byte comparison you can easily quit at the first byte that's dissimilar (and it very well may be the first byte of a file that's hundreds of MBs). Using hashes is mainly useful in different scenarios, like comparing one file to a known hash i.e. when you don't have the other file on hand, or don't want to send/copy it elsewhere to compare it there (and other tasks like for password authentication obviously). Unless you want to compare a large number of files together and identify duplicates (not necessarily comparing against one specific file), in which case hashing indeed works nicely (it saves times by not having to re-read lots of files, lots of times)

In Perl I'm given the benefit of multi-threading, but i'm not sure how I could do this in batch (not multi-threading, just scanning larger files)

Multi-threading is of no use here anyway. I'm not sure how you were expecting to use it, or what for. But if you try to read two or more files at once and then hashing them it's going to be quite slower, due to drastically increased seeking (except on SSDs). Unless you plan on having one thread reading the entire file (which might be huge) to RAM, and then while it hashes the other thread loads another file to RAM -- or one thread that queues files to hash in byte arrays in RAM while the other thread does the hashing. That would require TONS of RAM if there is large files (e.g. comparing two DVD9 .iso's would require more than 16GB of free RAM), and the speed gain is rather minimal vs using streams (which uses very little memory).

Honestly, it's hard to be really helpful when we have zero idea what you're really trying to do here -- comparing files seemingly, but what for? And what data should be shown (identical or not? which bytes are different? etc) and how (wall-of-text? excel sheet? GUI app?)

**jaclaz** · January 28, 2012

(and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it).

Actually jaclaz only said:

Hmmm.
http://www.mscs.dal.ca/~selinger/md5collision/

and he will do that AGAIN , this time actually implying that also SHA-1 is getting old:

Hmmm.

http://blog.dustintrammell.com/2009/01/07/md5-really/

http://csrc.nist.gov/groups/ST/toolkit/documents/shs/hash_standards_comments.pdf

http://csrc.nist.gov/groups/ST/hash/statement.html

And the new SHA-3 is to be soon selected.

http://csrc.nist.gov/groups/ST/hash/sha-3/index.html

http://csrc.nist.gov/groups/ST/hash/sha-3/Round3/March2012/index.html

jaclaz

**CoffeeFiend** · January 28, 2012

I'm well aware of those points

SHA1 is getting old indeed, but it's still "good enough" for most file comparison tasks and what's still getting used the most today, even when security is involved. Other hash algos tend to be slower and mainly overkill for this particular job here. MD5 though... I can't think of a reason I'd start on a new program/design using that in 2012.

**jaclaz** · January 28, 2012

MD5 though...

Ow, comeon, how many accidental (NOT intentional) MD5 collisions did you ever see in your experience?

I would call them UNlikely.

But as seen here:

given the right scenario , ....:

jaclaz

**AceInfinity** · January 28, 2012

Modified Windows files? How about running sfc /scannow? That's built in, and meant to fix precisely those kinds of problems (there's system restore too). Or otherwise, why not compare the SHA1 hash of the file with one of the online lists that already exist, or from a known good file on another machine? As for identifying malware by running fc /b on 2 files... Most people have an antivirus which seems like a far better option for that

I don't need to endure belittling comments like this in my attempts to just try and learn something. All I want is advice, and I'll pursue the necessary steps based on feedback I've been given to improve on this. That's all I really want here...

Please note that I already mentioned my awareness to the system file checker command, and how I was just trying to find a very quick example. The use of this thing is limited to imagination, disregarding the limits of the script itself for larger files which has also already been made well aware to me throughout this thread by several other members. I don't need to have it reiterated to me over and over, that only really just gives me a headache and adds to extra posts in this thread that give me deja vu when I find out I'm reading what's already been told to me though.

I KNOW I'm not the greatest batch programmer, but I need some new information as well. Having it bashed into my head from criticism on how my script here is useless does not help me. It may be useless no matter what I do to this script. But I just want to learn how to improve on it so I can become better at batch.

I'm not looking to make a masterpiece here, or the absolute best script in the world for comparing files. If I was to do that, I probably wouldn't even use batch as it's slow anyway.

Take a look at what you've said to me below here...

A simple byte-for-byte comparison would catch that. That's pretty easy to write in any language if it's not already built-in (you said you're using perl which has File::Compare, python has filecmp.cmp, etc). Hashing here only adds CPU load for no reason (and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it). It also increases comparison time not only by being CPU bound, but also by forcing you to hash the whole thing, whereas when you're doing a byte-for-byte comparison you can easily quit at the first byte that's dissimilar (and it very well may be the first byte of a file that's hundreds of MBs). Using hashes is mainly useful in different scenarios, like comparing one file to a known hash i.e. when you don't have the other file on hand, or don't want to send/copy it elsewhere to compare it there (and other tasks like for password authentication obviously). Unless you want to compare a large number of files together and identify duplicates (not necessarily comparing against one specific file), in which case hashing indeed works nicely (it saves times by not having to re-read lots of files, lots of times)

This part is something new to take into account. I already know well about password authentication and hash comparison methods, specifically used on websites for the most part to authenticate a user to a MySQL database, and either SHA1 or MD5 are most commonly used for that.

Multi-threading is of no use here anyway. I'm not sure how you were expecting to use it, or what for. But if you try to read two or more files at once and then hashing them it's going to be quite slower, due to drastically increased seeking (except on SSDs). Unless you plan on having one thread reading the entire file (which might be huge) to RAM, and then while it hashes the other thread loads another file to RAM -- or one thread that queues files to hash in byte arrays in RAM while the other thread does the hashing. That would require TONS of RAM if there is large files (e.g. comparing two DVD9 .iso's would require more than 16GB of free RAM), and the speed gain is rather minimal vs using streams (which uses very little memory).

What would be the purpose of comparing DVD's though? Multi-threading can be of use, it just depends on how you want to use it. As you say you could compare 100 different files on a same thread, but memory would be a factor there.

I'm well aware of those points
SHA1 is getting old indeed, but it's still "good enough" for most file comparison tasks and what's still getting used the most today, even when security is involved. Other hash algos tend to be slower and mainly overkill for this particular job here. MD5 though... I can't think of a reason I'd start on a new program/design using that in 2012.

There's nothing wrong with MD5 in my opinion. It still works. There's tons of hashes out there, but MD5 is common in most things. If you say SHA1 is still "good enough" then I don't see a reason as to why MD5 can't be classified in the same way.

As jaclaz said, and I myself have never came across an MD5 collision either, it's highly unlikely, and still fairly unlikely unless you look to try to match them purposely. But that would be hard to do with something like malware because you'd need to have the malicious function operational, while still trying to match an MD5, and just binding the file alone you're going to have to try to calculate how that can be done if you're binding to a windows file. This would include possible compression of that windows file to make sure that your binded malicious code matches the original filesize as well after everything is done. Not a very easy task for malicious code developers.

It's still a VERY common and used hashing algorithm out there, so there must not be anything wrong with it. C programming language is old, but it's still used out there lots. Being old doesn't mean we're forced to the idea of change because of the fact that it's old that there must be something better out there.

Edited January 28, 2012 by AceInfinity

**CoffeeFiend** · January 29, 2012

belittling comments like this

In what way is it? I never criticized your batch skills (or other skills either), I'm just saying I don't see what this would be useful for. You seem rather easily offended.

I probably wouldn't even use batch as it's slow anyway.

There's plenty more reasons NOT to use batch files It's pretty much my dead-last choice for just about anything, except for one particular task: passing a couple arguments to an installer (then again, you can barely call that a "batch" file, it's just one command line). Outside of that task, I can't think of any other language or scripting language that sucks so badly and is so limited. You said that you know perl, I don't see why you don't just use that instead.

What would be the purpose of comparing DVD's though?

That's the question we've all been asking What would be the purpose of comparing DVD's or any other files though? It's definitely not clear.

Multi-threading can be of use, it just depends on how you want to use it. As you say you could compare 100 different files on a same thread, but memory would be a factor there.

Comparing 100 different files between themselves is better done with hashing anyway, and multithreading can barely help here really (it's very much IO-bound, unless you have a fancy SSD). Either ways, batch files don't give you that option.

Anyway. Have fun. I don't think I can be of much help with trying to improve on something without seeing the big picture (the end goal/purpose).

**AceInfinity** · January 29, 2012

I don't see why you don't just use that instead.

I have used perl, I created a file compare script in the past, which was mentioned in one of my earlier posts in this thread. But I wanted to see what I could do with batch just for personal learning experience

Comparing 100 different files between themselves is better done with hashing anyway, and multithreading can barely help here really (it's very much IO-bound, unless you have a fancy SSD). Either ways, batch files don't give you that option.
Anyway. Have fun. I don't think I can be of much help with trying to improve on something without seeing the big picture (the end goal/purpose).

That's true, i've been told to not use MD5, so maybe an SHA1 hash version of my perl script for the file comparison? My perl script does use a hashing comparison method, but currently the only way I knew this through batch was through a program called fc.exe which I found on Windows through some searching lol.

jaclaz told me about Comp, which I may look into next...

**Yzöwl** · January 29, 2012

Just bear something in mind if I wanted to verify that myfile.dll was not in some way altered, in order to check it against a known good myfile.dll using your script they would both need to be in the same directory to drag and drop them. As you know they could not both be in the same directory because they have the same name. It would seem crazy to take one or the other of them, rename it and then move its directory location along side the other one in order to then drag and drop it onto a batch file!

**CoffeeFiend** · January 30, 2012

That's true, i've been told to not use MD5, so maybe an SHA1 hash version of my perl script for the file comparison?

That change should be pretty easy to make. Yes, MD5 collisions aren't all that likely, but why settle on such an old algo that's been decertified 14 years ago by NIST? Yes, that was for "secure purposes", but then again, it's so simple to use something more modern (then again, my own file hasher tool optionally does both -- just in case I want to check a "md5sum" which is quite unlikely).

the only way I knew this through batch was through a program called fc.exe which I found on Windows through some searching lol.

Batch files don't have built-in methods for comparing files, or hashing, or anything of the sort. We're talking about early 1980's technology here, a VERY primitive kind of "scripting" language, which was replaced by vbscript & jscript back in the 90's. And now even vbscript/jscript are being quickly replaced (and haven't been meaningfully updated in over a decade) by powershell. You're stuck relying on external tools (often 3rd party) for pretty much everything but very simple loops and copying/moving/deleting files, and there's no way to write replacement for them in batch either.

Honestly, even batch files' main replacement (vbscript) isn't so great. I mean, it was pretty nice for its time but it's really showing its lack of being updated. The error handling is laughable, I hope you don't plan on sorting data too often (it's quite a pain), arrays are pretty limited (and forget about fancier data structures), all of your code must be in a single file, etc. Nevermind the VB syntax. But it's still far better than batch files for numerous reasons (namely native access to WMI, Databases, ADSI, COM, FSO, etc, and tons more very basic things batch files can't do, like math, working with dates or strings, etc). I personally moved from batch files to vbscript in the win2k era, and then started moving away from that in the last couple of years.

jaclaz told me about Comp, which I may look into next...

It's a little bit fancier: It'll tell you if the sizes are different, otherwise it'll produce even more text when two files differ (3 whole lines of text per byte, or over 50 bytes of text for one byte that's different), but in some cases it'll stop at the first occurrence. IMO it doesn't change the overall picture all that much. You're still relying on an similar external tool for a basic task (file comparison) which you wouldn't have to do if you were using basically any other language.

**AceInfinity** · January 31, 2012

Powershell is something i'm quite familiar with, but I never would have found it a while back without trying to learn about batch. Perhaps It's senseless to keep batch as a part of my background knowledge and keep with Powershell? I've made lots more with powershell for my love of it's functionality over Batch, and i've created a few commandlets of my own that I can run through a quick console to do certain tasks that save me time on my computer.

Don't get me wrong though, batch still has some uses, but with Windows 7 where Powershell is built in by default for me, and even same with Windows 8, i'm pretty sure it will become the new batch in a little while here.

I find Powershell lots more enjoyable though, as it can create GUI's as well if you utilize and reference the system namespaces.

Edited January 31, 2012 by AceInfinity

**CoffeeFiend** · January 31, 2012

Perhaps It's senseless to keep batch as a part of my background knowledge and keep with Powershell?

If you know powershell then I don't see much of a point to using batch files in general. As far as I'm concerned, they died when Win2k came out -- along with the other MS-DOS legacy stuff. This is even more true today with x64 OS'es not even having the NTVDM (not being able to run 16 bit apps from the same era or even later). The 80's are over. Yes, there is still some support for batch files for legacy purposes but that's about it.

Powershell takes a while to get used to and I wouldn't exactly call it perfect either but it's quite nice compared to the other built-in options (batch and WSH languages). I don't normally make GUIs for scripts myself (much like I never wasted much time creating HTA's from vbscript scripts) as I mostly run them from the command line (or in powershell's case, the ISE), or automated/scheduled.

Oh, and if you wanted to do something like this in powershell it's definitely possible too (as you're most likely already aware). You have full access to the System.Security.Cryptography namespace for hashing, etc. But I would personally still rather use C# for such a tool.

Sign In

Binary Data Compare Batch - [fc]

Recommended Posts

jaclaz

Yzöwl

AceInfinity

jaclaz

CoffeeFiend

jaclaz

CoffeeFiend

jaclaz

AceInfinity

CoffeeFiend

AceInfinity

Yzöwl

CoffeeFiend

AceInfinity

CoffeeFiend

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Activity

Browse