MSFN Forum: Binary Data Compare Batch - [fc] - MSFN Forum

Jump to content



  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

Binary Data Compare Batch - [fc] Rate Topic: -----

#1 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 24 January 2012 - 12:12 AM

I created this some months ago to take inputs from dragged files as arguments/params to the batch file and loop through them with fc using shift, so that it would take in more than a max of 9 args :)

@echo off
SETLOCAL ENABLEEXTENSIONS 
title Binary File Compare - Created by AceInfinity
set /a args=0 && for %%a in (%*) do ( set /a args+=1 )
if %args% lss 2 call :noArgs

if not exist "bin_output" md "bin_output"
(
echo.----------------------------------------------------------------
echo.    Binary File Comparison Script - Created by AceInfinity
echo.                      Copyright 2012
echo.----------------------------------------------------------------
echo.
) > "bin_output\results.txt"

echo Processing Binary Data...

for %%a in (%*) do (
	if not %%a==%1 ( 
		fc /b %1 %%a >> "bin_output\results.txt"
	)
)
exit

:noArgs
echo No other input files as arguments could be found... 
echo Please specify at least 2 input files to be used for comparison 
echo. && pause
exit


Perhaps someone else might find use for it, so I thought i'd share, and accept feedback on it


#2 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 24 January 2012 - 03:39 AM

Some general observations:
You say you created the file some months ago, but strangely it was copyrighted only this month!
You say that you have used 'SHIFT' in order to allow for more than nine arguments, but 'SHIFT' appears to not exist within the code.
Your method of determining whether any arguments were input seems a little convoluted.

You didn't explain in enough detail what the purpose of the script was:
It compares one file which when dropped with a bunch of others onto a batch file is allocated as the first parameter, (using luck), with every other file following it. i.e. compare 1 with 2, 1 with 3, 1 with 4 etc.

#3 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 24 January 2012 - 06:24 PM

View PostYzöwl, on 24 January 2012 - 03:39 AM, said:

Some general observations:
You say you created the file some months ago, but strangely it was copyrighted only this month!
You say that you have used 'SHIFT' in order to allow for more than nine arguments, but 'SHIFT' appears to not exist within the code.
Your method of determining whether any arguments were input seems a little convoluted.

You didn't explain in enough detail what the purpose of the script was:
It compares one file which when dropped with a bunch of others onto a batch file is allocated as the first parameter, (using luck), with every other file following it. i.e. compare 1 with 2, 1 with 3, 1 with 4 etc.


Quote

It compares one file which when dropped with a bunch of others onto a batch file is allocated as the first parameter, (using luck), with every other file following it. i.e. compare 1 with 2, 1 with 3, 1 with 4 etc.


No that is unfortunately wrong, you can determine what param or arg %1 is by making sure that you want to compare with the file that you use to drag the collection of files with, is the one you want as %1. I've tested, and it's determined by the file itself that you drag over the batch, which gets parsed as the first argument after %0. For example: If you have 3 files, file1, file2 and file3. If you select all of them, and drag these files over the batch file itself by dragging file3 to move the files over the batch file to be used as arguments, file3 will become argument %1.

I updated the year in the batch file lol. :) Batch files aren't compiled, so i'm freely able to edit that if I wish. Believe it or not, I created the script in 2011 but already had the year 2012 embedded in it anyways. It really plays no role on proof for when I created it though. For all you know I could have created this script more than a year ago (assuming the batch syntax would still be fitting for the NT version available to the user), but I still have optionality to change it to the year 9999 if I wanted lol.

My mistake though... I had a version which used SHIFT... I had it in my notepad :S but it seems I copied out this one instead. I would provide a link to where I have provided the version with, shift, but It may be against the rules, so how about I just copy the code that I have pasted on another forum here to show you the version using shift to move the input argument up by one?

Here's my first version:
@echo off
 title Binary File Compare - Created by AceInfinity
 
set i=0
 set args=0 && for %%a in (%*) do set /a args+=1 && set "original=%1"
 if %args% lss 2 call :noArgs
 set /a args-=2
 
if not exist "bin_output" md "bin_output"
 (
 echo.----------------------------------------------------------------
 echo.    Binary File Comparison Script - Created by AceInfinity
 echo.                            Copyright 2012
 echo.----------------------------------------------------------------
 echo.
 ) > "bin_output\results.txt"
 
echo Processing Binary Data...
 :start
 fc /b %original% %2  >> "bin_output\results.txt"
 shift
 set /a i+=1
 if not %i% gtr %args% goto start
 exit
 
:noArgs
 echo No other input files as arguments could be found... 
echo Please specify at least 2 input files to be used for comparison 
echo. && pause
 exit


This will use fc to compare files by their binary data with the /b flag that i've specified directly in the code. After comparing each file to the first argument as the original file in which all other arguments/parameters as files will be compared to, it creates an output text file as a log to return all of the results. If you have lots of files to compare, the buffer of the console window probably won't let you view all of the data, so it's exported to a text file instead. It may have been better to change the code to make an export of each comparison test to a new file so that the text file doesn't become too large, however I don't think anyone will be doing that, it will most likely be maybe 1-5 files.

If you want it to export to a new file everytime i'll create a new revised script if anyone wants it that way.

Important Note: The way parameters are defined, it basically uses the NTFS filesystem default from top to bottom I believe in the order that it places values into arguments that are sent (to my script in this case). Therefore you can't really choose which file goes where if you use the drag drop method, unless you open a command prompt and define each filepath manually. But that doesn't really matter as we know that if you select more than one file, WHILE DRAGGING THE GROUP OF FILES OVER MY BATCH SCRIPT THE ONE FILE THAT YOU USED TO DRAG THE GROUP OF SELECTED FILES IS PARSED AS THE FIRST ARGUMENT (%1)

This may be handy for you to know, when you want to quickly compare all other selected files to a specific file of your choice.

I DID have a video of it in use, by the video got removed, because my Youtube account was deleted a week ago when I decided to try to delete my Google account which was linked with it. And unfortunately google deleted it along with the account.

My apologies though... I can't believe I missed that it didn't use shift. (the version of my script I posted)

If you have troubles reading it, or using the script, then just ask me though. First script is the most updated version as I found that shift was useless if I could just use the for loop to loop through all of the values in the arg list.

I know you're a good batch programmer, so no personal tension between you and me as a member, but I thought that someone may find this useful, or someone like yourself could help me improve it further.

If you don't feel like providing the advice that I so kindly looked for, even after clarifying what my idea of the kind of feedback that I wanted for this script was, then kindly do not post in this thread anymore. That's all. At least i've gotten some advice on how to check for a minimum number of input args without looping and counting each which makes my script a bit faster, but as well as a couple other little things posted by other members here.

Have a nice day

~Ace

This post has been edited by AceInfinity: 26 January 2012 - 07:52 PM


#4 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 25 January 2012 - 01:51 PM

BTW, you didn't tell me anything I didn't already know, I only mentioned the parameter allocation and how the file works because you hadn't. (It's a little pointless posting a file for others to use if they don't know what to do or what it does.)

Your method of determining whether any arguments were input still seems a little convoluted.

View PostAceInfinity, on 24 January 2012 - 06:24 PM, said:

I know you're a good batch programmer, so no personal tension between you and me as a member, but I thought that someone may find this useful, or someone like yourself could help me improve it further.


I'm not just good I'm excellent.

Why would there be tension, you asked for feedback, and I gave some!

TBH if I were to ever come across a similar kind of problem I would be unlikely to use FC anyhow.

#5 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 25 January 2012 - 11:40 PM

View PostYzöwl, on 25 January 2012 - 01:51 PM, said:

BTW, you didn't tell me anything I didn't already know, I only mentioned the parameter allocation and how the file works because you hadn't. (It's a little pointless posting a file for others to use if they don't know what to do or what it does.)

Your method of determining whether any arguments were input still seems a little convoluted.

View PostAceInfinity, on 24 January 2012 - 06:24 PM, said:

I know you're a good batch programmer, so no personal tension between you and me as a member, but I thought that someone may find this useful, or someone like yourself could help me improve it further.


I'm not just a good I'm excellent.

Why would there be tension, you asked for feedback, and I gave some!

TBH if I were to ever come across a similar kind of problem I would be unlikely to use FC anyhow.


Alright, you tell me all the bad things about my script, but you don't attempt to help me, you just point out that these things are bad. I can't claim that to be the kind of proper feedback that I was looking for to be honest.

So far you're just telling me that you're absolutely "excellent" with batch, but you can't provide some help to others other than telling them that they are doing something that may not be the best method? That shows a form of arrogance at it's lowest level in my opinion, which is not a bad thing, but if you have such knowledge, can't you attempt to utilize it to provide feedback that can help others? Feedback that can be of some use to this person? You would unlikely use FC, and my method of determining whether any arguments are inputted seems "convoluted". Okay... Why then?

You're posts don't provide any support for me, and are of zero value as of now though. It's like going to a math class, and some teacher telling you "No, that's not how you do it, try again". Well then.... "How do I do it?". That is my similar perspective here from that analogy.

This post has been edited by AceInfinity: 25 January 2012 - 11:49 PM


#6 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 26 January 2012 - 04:00 AM

You didn't ask for help you didn't ask changes, you asked for feedback!

Here's what I've already done for you:
  • Informed you that there was a discrepancy in the date you'd written in the script compared with that in which you'd stated you had written it.
  • Informed you that the main method you said you'd used wasn't in fact used at all.
  • Informed you that you could simplify a specific portion of your code.
  • Informed all readers what would happen if they ran your script thus prompting the author explanation it required.
  • Informed you that there are other ways of performing the task when faced with similar problems.


If your file works for you in your situation, and you're happy enough to post it on a public forum for others to use, then be pleased that you've done your best with it.

If I were your maths teacher I'd say that for your specific problem you got the question right. I'd then suggest that you investigate different theories or formulas to improve its efficiency and satisfy a greater diversity of working situations. I certainly wouldn't do it for you!

Please don't allow any inferiority frustration you feel to propagate further argumentative responses.

#7 User is offline   Scr1ptW1zard 

  • Junior
  • Pip
  • Group: Members
  • Posts: 57
  • Joined: 05-July 07

Posted 26 January 2012 - 04:24 AM

AceInfinity, Yzöwl is providing valuable feedback.
Perhaps you meant to ask "How could this be done more efficiently?"
In that case, here is some changes I would make:

For the checking of command-line parameters, ask yourself "What do I need as input?" Your answer would be "at least 2 parameters".
So, just check for that:

if [%2] equ [] goto noArgs


You can also do away with most of the variables.

Set "original=%1"

:start
	shift
	if [%1] equ [] goto :eof
	>>"bin_output\results.txt" fc /b %original% %1
	goto start



HTH

#8 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 26 January 2012 - 05:14 AM

View PostScr1ptW1zard, on 26 January 2012 - 04:24 AM, said:

You can also do away with most of the variables.

Set "original=%1"

:start
	shift
	if [%1] equ [] goto :eof
	>>"bin_output\results.txt" fc /b %original% %1
	goto start



HTH

Since you have Extensions enabled you can also avoid to set the "original" variable.

This should work :unsure::
:start
	>>"bin_output\results.txt" fc /b %1 %2
	shift /2
	if [%2] equ [] goto :eof
	goto start


;)

jaclaz

#9 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 26 January 2012 - 05:23 AM

You are too kind Scr1ptW1zard, jaclaz.

As end users we really need to ask ourselves why we want to run this file. What are we looking for from it? Are we trying to find identical files, or are we trying to find files which appear the same but aren't?

What are the likely scenarios under which this type of comparison would be required?

#10 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 26 January 2012 - 12:42 PM

These last three posts are what I was looking for :) Finally some advice that I can use towards the latter if I was to create a newer version now. Thanks guys! I'll see what I can come up with, and maybe to answer this quesion Yzöwl:

Quote

As end users we really need to ask ourselves why we want to run this file. What are we looking for from it? Are we trying to find identical files, or are we trying to find files which appear the same but aren't?

What are the likely scenarios under which this type of comparison would be required?


I can do both? Provide a list of files that are identical and files that might not be?

Maybe just loop more than once, to compare not just one file to all others, but every file to each other file.

#11 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 26 January 2012 - 01:04 PM

View PostAceInfinity, on 26 January 2012 - 12:42 PM, said:

Finally some advice that I can use towards the latter if I was to create a newer version now.


Though it may be harder to copyright this new version :whistle:
I hereby claim copyright 2012 :w00t: for the use of SHIFT /2 :ph34r:

:lol:

jaclaz

#12 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 26 January 2012 - 02:25 PM

You still don't get it AceInfinity, I provided advice, Scr1ptW1zard did the job for you and jaclaz improved on Scr1ptW1zard's work.

My questions weren't whether you could do something they were designed to lead you to the reasons why you posted the script and what your end users would use it for.

You haven't answered it, and TBH examining the scenarios whereby the file becomes useful to others is the most important thing about posting it.

What did you need it for? are many others likely to be in the same or similar situation? how would their situation differ? Are many people likely to have a single directory containing many files which may be identical in all but name and, if so, why would they want to know and what would they do with them etc.?

Take a look at the output file you get and see if there is anything in it which could be bettered for the end user, how would you change the code to do that?

Is the end user likely to need to know the differences or just whether or not there were any?

Without answering these, the only real part of the script which becomes useful to us is the portion produced by Scr1ptW1zard and improved on by jaclaz.

#13 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 26 January 2012 - 07:41 PM

View Postjaclaz, on 26 January 2012 - 01:04 PM, said:

View PostAceInfinity, on 26 January 2012 - 12:42 PM, said:

Finally some advice that I can use towards the latter if I was to create a newer version now.


Though it may be harder to copyright this new version :whistle:
I hereby claim copyright 2012 :w00t: for the use of SHIFT /2 :ph34r:

:lol:

jaclaz


lol no, that copyright is more just for the looks anyway, I personally don't care if anyone uses my scripts, or modifies them in anyway, but I can provide credits with the copyright if I ever repost it, but I doubt it, i'm already making newer projects outside of batch.

Quote

You still don't get it AceInfinity, I provided advice, Scr1ptW1zard did the job for you and jaclaz improved on Scr1ptW1zard's work.

My questions weren't whether you could do something they were designed to lead you to the reasons why you posted the script and what your end users would use it for.

You haven't answered it, and TBH examining the scenarios whereby the file becomes useful to others is the most important thing about posting it.


You're still on about this? I don't see what there is to "get" personally. This is my thread, if someone has any batch skills, which they should before attempting to run any batch script they find on the net then they can analyze it and figure it out. Otherwise there's this thread here where people can ask questions on how my script works if they are interested. I'm not here to cater to folks, and it's up to them whether they are willing to use the script or not. It's useful for comparing files, simple as that. If you're a developer, a troubleshooter, etc... Or just a curious Windows user, then you might find a use for it.

I personally made this for my own use, although decided to post it out there in case anyone else found it useful because it is universal in the way it functions.

If you just wanted to remove duplicates, then maybe MD5 hashing comparison algorithm is better. But this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up. Otherwise, again, you can choose to compare by MD5, but i'm sure the data inside the file as binary data is just as well good.

Are you interrogating me for any specific reason? Perhaps learning how this would be useful as you cannot figure it out with putting some scenarios together on your own? I'm having a hard time understanding why you can't just let me off the hook for posting this. Nobody else had much troubles with me posting here, so it can't be as bad as you claim it to be. Even if I was to explain everything, chances are there might still be questions left unanswered as I can't put together a limitless amount of probable questions and pair them off with answers in my head. No one can.

Quote

Take a look at the output file you get and see if there is anything in it which could be bettered for the end user, how would you change the code to do that?


Things can ALWAYS be improved, but is this possibly not one of the reasons to why I came here for feedback? Maybe someone else could add in something that they might find useful as an addition to my script here? Who knows. But again, this was just a friendly contribution. I never intended to work on this script for others, mainly myself, but others with permission to freely edit as they wish.

Quote

Without answering these, the only real part of the script which becomes useful to us is the portion produced by Scr1ptW1zard and improved on by jaclaz.


That doesn't make any sense to me really. But I think the reason why i'm not getting it is because I, as a human on this earth am entitled to my own opinion and point of view, and i'm just not seeing yours. You still haven't been too helpful to me in this thread after 3 posts though that you've made. You're acting like I don't have a brain here and I can't come up with suggestions on how to improve my script on my own. However, I know for a fact that i'm not going to have the same ideas as others, which is my main intension for why I created this thread.

I'm not intimidated by your modship here, i'm a decent member of many forums, and i'm entitled to my own thoughts. I just don't see you as such a friendly guy after seeing the differences in posts between you and the 2 other members in this thread, who can joke around, and provide me some constructive criticism on the current code that I have. Yours just seems more conceptual, yet you don't back it up with much of an explanation as to why you say such things so I can use your posts to better much in terms of my script.

Even if I was to think of more to add to my script that could be useful to others as you suggest, what would be the point if the current code I have now might not be ethical based on the assumptions i've made from your replies previously? You still haven't answered my questions here.

And for my main question - what WOULD you use other than fc? Seeing as how you stated that fc is NOT what you would use. I've been waiting for answers to these questions so that maybe I can at least look something up if you were to suggest anything, but I haven't been getting very far with you in this thread so far. Which is ironic, because I mainly posted to see what kind of advice you would give to me here, knowing that you have posted various batch help around the forum in the past.

This post has been edited by AceInfinity: 26 January 2012 - 07:48 PM


#14 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 27 January 2012 - 03:34 AM

Aceinfinity,
take it easy, man :).

I will try to explain to you the issues with your batch and post (nothing personal, only trying to tell you plainly what may have contributed to the present misunderstanding)
Facts:
  • what you posted is NOT a batch file :w00t: (see below ;))
  • it misses a simple, basic explanation of it's intended usage paradigm


Explanation:
If I remove from it the several lines of copyright related matters, it amounts to a handful of "normal", "common", batch commands (which can be made "better", as seen).
It completely misses a number of checks for "sanity" and "safety" (just imagine that you compare with it two say, DVD .iso's, and you will get a bin_output\results.txt of several Gb's :ph34r:, and probably take a bit of time to run with CPU at 100%
It cannot run from read only media (and does not provide the user for a way to choose an alternate target location), it does not check for existence of compared files, it makes no checks for runnning OS, etc., etc..
To invoke a Copyright on something, it must be something more "substantiated", IMHO, compare with:
http://www.copyright...ircs/circ61.pdf

All in all it amounts to a quick and dirty script to quickly do something VERY specific, like:
  • on 2K/XP and later ONLY
  • with relatively small files only
  • on a Read/Write device
  • only reliable if drag 'n drop is used


Nothing particularly "bad" about the above :), simply these things might have been specified from the beginning.

jaclaz

#15 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 27 January 2012 - 05:02 AM

View Postjaclaz, on 27 January 2012 - 03:34 AM, said:

Aceinfinity,
take it easy, man :).

I will try to explain to you the issues with your batch and post (nothing personal, only trying to tell you plainly what may have contributed to the present misunderstanding)
Facts:
  • what you posted is NOT a batch file :w00t: (see below ;))
  • it misses a simple, basic explanation of it's intended usage paradigm


Explanation:
If I remove from it the several lines of copyright related matters, it amounts to a handful of "normal", "common", batch commands (which can be made "better", as seen).
It completely misses a number of checks for "sanity" and "safety" (just imagine that you compare with it two say, DVD .iso's, and you will get a bin_output\results.txt of several Gb's :ph34r:, and probably take a bit of time to run with CPU at 100%
It cannot run from read only media (and does not provide the user for a way to choose an alternate target location), it does not check for existence of compared files, it makes no checks for runnning OS, etc., etc..
To invoke a Copyright on something, it must be something more "substantiated", IMHO, compare with:
http://www.copyright...ircs/circ61.pdf

All in all it amounts to a quick and dirty script to quickly do something VERY specific, like:
  • on 2K/XP and later ONLY
  • with relatively small files only
  • on a Read/Write device
  • only reliable if drag 'n drop is used


Nothing particularly "bad" about the above :), simply these things might have been specified from the beginning.

jaclaz


I agree, but I doubt anyone still uses earlier than Windows 2K/XP. I have a 95 machine, as well as a 98, and 2000 all 3 of which in working condition still that I haven't used for general or "real" computer use in years.

This kind of feedback I can respect though, now we're actually getting somewhere lol :)

I was going to iterate through the files in a directory to start off, but decided to take specific input args as files instead when I first started writing this script. I personally don't know of any other realistic built in Windows utility though that is like fc but can handle larger data though anyways. Not off the top of my head at least. Perhaps you have other suggestions on a "reference" that can be used? And i'll see what I can do to ultilize it to make a better more adaptable batch script?

I have actually created a much better version of this in Perl, although using MD5 file checksums as my comparison and narrowing the search by only comparing files of similar filesize. I was just curious one day as to what I would be limited to if I was to do the same in batch :)

This post has been edited by AceInfinity: 27 January 2012 - 05:04 AM


#16 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 27 January 2012 - 05:56 AM

View PostAceInfinity, on 27 January 2012 - 05:02 AM, said:

This kind of feedback I can respect though, now we're actually getting somewhere lol :)

Good. :)

JFYI, a small batch file (by mere coincidence also using FC.EXE) I just posted:
http://www.msfn.org/...s/page__st__142

Take into account that it is posted in a very specific thread and aimed mainly at people that do know where their towel is and it is admittedly experimental. :hello:

jaclaz

#17 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,195
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 27 January 2012 - 01:18 PM

View PostAceInfinity, on 26 January 2012 - 07:41 PM, said:

But this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up.
Taking the scenarios above, could you explain how I would do this using your script as posted and improved upon thus far.
Which files would I select in, for instance, system32 together with my suspect system32 file in order to start a comparison?

View PostAceInfinity, on 27 January 2012 - 05:02 AM, said:

narrowing the search by only comparing files of similar filesize.
Now why would you do that in perl and not in the batch script? Based on the method you are using and the potential time it takes and, as jaclaz above states, the possibly huge output file to check that is exactly the kind of thing I'm expecting you to think about. I know that's a decision you could make yourself manually when selecting the files in your explorer window, but why accept manually when you're creating a script to save time.

#18 User is offline   AceInfinity 

  • Newbie
  • Group: Members
  • Posts: 21
  • Joined: 10-August 11
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 27 January 2012 - 06:03 PM

View Postjaclaz, on 27 January 2012 - 05:56 AM, said:

View PostAceInfinity, on 27 January 2012 - 05:02 AM, said:

This kind of feedback I can respect though, now we're actually getting somewhere lol :)

Good. :)

JFYI, a small batch file (by mere coincidence also using FC.EXE) I just posted:
http://www.msfn.org/...s/page__st__142

Take into account that it is posted in a very specific thread and aimed mainly at people that do know where their towel is and it is admittedly experimental. :hello:

jaclaz


Thanks for that reference I will definitely take a look jaclaz :)

Much appreciated!

View PostYzöwl, on 27 January 2012 - 01:18 PM, said:

View PostAceInfinity, on 26 January 2012 - 07:41 PM, said:

But this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up.
Taking the scenarios above, could you explain how I would do this using your script as posted and improved upon thus far.
Which files would I select in, for instance, system32 together with my suspect system32 file in order to start a comparison?

View PostAceInfinity, on 27 January 2012 - 05:02 AM, said:

narrowing the search by only comparing files of similar filesize.
Now why would you do that in perl and not in the batch script? Based on the method you are using and the potential time it takes and, as jaclaz above states, the possibly huge output file to check that is exactly the kind of thing I'm expecting you to think about. I know that's a decision you could make yourself manually when selecting the files in your explorer window, but why accept manually when you're creating a script to save time.


I was just trying to find a reference in a way that I could show you how it may be useful, however, maybe a utility like system file checker would be better for system files to check their file integrity, but if you saved a backup of explorer.exe somewhere, knowing that it was the factory default for that file. And you noticed odd behavior in your current explorer.exe (with non-regard towards the file size differences, if any) you could compare with the original explorer.exe that you have backed up someplace, with the one that you currently use on your system. I'm just being optimistic now, but trying to think of ideas here.

View PostFrom 27 January 2012 - 01:18 PM:

Now why would you do that in perl and not in the batch script?


... Hmm, actually, see now we're getting somewhere! :) Thankyou for that suggestion... That is actually something to think about, and finally a post that I can use for improving this script.

I am personally apologizing for the past posts i've made towards you, that's all I was really looking for in a reply from you. I'm dumbfounded as to why I didn't think of this earlier, but when I made this script I had forgotten that I had made a perl version to do a similar task as well. All my Perl scripts are backed up on my external hard drive, some which are more than a year old.

I appreciate all of the responses i've been given so far, thanks guys!

Edit: Wait... YzOwl... What about file sizes that may be the exact same but have different binary data? In the case of my perl script, I was comparing for SIMILAR file's and that's only possible if the file size is the same, otherwise it would indicate that it's a different file and the MD5 would change. In this case i'm trying to compare for DIFFERENT files, although if I limit myself to files of different file sizes, what about the files of the same size that have different binary data that i'm not scanning because I limit myself in the scan to scan for files of different sizes only?

Also, how would you go about scanning files of mass filesizes? (Bigger files)

In Perl I'm given the benefit of multi-threading, but i'm not sure how I could do this in batch (not multi-threading, just scanning larger files)


~Ace

This post has been edited by AceInfinity: 27 January 2012 - 06:08 PM


#19 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 9,112
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2012 - 05:01 AM

Hmmm.
http://www.mscs.dal....r/md5collision/

jaclaz

#20 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,260
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 28 January 2012 - 10:51 AM

View Postjaclaz, on 27 January 2012 - 03:34 AM, said:

If I remove from it the several lines of copyright related matters, it amounts to a handful of "normal", "common", batch commands (which can be made "better", as seen).
[...]
To invoke a Copyright on something, it must be something more "substantiated"

That. A copyright on something that can be summed up in 2 lines of pseudo-code?

View Postjaclaz, on 27 January 2012 - 03:34 AM, said:

It completely misses a number of checks for "sanity" and "safety" (just imagine that you compare with it two say, DVD .iso's, and you will get a bin_output\results.txt of several Gb's :ph34r:, and probably take a bit of time to run with CPU at 100%
It cannot run from read only media (and does not provide the user for a way to choose an alternate target location), it does not check for existence of compared files, it makes no checks for runnning OS, etc., etc..

This. These checks are fairly important and they're quick to add as well. And you're not kidding when you say several Gb's (or GBs or whatever). fc /b outputs 17 bytes per byte that's different. If you're comparing two full DVD5 images, you'd get a difference file of ~76GB!

I have to echo the general feeling of "gosh, I have absolutely NO idea what I would use this for".

View PostAceInfinity, on 26 January 2012 - 07:41 PM, said:

this could be used to scan for modified Windows files or to identify malware in any case if it's been binded with an important system file and that system file has been acting up

Modified Windows files? How about running sfc /scannow? That's built in, and meant to fix precisely those kinds of problems (there's system restore too). Or otherwise, why not compare the SHA1 hash of the file with one of the online lists that already exist, or from a known good file on another machine? As for identifying malware by running fc /b on 2 files... Most people have an antivirus which seems like a far better option for that, and there's all the websites where you upload a file and it scans it with multiple AVs too. I for one, cannot identify a threat based on 2 narrow columns of hex numbers flying past at that speed. Also, viruses may hook APIs which may make infected files appear clean (unless you scan offline) i.d. identical (it may even infect your clean file).

Basically, the "comparing files" problem has already been solved a number of times by lots of people. There's several cmd line utils just for this (e.g. diff and diff3), there's many GUI tools for this as well (WinMerge is pretty popular), and there's many more tools for generating/comparing hashes. It looks to me like a solution in search of a problem.

View PostAceInfinity, on 27 January 2012 - 06:03 PM, said:

What about file sizes that may be the exact same but have different binary data?

A simple byte-for-byte comparison would catch that. That's pretty easy to write in any language if it's not already built-in (you said you're using perl which has File::Compare, python has filecmp.cmp, etc). Hashing here only adds CPU load for no reason (and as Jaclaz already pointed out, MD5 is quite old and a bad idea in general, SHA1 is a common replacement for it). It also increases comparison time not only by being CPU bound, but also by forcing you to hash the whole thing, whereas when you're doing a byte-for-byte comparison you can easily quit at the first byte that's dissimilar (and it very well may be the first byte of a file that's hundreds of MBs). Using hashes is mainly useful in different scenarios, like comparing one file to a known hash i.e. when you don't have the other file on hand, or don't want to send/copy it elsewhere to compare it there (and other tasks like for password authentication obviously). Unless you want to compare a large number of files together and identify duplicates (not necessarily comparing against one specific file), in which case hashing indeed works nicely (it saves times by not having to re-read lots of files, lots of times)

View PostAceInfinity, on 27 January 2012 - 06:03 PM, said:

In Perl I'm given the benefit of multi-threading, but i'm not sure how I could do this in batch (not multi-threading, just scanning larger files)

Multi-threading is of no use here anyway. I'm not sure how you were expecting to use it, or what for. But if you try to read two or more files at once and then hashing them it's going to be quite slower, due to drastically increased seeking (except on SSDs). Unless you plan on having one thread reading the entire file (which might be huge) to RAM, and then while it hashes the other thread loads another file to RAM -- or one thread that queues files to hash in byte arrays in RAM while the other thread does the hashing. That would require TONS of RAM if there is large files (e.g. comparing two DVD9 .iso's would require more than 16GB of free RAM), and the speed gain is rather minimal vs using streams (which uses very little memory).

Honestly, it's hard to be really helpful when we have zero idea what you're really trying to do here -- comparing files seemingly, but what for? And what data should be shown (identical or not? which bytes are different? etc) and how (wall-of-text? excel sheet? GUI app?)

Share this topic:


  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users



All trademarks mentioned on this page are the property of their respective owners
Copyright © 2001 - 2011 msfn.org
Privacy Policy