Jump to content

Welcome to MSFN Forum
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. This message will be removed once you have signed in.
Login to Account Create an Account



Photo

Del White/Empty Space, Double Lines/Spaces

- - - - - Remove Doubled Remove White Delete Empty Delete White Slim down Code Delete Doubled

  • Please log in to reply
13 replies to this topic

#1
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag

Hi,

 

i wanna delete Empty Spaces, in many files by script...

But i didnt get what i like...

 

Here is an Example: (there some are Tabs and Empty doubled Spaces at the end of a line)

; Generated by ERROR
[Data]

	AutomaticUpdates= " No"			  
  Autopartition	=0  			
 	MsDosInitiated  = 0   			
	 UnattendedInstall   	=" 2 + 4 "   				
			
		  
  		
[Unattended]

  	 	 		UnattendMode=				FullUnattended
ProgramFilesDir="\Program Files (x86)"
NoWaitAfterGUIMode=1
 

Exactly i wish folling steps:

01) Convert all to UTF8 or 32

02) Convert all Tabs to Spaces

03) Replace *" * with *"*  (Optional)

04) Replace * "* with *"*  (Optional)

05) Replace * " * with *"*  (Optional)

06) Replace *=* with * = * (Optional)

07) Replace * =* with * = * (Optional)

08) Replace *= * with * = * (Optional)

09) Remove/Replace Doubled White Space Everywhere

10) Remove White/Empty Space at the Beginn of a line

11) Remove White/Empty Space at the End of a line

12) Add a Empty Line before [

13) Replace Doubled Empty Lines With one Line

 

I tryed the fart tool, the sed tool, a batch, and notepad++

 

My Batch cut the Output, a perlscript doesnt delete the empty space at line end, - something other didnt do it at the beginning... (- but i really wanna have 1 Solution for all)

 

Should be look like this: After

; Generated by ERROR

[Data]
AutomaticUpdates = "No"
Autopartition = 0
MsDosInitiated = 0
UnattendedInstall = "2 + 4"

[Unattended]
UnattendMode = FullUnattended
ProgramFilesDir = "\Program Files (x86)"
NoWaitAfterGUIMode = 1

Would be nice, if someone can tell me how to get it.... (- maybe with AutoIT ?)

 

Thx

R4


Edited by R4D3, 28 May 2015 - 08:31 PM.



How to remove advertisement from MSFN

#2
jaclaz

jaclaz

    The Finder

  • Developer
  • 15,522 posts
  • Joined 23-July 04
  • OS:none specified
  • Country: Country Flag

It would probably be faster to re-write the file, but it depends on the characters used, using batch and FOR /F "tokens= delims=", if all the lines are like that (essentially your example looks like a .ini or .inf file).

 

I am not too sure to understand the need to convert (from what?) to "UTF 8 or 32".

If it is plain ASCII, I would probably use gsar:

http://home.online.no/~tjaberg/

 

See if this seemingly unrelated thread:

http://www.msfn.org/...two-text-files/

 

gives you some inspiration, the object there was to split and selectively merge, nicely formatted, a whole lot of .inf, but the basic approach should be the same or similar.

 

jaclaz



#3
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

Certainly something with Regular Expression support is suitable. Surprised that Perlscript cannot do it. But hey, AutoIt has Perl Compatible Regular Expression support ;).

 

This is what I came up with. Many here seem to like using CMD scripts so I made this AutoIt script to be compiled as a CUI program. I am using AutoIt 3.3.10.2 at present which should be compatible with the latest version.

; Name of compiled file.
#pragma compile(Out, 'reformatini.exe')
; CUI program. Set to False for a GUI program.
#pragma compile(Console, True)
; Bit x86|x64. Set to true for 64 bit program.
#pragma compile(x64, False)
; AutoIt version. Tested on this version.
#pragma compile(FileVersion, 3.3.10.2)
; What this file is meant for.
#pragma compile(FileDescription, 'Reformat ini file content. Use /?, -? or -h for help.')
; A name for the program.
#pragma compile(ProductName, 'Ini File Reformat Tool')
; Version for this program.
#pragma compile(ProductVersion, 1.0.0.0)

#NoTrayIcon

If $CMDLINE[0] > 2 Then
	; More then 1 parameter is not supported.
	ConsoleWriteError('Only maximum of 2 parameters is allowed.' & @CRLF)
	Exit 1
ElseIf $CMDLINE[0] = 2 Then
	; Read direct from the ini file.
	$sContent = FileRead($CMDLINE[1])
	If @error Then
		ConsoleWriteError('Failed to read the file.' & @CRLF)
		Exit 2
	EndIf
ElseIf $CMDLINE[0] = 1 Then
	Switch $CMDLINE[1]
		Case '/?', '-?', '-h'
			; Help
			ConsoleWrite( _
			 'Ini File Reformat Tool' & @CRLF & _
			 'Outputs the reformatted content to the console or to a file.' & @CRLF & @CRLF & _
			 'Pass 2 parameters as 1st being path to the source file and 2nd' & @CRLF & _
			 'to the destination file. The file encoding of the destination file' & @CRLF & _
			 'will be based on the source file encoding.' & @CRLF & _
			 ' i.e. "' & @ScriptName & '" source.ini destination.ini' & @CRLF & @CRLF & _
			 'Or, pass 1 parameter as being the path to the source file.' & @CRLF & _
			 ' i.e. "' & @ScriptName & '" source.ini' & @CRLF & @CRLF & _
			 'Or, pipe to this file.' & @CRLF & _
			 ' i.e. Type source.ini | "' & @ScriptName & '"' & @CRLF & @CRLF & _
			 'Exitcode:' & @CRLF & _
			 ' 1 Only maximum of 2 parameters is allowed.' & @CRLF & _
			 ' 2 Failed to read the file.' & @CRLF & _
			 ' 3 No parameters and no input provided.' & @CRLF & _
			 ' 4 Failed to open the file for write.' & @CRLF _
			)
			Exit
		Case Else
			; Read direct from the ini file.
			$sContent = FileRead($CMDLINE[1])
			If @error Then
				ConsoleWriteError('Failed to read the file.' & @CRLF)
				Exit 2
			EndIf
	EndSwitch
Else
	; Read from stdin.
	$sContent = ''
	Do
		$sContent &= ConsoleRead()
	Until @error
	If $sContent == '' Then
		ConsoleWriteError('No parameters and no input provided.' & @CRLF)
		Exit 3
	EndIf
EndIf

; Clean the content from the ini file.
$sNewContent = _CleanIniFileContent($sContent)

; Output the cleaned content.
If $CMDLINE[0] = 2 Then
	; Open the output file for erase and then write in the same encoding as the source file.
	$hWrite = FileOpen($CMDLINE[2], FileGetEncoding($CMDLINE[1]) + 0x2)
	If $hWrite = -1 Then
		ConsoleWriteError('Failed to open the file for write.' & @CRLF)
		Exit 4
	Else
		; Write the new content to the output file.
		FileWrite($hWrite, $sNewContent & @CRLF)
		FileClose($hWrite)
	EndIf
Else
	; Just output the new content to console.
	ConsoleWrite($sNewContent & @CRLF)
EndIf

Exit

Func _CleanIniFileContent($sContent)
	; Remove empty lines and trim whitespace from the end of each line.
	$sContent = StringRegExpReplace($sContent, '(?m)^\h*(.+?)\h*$', '\1')
	; Remove horizonal whitespace on lines that have no other content.
	$sContent = StringRegExpReplace($sContent, '(?m)^\h+$', '\1')
	; Remove empty lines.
	$sContent = StringRegExpReplace($sContent, '(\r\n|\n){2,}', '\1')
	; Fix the spacing between the key values and the data values.
	$sContent = StringRegExpReplace($sContent, '(?m)^([^;#[])(.*?)\h*=\h*(.*)$', '\1\2 = \3')
	; Trim the spacing from the quoted data values. i.e. " string " to "string".
	$sContent = StringRegExpReplace($sContent, '(?m)^([^;#[])(.+?) = "\h*(.+?)\h*"$', '\1\2 = "\3"')
	; Add empty lines before section names.
	$sContent = StringRegExpReplace($sContent, '(?m)^(\[.+\])$', @CRLF & '\1')
	; Trim both ends of the content.
	$sContent = StringStripWS($sContent, 0x3)
	Return $sContent
EndFunc

Here is a cmd script used to test it and the output.

Spoiler

 

Use the executable like any one of these commands:

 

Syntax: reformatini.exe /? | -? | -h
Syntax: reformatini.exe source.ini [destination.ini]
Syntax: type source.ini | reformatini.exe

 

| is alternate except in the last command. [ ] is optional.

 

 

Let me know how it works for you.

 

Edit: Updated about the | not being an alternate in the last command which may have been confusing otherwise.


Edited by MHz, 29 May 2015 - 03:59 AM.

  • R4D3 likes this

#4
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag
First: Thanks a Lot: MHZ - your Script work perfectly, now I just need do modify it a bit, - or the batch with something like %1, to change an amount of Files in a Folder...
 
- Second:  I am Sure that a Perl Script can do it easily, - but I am not a programmer (I use try and error method instead...) - and wasn't able to combine those Expressions for the end and begin of a line...
 
@jaclaz: I would agree, if I like to change one singe file... - but my plan is to get the script running, over all extracted .mof .css .inf . ini .sif (all Txt based files) form my windows XP Iso.... - There are more than 12 Million empty spaces in that files... (only in 3 Files is a Warning Message: Do not Edit - but i am not sure if MS means "blanks" with that too... -)
 
The Input (from the files) is sometimes different - normally Windows use UTF16LE (but some files are not...) - and there is although this problem - that a white space isn't always a white space - cause some Language Components/Layouts use different one... (I read something about it a few days ago...)
 
the point, why I like to choose utf8 or 32 as output, is: - I have read that utf8 (the most common standard) is the smallest one (in file size) - and utf32 the fastest one. Some of that Files will be used from the OS every time - and I think about what happens when my System run those files (12 Million blanks "lighter") in UTF8, or 32 instead of 16LE - maybe I get a speed change, or maybe less memory usage - and cause I don't know, - I wanna test it...  (i know that i the digital signing of some files will be lost by that... - if i get errors i will expect them...)
 
;)
 
- if I get success - I will do the same thing on Reshack Extracted Resource Files ( UI Code parts from .exe .ocx .dll) (and yes I will try to slim down the palette colors of all internal resource files by a script too, cause I was able to slim down my explorer.exe to 330KB !!! in the past, by using a 3 Color palette (black/white/transparent) for all used resources, and by deleting the unused... (there is no need that a 48px 3 color Icon use a 16,7 Mio Color palette... - it just blows up the file size and the memory usage...) (so its try to Slim Down the Size of Windows , without deleting any files... -or before deleting them  ;))
 
Edit: Your link to this "unrealted" thread is interesting ;)

Edited by R4D3, 29 May 2015 - 08:59 AM.


#5
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

 

First: Thanks a Lot: MHZ - your Script work perfectly, now I just need do modify it a bit, - or the batch with something like %1, to change an amount of Files in a Folder...
 
- Second:  I am Sure that a Perl Script can do it easily, - but I am not a programmer (I use try and error method instead...) - and wasn't able to combine those Expressions for the end and begin of a line...
...

Your welcome.

I made it as a CUI i.e. console program so it can be used in a CMD script something like:

if not exist subfolder md subfolder
for %%A in (*.ini) do reformatini.exe "%%~A" "subfolder\%%~nxA"

Actually, you could do the above with it compiled as a GUI program though you would get no output in the CMD window.

 

Look, I do not expect you to be a (professional,serious,whatever) programmer. Learning programming is like climbing a ladder. You go step by step. The rate of the climb is up to you. Do it as steep as you can handle. What I have learnt is not by magic but determination. You can probably be there one day. About regular expressions, I considered I could do it in 1 regular expression, I come close, but kept failing. So, bleep it, I did it over several expressions. Maybe not the best but got it done and if a bug is present then usually I can track it down to one of the expressions or needing another expression. PCRE may need unicode i.e. UCP turned on though I do not handle unicode characters in the patterns so hopefully it should not be needed. If I sound a little too advanced for you then say, hey, can you break this down. I will try though I may have lost some prior memory but I can only try to remember how I knew little about programming i.e. I may need to be reminded. :)


  • R4D3 likes this

#6
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag

nah its ok, youre right... ;) i just get often confused from all that "special" charaters in scripts - i am able to do some little things - but i really have problems when i need to make a "concept" and how to use it... - i understand 80% of what you have done there, but wouldnt be able to get it myself from "zero"... - mostly i modify script examples and copy them together, to get my target...

 

- i already have somthing like a similar gui i maybe can use for it  (me and someone else did write a autoit script for batchmod (its a script itself for reshacker) to change the resource files from a extracted XP Iso to slipstream a resource packs like FlyAKite,... - but we never finished it... - i think i allready post that here "somewhere" - think this was on some older usernames R2-D2, or D5D4 or something similar, where i have forget pw and used email :D)

 

I will try your "MasterCode" for all that files next days... ;) (I have tryed to open all 1200 Files with Ultraedit by hand to modify them, and sometimes i got a warning, that this file is not dos coded - but i can fix that files before - I just have to make a list...)


Edited by R4D3, 29 May 2015 - 10:10 AM.


#7
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

Reshacker, hmm, not 64 bit compatible AFAIK. A shame really as it is renowned as being such a great program in its prime time. Even good programmers and their programs may need to retire.

 

I will try your "MasterCode" for all that files next days... ;) (I have tryed to open all 1200 Files with Ultraedit by hand to modify them, and sometimes i got a warning, that this file is not dos coded - but i can fix that files before - I just have to make a list...)

Cool. 1200 files, major BLEEP! By the time you have edited a couple of dozen or so, you could have a script to do the rest in seconds. Yeah, yeah, yeah, takes some known knowledge,though something to strive for. If Ultraedit is complaining about not being a DOS coded file then perhaps it thinks it is a binary file. A plain text file does not have a header yet alone a DOS header. Something strange may be going on there.



#8
jaclaz

jaclaz

    The Finder

  • Developer
  • 15,522 posts
  • Joined 23-July 04
  • OS:none specified
  • Country: Country Flag

Well, nothing can beat plain ASCII/ANSI (non-unicode), it is the first time I hear that UTF-8 is common (I mean among TEXT used in Windows .iso's in files like  .mof .css .inf . ini .sif, while it is very common in the web), open one of those text files in a hex editor/viewer.

If it's plain ASCII/ANSI, you will see exactly what you can see in (say) Notepad.

If it's unicode (UTF-16) it's first two bytes will be "ÿþ" or FFFE and you will see all letters separated by a dot.

If it's UTF-8 it's first three bytes will be "" or EFBBBF.

 

You CANNOT change the text encoding of files like .inf, .ini or .sif as simply the Windows Setup would not be able to read them, you have to keep the SAME text encoding as the original file.

 

ASCII/ANSI is an 8 bit FIXED text encoding, each character will take exactly one byte.

UTF-8 and UTF-16 are VARIABLE length text encoding, each character will take AT LEAST respectively 8 bits (1 byte) or 16 bits (2 bytes), and extended/regional symbols will take more.

UTF-32 is a FIXED format (just like ASCII/ANSI), but each character will need 32 bits (or 4 bytes) instead of 8 bits (or 1 byte).

 

So, if your scope (even if it were possible, i.e. all the tools/programs involved could actually read indifferently any of those encodings, which is NOT he case :no:) is to reduce size of the files, you are doing it wrong :w00t: :ph34r: plain ASCII/ANSI and UTF-8 would be roughly the same size, the UTF-16will be AT LEAST double that size and UTF-32 will be 4 times that size.

 

jaclaz


Edited by jaclaz, 29 May 2015 - 01:09 PM.


#9
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag
Hey, common ;) the amount of UTF-8 files at all is probably bigger than the amount of other Pages
 
Some files on the XP.Iso are already in different Code Pages, so I thought it is a good Idea to unify them (before converting the whitespaces - if I don't did this before - MHZ´s Script will maybe not work right on Tabs or Spaces from some files...
Now, I tested a batch file with all kinds of codepages... - and you're right, just 3 of them worked 100% at this single test (Ansi, UTF-8(without BOM), and Unicode)
 
But Slim down the Size of XP - without deleting Files, is not Impossible !
 
Here you can Proof it by Yourself: 
A) Extract all files form your NT 5.X Iso (*.*_; *.Cab; *.Zip)
1) Delete more than 12 Million empty spaces from all Text based Files (maybe Expect those 3 "do not edit" files, and files with a .Cat file)
2) Remove all ; Comments
3) Replace all www.********* with Blank, Router Address, or Blackhole
4) Replace a Billion Times the Word Microsoft with MS (yes I know that Xp is from them...- so keep it in the Version String)
5) Extract the resource Files from all(.exe; .dll; .ocx), do Step 1-4 for their Menu files
6) Delete unused Resources (.wav, .ani; .cur; .jpg; .avi; .wmv; .bmp; .gif; .png; .ico)
7) Delete every second picture in .avi and .gif animations band reduce/or speed up the play speed (Example: Explorer Copy Animation)
8) Convert Color palettes of all "inside" images/Anims who are not bigger than 96x96 to 256 +Alpha, or 4 Color (Black, White, Grey, Alpha), or what you like
9) Convert the bigger Pics to a 256 Color+Alpha palette (Expect the StartupPic/Animation)  
9) Remove Multiple Doubled Icon Files (just keep 16x16 and 32x32 - I do not like BIG Icons at all... - if you increase, a 32px Icon to 96px - and the icon is away, the system will show up the rescaled 32px icon - dosnt look so fine, but dont break the OS, the icons inside get resources to be displayed in 8x8 on a 16 Color Monitor up to 512x512
A) Compile all Files back to their Original and try your new XP Iso in a VM
 
Best thing would be to move all of that "Lovely Stuff" to shell32.dll (and set a link to it instead) - cause there are thousands and thousands of similar or doubled Resources in this files... - you will never believe how much...
As I said before, - you can Slim down the Explorer.exe with Reshacker to 330KB ! And if you use a good Set of Icons it looks ways nicer...
 
Edit: No, I am not able to write a Script for all that... (i even dont know a good commandline Palette reducer) - I hope i am close to finish Step 1 ;)
 
 
Update: After some testing of your code i found a entry that seems to clean some other codepage tabs/spaces) - just dont know why it begins with (  and get the & @ in a little different order
	$sContent = (StringRegExpReplace($sContent, "\h+", " ") & @CRLF)

 


Edited by R4D3, 29 May 2015 - 09:39 PM.


#10
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

<snip>
Update: After some testing of your code i found a entry that seems to clean some other codepage tabs/spaces) - just dont know why it begins with (  and get the & @ in a little different order
 

	$sContent = (StringRegExpReplace($sContent, "\h+", " ") & @CRLF)

The extra braces is forcing everything within them to be evaluated. In that example, they serve no meaningful purpose. Take care with something like that pattern as paths in your text files can have multiple spaces or tabs as an example which could break those paths.
 
Adding (*UCP) at the beginning of each Regular Expression pattern may change the behavior of how certain characters are matched. I.e. \h may match unicode horizontal spaces as well as the ASCII + Extended ASCII horizontal spaces.
 
So you can change this function to:

Func _CleanIniFileContent($sContent)
	; Trim whitespace from the end of each line.
	$sContent = StringRegExpReplace($sContent, '(*UCP)(?m)^\h*(.+?)\h*$', '\1')
	; Remove horizonal whitespace on lines that have no other content.
	$sContent = StringRegExpReplace($sContent, '(*UCP)(?m)^\h+$', '\1')
	; Remove empty lines.
	$sContent = StringRegExpReplace($sContent, '(*UCP)(\r\n|\n){2,}', '\1')
	; Fix the spacing between the key values and the data values.
	$sContent = StringRegExpReplace($sContent, '(*UCP)(?m)^([^;#[])(.*?)\h*=\h*(.*)$', '\1\2 = \3')
	; Trim the spacing from the quoted data values. i.e. " string " to "string".
	$sContent = StringRegExpReplace($sContent, '(*UCP)(?m)^([^;#[])(.+?) = "\h*(.+?)\h*"$', '\1\2 = "\3"')
	; Add empty lines before section names.
	$sContent = StringRegExpReplace($sContent, '(*UCP)(?m)^(\[.+\])$', @CRLF & '\1')
	; Trim both ends of the content.
	$sContent = StringStripWS($sContent, 0x3)
	Return $sContent
EndFunc

This is untested so it may work OK or it may need to be updated to handle the different behavior.
 
Edit1: Added extra info about using just \h+ in a pattern.
 
Edit2: Created another version. v1.1. See below.
 

Spoiler


Edited by MHz, 30 May 2015 - 12:28 AM.


#11
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag

Thx...

 

Now after a first Test (Over all Files from the Iso that could be Open in Notepad) - your Script deletes 19MB of Whitespace at all... - i guess it would be more :sneaky: - now maybe i just search the most worse one, by size, cause it will be hard to write a repack script for all that files (sometimes there is a cabbed *._ file inside a .cab thats inside another .cab - and some files get overwritten when i expand them - cause their expand target ends up in same file name....)

 

Attached File  Compare.PNG   23.47KB   1 downloads

 

At this Test, i just use your new script, with this batch and a *.* command - and didnt test yet if the new one can do different files type like the old one...

@ECHO OFF & COLOR 3F & ECHO Script by msfn User: MHZ
if not exist subfolder md subfolder
for %%A in (*.inf; *.ini; *.sif) do reformatini.exe "%%~A" "subfolder\%%~nxA"
Pause

Edited by R4D3, 30 May 2015 - 11:36 AM.


#12
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

You should be able to use *.* as first parameter and a folder as 2nd parameter. This is so long as you have only text based files in the source directory as no file type filtering is done by the script.
 
Another way could be to rename the files adding a temporary extension i.e. file1.ini to file1.ini.text, file2.inf to file2.inf.text etc. Do a reformatini.exe *.text destfolder and then once done, rename the files removing the temporary extension. CMD For loop and using Rename should be able to do the mass file renaming.

 

As for some whitespace which could be removed. The ini file format usually is not so spacious in its default API usage.

 

i.e. Try this test. Requires reformatini.exe for the comparison.

; Create a default ini layout.
IniWrite('test1.ini', 'section 1', 'key1', 'value1')
IniWrite('test1.ini', 'section 1', 'key2', 'value2')
IniWrite('test1.ini', 'section 2', 'key1', 'value1')
IniWrite('test1.ini', 'section 2', 'key2', 'value2')

; Clean the ini.
RunWait('reformatini.exe test1.ini test2.ini')

; Read the characters of the files into a variable.
$test1 = FileRead('test1.ini')
$test2 = FileRead('test2.ini')

; Show some results.
MsgBox(0x40000, @ScriptName, _
 'Size of test1 = ' & StringLen($test1) & @CRLF & @CRLF & _
 $test1 & @CRLF & @CRLF & _
 'Size of test2 = ' & StringLen($test2) & @CRLF & @CRLF & _
 $test2 & @CRLF _
)

I get this output.

Size of test1 = 78

[section 1]
key1=value1
key2=value2
[section 2]
key1=value1
key2=value2

Size of test2 = 88

[section 1]
key1 = value1
key2 = value2

[section 2]
key1 = value1
key2 = value2

10 characters are excess whitespace in the cleanup file as it adds spaces around the = character and adds spacing before section name lines. Something to consider. Minor changes to the Regular Expressions can change that result.



#13
R4D3

R4D3

    Newbie

  • Member
  • 42 posts
  • Joined 12-July 14
  • OS:none specified
  • Country: Country Flag
;) Thx again,
 
I know the script:
 
- add some whitespace between = itself
- fixes possible errors like " this"  and "this " one
- add a line before a [ at Linebegin ? or in General `? - didnt think about it before - i will test it soon ;)
 
In the 7050 Files,  " = " was inside 551.584 times, but when a " " space need 1 Byte this 1.103.168 spaces will be something like 1MB - so at all I would save 20MB instead of 19MB (I still think about killing all tabs and doubled whitespaces between "things", and ; comments at line end, or lines starting with a ; too)
 
Problem is: I now have to decide, how much of this files I really like to change
 
I Could use:
A) files who are easy to recab to their position in the Iso
B) just the big ones
C) just the files with most whitespace (by counting them before)
D) Chosen files by list
E) Files by type
F) all Textbase Files (but I am not sure if I can handle the expanding, recabbing of all archived files; [some files have different names and end up in the same file, other have their real name after expanding - and need to keep it, and some have not, and will be renamed by windows setup) (I although would need something like a "Codepage Detector", who checks up if first, if the file can be cleaned; - by searching the first byte somehow ?
 
I thought first to do it finaly like this, cause if I use *.* - I need to move all this extensions to another place (*.* was just for a Test. Where I copyed all readable files in one folder, to get this knowledge about that 19MB of Whitespace, inside them...)
@ECHO OFF & COLOR 3F & ECHO Script by msfn User: MHZ
if not exist subfolder md subfolder
for %%A in (*.adm; *.adr; *.asa; *.asp; *.aspx; *.bat; *.cer; *.cf; *.chs; *.cht; *.cmd; *.cnt; *.config; *.cpx; *.crl; *.css; *.csv; *.default; *.df; *.dns; *.dtd; *.dun; *.dxt; *.ecf; *.eng; *.gpd; *.h; *.h2; *.hex; *.hht; *.hkf; *.hpj; *.hta; *.htm; *.htt; *.htx; *.hxx; *.icw; *.inc; *.inf; *.ini; *.ins; *.isp; *.jpn; *.js; *.key; *.kor; *.man; *.manifest; *.mfl; *.mib; *.mof; *.msc; *.nt; *.obe; *.osc; *.p7b; *.pmc; *.ppd; *.ppt; *.pro; *.prx; *.rat; *.reg; *.rsp; *.sam; *.sed; *.sep; *.set; *.sif; *.smc; *.sp2; *.spd; *.sql; *.srg; *.sym; *.tha; *.the; *.txt; *.uninstall; *.url; *.vbs; *.vcf; *.ver; *.wpl; *.wsc; *.wsx; *.xdr; *.xml; *.xsd; *.xsl; *.xslt) do reformatini.exe "%%~A" "subfolder\%%~nxA"
Pause

- Maybe i need to open a New Thread in Windows XP Subforum for this Idea of Prepairing a Iso


Edited by R4D3, 31 May 2015 - 12:41 PM.


#14
MHz

MHz

    Just simple

  • Member
  • PipPipPipPipPipPipPip
  • 1,685 posts
  • Joined 02-August 04
  • OS:Windows 7 x64
  • Country: Country Flag

Problem is: I now have to decide, how much of this files I really like to change

Depends on how much you can do by script. Manually doing it would be a PITA.
 

... I although would need something like a "Codepage Detector", who checks up if first, if the file can be cleaned; - by searching the first byte somehow ?

Codepage affects the extended ASCII. This depends on the system language default that is set. It is not a BOM that defines codepage, but rather file encoding. If you look at the _FileWrite() function, you may noticed that I used FileGetEncoding() which gets the encoding that the file uses.
 

@ECHO OFF & COLOR 3F & ECHO Script by msfn User: MHZ
if not exist subfolder md subfolder
for %%A in (*.adm; *.adr; *.asa; *.asp; *.aspx; *.bat; *.cer; *.cf; *.chs; *.cht; *.cmd; *.cnt; *.config; *.cpx; *.crl; *.css; *.csv; *.default; *.df; *.dns; *.dtd; *.dun; *.dxt; *.ecf; *.eng; *.gpd; *.h; *.h2; *.hex; *.hht; *.hkf; *.hpj; *.hta; *.htm; *.htt; *.htx; *.hxx; *.icw; *.inc; *.inf; *.ini; *.ins; *.isp; *.jpn; *.js; *.key; *.kor; *.man; *.manifest; *.mfl; *.mib; *.mof; *.msc; *.nt; *.obe; *.osc; *.p7b; *.pmc; *.ppd; *.ppt; *.pro; *.prx; *.rat; *.reg; *.rsp; *.sam; *.sed; *.sep; *.set; *.sif; *.smc; *.sp2; *.spd; *.sql; *.srg; *.sym; *.tha; *.the; *.txt; *.uninstall; *.url; *.vbs; *.vcf; *.ver; *.wpl; *.wsc; *.wsx; *.xdr; *.xml; *.xsd; *.xsl; *.xslt) do reformatini.exe "%%~A" "subfolder\%%~nxA"
Pause

I do not know probably half of those extensions and whether it is safe to use the Regular Expressions on those as they are designed for an ini file type structure. You may need to make Regular Expressions in different functions to be called by the detected file type.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users