Jump to content

How to merge two text files?


Recommended Posts

Does anyone know what the problem may be with:

MORE /T0 1.txt>2.txt

? This removes all TABs from a file and seems to work fine when processing small files. However, when I try to do this with larger files then the command suddenly stops at 1.65MB (the size differs but the point is that it just won't go further). The last line of 2.txt is something like:

-- More (86%) -- 

What's the case here? :blink:

Edited by tomasz86
Link to comment
Share on other sites


I've never run into this, but could it be the amount of memory available? It's just a guess, but it sounds like it is trying to load and process the entire thing in memory.

Cheers and Regards

Link to comment
Share on other sites

"more" is used to show only portion of file to screen at the size of dos box so it is normal that it works like this. Perhaps you could trick it by changing the size of the dos box before running "more".

If you need to remove tabulation, a tool like unix sed would be usefull.

Link to comment
Share on other sites

@allen2 MORE seems to work like TYPE when you pipe it to a file.

MORE 1.txt>2.txt

TYPE 1.txt>2.txt

I don't see any difference in the output except for the case mentioned above when larger files are processed and it gets stuck at some moment...

Edited by tomasz86
Link to comment
Share on other sites

I find gsar quite problematic. There are characters like ":" or "\" which have to be replaced with something else in order to use them with gsar's "-r" option. It's also limited to 256 characters. At the moment I've changed the script so that everything which was done by gsar can be done with pure batch using the SET command, and there's even no significant different in speed (it's actually got faster by a few seconds). I can remove TABs using SET too. But that's not the point. I've discovered the "MORE /T" switch by accident and played with it for a while until I encounter the problem described above. I was just wondering why MORE suddenly stops when processing larger files.

By the way, I think I've managed to get the Strings sorted:


FOR /F "tokens=1* delims== " %%A IN (1.txt) DO (
IF "%%B"=="" (
SET Line=%%A=""
) ELSE (
FOR /F tokens^=1*^ delims^=^" %%C IN ("%%A="%%B"") DO (
SET Line=%%C"%%D
IF !Line:~-2!==^"^" SET Line=!Line:~0,-1!
)
)
ECHO !Line!
)

This seems to work for all kinds of strings, including these:

TZROOT=SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones
HelpLink = "http://support.microsoft.com{##}kbid=2829069"
MainCancelIntroString = "Thank you for reporting the Request. When you click ""Send Report"" button, data concerning why install failed will be sent to Microsoft"
PowerShell_ReleaseNotesDir=

the result being:

TZROOT="SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones"
HelpLink="http://support.microsoft.com{##}kbid=2829069"
MainCancelIntroString="Thank you for reporting the Request. When you click ""Send Report"" button, data concerning why install failed will be sent to Microsoft"
PowerShell_ReleaseNotesDir=""

Edited by tomasz86
Link to comment
Share on other sites

I find gsar quite problematic. There are characters like ":" or "\" which have to be replaced with something else in order to use them with gsar's "-r" option. It's also limited to 256 characters.

I don't get it. :unsure:

Use a dec or hex character number, instead of the textual representation of it.

I intiially suggested gsar only because it's one of the tool I use coomonly (and it works on binary files, something I do a lot), you may want to find an alternative to it only dedicated to "text" files.

Besides the name :ph34r: this one doesn't seem like bad:

http://sourceforge.net/projects/fart-it/

http://fart-it.sourceforge.net/

or this one:

http://findandreplace.codeplex.com/

Most probably the behaviour of MORE is a glitch in the matrix, I don't think that many people ever used MORE for anything bigger than a few Kbytes. :unsure:

jaclaz

Link to comment
Share on other sites

You're probably right about MORE. I'll leave it for now.

As for gsar, I just wanted to say that in case of gsar you need to take into account more special characters than in case of a pure batch. In your SPLITINF script gsar is used to replace characters like "?" or "&", etc. Is it really a problem to just use batch like this instead of gsar?

FOR /F delims^=^ eol^= %%A IN (1.txt) DO (
SET Line=%%A
SET Line=!Line: =!
SET Line=!Line:%%=%%%%!
SET Line=!Line:^&={#}!
SET Line=!Line:^?={##}!
SET Line=!Line:^<={###}!
SET Line=!Line:^>={####}!
SET Line=!Line:^^!={#####}!
SET Line=!Line:^|={######}!
ECHO !Line!
)

Edited by tomasz86
Link to comment
Share on other sites

Why tip-top around sed for so long? :unsure:

Granted, it converts DOS ASCII to unix ASCII. Then one pipes it through unix2dos -D, and lo!, it's DOS ASCII all right again.

Both exist in cygwin, requiring just the inevitable cygwin1.dll (and, perhaps, one or two more .dlls), since way back.

And I bet there's a good Mingw standalone implementation too...

sed1line.7z

Link to comment
Share on other sites

Is it really a problem to just use batch like this instead of gsar?

Not at all :).

As a matter of fact it makes sense since you are parsing the files line by line.

@dencorso

See the above, it is just a matter of "philosophy", either processing the file(s) as a whole or parsing them line by line.

And anything needing cygwin1.dll is philosophically "wrong". :ph34r:

And anything provided through their installer is a crazy, senseless, mass of bloat :( , compare:

http://reboot.pro/topic/15207-why-everything-is-so-dmn-diificult-a-web-quest-for-ddexe/

jaclaz

Link to comment
Share on other sites

I'm now trying to solve a different problem...

I want to retrieve filename from a cabbed file. Let's say that the file is called "abc.dl_" and the real filename is "a b c.dll".

If you run

cabarc l abc.dll_

you get

Microsoft (R) Cabinet Tool - Version 5.2.3790.0
Copyright (c) Microsoft Corporation. All rights reserved..

Listing of cabinet file 'abc.dl_' (size 18989):
1 file(s), 1 folder(s), set ID 0, cabinet #0

File name File size Date Time Attrs
----------------------------- ---------- ---------- -------- -----
a b c.dll 36352 2013/03/30 12:13:14 -a--

I've come up with this script:

@ECHO OFF

SETLOCAL ENABLEDELAYEDEXPANSION

SET tokens1=1
:loop1
FOR /F "skip=9 tokens=%tokens1%" %%A IN ('cabarc l abc.dll') DO (
SET/A tokens1+=1
GOTO :loop1
)
SET/A tokens1-=5

SET tokens2=1
:loop2
FOR /F "skip=9 tokens=%tokens2%-%tokens1%" %%A IN ('cabarc l abc.dll') DO (
IF DEFINED File (
SET File=!File! %%A
) ELSE (
SET File=%%A
)
SET/A tokens2+=1
GOTO :loop2
)

ECHO "!File!"

PAUSE

which does work, the result being

"a b c.dll"

but I'm just wondering if there's any simpler way to do it instead of using such two loops. My method is also far from perfect because it won't work if the real filename has more than one space in between, ex.

"a     b c.dll"

Edited by tomasz86
Link to comment
Share on other sites

but I'm just wondering if there's any simpler way to do it instead of using such two loops. My method is also far from perfect because it won't work if the real filename has more than one space in between, ex.

"a     b c.dll"

Do files with space in names exist in CAB files? :unsure:

Anyway, see if this fits:

@ECHO off
SETLOCAL ENABLEDELAYEDEXPANSION
FOR /F "tokens=*" %%A IN ('cabarc L test.cab ^| FIND "/"') do (
SET Line=%%A
SET Line=!Line:~0,28!
CALL :rem_trail_spaces !Line!
ECHO [!Line!]
)

GOTO :EOF

:rem_trail_spaces
SET Line=%*
GOTO :EOF

jaclaz

Link to comment
Share on other sites

See the above, it is just a matter of "philosophy", either processing the file(s) as a whole or parsing them line by line.

And anything needing cygwin1.dll is philosophically "wrong". :ph34r:

And anything provided through their installer is a crazy, senseless, mass of bloat :( , compare:

http://reboot.pro/topic/15207-why-everything-is-so-dmn-diificult-a-web-quest-for-ddexe/

Yes, I agree with your points there.

But I also need sed, so I did compromise. :blushing:

Now, our friend submix8c (thanks, man... you do rock! :thumbup ) found a pearl he pointed me to on another thread... the link he gave works no more, but good old Wayback Machine is always there for the rescue (at least for the time being...): UnxUtils 2001 version (real standalone)!!! :thumbup

Grab the sed from it and do give it a try (it's a 45 kiB PE file!)... you'll fall in love. :wub:

What's *really* limiting with sed is that it's ASCII, period... so if you need UNICODE, then that's not an option. But life is like that, anyway... :}

Link to comment
Share on other sites

Have you tried just using 'expand'

@ECHO OFF & SETLOCAL ENABLEEXTENSIONS
(SET TESTFILE=D:\My Files\abc.dl_)
FOR /F "TOKENS=*" %%# IN ('EXPAND -D "%TESTFILE%"^|FIND /I "%TESTFILE%"') DO (
SET _=%%#)
ECHO/%_:*: =%
PING -n 4 127.0.0.1 1>NUL

Link to comment
Share on other sites

@Yzöwl Expand.exe works but is extremely slow compared to "cabarc L".

@jaclaz Why "!0,28!"? It won't work for longer filenames, will it?

Do files with space in names exist in CAB files? :unsure:

This is a good question :P Probably not but I just want to go safe.

Edited by tomasz86
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...