MSFN Forum: renaming files in CMD scripts - MSFN Forum

Jump to content


  • 4 Pages +
  • 1
  • 2
  • 3
  • 4
  • You cannot start a new topic
  • You cannot reply to this topic

renaming files in CMD scripts Rate Topic: -----

#21 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,399
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 21 February 2012 - 06:07 AM

View Postbphlpt, on 21 February 2012 - 04:18 AM, said:

I would love to see an example as you have described. I can think of other possible situations where knowing how to do this would be handy.

That's a good point. Here's the VBScript again for those it might help at some point:

Option Explicit
Dim oXmlHttp, oRegExp, oMatch, adoStr, sChildPages(), i, url

Set oXmlHttp = createobject ("Msxml2.ServerXMLHTTP.6.0")
oXmlHttp.Open "GET", "http://www.slv.dk/Dokumenter/dsweb/View/Collection-357", False
oXmlHttp.Send

Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Global = True
oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/View/Collection-\d*)"">"

Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
If oMatch.Count = 0 Then WScript.Quit

'really ugly hack where we skip the first child page found (itself)
ReDim sChildPages(oMatch.Count-2)
For i = 1 to oMatch.Count-1
	sChildPages(i-1) = "http://www.slv.dk" & oMatch.Item(i).Submatches(0)    
Next

oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/Get/Document.*pdf)""\sclass=""uline""><b>(.*?)</b>"
For Each url in sChildPages
	oXmlHttp.Open "GET", url, False
	oXmlHttp.Send
	Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
	For i = 0 to oMatch.Count-1
		DownloadBinaryFile "http://www.slv.dk" & oMatch.Item(i).Submatches(0), oMatch.Item(i).Submatches(1) & ".pdf"
	Next
Next

Function DownloadBinaryFile(sUrl, sFileName)
    oXmlHttp.Open "GET", sUrl, False
    oXmlHttp.Send
    Set adoStr = CreateObject("ADODB.Stream")
	adoStr.Type = 1 'adTypeBinary
	adoStr.Open
	adoStr.Write oXmlHttp.ResponseBody
	adoStr.SaveToFile sFileName, 2 'adSaveCreateOverWrite
	adoStr.Close
End Function


It's pretty ugly, there's no error handling of any kind and all that but it gets the job done. Writing essentially the same thing in other languages should be pretty straightforward too (most of the work here is getting the regular expressions right). And in most cases it would be nicer/better/simpler too (VBScript data structures suck hard, downloading binary files here is a bit of a hack, error handling is beyond awful, etc).


#22 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 21 February 2012 - 07:48 AM

Check my last post. I edited it, there is the code that could be implemented to the code I posted before, yet one little change need to be done.

This post has been edited by DosCode: 21 February 2012 - 07:51 AM


#23 User is offline   bphlpt 

  • MSFN Expert
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 1,082
  • Joined: 12-May 07

Posted 21 February 2012 - 07:54 AM

CoffeeFiend - That is Marvelous! It certainly does get the job done!

I copied your script above and saved it as "GetPdf.vbs". I put the file in an otherwise empty folder named "Temp", I opened a command box in that folder and ran the file with the command:

cscript getpdf.vbs

and waited. The file ran silently. When the command completed and I got a command prompt back, I refreshed the contents of "Temp" in Windows Explorer and all the pdf files that the OP was looking for were in the folder and named correctly! The files opened correctly in Foxit Reader, my pdf reader of choice. Absolutely no problems at all. No extra files, nothing to rename, no extra external apps were required that weren't already part of Windows 7, all looked great!

Now that I know it WORKS, I've just got to do some reading so I can understand WHY it works and HOW I need to modify it to meet future needs. I hate to ask for more after you've put this together, but to save time blindly using Google, would you mind pointing me to a few links where I can read about the key parts of your script? Maybe The Scripting Guys address something similar?

Many Thanks!

Cheers and Regards

This post has been edited by bphlpt: 21 February 2012 - 08:11 AM


#24 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,447
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 21 February 2012 - 10:46 AM

This might do:
@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
echo In file:%source%
ECHO.

FOR /F "tokens=* delims=" %%A IN ('FIND ".pdf" "%source%" ^|FIND "href"^|FIND "class="') DO (
SET Line="%%A"
CALL :Line_process
)
GOTO :EOF

:Line_process
SET Line=!Line:^<=§!
SET Line=!Line:^>=§!
:loop_href
IF NOT "!line:~1,4!"=="href" SET Line="!Line:~2,-1!"&GOTO :loop_href
SET File="!Line:~7,-1!"
CALL :File_name !File!
SET Line="!File:%Filepath%=!"
:loop_class
IF NOT "!line:~1,4!"=="§§b§" SET Line="!Line:~2,-1!"&GOTO :loop_class
SET File="!Line:~4,-1!"
FOR /F "tokens=1 delims=§" %%B IN (!File!) DO SET FileTitle=%%B&SET File=
SET File
GOTO :EOF

:File_name
SET Filepath="%~1"
SET Filename="%~nx1"
GOTO :EOF


jaclaz

#25 User is offline   bphlpt 

  • MSFN Expert
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 1,082
  • Joined: 12-May 07

Posted 21 February 2012 - 11:24 AM

DosCode - No offense was meant, and I'm a CMD script fan. I've written CMD scripts that are almost 3000 lines long. There are definitely cases, IMO, where CMD script is faster and more flexible than other options. There's a reason that it has existed since before Windows up to the present day. It can be very powerful. I encourage you to pursue learning how to accomplish what you want using CMD script. But I would also STRONGLY suggest you at least try CoffeeFiend's script at least once, using the instructions I listed in my last post. CoffeeFiend accomplished in that one post everything you have asked for in every post you have made here since you became a member. Everything. CoffeeFiend and I don't need to continue on. That is all you need. If nothing else you can have it as a backup approach. The post is appropriate to leave in this thread. This section deals with all types of scripting, not just CMD script, and that script deals directly with what you wanted to accomplish as an end goal. Others who read this thread might be interested in alternative approaches, as I was. You don't have to listen to our advice. But the threads here are for the benefit of all readers, not just you. Our two posts do not distract from your overall goal as much as your posts which have been scattered over multiple threads and have yet to come up with a working solution. You have been asking about bits and pieces for a week now, and we didn't even know what your overall goal was until 18 hours ago. Now that jaclaz, our CMD script wizard, has stepped in, I'm sure he can help you come up with a script that can meet your needs, and I wish you well. But that does not make alternatives less valid.

Cheers and Regards

This post has been edited by bphlpt: 21 February 2012 - 11:25 AM


#26 User is offline   CoffeeFiend 

  • Coffee Aficionado
  • Group: Super Moderator
  • Posts: 5,399
  • Joined: 14-July 04
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 21 February 2012 - 04:54 PM

View Postbphlpt, on 21 February 2012 - 07:54 AM, said:

I hate to ask for more after you've put this together, but to save time blindly using Google, would you mind pointing me to a few links where I can read about the key parts of your script?

There is no central place for all of this that I'm aware of, nor am I a VBScript guru (I've mostly given up on it, and most of my VBScript knowledge dates back to the Win2k era, part of it being from writing classic ASP pages). As such I'm not certain what are the best resources out there today. But here's some bits and pieces that might help:

Msxml2.ServerXMLHTTP.6.0 is one of several objects which you can use to get content from the web (just like web pages that use "AJAX" stuff). The Open method is what you use to initialize the object, which HTTP verb to use and the URL. The Send method is what actually makes the request and gets the response (HTML here) back in its ResponseText property, which I've later parsed using regular expressions.

As for using regular expressions, the idea is to design them to have submatches for the content you want (the desired chunks surrounded by parentheses). Then you already have the info you want without further parsing or processing.

And finally, the regular expressions explained:

<a\shref="(/Dokumenter/dsweb/View/Collection-\d*)">
<a matches literal text
\s matches a space
href=" matches more literal text
( this marks the beginning of the information I'm interested in (the submatch which here is the URL of a child page)
/Dokumenter/dsweb/View/Collection- matches some more literal text
\d matches a numeric digit (0 to 9)
* means that this previous digit can be present any amount of times (zero to infinity)
) marks the end of the information I care about
"> matches literal text

<a\shref="(/Dokumenter/dsweb/Get/Document.*pdf)"\sclass="uline"><b>(.*?)</b>

<a matches literal text
\s matches a space
href=" matches more literal text
( this marks the beginning of the information I'm interested in (the submatch, which here is the URL of the PDF)
/Dokumenter/dsweb/Get/Document matches literal text
. matches any character
* which is there zero times or more
pdf matches more literal text
) marks the end of the information I care about
" literal text
\s space
class="uline"><b> literal text
( marks the beginning on the text group of infos I want (next numbered submatch which is the desired filename for the PDF)
. is still any old character
*? is a "fancier" version of * which matches any amount of times, but keeping the selection as short as possible
) marks the end of the 2nd group
</b> literal text

I think this covers the most interesting parts :) Not that I use this myself for page scraping/parsing mind you.

#27 User is offline   bphlpt 

  • MSFN Expert
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 1,082
  • Joined: 12-May 07

Posted 21 February 2012 - 09:46 PM

Thank you.

Cheers and Regards

#28 User is offline   Aacini 

  • Group: Members
  • Posts: 2
  • Joined: 21-February 12
  • OS:XP Home
  • Country: Country Flag

Posted 22 February 2012 - 02:14 AM

Hi. I am newbie at this forum. I was invited here by DosCode to post my last solution to this problem that was developed in detail in other site. So here it is:

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"

for /F "delims=" %%c in ('findstr /C:"<a " "%source%" ^| findstr /C:"%pdf%"') do (
   set "tag=%%c"
   rem Get the value of "<b>" sub-tag
   set "tag=!tag:<b>=$!"
   set "tag=!tag:</b>=$!"
   for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
   echo Title found: "!title!"
)



If you wish, you may review the development of this solution at this site

Regards...
Antonio

EDIT: I fixed a detail in my code (changed "<a>" by "<a ") that prevent it to correctly run...

This post has been edited by Aacini: 22 February 2012 - 07:37 PM


#29 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,364
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 22 February 2012 - 04:39 AM

It is not the solution!

Unless the original remit has changed it is simply a sub function.

BTW I'm unable to access the link to take a look at your .html, (I cannot access 'http://www.slv.dk/Dokumenter/' at all I receive Connection closed by remote server message).

#30 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 05:49 AM

View PostYzöwl, on 22 February 2012 - 04:39 AM, said:

It is not the solution!

Unless the original remit has changed it is simply a sub function.

BTW I'm unable to access the link to take a look at your .html, (I cannot access 'http://www.slv.dk/Dokumenter/' at all I receive Connection closed by remote server message).


It is solution for sub function. I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server. If anybody did so, his own fault. I am not glad of this happen. But to be exact, I did not test or checked that VB scipt, I just guess there were some instructions to download the pdf files. I will paste here my code after your answer.

#31 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,364
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 22 February 2012 - 06:14 AM

View PostDosCode, on 22 February 2012 - 05:49 AM, said:

I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server.
I haven't tested anyone else's code, I have no need to. I am prevented access to the site simply because I have a blocking application which does not like that site.

However I've taken a look at the html code for GEN 0 GENERAL and it appears that your .pdf link lines look like this:

Quote

<a href="/Dokumenter/dsweb/Get/Document-408/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>
<a href="/Dokumenter/dsweb/Get/Document-410/gen_0_3.pdf" class="uline"><b>GEN 0.3 Record of AIP Supplements</b></a>
<a href="/Dokumenter/dsweb/Get/Document-411/EK_GEN_0_4_en.pdf" class="uline"><b>GEN 0.4 Checklist of AIP Pages</b></a>
<a href="/Dokumenter/dsweb/Get/Document-412/EK_GEN_0_5_en.pdf" class="uline"><b>GEN 0.5 List of Hand Amendments to the AIP</b></a>
<a href="/Dokumenter/dsweb/Get/Document-413/EK_GEN_0_6_en.pdf" class="uline"><b>GEN 0.6 Table of Contents to Part 0 and 1</b></a>
If these are indeed the lines you are looking for then the following will always fail:
findstr /C:"<a>"


#32 User is offline   bphlpt 

  • MSFN Expert
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 1,082
  • Joined: 12-May 07

Posted 22 February 2012 - 06:55 AM

View PostDosCode, on 22 February 2012 - 05:49 AM, said:

I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server. If anybody did so, his own fault. I am not glad of this happen. But to be exact, I did not test or checked that VB scipt, I just guess there were some instructions to download the pdf files. I will paste here my code after your answer.


Yzöwl might not have tested the VB script, but I did of course, once, and my access has not been blocked. So there was no consequences to my "fault". Sorry to disappoint you. Don't worry, I'm not going to download more from the site, there is no need. And it is wrong to overload anyone's server for any reason. You also downloaded the PDF files from the site at least once, with wget I believe you said, otherwise you wouldn't be needing to rename them. It is too bad you didn't test CoffeeFiend's script, you would have a backup plan and you might have learned something. I look forward to seeing your completed, working script. I will be happy to learn any new tricks you're willing to share, in any programming language. I'm just sorry you don't seem to have the same attitude.

Cheers and Regards

#33 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 08:10 AM

Yzöwl: Sure
findstr /C:"<a>"
will fail. This tag has no sense. Because a tag always has some attributes.

bphlpt: Well, I was worried somebody could to abuse the link downloading big amout of files, because if more people would that do it at same time, I think that could slow down or maybe overload the server? Don't know. When I downloaded the files it took me 1 or three minutes, but I downloaded more files not just these ones and I did that by parts. I don't need the VB script even it can be good. Realize how much time I spent by learning the cmd things, even I am pretty bored (not in bad meaning, rather "lazy") and not devoting so much time to this job as I could. So now I send just what I have in my PC and going to have some rest.

You can repair me, if you see some mistakes in my code.

@echo off
setlocal EnableDelayedExpansion
for %%P in (*.pdf) do (
  set "pdfFile=%%P"
  set htmlMask="GEN !pdfFile:~0,1! *.html"
  REM echo !htmlMask!
  echo Testing "!pdfFile!": Looking for !htmlMask!
  set "found="
  for %%H in (!htmlMask!) do (
    set found=1
    echo "%%H"
    for /f "delims=" %%b in ('find /i "<title>" ^< "%%H"') do (
    set "pdf=%%P"
    set "source=%%H"
    set "var=%%b"
    call :JUMP
    )

    REM do whatever you need to do with the %%P pdf file and %%H html file
  )
  REM if not defined found echo NOT FOUND
)

)
echo done - check tempren.bat
::goto :EOF

:JUMP
REM Get title for pdf from html file

set "source=%source%"
set "pdf=%pdf%"

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
   set "line=%%c"
   REM Test if the line contains pdf file I look for:
   SET "pdfline=!line:%pdf%=!"
   if not "!pdfline!" == "!line!" (
      REM Test if the pdfline contains tag b
      if not "!pdfline:*><b>=!" == "!pdfline!" (
         cls     
         set "tag=!pdfline:<b>=$!"
         set "tag=!tag:</b>=$!"
         for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
	 set "title=!title::=-!"
	 set "title=!title:\=-!"
	 set "title=!title:/=-!"
	 set "title=!title:|=-!"
	 set "title=!title:?=-!"
         echo Title found: "!title!"
         pause
      )
   )
)



I will finish it later :-) I'm so lazy...

#34 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,364
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 22 February 2012 - 10:16 AM

Why then did you invite someone who has not only failed to answer your question but also provided non working code to join our Forum and post it in this Topic?

Now if I think back to a previous Topic of yours I seem to recall you asking about changing the .pdf file names by removing some characters from the beginning of them, are these in fact the same files that you are once again renaming?
e.g. Taking the following line from my previous posts GEN 0 GENERAL.html output:

Quote

<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>
Have you not already renamed the downloaded file, "EK_GEN_0_2_en.pdf", to "0_2_en.pdf"? And are you now your wanting to further rename it to "GEN 0.2 Record of AIP Amendments.pdf"

#35 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 01:54 PM

View PostYzöwl, on 22 February 2012 - 10:16 AM, said:

Why then did you invite someone who has not only failed to answer your question but also provided non working code to join our Forum and post it in this Topic?


Name of the topic is "renaming files in CMD scripts". I believe his code is fine and works. But I just want CMD solution or CMD+gnuwin.

I originally did not know that I would rename the files by html title. This decision I did later. I still can remove the non-willing prefix by script.

This post has been edited by DosCode: 22 February 2012 - 01:54 PM


#36 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 02:03 PM

Oh, I have overlooked the post from Aacini (and more posts at that time), and I did not see the error in his code, because I did not test his code. Sorry for all. But never mind you came here, Aacini, thanks for your try. Problem is that the tag a alway has attributes. But your solution is inspirative to me. I could try use regex for this. But have no time to think about it now. What about to try findstr regex like <a *>.*</a> but to add there ungreedy option?

This post has been edited by DosCode: 22 February 2012 - 02:05 PM


#37 User is offline   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,447
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 02:23 PM

View PostDosCode, on 22 February 2012 - 02:03 PM, said:

Oh, I have overlooked the post from Aacini (and more posts at that time), and I did not see the error in his code, because I did not test his code.

What about the snippet I posted? :unsure:
Did you also fail to try it? :w00t:

jaclaz

This post has been edited by jaclaz: 22 February 2012 - 02:24 PM


#38 User is offline   Yzöwl 

  • Wise Owl
  • Group: Super Moderator
  • Posts: 4,364
  • Joined: 13-October 04
  • OS:Windows 7 x64

Posted 22 February 2012 - 04:19 PM

View PostDosCode, on 22 February 2012 - 01:54 PM, said:

I originally did not know that I would rename the files by html title. This decision I did later. I still can remove the non-willing prefix by script.
Is that your answer to my question? Is the following example what you are trying to achieve?

View PostYzöwl, on 22 February 2012 - 10:16 AM, said:

e.g. Taking the following line from my previous posts GEN 0 GENERAL.html output:

Quote

<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>
Have you not already renamed the downloaded file, "EK_GEN_0_2_en.pdf", to "0_2_en.pdf"? And are you now your wanting to further rename it to "GEN 0.2 Record of AIP Amendments.pdf"
Is your goal to rename the downloaded file "EK_GEN_0_2_en.pdf" to "GEN 0.2 Record of AIP Amendments.pdf"?

#39 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 05:05 PM

View PostYzöwl, on 22 February 2012 - 04:19 PM, said:

Is your goal to rename the downloaded file "EK_GEN_0_2_en.pdf" to "GEN 0.2 Record of AIP Amendments.pdf"?[/size]


Yes. And sorry, I don't manage things now.

#40 User is offline   DosCode 

  • Newbie
  • Group: Members
  • Posts: 47
  • Joined: 15-February 12
  • OS:none specified
  • Country: Country Flag

Posted 22 February 2012 - 05:08 PM

jaclaz:
Yes, I overlooked. Maybe because when I came to PC, I have seen the CoffeeFiend's reply about VBS and I did not read it, I just did not see there is one or two posts above it. Yours posts have been just between CoffeeFriend and bphlpt post about VBS, that I did not concentrated my attention to. The post are long on my screen taking much of place. Not simple to orientate. I guess the fault is on design of this site because the left column is too wide! I will try to have some time tomorrow. Now time to sleep.

This post has been edited by DosCode: 22 February 2012 - 05:16 PM


Share this topic:


  • 4 Pages +
  • 1
  • 2
  • 3
  • 4
  • You cannot start a new topic
  • You cannot reply to this topic

2 User(s) are reading this topic
0 members, 2 guests, 0 anonymous users



All trademarks mentioned on this page are the property of their respective owners
Copyright © 2001 - 2013 msfn.org
Privacy Policy