ITT: I find out I may not be the power user I think I am.
Linux Mint
Linux Mint is a free Linux-based operating system designed for use on desktop and laptop computers.
Want to see the latest news from the blog? Set the Firefox homepage to:
where is a current or past release. Here's an example using release 21.1 'Vera':
https://linuxmint.com/start/vera/
Haha. I used to wite long and intricate BATCH files for MS DOS to automate all sorts of shit back in the 1990s. Bash is more powerful but much the same thing.
I've written a few tiny ones but I'm too old for that shit these days :)
My work is all windows based, I only put on the Linux hat while I'm at home. Powershell is fun when it's not broken and my commands don't shit the bed
I've been "daily driving" Linux for over 15 years, and I've learned a lot about configuration files, but I've never learned how to program anything. Couldn't write a "for loop" to save my life.
I think config files, yaml, and xml are reasonable pieces of code. I think program scripting is absolutely unreasonable fluff.
be careful using rm
in a loop and/or with variable arguments, things can go very wrong :)
when i'm writing a complicated command line involving rm
i often write and run it first with echo
in place of rm
just to be sure i am getting the results i expect. also when i re-run it actually using rm
, i tend to use the -v
option (which tells rm
to print what it is doing) to reassure myself that i've just deleted what i wanted to and nothing else.
Great tip, thank you!
Just fyi, if you add a second question that you want people to see, you might avoid adding it as an edit and just post a new question instead or ideally include it all in the first post to begin with.
Out of curiosity, what did you end up using for the first part? I know how I would have done it, but I'm self taught and always interested in learning new/different ways to do things.
For the bonus EDIT question, are you moving all html files from any subdirectory under your current directory? If so, that's much easier, but I would avoid putting your done
folder under the scope that you're scanning against as well as ensuring no files have matching names to avoid overwriting files already moved.
All in all, I'm sure you can get there, but it does also help to have more information up front so we can provide clearer help.
Ok, thanks for the tip. I'm still getting used to Lemmy.
I ended up using
for f in *; do find ./"$f" -type f | sort | tail -n 2 | xargs -n 1 rm; done
and it worked perfectly. For the bonus question, I'm moving the html files from 127 subfolders. They are the only content of the subfolders. I want to prepend an integer to each and copy them to a different folder, so instead of
- folder1/file1
- folder1/file2
- folder2/file1
- folder2/file2
- folder2/file3
I'll have
- 001file1
- 002file2
- 003file1
- 004file2
- 005file3
xargs rm -rf
?
for f in *; do ls $f | tail -n 2 | xargs rm -rf; done
You mean like that? rm -rf followed by a question mark does not inspire confidence XD
Additionally, for safety you can add the i
flag to be promoted to confirm each removal. It may be tedious depending on the number of files, but it may also save you from deleting files and/or directories you don't want deleted.
For clarity, be careful with that -rf
combo of flags. As another commenter mentioned, -r
means recursive, which will delete directories and their contents. You're talking about deleting files. If you do not want directories and their contents removed, DO NOT use the -r
flag.
Thank you for the tips, but now I'm getting "Cannot remove: No such file or directory" all the way down! The files are there, I see them, they come up in the terminal, but for some reason xargs rm does not want to delete them. When I put the -f flag, rm doesn't give an error but the files are still there! wtf
When you run the command without the xargs
bit, like this:
for f in \*; do ls $f | tail -n 2; done\
,
Does the output give you the full file path, or just the file names?
The full file path will look something like:
/dir1/dir2/actual-file
And of course the file name would just be:
some-file
If you're getting just the file name, that's the problem. Unless you're in the directory with the file you wish to delete, rm
needs the full path.
Edit: grammar
Yea that must be it! It's spitting out just the file name and not the whole path. There is only 1 level of depth, so I want to remove
- ./folder1/file 3
- ./folder1/file4
- ./folder2/file11
- ./folder2/file12
so how do I get the whole path into xargs? I tried xargs "$f"/
but fortunately that didn't work because it was trying to delete all the directories lmao XD
Here's the command to delete the files:
for f in *; do find ./"$f" -type f | sort | tail -n 2 | xargs -n 1 rm; done
If you want to insure it will target the correct files, first run this command (I HIGHLY recommend you do this first. Verify BEFORE you delete so you don't lose data):
for f in *; do find ./"$f" -type f | sort | tail -n 2; done
I'll be adding another comment reply with a breakdown of the command shortly (just need to write it up)
Here's what's happening in the command;
for f in *; do
You already know this for loop, which is using the *
glob to iterate over each directory in the current directory.
find ./"$f" -type f
Instead of your original ls
command, which gives the file names, and not their full paths, we're using GNU find
, which outputs the full path of what it finds. The arguments are:
./"$f"
- This tells find
where to start its search. I double qouted the $f
variable to properly expand the directory name even if it has nonstandard characters in it like spaces.
-type f
- This tells find
what kind of file object to look for. So it's two parts. -type
to tell find
there will be a specific type to look for, and the f
flag, which means file. Meaning, it will only find files
The output of find is not sorted alaphabetically, so before piping the output to tail
, we first pipe it to sort
, which by default will sort alphanumerically, which we then pipe to tail
to grab just the last two files, and finally we get to the xargs
bit.
Here I added the -n 1
argument to xargs
to get it to work on the files one at a time. This isn't actually necessary. You could just run it as xargs rm
. I didn't realize that before I posted the command. (I'm still learning too! The learning never ends. :D )
Thanks so much harsh!!! I will study this and hit Enter after I understand it.
Thanks again, that's epic.
You're welcome! Happy I could help.
One other quick note, do the filenames or directories have spaces in them? If they do, that will cause a problem with the command as it is and need some additional modification. I accounted for the possible spaces in the directory names with the find command, but not with xargs
. I just realized that as I was looking it over again.
That was it! Thank you. I got rid of over 150 files in 127 directories with a lot less clicks than through the file explorer.
Luckily this time there were no spaces in the names. Spaces in names are a PITA at my stage of learning, and I'm never sure if I should use ' or ".
Btw, new challenge in the edited original post, if you haven't yet exhausted your thinking quota for the day lol.
I'd tackle this as two different commands rather than trying to come up with a oneliner to handle moving and renaming with incrementing numbers all in one go. It could be done all in one go, but to accomplish that, I'd probably write a bash script over a one liner.
So we'll start with moving them, which is the easy part using find
:
find html-dir -name '*.html' -exec mv {} /path/to/new/dir/ \;
Let's look at each argument:
find
- calls find
html-dir
- is the path of the top level directory that contains all of the html files and/or other directories that contain the html files you want to move. This command will go recursively through all directories it finds inside of this top level directory.
-name
- this flag tells find
to use the following expression to match filenames
'*.html'
- This is the expression being used to find filenames. It's using globbing to match any file that has the .html
extension
-exec
this is kind of like xargs
, but for find. It allows you to take what has been found and act on it.
mv
- the move command to move the files to the new directory
{}
- this is the placeholder to call the find results
/path/to/new/dir/
- the directory you want to move the .html
files to
\;
- this terminates the exec
action. The backslash escapes the termination character. You can use two different termination characters, ;
and +
. The semicolon tells exec to pass the results through one at a time, while the plus tells exec to pass the results through all at once.
Once all of the files are in the new folder, cd
into that folder and use this to enumerate the files:
n=1; for file in *; do pad="$(printf '%02d' $n)"; mv $file $pad$file; ((n++)); done
This is a one liner similar to what you were trying to do.
n=1;
- sets our first number for the enumeration with a semicolon to terminate and set up the next command
for file in *; do
- Sets up our for
loop. file
will be the variable in the loop representing each file as it loops, and *
will bring in all the .html
files in the directory you are currently in, and ; do
terminates and sets up the loop.
pad="$(printf '%02d"'' $n)";
- You may or may not want to use this. I like to pad my numbers so they list properly. This creates a new variable inside the loop called pad
. Using command substitution, the printf
command will take our $n
variable and pad it. The padding is dictated by the '%02d'
, which will turn 1
into 01
, and so on. If there will be more than 99 .html
files, and you want padding to three digits, then change '%02d'
to '%03d'
. Again the semicolon terminates and sets up the next command
mv $file $pad$file;
- this is pretty straightforward. It's taking the file being acted on in the loop as represented by the variable $file
and moving it inside the same directory with the mv
command to give it a new enumerated name by concatenating the $pad
and $file
variables into a single filename. so a.html
would become 01a.html
. This command is also terminated with a semicolon
((n++));
- this is an easy way in bash to increment a numeric variable, again terminated by a semicolon,
done
ends the loop, and the command.
Let me know how it works for you!
๐๐๐๐ thanks harsh!! I'll study this and report back. I really appreciate your time and effort. There is a lot to learn here, and actually the padding is on my list of things to learn, so thank you sensei! As to your question about the integers, the files need to be in alphabetical order before getting the integer prepended to them, so like
- folder1/file1
- folder1/file2
- folder2/file1
- folder2/file2
- folder2/file3
turns to
- folder1/001file1
- folder1/002file2
- folder2/003file1
- folder2/004file2
- folder2/005file3
that way in the folder when it's all said and done I'll have
- 001file1
- 002file2
- 003file1
- 004file2
- 005file3
I'll check if your method works out of the box for that or if I have to use the sort function like you showed me last time. Thanks again!
It won't work the way you need it to as is, since the find command will first move all of those files together into the same folder, and if there are similar names between folder, like:
folder1/file1
folder2/file1
Then it will throw errors and/or overwrite files.
I'm at work so I can't loom at this right now but I'll look at it later when I get home.
Yea, I just came back to say this. Since cp overwrites by default (I tried copying first before trying moving) and each folder has files named index001 index002 etc then then folder where they all go has only ONE of index001.html, ONE of index002.html etc. So I think what I need to do is find each html file, rename it with a unique integer in front of the name, move it to the common folder.
Okay, took me awhile to write everything up. The script itself is pretty short, but it's still much easier to do in a script than to try to make this a one line command.
I tested this creating a top level directory, and then creating three subdirectories inside it with a different number of html files inside those directories, and it worked perfectly. I'm going to break down exactly what's going on in the script, but do note the two commented commands. I set this script up so you can test it before actually executing it on your files. In the breakdown of the script I'm going to ignore the testing command as if it were not in the script.
The script:
#! /bin/bash
# script to move all html files from a series of directories into a single directory, and rename them as they are moved by adding a numeric indicator
# to the beginning of the filename, while keeping files of the same folder grouped.
fileList="$(find ~/test -name '*.html' | sort)"
num=1
while IFS= read -r line; do
pad="$(printf '%03d' $num)"
#The below echo command will test the script to ensure it works.
#The output to the terminal will show the mv command for each file.
#If the results are what you want, you can comment this line out to disable the command, or delete it entirely.
echo "mv $line ~/done/"$pad${line##*/}""
#This commented out mv command will actually move and rename all of the files.
#When you are certain based on the testing that the script will work as desired
#uncomment this line to allow the command to run and move & rename the files.
# mv $line ~/done/"$pad${line##*/}"
((num++))
done<<<"$fileList"
The breakdown of the script is in the reply comment. Lemmy wouldn't let me post it as one comment.
Hey I finally got to try this out and tbh I hit Enter without understanding the whole thing ๐คญ๐คซ but anyway it's perfect! And it left me with a lot to study, your explanation was really helpful. Thanks so much for all your help! I really appreciate the time you spent :)
Excellent! I'm glad it worked for you. :)
The breakdown:
#! /bin/bash
- This heads every bash script and is necessary to tell your shell environment what interpreter to use for the script. In this case we're using /bin/bash
to execute the script.
fileList="$(find ~/path/to/dir/with/html/files -name '*.html' | sort)"
- What this command is doing is creating a variable called fileList
using command substitution. Command substitution encloses a command in "$()"
to tell bash
to execute the command(s) contained within the substitution group and save the output of the command(s) to the variable. In this case the commands are a find
command piped into a sort
command.
find ~/path/to/dir/with/html/files -name '*.html' | sort
- So this is the command set that will execute and the output of this command will be saved to the variable fileList
.
find
- invokes thefind
tool for finding files
~/path/to/dir/with/html/files
- This tellsfind
where to start looking for files. You'll want to change this to the top level directory containing all the subdirectories with the html files.
-name
- tellsfind
to match files names using the expression that follows
'*.html'
- This is the expressionfind
will use to match files. In this case it's using globbing to find all files with a file extension of.html
. The*
glob means to match any number of characters (including no characters at all). So when you combine the glob with the file extension for*.html
, you're telling find to find any files that have any characters at all in the filename as long as that filename ends with.html
|
- The pipe redirects the output of the find command, which in this case is a list of files with the full path of those files and to sort them alphanumerically. That sorted list is then saved to the variablefileList
num=1
- Here we're creating a variable called num
with a value of 1
. This is for adding sequential numbers to the files as they are moved from their source directory to the destination directory.
while IFS= read -r line; do
- This script uses a while
loop to process each item saved to the fileList
variable. Within the while
loop, the moving and renaming of the files will take place.
while
- Invokes thewhile
command. Whatwhile
does is repeat all of the commands contained inside the loop as long as the given condition is true. In this case the condition is "While there are still items in the variablefileList
to process, keep processing." When there are no more items in thefileList
variable to process, the condition becomes false and the loop terminates.
IFS=
- This calls the Internal Field Separator. The Internal Field Separator is a set of three characters that are used by default to terminate an item. Those default characters are Tab, Space, and Newline. Because the contents of thefileList
variable are separated by newline characters, the Internal Field Separator will take the items in the variable one line at a time instead of feeding the entire list into thewhile
loop as one big chunk of text.
read -r
- Theread
command does what it says and reads the input given. In this case our variablefileList
(we'll get to how we make thewhile
loop read the variable below.). The-r
flag tellsread
to ignore the backslash character (if it finds it anywhere) and just treat it as a normal part of the input.
line
- this is the variable that each line will be saved in as they are worked through the loop. You can call this whatever you want. I just usedline
since it's working through lines of input.
; do
- The semicolon terminates thewhile
setup, anddo
opens the loop for the commands we want to run using the input saved inline
.
pad="$(printf '%03d' $num)"
- This is another variable being created using command substitution. What the command in the substitution group does is take the num
variable and pad it with zeroes to be a three digit number.
printf '%03d' $num
- This is the command that runs inside the substitution set.
printf
- calls theprintf
command, which is similar toecho
in that it prints output to standard out (the terminal), butprintf
has more options for manipulating that output.
'%03d' $num
- This is a format specifier that you use withprintf
. The%
indicates that what follows is a format specifier. The0
is the character that's going to be used in the formatting. You can use any character you want in this position. The3
indicates the amount of padding, in this case formatting the number to three digits, and thed
indicates that what's being formatted is an integer. The combined format specifier of%03d
will then format the argument that follows it. In this case, the variablenum
.
mv $line /path/to/dest/"$pad${line##*/}"
- This command actually moves and renames the file.
mv
- Invokes themv
command to move some files
$line
this argument is the file to be moved. This being the variableline
will be expanded to the full file path of the current html file working through thewhile
loop.
/path/to/dest/"$pad${line##.*/}"
- This argument is the destination and renaming of the file being moved. The path part is pretty self explanatory. Replace this with the path to your desired destination. The filename bit needs its own explanation.
"$pad${line##.*/}"
- This is the bit that renames the file. What this is doing is concatenating (joining) two different variables to create the new filename.$pad
is the formattednum
variable to result in a zero padded three digit number that will be added at the beginning of the new filename.${line##.*/}
is theline
variable modified using parameter expansion, which is indicated by the curly braces.Inside the curly braces you have three parts. The first is the parameter, which in this case is the
line
variable. This is followed by the special characters to modify the expansion, and then following the modification characters is the pattern to be matched.Without modification, the
line
variable will look something like:/full/path/to/file1.html
.For the purposes of this
mv
command, we don't want the full file path in the destination argument. If the destination argument was/path/to/dest/$pad$line
, the expanded result would be/path/to/dest/001/full/path/to/file1.html
. That's obviously no good.What we want here is
/path/to/dest/001file1.html
.To get that, we use the
##
modification characters which will take the pattern that follows it, search the parameter for that pattern, find the last occurrence of that pattern, and then delete everything up to that point.After the
##
special characters is the actual pattern to be matched which is.*/
. That pattern is a regular expression made up of three characters. The.
in a regular expression means to match any character at all. The*
after the.
dictates how many times the pattern can repeat. As mentioned above the*
means to match zero or more times, so combined the.*
means literally:"Match any character at all zero or more times".
The final character of the regular expression is the literal
/
. All three characters together (.*/
) means:"Match any character at all zero or more times until you get to the last forward slash character".
When you combine the
##
modification characters with the.*/
pattern like so:##.*/
it means:"Find any character at all zero or more times until you get to the last forward slash character, and then delete everything found including the last forward slash character".
When you put all of that together you have ${file##.*/} which will take
/full/path/to/file1.html
and outputfile1.html
.Finally, when you combine the
pad
variable with it like so:"$pad${file##.*/}"
(enclosed in double quotes to insure correct expansion), you get001file1.html
as the new name for the html file as it is moved to the new destination.
((num++))
- Is a nice easy way to increment a number variable in bash
done<<<"$fileList"
- Is three parts. done
indicates the end of the while loop. <<<
is for heredoc
, which is a bash utility that allows you to pass a multiline chunk of text into a command. To pass a variable into a command with heredoc
you need to use three less than symbols (<<<
). Finally, is the variable holding the chunk of text we want fed into the while
loop, which is the fileList
variable (double quoted to insure proper expansion ignoring spaces and other nonstandard characters).
And that's the script! Let me know how it works for you.
Agree. I think for that a bash script will be a better approach. I'll be off work in a few hours. I'll take a look then.
Okay, for the HTML files, do the integers need to be apied to the files in a certain order, like, alphanumerically from the folders they're coming from?
yes. that's what I suggested.. the question mark was there to ask you if you tried that :-D I'm at work, pretty busy :-D I hope you read the rm
manual.
-r
means recursive
-f
means force, which will delete the files/directories without interaction
Oh I see, lol. Now I'm getting "Cannot remove: No such file or directory" all the way down! The files are there, I see them, they come up in the terminal, but for some reason xargs rm does not want to delete them. When I put the -f flag, rm doesn't give an error but the files are still there! wtf