Was just working on some server cleanup for a client and thought that this might be a handy tip for anyone out there in a similar situation...
Here's the problem, I have a directory on a server that has lots (i.e. thousands) of subdirectories. I need to copy those up to an S3 backup bucket - organized by year and month - and then delete them.
The tricky bit is they're not organized by month/year right now. It's just a giant mess of subdirectories without much of a coherent naming strategy.
So, here's what I've come up with so far.
First, I generate a text file listing all the directories, filtering through grep for the year I'm interested in, i.e.
ls -ltr | grep 2019 > list.txt
This gives me a list of all directory entries that have '2019' in them, which should get them all since we're using 'ls -l'
Next, I open that file in vim, and copy all the files for the month I'm backing up, and use a regex to remove the leading bit of each line that shows the permissions, date, size, etc.
:% s/.*2019 //g
So now I have a list of just the directory names, one per line.
Then I use another regex to create an aws cli command on each line, which copies the directory up to the bucket. The aws cli command looks something like this
aws s3 cp --recursive [directory] s3://[bucket]/2019/04/[directory]
The vim regex looks like this:
:% s/\(.*\)/[s3 command from above]\/\1/g
Once that's done, I have a big old text file of aws cli commands. I quit out of vim, make that file executable, and run it on the command line.
To remove the directories after the command finishes, I basically use the exact same process - except that I the last bit generates the appropriate rm -rf command - and execute that file.
I'm sure there's better/smoother processes, but this one works pretty well for my needs. One thing I need to remember to due next time I start a command is to do it in a screen session in case of connection issues. That's a post for another day!
Here's the problem, I have a directory on a server that has lots (i.e. thousands) of subdirectories. I need to copy those up to an S3 backup bucket - organized by year and month - and then delete them.
The tricky bit is they're not organized by month/year right now. It's just a giant mess of subdirectories without much of a coherent naming strategy.
So, here's what I've come up with so far.
First, I generate a text file listing all the directories, filtering through grep for the year I'm interested in, i.e.
ls -ltr | grep 2019 > list.txt
This gives me a list of all directory entries that have '2019' in them, which should get them all since we're using 'ls -l'
Next, I open that file in vim, and copy all the files for the month I'm backing up, and use a regex to remove the leading bit of each line that shows the permissions, date, size, etc.
:% s/.*2019 //g
So now I have a list of just the directory names, one per line.
Then I use another regex to create an aws cli command on each line, which copies the directory up to the bucket. The aws cli command looks something like this
aws s3 cp --recursive [directory] s3://[bucket]/2019/04/[directory]
The vim regex looks like this:
:% s/\(.*\)/[s3 command from above]\/\1/g
Once that's done, I have a big old text file of aws cli commands. I quit out of vim, make that file executable, and run it on the command line.
To remove the directories after the command finishes, I basically use the exact same process - except that I the last bit generates the appropriate rm -rf command - and execute that file.
I'm sure there's better/smoother processes, but this one works pretty well for my needs. One thing I need to remember to due next time I start a command is to do it in a screen session in case of connection issues. That's a post for another day!
Comments