Chapter 10: Organizing Files

Scene: The “Desktop Disaster”

Chaitanya is staring at his computer screen, looking completely overwhelmed. His main SchoolSystem folder is an absolute disaster zone.

Aditi Ma’am walks up and squints at the screen. “Chaitanya, what happened here? There are hundreds of files just dumped in the root directory.”

Chaitanya: “It’s the script from yesterday! It generated 35 quizzes and 35 answer keys perfectly. But it dropped all 70 files right next to the main Python script. And then I ran the attendance script, and the gradebook script… Now everything is mixed together. I’m spending more time dragging and dropping files into folders than I am actually coding.”

Aditi Ma’am: “You have learned how to create files. Now you must learn how to organize them. Dragging and dropping with your mouse is for beginners. We are going to automate your file management using the shutil module.”

Chaitanya: “What does shutil stand for?”

Aditi Ma’am: “Shell Utilities. It lets Python control your computer’s file explorer. It can copy, move, rename, and delete files at lightning speed.”

Copying Files and Folders (shutil.copy)

Aditi Ma’am: “Let’s start by backing up the Principal’s master attendance file before you accidentally break it. To copy a single file, use shutil.copy(source, destination).”

Python

import shutil, os
from pathlib import Path

# Setup our paths
os.chdir('C:/SchoolSystem')
p = Path.cwd()

# Copy the file to the Backup folder
shutil.copy(p / 'attendance.txt', p / 'Backup')

Chaitanya: “That copied attendance.txt into the Backup folder. What if I want to change the name of the file while I copy it?”

Aditi Ma’am: “If the destination you provide ends in a filename instead of a folder, Python will copy the file and rename it at the same time.”

Python

# Copy and rename in one step
shutil.copy(p / 'attendance.txt', p / 'Backup/attendance_backup_monday.txt')

Chaitanya: “That is incredibly useful for daily backups! But what if I want to copy an entire folder? Like the entire Grade_9 folder, with all its files and subfolders inside?”

Aditi Ma’am:shutil.copy() only works on single files. To copy an entire folder tree, use shutil.copytree().”

Python

# Copy the entire Grade_9 folder and everything inside it
shutil.copytree(p / 'Grade_9', p / 'Grade_9_Backup')

Moving and Renaming (shutil.move)

Chaitanya: “Okay, that handles backups. But I don’t want to copy those 70 geography quizzes; I want to move them out of my main folder and into a Geography_Test folder.”

Aditi Ma’am: “For that, we use shutil.move(source, destination). It deletes the file from the original location and puts it in the new one.”

Python

# Move a quiz file into a specific folder
shutil.move('capitalsquiz_1.txt', 'C:/SchoolSystem/Geography_Test')

Chaitanya: “What happens if there is already a file named capitalsquiz_1.txt inside the Geography_Test folder?”

Aditi Ma’am: “It will overwrite it instantly. No warning popup, no ‘Are you sure?’ message. Just gone. Remember Rule #1: Python assumes you know what you are doing.”

Chaitanya: “Yikes. I need to be careful with that. Can I use move() to rename files, too?”

Aditi Ma’am: “Yes. In fact, Windows doesn’t actually have a separate ‘rename’ command in its core structure. Renaming a file is literally just moving it to the exact same folder, but with a different name.”

Python

# Renaming a file by "moving" it to the same directory
shutil.move('capitalsquiz_1.txt', 'India_Capitals_Quiz_1.txt')

The Danger of Deleting Files permanently

Aditi Ma’am: “Now for the dangerous part. Cleaning up the garbage. There are three functions you can use to delete things in Python, depending on what you are trying to destroy.”

  1. os.unlink(path): Deletes a single file.
  2. os.rmdir(path): Deletes a folder, but only if it is completely empty.
  3. shutil.rmtree(path): The nuclear option. It deletes a folder and absolutely everything inside it.

Chaitanya: “Let me write a loop to delete all those .txt files that end in .bak (backup files). They are taking up too much space.”

Python

import os
from pathlib import Path

for filename in Path.cwd().glob('*.bak'):
    os.unlink(filename)
    print(f'Deleted {filename}')

Aditi Ma’am: “Stop! Before you run that, Chaitanya, look at me. If you run os.unlink(), the files do not go to the Recycle Bin. They do not go to the Trash. They are permanently destroyed from your hard drive.”

Chaitanya: “Oh. So if I make a typo in my loop and tell it to delete *.txt instead of *.bak…”

Aditi Ma’am: “You will instantly and irreversibly delete every single text file in the school system.”

Chaitanya: (slowly pulls his hand away from the Enter key) “How do I test my code safely, then?”

Aditi Ma’am: “Whenever you write a deletion script, you should always comment out the actual delete command and replace it with a print() statement first. This is called a ‘Dry Run’.”

Python

for filename in Path.cwd().glob('*.bak'):
    # os.unlink(filename)  <-- Commented out!
    print(f'Would be deleted: {filename}')

Aditi Ma’am: “Run the Dry Run. Check the printed list. If it looks correct, then—and only then—uncomment the os.unlink() line.”

Safe Deletion with send2trash

Chaitanya: “Even with a Dry Run, I’m terrified of os.unlink(). What if I want to send things to the Recycle Bin so I can restore them if I mess up?”

Aditi Ma’am: “Python’s built-in modules don’t support the Recycle Bin because every operating system handles trash differently. But a brilliant programmer wrote a third-party module to solve this. Open your terminal and type pip install send2trash.”

Chaitanya: (Types it in) “Installed. How do I use it?”

Aditi Ma’am: “Import it, and use it exactly like os.unlink(). But this time, you have a safety net.”

Python

import send2trash

# Create a dummy file
dummy_file = open('garbage.txt', 'w')
dummy_file.write('This is trash.')
dummy_file.close()

# Send it to the Recycle Bin safely
send2trash.send2trash('garbage.txt')

Chaitanya: (Checks his Windows Recycle Bin) “It’s there! I can just right-click and restore it. I am never using os.unlink() again.”

Aditi Ma’am: “A wise choice. send2trash is slightly slower than os.unlink(), but for everyday scripts, the safety is absolutely worth it.”


Aditi Ma’am: “You now know how to copy, move, rename, and safely delete files. But right now, you can only operate on one folder at a time.”

Chaitanya: “What do you mean?”

Aditi Ma’am: “What if the Principal asks you to find every single .pdf file hidden anywhere in the School System, across 50 different nested folders, and copy them to a flash drive?”

Chaitanya: “I would have to write a loop inside a loop inside a loop… checking every folder manually?”

Aditi Ma’am: “No. You use Python’s directory-walking algorithm. In Part 2, I will teach you the legendary os.walk() function. It will allow your scripts to crawl through every folder on your hard drive automatically.”

PART 2

Scene: The “Nested Folder” Nightmare

Chaitanya is clicking rapidly through his file explorer. He opens SchoolSystem, then Grade_10, then Section_A, then Math_Assignments. He sighs and clicks the “Back” arrow three times.

Aditi Ma’am: “You look like a rat in a maze, Chaitanya. What are you looking for?”

Chaitanya: “The Principal just dropped a massive job on me. She needs a copy of every single .pdf syllabus in the entire school database. The problem is, every teacher organizes their folders differently.”

  • Mr. Sharma put his in Grade_9/Science/Syllabus.pdf.
  • Mrs. Gupta put hers in Grade_11/Humanities/History/Term1/syllabus_final.pdf.

Chaitanya: “There are hundreds of nested folders. Do I have to write a for loop, and then put another for loop inside it to check the subfolders, and then another one inside that?”

Aditi Ma’am: “If you do that, your code will look like a staircase to nowhere. And what happens when a teacher creates a folder five levels deep? Your code will miss it.”

Chaitanya: “So how do I search a maze if I don’t know how deep it goes?”

Aditi Ma’am: “You use Python’s built-in bloodhound: os.walk().”

Walking the Directory Tree

Aditi Ma’am: “The os.walk() function is designed to crawl through every single folder, subfolder, and sub-subfolder inside a directory, automatically. You don’t need to know how deep the maze is; os.walk() explores every dead end and hallway for you.”

Aditi Ma’am: “Think of it like a security guard inspecting a building. The guard walks into a room and writes down three things on his clipboard:”

  1. The name of the room he is currently in.
  2. A list of all the doors leading to other rooms.
  3. A list of all the loose papers sitting on the desks.

Chaitanya: “Room, doors, papers.”

Aditi Ma’am: “Exactly. In Python terms, os.walk() is a loop that gives you three values every time it steps into a new folder:”

  1. folderName (a string)
  2. subfolders (a list of strings)
  3. filenames (a list of strings)

Writing the Walk Loop

Chaitanya: “Let me try writing the loop. I want it to start at the very top of the SchoolSystem folder.”

Python

import os

# Start the walk at the root of the school directory
for folderName, subfolders, filenames in os.walk('C:/SchoolSystem'):
    print('The current folder is ' + folderName)

    for subfolder in subfolders:
        print('Inside ' + folderName + ', I found subfolder: ' + subfolder)

    for filename in filenames:
        print('Inside ' + folderName + ', I found file: ' + filename)

    print('---') # Draw a line before walking into the next room

Aditi Ma’am: “Run it.”

Chaitanya: (Hits Enter) “`text The current folder is C:/SchoolSystem Inside C:/SchoolSystem, I found subfolder: Grade_9 Inside C:/SchoolSystem, I found subfolder: Grade_10 Inside C:/SchoolSystem, I found file: master_attendance.csv

The current folder is C:/SchoolSystem/Grade_9 Inside C:/SchoolSystem/Grade_9, I found subfolder: Science Inside C:/SchoolSystem/Grade_9, I found file: roster.txt

The current folder is C:/SchoolSystem/Grade_9/Science Inside C:/SchoolSystem/Grade_9/Science, I found file: Syllabus.pdf Inside C:/SchoolSystem/Grade_9/Science, I found file: Lab_Rules.pdf


**Chaitanya:** "Whoa. It just mapped the entire hard drive structure in a split second. It went into the main folder, checked everything, then automatically stepped into `Grade_9`, checked everything there, and then stepped into `Science`!"

**Aditi Ma'am:** "Yes. And you didn't have to write any nested loops to handle the depth. The single `os.walk()` loop handles the diving and surfacing for you."

### Project: The PDF Extractor

**Chaitanya:** "Okay, I understand how it moves. Now I need to actually solve the Principal's problem. I need to find the `.pdf` files and copy them to a flash drive (let's say drive `F:/Syllabus_Backup`)."

**Aditi Ma'am:** "Take it step-by-step.
1. Walk the tree.
2. Look at the `filenames` list.
3. If a filename ends with `.pdf`, construct its full path.
4. Use `shutil.copy()` to send it to the flash drive."

**Chaitanya:** "Got it. I'll use the `endswith()` string method we learned earlier."

```python
import os, shutil

backup_folder = 'F:/Syllabus_Backup'

# Walk the whole school system
for folderName, subfolders, filenames in os.walk('C:/SchoolSystem'):
    
    # Loop through the files in the current folder
    for filename in filenames:
        
        # Check if it's a PDF
        if filename.endswith('.pdf'):
            
            # Create the full absolute path of the file
            full_path = os.path.join(folderName, filename)
            
            # Copy it to the backup drive
            print(f'Copying {filename} to backup...')
            shutil.copy(full_path, backup_folder)

print('Backup complete.')

Aditi Ma’am: “Notice you used os.path.join(folderName, filename)? Why did you do that?”

Chaitanya: “Because filename is just a string, like 'Syllabus.pdf'. If I tell shutil to copy 'Syllabus.pdf', it will look for it in my Current Working Directory and crash. I have to glue it to folderName so shutil knows exactly which room in the maze the file is sitting in.”

Aditi Ma’am: “Perfect. You avoided the most common mistake beginners make with os.walk().”

Chaitanya: (Runs the script) “It found 142 PDFs across 50 different folders and copied them all to the flash drive in about two seconds. The Principal is going to be thrilled.”

Modifying the Walk (The Skip Trick)

Aditi Ma’am: “One last trick. What if you want to skip an entire folder? Say you have a massive folder called Old_Archives that has 10,000 files in it, and you don’t want to waste time searching it.”

Chaitanya: “Can I just tell it not to go in there?”

Aditi Ma’am: “Yes. Because subfolders is a standard Python list, you can actually modify it during the walk. If you remove a folder name from the subfolders list, os.walk() will completely ignore that branch of the maze.”

Python

for folderName, subfolders, filenames in os.walk('C:/SchoolSystem'):
    
    # If we see 'Old_Archives', remove it from the list of doors to open
    if 'Old_Archives' in subfolders:
        subfolders.remove('Old_Archives')
        print('Skipping the archives...')
        
    # ... rest of the code ...

Chaitanya: “That is powerful. I have complete control over where the script goes.”


Aditi Ma’am: “You have mastered moving, deleting, and searching through thousands of files. But the Principal’s flash drive only has 2 Gigabytes of space left. If you need to send her massive databases, copying raw files won’t work.”

Chaitanya: “I have to shrink them.”

Aditi Ma’am: “Exactly. In Part 3, I will teach you how to use the zipfile module to compress hundreds of files into tiny, easily emailable ZIP archives, directly from your Python script.”

PART 3

Scene: The “Out of Space” Error

Chaitanya is staring at a Windows error popup: Not enough space on drive F:/

Chaitanya: “Ma’am, the PDF extraction script worked perfectly, but the Principal’s flash drive is ancient. It only has 2GB of space. The 142 PDFs take up 3GB. I can’t fit them all on there, and they are too big to email.”

Aditi Ma’am: “Then you must shrink them. When you pack for a vacation and your suitcase won’t close, what do you do?”

Chaitanya: “I sit on it. Or use those vacuum-seal bags to suck the air out.”

Aditi Ma’am: “Exactly. In the digital world, sitting on your data is called Compression, and the vacuum-seal bag is a ZIP file. Python can create, read, and unpack these bags using the zipfile module.”

Reading ZIP Files

Aditi Ma’am: “Before we make a ZIP file, you need to know how to handle one. Suppose the Principal hands you a file called Archived_Exams.zip. You want to know what’s inside without actually unpacking the whole thing on your hard drive.”

Chaitanya: “Like looking through the clear plastic of the vacuum bag?”

Aditi Ma’am: “Precisely. We use zipfile.ZipFile() to open the bag, and the namelist() method to look inside.”

Python

import zipfile, os
from pathlib import Path

# Move to the folder with the ZIP file
os.chdir('C:/SchoolSystem')

# 1. Open the ZIP file object
exampleZip = zipfile.ZipFile('Archived_Exams.zip')

# 2. Look at the names of the files inside
print(exampleZip.namelist())

Output: ['Math_2024.pdf', 'Science_2024.pdf', 'History_2024.pdf']

Aditi Ma’am: “You can even check how much space the compression actually saved using the getinfo() method. It gives you a ZipInfo object containing the original file size and the new, squished file size.”

Python

examInfo = exampleZip.getinfo('Math_2024.pdf')

print(f'Original size: {examInfo.file_size} bytes')
print(f'Compressed size: {examInfo.compress_size} bytes')

# Calculate the efficiency
ratio = round(examInfo.file_size / examInfo.compress_size, 2)
print(f'Compressed file is {ratio}x smaller!')

Chaitanya: “Wow. 4x smaller? That means my 3GB of PDFs could shrink to under 1GB. They’ll easily fit on the flash drive!”

Extracting from ZIP Files

Chaitanya: “Okay, if I need to unpack Archived_Exams.zip, do I use shutil?”

Aditi Ma’am: “No, shutil moves folders, but it doesn’t unzip them. The ZipFile object has its own methods for that: extractall().”

Python

# Unpack everything into a new folder called 'Unpacked_Exams'
exampleZip.extractall('C:/SchoolSystem/Unpacked_Exams')
exampleZip.close()

Aditi Ma’am: “If the Unpacked_Exams folder doesn’t exist, Python will automatically create it for you and dump all the files inside. If you only wanted to pull out one specific file, you would use exampleZip.extract('Math_2024.pdf').”

Creating and Adding to ZIP Files

Chaitanya: “Alright, I’m ready to create my own ZIP file for the Principal’s PDFs. How do I make the vacuum bag?”

Aditi Ma’am: “Remember the File I/O dance from Chapter 9? Open, Write, Close. Creating a ZIP file works exactly the same way. You must open the ZipFile object in Write Mode ('w').”

Python

import zipfile

# 1. Create a brand new ZIP file (Write Mode)
newZip = zipfile.ZipFile('Syllabus_Backup.zip', 'w')

Aditi Ma’am: “Now, you add files to it using the write() method. But here is the critical part, Chaitanya. You must tell Python how to compress the data. Pass compress_type=zipfile.ZIP_DEFLATED to use the standard compression algorithm.”

Python

# 2. Add files to the ZIP and compress them
newZip.write('Syllabus_Science.pdf', compress_type=zipfile.ZIP_DEFLATED)
newZip.write('Syllabus_Math.pdf', compress_type=zipfile.ZIP_DEFLATED)

# 3. Close the bag!
newZip.close()

Chaitanya: “What if I forget ZIP_DEFLATED?”

Aditi Ma’am: “Then Python will put the files into the ZIP folder, but it won’t actually squish them. It will just be a regular folder with a .zip extension. It won’t save you any disk space.”

Chaitanya: “And if I want to add another PDF to Syllabus_Backup.zip tomorrow? Do I use 'w' again?”

Aditi Ma’am: “If you use 'w', it will overwrite the whole ZIP file and destroy yesterday’s work! To add a new file to an existing ZIP, open it in Append Mode ('a').”

The Grand Project: Automated Versioned Backups

Aditi Ma’am: “Let’s put Chapters 9 and 10 together into a master script. The Principal wants a backup of the entire Grade_10 folder. But she wants to run the script every Friday. If the script always names the file Backup.zip, it will overwrite last week’s backup. We need the script to automatically figure out the next version number—like Backup_1.zip, Backup_2.zip—so we never lose history.”

Chaitanya: “That requires logic. I have to check the hard drive to see what numbers already exist.”

Aditi Ma’am: “Exactly. Here is how a professional sets up an automated backup.”

Step 1: Figure out the ZIP filename

Python

import zipfile, os

def backupToZip(folder):
    # Make sure we have the absolute path
    folder = os.path.abspath(folder) 
    
    # Figure out what number to use for the backup file
    number = 1
    while True:
        zipFilename = f'{os.path.basename(folder)}_{number}.zip'
        if not os.path.exists(zipFilename):
            break # We found a number that doesn't exist yet!
        number = number + 1

Chaitanya: “Clever. If Grade10_1.zip exists, the while loop bumps number to 2. If Grade10_2.zip exists, it bumps to 3. It only breaks the loop when it finds an empty slot.”

Step 2: Create the ZIP and Walk the Directory Aditi Ma’am: “Now we create the ZIP file, and use os.walk() to crawl through the Grade_10 folder and add every single file we find to the vacuum bag.”

Python

    print(f'Creating {zipFilename}...')
    backupZip = zipfile.ZipFile(zipFilename, 'w')
    
    # Walk the entire folder tree
    for folderName, subfolders, filenames in os.walk(folder):
        print(f'Adding files in {folderName}...')
        # Add the current folder to the ZIP
        backupZip.write(folderName)
        
        # Add all the files in this folder to the ZIP
        for filename in filenames:
            # Don't backup the backup ZIP files themselves!
            if filename.startswith(os.path.basename(folder) + '_') and filename.endswith('.zip'):
                continue 
            
            # Write the file to the ZIP, applying compression
            backupZip.write(os.path.join(folderName, filename), compress_type=zipfile.ZIP_DEFLATED)
            
    backupZip.close()
    print('Backup completed successfully.')

# Run the function on the Grade 10 folder
backupToZip('C:/SchoolSystem/Grade_10')

Chaitanya: (Runs the script) “`text Creating Grade_10_1.zip… Adding files in C:/SchoolSystem/Grade_10… Adding files in C:/SchoolSystem/Grade_10/Math… Adding files in C:/SchoolSystem/Grade_10/Science… Backup completed successfully.


**Chaitanya:** "I ran it again, and it instantly created `Grade_10_2.zip`. It backed up hundreds of files in seconds, compressed them, and versioned them automatically. I can just schedule this script to run every Friday and never worry about backups again."

**Aditi Ma'am:** "You have graduated from simply writing code, Chaitanya. You are now automating the boring stuff."

### Summary Box (Chapter 10)

* **Copying:** `shutil.copy(source, destination)` for files, `shutil.copytree()` for entire folders.
* **Moving/Renaming:** `shutil.move(source, destination)`.
* **Safe Deletion:** Install and use `send2trash.send2trash(path)` to avoid permanent loss.
* **Tree Traversal:** `os.walk(path)` returns the current folder, a list of subfolders, and a list of filenames for *every* directory in the tree.
* **Compression:** `zipfile.ZipFile('name.zip', 'w')` creates archives. Use `compress_type=zipfile.ZIP_DEFLATED` to shrink file sizes.
* **Extraction:** `ZipFile.extractall()` unpacks everything.

**Aditi Ma'am:** "You now control the file system. In the next phase of your training, we will stop looking at the files themselves, and start looking at the *data* inside them. Next up is **Web Scraping**—how to pull data straight off the internet without a browser."

***

Leave a Comment

💬 Join Telegram