Simplest way to move files between VM and Google drive

Pradeep Vanga
3 min readJan 26, 2022

In this short post, I am going to explain a simple way to move files between a VM like an EC2 instance or a GCP VM and Google drive without having to write any code, without having to leverage any drive APIs; using few linux CLI commands.

This approach is something that I came up with while finding ways to store the deep learning training outputs to drive when running the trainings on GCP spot instances. I tried tools like rclone and explored drive Rest APIs to achieve this before realizing that I could do this with simple linux CLI tools.

This involves using Google Colab, which is a service from Google that let’s us write python code and execute it in browser primarily, based on jupyter notebooks. This is targetted towards machine learning commnity, giving free access to GPUs. But, one can also run any arbitrary linux commands in it.

Google Colab has support for mounting google drive as a virtual directory and access the files that are stored on drive just like any other file on the file system. This is enabled by a pip package that google has made available(which only works as part of Google colab btw and hence the need to use Colab)

rsync is a Linux CLI tool to synchronize files across computers/drives.

By leveraging these, we can easily set up SSH access between Colab and the VM and be able to sync files between them.

Leveraging Colab to sync files between drive and VMs involve following steps.

  • Create/download SSH private key to Colab
  • Mount google drive in Colab
  • Add SSH public key to VM authorized keys file (~/.ssh/authorized_keys)
  • Make sure rsync is installed on the VM
  • Use rsync to sync files between Colab VM and drive.

This is a neat trick that is useful if you manage your content in google drive and want to use VMs from cloud providers for compute. If you are a hobbyist or a small company with G Suite business subscription, using drive as a storage is optimal. This can especially be useful if you are using spot instances of the cloud providers which are 80–90% cheaper than regular VMs. There is a risk of data loss with spot instances which can be terminated at any time depending on the demand.

Here is an example colab notebook to try out

Mount google drive

from google.colab import drive
drive.mount("/conte/drive")

With this, we can access the files from drive like any other file on a server.

Creating an SSH key

To be able to do his seamlessly, we need to create an SSH key without a passphrase and store the private key in drive itself so that it can be copied to Colab as part of the notebook without having to create it every time.

ssh-keygen -b 2048 -t rsa -f /content/drive/MyDrive/file-sync-tutorial/id_rsa -q -N

This generates two files id_rsa which is the private key file that needs to be present on Colab and id_rsa.pub which is the public key file that needs to be configured on the target VM (in GCP, AWS etc).

Rsync to sync files

Copying files from Google drive to Cloud VM

rsync -a ssh-username@your-server-ip:/home/ubuntu/ /content/drive/MyDrive/destination/

Run rsync every 120 seconds to keep the files synchronized.

watch -n 120 rsync -a ssh-username@your-server-ip:/home/ubuntu/ /content/drive/MyDrive/destination/

Note that even colab notebook can get terminated at any time and hence relying on watch command to synchronize files over a long period is not viable.

We can also do tricks like this to zip/unzip files that are stored in drive as uploading/downloading large number of small files to drive is a pain and not resilient. We can instead use Colab to do the zip/unzipping of the files stored in drive.

--

--