Alibaba Cloud - Cloud Disks Backup and Restore

Creating virtual machines on the cloud, regardless of the platform is pretty easy. It is made so easy that you might jump in and fire machines without prior planning. The advantage of the cloud is that you can get started without worrying about initial costs, and scale as needed as your business grows. But this does not mean that you don’t have to plan before getting stared. Each Cloud Provider has its own way of providing management features for their products. You should take the time to learn them, and design not only for your application, but also for operating your resources once your services go live. The operation and maintenance part is easily overlooked but I think it is important to deeply think about it during your architecture design process. Opearion and maintenance is wide topic, especially for large scale infrastructures, but today I will focus on disks and data management on Alibaba Cloud.

What is a Cloud Disk

From the Alibaba Cloud product documentation.

Cloud Disk, which is a block-level data storage product provided by Alibaba Cloud for ECS, uses a multiple distributed system, and features low latency, high performance, persistence, high reliability, and more. Cloud disks can be created, resized, and released at any time.

So when creating virtual machines with ECS, Cloud Disk will be basically the disk device for your virtual machine. Every virtual machine that you create must have at least one cloud disk attached and will be used as the system disk. In addition, you can attach up to 16 additional cloud disks as data disks. The system disk is tied to the virtual machine and cannot be detached at will. OTOH, data disks can be attached and detached from any virtual machine. Other notes:

  • A cloud disk cannot be attahed to multiple virtual machines at the same time, thus does not support simultaneous reads and writes from multiple virtual machines. For this type of needs, you’d have to rely on other storage products like Object Storage Service or Network Attached Storage.
  • The capacity of a cloud disk goes from 20GiB up to 32,768GiB . When using it as a system disk, the minimum size depends on the operating system you choose for the virtual machine. 20GiB is the minimum for linux systems while 30GiB for windows.

Backup and Restore

Snapshots

Before talking about backup and restore of cloud disks on Alibaba Cloud, we must understand snapshot, as it’s the main feature that allows automatic backups of cloud disks. In a nutshell, a snapshot is a copy of the data on a cloud disk at a specified point in time. The snapshot feature provides the capability to set automatic snapshot policies. This means that you can easily set a backup policy on any cloud disk that you deploy on Alibaba Cloud.
There are two ways to create snapshots:

  • manual snapshot: at any point in time, you can manually create a snapshot of a cloud disk.
  • automatic snapshot: automatically created based on a snapshot policy.

Snapshot creation may yield a slight decrease in io performance. Based on Alibaba Cloud’s official documentation, you can face up to 10% decrease in io performance. Backup policies depend on things such as best practices or security guidelines of your organization, but you must also consider the perfomance impact when designing your backup policy. For example, if your cloud disk stores data of a database, you should consider to avoid creating snapshots in times when there is a high volume of database transactions, as degradation on the disk io performance may impact the database performance.

Making use of snapshots

The last thing that we need to understand before getting started is how we can make use of snapshots. There are 2 ways to use a snapshot:

  1. Restore data to a cloud disk.
  2. Create a custom image. Images and custom images are out of the scope of this post, but I’ll give a brief introduction anyways. An image is a running environment template for ECS instances. Images allow the provisioning of ECS instances with a preloaded operating system, optionally with additional software and configurations. There are three types of images that you can use on Alibaba Cloud:

    • public images, which are officially provided by Alibaba Cloud
    • custom images, which are created by the end users
    • marketplace images, which are created by 3rd party providers and made available through the Alibaba Cloud marketplace.

Restoring data to a cloud disk with a snapshot is an easy to understand concept. But what can we do with custom images in terms of backup/restore and why? I’ll come back to this later when we talk the details of data restoration.

Backing up your disks

As described in the previous sections, disk backup actually equals to creating a shapshot of your disk. You can create a disk snapshot anytime manually, or setup a schedule for automatic snapshots. We will walk through both methods.

Manual snapshot

A walkthrough with a newly created disk.

Verify the target disk ID

$ aliyun ecs DescribeDisks --output cols=DiskId
DiskId
------
d-6wecl9xdfms5r1vxml07

Verify current snapshots of the target disk

$ aliyun ecs DescribeSnapshots --DiskId d-6wecl9xdfms5r1vxml07
{
	"PageNumber": 1,
	"TotalCount": 0,
	"PageSize": 10,
	"RequestId": "BABE43C3-B759-494D-AB2E-8C4C8CEBC71F",
	"Snapshots": {
		"Snapshot": []
	}
}

At this moment no snapshot exists for this disk. we can create one manually with CreateSnapshot API:

$ aliyun ecs CreateSnapshot --DiskId d-6wecl9xdfms5r1vxml07 --SnapshotName d-6wecl9xdfms5r1vxml07_initial_backup --Description "Very first backup"
{
	"RequestId": "7469877A-AF3D-47E8-8E5B-22509EDA4D9A",
	"SnapshotId": "s-6we4k062vs82hmgt90pv"
}

Let’s verify

$ aliyun ecs DescribeSnapshots --DiskId d-6wecl9xdfms5r1vxml07
"Snapshots": {
	"Snapshot": [
		{
			"Description": "Very first backup",
			"SnapshotName": "d-6wecl9xdfms5r1vxml07_initial_backup",
			"ProductCode": "",
			"Encrypted": false,
			"SourceDiskSize": 40,
			"Progress": "100%",
			"Usage": "none",
			"CreationTime": "2019-02-24T04:11:40Z",
			"SourceStorageType": "disk",
			"Tags": {
				"Tag": []
			},
			"Status": "accomplished",
			"KMSKeyId": "",
			"SourceDiskType": "system",
			"SourceDiskId": "d-6wecl9xdfms5r1vxml07",
			"SnapshotId": "s-6we4k062vs82hmgt90pv"
		}
	]
}

Manual snapshot, in my point of view is like a check point. You could use it in the following cases:

  • After initial setup of your ECS instance
  • Before deploying new code, or applying a major change to your applications or configurations

As your application keeps running and new data is saved to your disk with time, you would not want to have to periodically create snapshots in this manual way. You could of course create a cron and run the command periodically, but automatic snapshot feature is available and is the right way to go.

Automatic snapshot

Creating snapshots automatically on a given disk requires two steps and you’re done:

  1. Create an automatic snapshot policy
  2. Apply the automatic snapshot policy to the target disk

An automatic snapshot policy is like a backup policy. There are two things that you need to determine before creating the policy:

  • The backup schedule: a combination of the weekday and time point for creating automatic snapshots. You can have a policy to take automatic snapshots on a hourly basis every day, or at a specific hour on a specific weekday. Example: Create a snapshot weekly at 00:00 on Sundays.
  • The retention time of the snapshot: how long do you want to keep your backups ? You can keep the snapshot permenently, or set a retention period between 1 to 65536 days.

Creating an auto snapshot policy

You can do so with CreateAutoSnapshotPolicy API

$ aliyun ecs CreateAutoSnapshotPolicy --regionId ap-northeast-1 --repeatWeekdays '["1", "2", "3", "4", "5", "6", "7"]' --timePoints '["0"]' --retentionDays 30 --autoSnapshotPolicyName dailybackup
{
	"RequestId": "91972457-966B-43FB-A5EA-41C81E0DAAFF",
	"AutoSnapshotPolicyId": "sp-6we4k062vs82i66zyj8n"
}

Above is a policy that will automatically create a snapshot daily at 00:00. We could change it to hourly snapshot by changing the --timePoints parameter to ["0", "1", ..., "23"]. Note that --retentionDays is 30 days in this example. This can be set to permenent by changing the parameter to -1. Let’s verify our policy:

$ aliyun ecs DescribeAutoSnapshotPolicyEx --AutoSnapshotPolicyId sp-6we4k062vs82i66zyj8n
{
	"PageNumber": 1,
	"TotalCount": 1,
	"AutoSnapshotPolicies": {
		"AutoSnapshotPolicy": [
			{
				"CreationTime": "2019-02-24 12:54:46",
				"Status": "Normal",
				"DiskNums": 0,
				"AutoSnapshotPolicyName": "dailybackup",
				"RegionId": "ap-northeast-1",
				"VolumeNums": 0,
				"RetentionDays": 30,
				"TimePoints": "[\"0\"]",
				"AutoSnapshotPolicyId": "sp-6we4k062vs82i66zyj8n",
				"RepeatWeekdays": "[\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\"]"
			}
		]
	},
	"PageSize": 10,
	"RequestId": "3256603C-7B7E-46C9-AB7F-A9BF9173E2B9"
}

In the console, it looks like this: alt text

Apply the auto snapshot policy to the target disk

We can do this ApplyAutoSnapshotPolicy API.

$ aliyun ecs ApplyAutoSnapshotPolicy --autoSnapshotPolicyId sp-6we4k062vs82i66zyj8n --regionId ap-northeast-1 --diskIds '["d-6wecl9xdfms5r1vxml07"]'
{
	"RequestId": "8AC003A1-B1C1-4FBA-9E81-39D5208E73B9"
}

You can now verify by checking the policy and the disk description:

$ aliyun ecs DescribeAutoSnapshotPolicyEx --AutoSnapshotPolicyId sp-6we4k062vs82i66zyj8n
{
	"PageNumber": 1,
	"TotalCount": 1,
	"PageSize": 10,
	"AutoSnapshotPolicies": {
		"AutoSnapshotPolicy": [
			{
				"CreationTime": "2019-02-24 12:54:46",
				"Status": "Normal",
				"DiskNums": 1,
				"AutoSnapshotPolicyName": "dailybackup",
				"RegionId": "ap-northeast-1",
				"VolumeNums": 0,
				"RetentionDays": 30,
				"TimePoints": "[\"0\"]",
				"AutoSnapshotPolicyId": "sp-6we4k062vs82i66zyj8n",
				"RepeatWeekdays": "[\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\"]"
			}
		]
	},
	"RequestId": "D9610639-B144-43E1-875E-65D3383027DF"
}

"DiskNums": 1 tells us that one disk has been applied to this policy.

$ aliyun ecs DescribeDisks --DiskIds '["d-6wecl9xdfms5r1vxml07"]'
{
	"PageNumber": 1,
	"TotalCount": 1,
	"PageSize": 10,
	"RequestId": "135118D7-E5B9-4631-BC70-E3A240D041C0",
	"Disks": {
		"Disk": [
			{
				"DiskChargeType": "PostPaid",
				"ImageId": "ubuntu_18_04_64_20G_alibase_20181212.vhd",
				"Device": "/dev/xvda",
				"DetachedTime": "",
				"Type": "system",
				"InstanceId": "i-6we7nc6j9lm4iz6pypmh",
				"Encrypted": false,
				"ZoneId": "ap-northeast-1a",
				"EnableAutoSnapshot": true,
				"AttachedTime": "2019-02-24T03:49:57Z",
				"SourceSnapshotId": "",
				"DeleteAutoSnapshot": false,
				"KMSKeyId": "",
				"Size": 40,
				"Description": "",
				"Portable": false,
				"ProductCode": "",
				"EnableAutomatedSnapshotPolicy": true,
				"ResourceGroupId": "",
				"DiskName": "",
				"AutoSnapshotPolicyId": "sp-6we4k062vs82i66zyj8n",
				"CreationTime": "2019-02-24T03:49:52Z",
				"Tags": {
					"Tag": []
				},
				"MountInstances": {
					"MountInstance": []
				},
				"Status": "In_use",
				"Category": "cloud_efficiency",
				"RegionId": "ap-northeast-1",
				"DeleteWithInstance": true,
				"OperationLocks": {
					"OperationLock": []
				},
				"ExpiredTime": "2999-09-08T16:00Z",
				"DiskId": "d-6wecl9xdfms5r1vxml07"
			}
		]
	}
}

The disk description shows us that auto snapshot policy has been applied through the values "EnableAutomatedSnapshotPolicy": true and "AutoSnapshotPolicyId": "sp-6we4k062vs82i66zyj8n" .
From this point, a snapshot of your disk will be created according to the snapshot policy(backup schedule) .

Restore

You can restore data by doing a disk rollback with ResetDisk API. You only need to specify the target disk ID and snapshot ID. You must note that the source of the snapshot that you specify must be the disk that you are trying to rollback. Meaning you cannot rollback to Disk B with a snapshot created from Disk A. Before performing a rollback, make sure that the ECS instance attached to the disk is stopped.
If you look at the target disk description, you can identify which ECS instance is attached to it:

$ aliyun ecs DescribeDisks --DiskIds '["d-6wecl9xdfms5r1vxml07"]' --output cols=InstanceId,DiskId
InstanceId             | DiskId
----------             | ------
i-6we7nc6j9lm4iz6pypmh | d-6wecl9xdfms5r1vxml07

Let’s check the instance status

$ aliyun ecs DescribeInstances --InstanceIds '["i-6we7nc6j9lm4iz6pypmh"]' --output cols=HostName,InstanceId,Status
HostName        | InstanceId             | Status
--------        | ----------             | ------
ebsarr-ecs-test | i-6we7nc6j9lm4iz6pypmh | Stopped

Let’s rollback the disk then:
Identify the snapshot

$ aliyun ecs DescribeSnapshots --DiskId d-6wecl9xdfms5r1vxml07
{
	"PageNumber": 1,
	"TotalCount": 1,
	"PageSize": 10,
	"RequestId": "98EA4B7E-5E03-45D9-9CE1-6B18D2D33CEE",
	"Snapshots": {
		"Snapshot": [
			{
				"Description": "Very first backup",
				"SnapshotName": "d-6wecl9xdfms5r1vxml07_initial_backup",
				"ProductCode": "",
				"Encrypted": false,
				"SourceDiskSize": 40,
				"Progress": "100%",
				"Usage": "none",
				"CreationTime": "2019-02-24T04:11:40Z",
				"SourceStorageType": "disk",
				"Tags": {
					"Tag": []
				},
				"Status": "accomplished",
				"KMSKeyId": "",
				"SourceDiskType": "system",
				"SourceDiskId": "d-6wecl9xdfms5r1vxml07",
				"SnapshotId": "s-6we4k062vs82hmgt90pv"
			}
		]
	}
}

We will rollback using "SnapshotId": "s-6we4k062vs82hmgt90pv . This is the initial backup created after instance creation.

$ aliyun ecs ResetDisk --DiskId d-6wecl9xdfms5r1vxml07 --SnapshotId s-6we4k062vs82hmgt90pv
{
	"RequestId": "AFB9FBD7-8BFE-4C8A-8AA6-BAB0D2620D7E"
}

That’s it. You can restart your ECS instance and verify the data on the disk.

Another way to restore data is to create an image from the snapshot, and then create a new ECS instance with this image. I would not call this restoring data, but rather recovering your ECS instance: you can use this method to recover an ECS instance in case you delete your instance by mistake. So the steps are : 1) Create a custom image from snapshot 2) Create a new ECS instance from the custom image. Let’s walk trough the steps
Creating a custom image from a snapshot

$ aliyun ecs CreateImage --RegionId ap-northeast-1 --SnapshotId s-6we4k062vs82hmgt90pv --ImageName "TestImageFromSnapshot"
{
	"ImageId": "m-6wecl9xdfms5vjoiu9y8",
	"RequestId": "D229CF0D-605F-4404-B346-7AA76084D1F9"
}

We can verify with DescribeImages

$ aliyun ecs DescribeImages --ImageId m-6wecl9xdfms5vjoiu9y8
{
	"PageNumber": 1,
	"TotalCount": 1,
	"PageSize": 10,
	"RegionId": "ap-northeast-1",
	"RequestId": "3B0E70C6-28BF-4F87-8994-3820E57B30E8",
	"Images": {
		"Image": [
			{
				"ImageId": "m-6wecl9xdfms5vjoiu9y8",
				"Description": "",
				"ProductCode": "",
				"ResourceGroupId": "",
				"OSType": "linux",
				"Architecture": "x86_64",
				"OSName": "Ubuntu  18.04 64位",
				"DiskDeviceMappings": {
					"DiskDeviceMapping": [
						{
							"ImportOSSObject": "",
							"Format": "",
							"Device": "/dev/xvda",
							"Type": "system",
							"SnapshotId": "s-6we4k062vs82hmgt90pv",
							"ImportOSSBucket": "",
							"Progress": "",
							"Size": "40"
						}
					]
				},
				"ImageOwnerAlias": "self",
				"Progress": "100%",
				"IsSupportCloudinit": true,
				"Usage": "none",
				"CreationTime": "2019-02-24T09:41:03Z",
				"Tags": {
					"Tag": []
				},
				"ImageVersion": "",
				"Status": "Available",
				"ImageName": "TestImageFromSnapshot",
				"IsSupportIoOptimized": true,
				"IsSelfShared": "",
				"IsCopied": false,
				"IsSubscribed": false,
				"Platform": "Ubuntu",
				"Size": 40
			}
		]
	}
}

DiskDeviceMappings shows that this image was created from snapshot. You can recover data(the ECS instance) by creating a new instance with this custom image.

This post ended up being quite long, but demonstrates that backup/restore on alicloud is very easy to achieve. At the same time, you can see how snapshot is easily powerfull. You’d need many tools, storage and servers to this on your own, especially if you run a large infrastructue with hundreds of servers. Also, alicloud provides an API, meaning all of the operations I manually did through the command line for this post can be automated.