What is Amazon Virtual Private Cloud (AWS VPC Tutorial 2019)
The Amazon VPC can serve the needs of most individuals and small businesses. In fact, even a medium-sized business can probably use the default VPC without any problem.
This tutorial explains Amazon virtual private cloud and Amazon cloud migration service from scratch. This tutorial gives you an overview of the process of creating a custom AWS VPC in 2019.
However, some situations exist, such as when you need to create custom subnets or obtain access to special VPC features, in which creating a custom VPC becomes important.
Virtual Private Cloud (VPC) Features
The idea behind a VPC is to create an environment in which a system separates the physical world from an execution environment.
Essentially, VPC is a kind of virtual machine combined with a Virtual Private Network (VPN) and some additions that you probably won’t find with similar setups. Even so, the concept of using VPC as a virtual machine is the same as any other virtual machine.
You can read more about the benefits of using a virtual machine at https://www.linux.com/learn/why-when-and-how-use-virtual-machine. The connectivity provided by a VPC is akin to the same connectivity provided by any other VPN.
When working with AWS, you never actually see or interact with the physical device running the code that makes the resources you create active. You don’t know where the physical hardware resides or whether other VPCs are also using the same physical hardware as you are.
In fact, you have no idea of whether the code used to create an EC2 instance even resides on just one physical machine. The virtual environment — this execution environment that doesn’t exist in the physical world — lets you improve overall reliability and make recovering from crashes easier.
In addition, the virtual nature of the environment fully separates the code that your organization executes from code that any other organization executes. This concept of total separation tends to make the environment more secure as well.
The following sections describe what a VPC is in more detail and why you can benefit from one in making your organization Internet-friendly.
Defining the VPC and the reason you need it
The Internet is the public cloud. Anyone can access the Internet at any time given the correct software. You don’t even need a browser. Applications access the Internet all the time without using one — a browser is simply a special kind of Internet access application.
Despite the public nature of the Internet, it actually provides four levels or steps that you follow from being completely public to being nearly private:
1. The public, unrestricted Internet
2. Sites that limit access community data using logins and other means
3. Sites that provide access to individually identifiable data for pay or other considerations through a secure connection
4 A nearly private connection that is accessible only between consenting parties (and any hackers that may be listening) The initial step that everyone takes is the public Internet.
You do something to access the Internet; you may use your smart television or some alternative means, but you take this initial step every time you begin a session.
What the Internet really provides is access to a much larger network in which anyone can find resources and use them to meet specific needs. For example, you might read the news stories on a site while someone else downloads precisely the same information and analyzes it in some manner.
Seeing the data of the Internet is important because it helps you understand that the Internet isn’t about games or information; rather, it’s about connections to resources that mainly revolve around data.
To take the next step, you need to consider all the sites out there that limit your access to the data that the Internet provides. For example, when you want to read news stories on some sites, you must first log in to the site.
The need to log in to the site represents a connectivity hurdle that you must overcome in order to gain access to the resource, which is data. Whether you read the data or analyze it, you must still log in. The site is still public.
Anyone who has an account can access it. A third step is public sites that host private data. For example, when you make a purchase at Amazon, you first log in to your account. The data is visible only to you, not anyone else with an account. All others with an account see only their private data as well. However, the site itself is still public.
A VPC is the fourth step. In this case, you separate everything possible from everything else using a variety of software-oriented techniques, with a little machine-level hardware reinforcement.
Keeping everything separated reduces security issues. After all, you don’t want another organization (or a hacker) to know anything about what you’re doing.
Realize that you are potentially using the same physical hardware and definitely using the same cable as other people. The lack of capability to create a separate physical environment is the reason that hackers continue to create methods of overcoming security and gaining access to your resources anyway.
The reason you need a VPC is to ensure that your cloud computing is secure, or at least as secure as possible when it comes to allowing any connectivity to the outside world. In fact, without a VPC, you couldn’t use the cloud for any sensitive data, even if you had no requirement for keeping the data secure legally or ethically.
Without a VPC, every communication would be akin to creating a post on Facebook: Anyone could see it. VPCs are actually quite common because they’re so incredibly useful. Here are some other vendors that make VPCs available as part of their offerings:
HP Hybrid Cloud – HPE Helion
Microsoft Azure Other offerings exist, especially on the regional level. VPC certainly isn’t unique to Amazon; it’s becoming a common technology, and you need to ensure that the Amazon offering suits your needs.
Of course, if you want to use a VPC with a product such as EC2, you really do need the Amazon offering because both are part of AWS.
Getting an overview of the connectivity options
How you make a connection to a VPC is important because different connection types have different features and characteristics. Choosing the right connection option will yield significant gains in efficiency, reliability, and security. You might also see a small boost in speed. The following list describes the common VPC connectivity options:
AWS Hardware VPN: You generally use a hardware router and gateway to provide the Internet Protocol Security (IPSec) connection to your VPN.
A software VPN offers a minimalistic approach to creating a connection between a VPC and your network. You rely on software to simulate the actions normally performed by hardware to create the connection.
Of all the options, this one is the slowest because you rely on software to perform a task best done with hardware. However, small and even medium-sized businesses may find that it works without a problem.
The only issue is that AWS doesn’t actually provide software VPN support, so you need to rely on one of the third parties listed in the AWS Marketplace.
Working with subnets
Generally, you begin with a number of subnets for your AWS setup, using one for each of the availability zones in your region.
For example, when working in the us-west-2 region, you have three subnets: us-west-2a, us-west-2b, and us-west-2c
. You want to avoid confusing the region with the availability zone.
Using the wrong value can result in commands that don’t work or that incorrectly configure features.
To access these subnets, choose Subnets in the Navigation pane. You see a listing of subnets. Each subnet lists its status along with other essential information that you need to access features in AWS.
These three subnets are internal. You use them as part of working with AWS. Deleting these subnets will cause you to lose access to AWS functionality, so the best idea is to leave them alone unless you need to perform specific configuration tasks.
Moving Data Using Database Migration Service
Database movement occurs in two scenarios: homogenous moves between installations of the same Database Management System (DBMS) product(the software that performs the actual management of the data you send to it for storage) and heterogeneous moves among different DBMS products.
Homogenous moves are easiest because you don’t need to consider issues such as differences in database features nearly as often (except, possibly, when performing an upgrade move). The blog covers homogenous moves first for this reason. However, the blog does discuss both homogenous and heterogeneous moves.
This service is free, but the compute time, data transfer time, and storage resources above a certain amount aren’t. The charges for these items are quite small, however. According to Amazon’s documentation, you can migrate a 1TB database for as little as $3.
A list of prices appears at https://aws.amazon.com/dms/ pricing/. Pricing varies by EC2 instance type (with the t2.micro instance used in the “Creating an instance” section of blog4 costing the least).
Data transfer charges don’t exist when you transfer information into a database, but you are charged when you transfer data out. Storage prices vary, but you get a certain amount of storage free (50GB in the case of the setup described in this blog). The setups in this blog won't cost you any money to perform.
Actually completing a migration will cost you money, but not much. You need to decide how far you want to go in performing the exercises in this blog. Actually performing the migration will cost you something but also provide experience in completing the tasks described.
Considering the Database Migration Service Features
It’s important to know what to expect from the DMS before you begin using it in an actual project. For example, the main page at https://aws.amazon.com/dms/ advertises zero downtime.
However, when you read the associated text, you discover that some downtime is actually involved in migrating the database, which makes sense because you can’t migrate a database containing open records (even with continuous replication between the source and target).
The fact is that you experience some downtime in migrating any database, so you have to be careful about taking any claims to the contrary at face value. Likewise, the merits of a claim that service is easy to use depend on the skills of the person performing the migration.
An expert DBA will almost certainly find the DMS easy to use, but a less experienced administrator may encounter difficulties. With these caveats in mind, the following sections provide some clarification in what you can expect from the DMS in terms of features you can use to make your job easier.
Choosing a target database
You already have a source database in place on your local network. If you’re happy with that database and simply want to move it to the cloud, you can perform a homogenous migration. A homogenous migration is the simplest type, in most cases, as long as you follow a few basic rules:
Ensure that the source and target database are the same version, have the same updates installed, and use the same extensions.
Configure the target database to match the source database if at all possible (understanding that the configuration may not provide optimal speed, reliability, and security in a cloud environment).
Define the same characteristics for the target database as are found in the source database, such as ensuring that both databases support the same security.
Perform testing during each phase of the move to ensure that the source and target databases really do perform the same way.
Don’t make the error of thinking that moving Microsoft SQL Server to Amazon Aurora is a homogenous data move.
Anytime that you must marshal the data (make the source database data match the type, format, and context of the destination database data) or rely on a product such as the AWS Data Migration Service to move the data, you are performing a heterogeneous data move (despite what the vendor might say).
Even if the two DBMSs are compatible, that means that they aren’t precisely the same, which means that you can encounter issues related to heterogeneous moves.
Treating a move that involves two different products, even when those products are compatible, as a heterogeneous move is a smart way to view the process. Otherwise, you’re opening yourself to potential unexpected delays.
In some cases, you may decide to move data from a source database that works well in a networked environment to a target database that works well in the cloud environment.
The advantage of performing a heterogeneous move (one in which the source and target aren’t the same) is that you can experience gains in speed, reliability, and security. In addition, the target database may include features that your current source database lacks.
The disadvantage is that you must perform some level of marshaling (modifying the data of the source database to match the target database) to ensure that your move is successful.
Modifying data usually results in some level of content (the actual value of the data) or context (the data’s value when associated with other data) loss. In addition, you may find yourself rewriting scripts that perform well on the source database but may not work at all with the target database.
A decision to move to a new target database may come with some surprises as well (most of the bad sort). For example, you can move data from your Microsoft SQL Server database to the Amazon Aurora, MySQL, PostgreSQL, or MariaDB DBMS. Each of these target databases has advantages and disadvantages that you must consider before making the move.
For example, Amazon provides statistics to show that Amazon Aurora performs faster than most of its competitors, but it also locks you into using AWS with no clear migration strategy to other cloud-vendor products.
In addition, Amazon Aurora contains features that may not allow you to move your scripts with ease, making recoding an issue.
You also need to research the realities of some moves. For example, some people may feel that moving to MySQL has advantages in providing a larger platform support base than Microsoft SQL Server.
However, Microsoft is now working on a Linux version of Microsoft SQL Server that may make platform independence less of an issue. The point is that choosing a target for your cloud-based DBMS will require time and an understanding of your organization's specific needs when making the move.
No matter what a vendor tries to tell you, you will have some downtime when migrating data of any kind from any source to any target. The amount of time varies, but some sort of downtime is guaranteed, so you must plan for it. The following list provides some common sources of downtime during a migration:
Performing the data transfer often means having all records locked, which means that users can’t make changes (although they can still potentially use the data for read-only purposes).
Data marshaling problems usually incur a time penalty as administrators, DBAs, developers, and DevOps all work together to discover solutions that will work.
Changing applications to use a new data source always incurs a time penalty. The changeover could result in major downtime when the change doesn’t work as expected.
Unexpected scripting issues can mean everything from data errors to reports that won’t work as expected. Repairs are usually time-consuming at best.
Modifications that work well in the lab suddenly don’t work in the production environment because the lab setup didn’t account for some real-world difference.
Users who somehow don’t get a required update end up using outdated data sources or applications that don’t work well with the new data source.
Schema conversions can work well enough to transfer the data, but they can change its content or context just enough to cause problems with the way in which applications interact with the data.
Consequently, full application testing when performing a heterogeneous move of any sort is a requirement that some organizations skip (and end up spending more time remediating than if they had done the proper testing in the first place).
Differences in the cloud environment add potential latency or other timing issues not experienced in the local network configuration.
An essential part of keeping downtime to a minimum, despite these many sources of problems, is to be sure to use real-world data for testing in a lab environment that duplicates your production environment as closely as possible.
This means that you need to address even the small issues, such as ensuring that the lab systems rely on the same hardware and use the same configuration as your production environment.
You also need to perform real-world testing that relies on users who will actually use the application when it becomes part of the production environment. If you don’t perform real-world testing under the strictest possible conditions, the amount of downtime you experience will increase exponentially.
Not only does an optimistic lab setup produce unrealistic expectations, but it also creates a domino effect in which changes, procedures, and policies that would work with proper testing don’t work because they aren’t properly tested and verified in the lab.
You must also use as many tools as you can to make the move simpler. The “Understanding the AWS Schema Conversion Tool” section, later in this blog, discusses the use of this tool to make moves between heterogeneous databases easier.
However, a great many other tools are on the market, so you may find one that works better for your particular situation.
Organizations can end up with data in a number of different DBMSs because of mergers and inefficiencies within the organization itself. A workgroup database may eventually see use at the organization level, so some of these DBMS scenarios also occur as a result of growth.
Whatever the source of the multitude of DBMSs, consolidating the data into a single DBMS (and sometimes a single database) can result in significant improvement in organizational efficiency.
However, when planning the consolidation, view it as multiple homogenous or heterogeneous moves rather than a single big move. Each move will require special considerations, so each move is unique. All you’re really doing is moving multiple sources to the same target.
A potential issue with data consolidation occurs when multiple source databases have similar data. When you consolidate the data, not only do you have to consider marshaling the data from the source schema to the destination schema, but you must also consider the effects of combining the data into a coherent whole.
This means considering what to do with missing, errant, outdated, or conflicting data. One database can quite possibly have data that doesn’t match a similar entry in another database. Consequently, test runs that combine the data and then look for potential data issues are an essential part of making a consolidation work.
One of the ways in which you can use the AWS DMS is to replicate data. Data replication to a cloud source has a number of uses, which include:
Providing continuous backup
Acting as a data archive
Performing the role of the main data storage while the local database acts as a cache
Creating an online data source for users who rely on mobile applications
Developing a shareable data source for partners
When used in this way, the AWS DMS sits between the source database and one or more target databases. You can use a local, networked, or cloud database as the source.
Normally, the target resides in the cloud. Theoretically, you can create a heterogeneous replication, but homogenous replications are far more reliable because you don’t need to worry about constantly marshaling the data between different source and target DBMS.
Moving Data between Homogenous Databases
Moving data between homogeneous databases (those of precisely the same type) is the easiest kind of move because you have a lot less to worry about than when performing a heterogeneous move (described in the “Moving Data between Heterogeneous Databases” section, later in this blog).
For example, because both databases are the same, you don’t need to consider the need to marshal (convert from one type to another) data between data types. In addition, the databases will have access to similar features, and you don’t necessarily need to consider issues such as database storage limitations.
The definition for homogenous can differ based on what you expect in the way of functionality. For the purposes of this blog, a homogenous data move refers to moving data between copies of precisely the same DBMS.
A move between copies of SQL Server 2016 is homogenous, but moving between SQL Server and Oracle isn’t, even though both DBMSs support relational functionality. Even a move between SQL Server 2016 and SQL Server 2014 could present problems because the two versions have differing functionality.
Trying a homogenous move before you attempt a heterogeneous move is important because the homogenous move presents an opportunity to separate database issues from movement issues.
The following sections help you focus on the mechanics of a move that doesn’t involve any database issues. You can use these sections to build your knowledge of how moves are supposed to work and to ensure that you fully understand how moves work within AWS.
Obtaining access to a source and target database
Moving the data
To move data, you must create a migration task. The following steps describe how to create a task that will migrate data from the source test database to the target test database:
1. Click Create Migration.
A Welcome page appears that tells you about the process for migrating a database. This page also specifies the steps you need to perform in the Navigation pane and provides a link for downloading the AWS Schema Conversion Tool.
2. Click Next.
The wizard displays the Create Replication Instance page. This page helps you define all the requirements for performing the migration task.
3. Type MoveMySQLData in the Name field.
Be sure to name your task something descriptive. You may end up using the replication task more than once, and trying to remember what the task is for is hard if you don’t use a descriptive name
4. (Optional) Type a detailed description of the task’s purpose in the Description field.
5. Choose the dms.t2.micro option from the Instance Class field.
The move relies on your EC2 instance. To get free-tier EC2 support, you need to use the dms.t2.micro option.
However, consider the cost of using the service. All incoming data is free. You can also transfer data between Amazon RDS and Amazon EC2 Instances in the same Availability Zone free. Any other transfers will cost the amount described at https://aws.amazon.com/dms/ pricing/.
6. Click the down arrow next to the Advanced heading.
You see the advanced options for transferring the data.
7. Type 30 (or less) in the Allocated Storage GB field.
Remember that you get only 30GB of free EBS storage per month (see https://aws.amazon.com/free/), so experimenting with a larger storage amount will add to your costs.
8. Choose Default-Launch in the VPC Security Group(s) field.
Using this security group ensures that you have access to the migration as needed.
Migrate Existing Data: Copies all the data from the source database to the target database.
Migrate Existing Data and Replicate Ongoing Changes: Performs the initial data copy and then monitors the source database for changes. When AWS detects changes, it copies just the changes to the target database (saving resources and maintaining speed).
Replication Data Changes Only: Assumes that the source and target databases are already in sync. AWS monitors the source database and copies only the changes to the target.
Select the Start Task on Create check box.
This option specifies that you want the migration to start immediately. If you deselect this check box, you need to start the task manually.
Click Create Task.
After a few moments, you see the Tasks page of the DMS Management Console. The task’s Status field contains Creating until the creation process is complete.
Moving Data between Heterogeneous Databases
DBMSs come in many different forms because people expect them to perform a wide variety of tasks. The relational DBMS serves the interests of businesses because it provides organized data storage that ensures data integrity and moderately fast response times.
However, the relational database often doesn’t work well for data that isn’t easy to organize, such as large quantities of text, which means that you must use a text-based DBMS instead.
The popular NoSQL DBMSs provide a nontabular approach to working with big data and real-time applications that aren’t easy to model using relational strategies.
In short, the need for multiple DBMS types is well established and reasonable because each serves a different role. However, moving data between DBMS of different types can become quite difficult.
The following sections can’t provide you with a detailed description of every move of this type (which would require an entire blog), but they do give you an overview of the AWS perspective of heterogeneous data moves.
Considering the essential database differences
Even if providing such a discussion were possible, considering the wealth of available DBMSs, the resulting text would be immense.
Fortunately, you don’t need to know the particulars of every DBMS; all you really need to think about are the types of differences you might encounter so that you’re better prepared to deal with them.
The following list presents essential database differences by type and in order of increasing complexity. As a difference becomes more complex to handle, the probability of successfully dealing with it becomes lower. In some cases, you must rely on compromises of various sorts to achieve success.
Features: Whenever a vendor introduces a new version of a DBMS product, the new version contains new features. Fortunately, many of these products provide methods for saving data in a previous version format, making the transition between versions easy.
This same concept holds for working with products (the data target) that can import specific versions of another product’s data (the data source). Exporting the data from the source DBMS in the required version makes data transfers to the target easier.
Functionality: One DBMS may offer the capability to store graphics directly in the database, while another can store only links to graphics data. The transition may entail exporting the graphic to another location and then providing that location as input to the new DBMS as a link.
Platform: Some platform differences can prove quite interesting to solve. For example, one platform may store paths and filenames in a manner in which case doesn’t matter, while another store this same information in a case-sensitive way. The data exchange may require the use of some level of automation to ensure consistency of path and filename case.
Data types: Most data type issues are relatively easy to fix because software commonly provides methods to marshal (change) one data type to another.
However, you truly can’t convert a Binary Large Object (BLOB) type text field into a fixed-length text field of the sort used by relational databases, so you must create a custom conversion routine of some sort. In short, data type conversions can become tricky because you can change the context, meaning,
Adding automation, such as code stored in data fields, to a DBMS significantly increases the complexity of moving data from one DBMS to another. In many cases, you must choose to leave the automation behind when making the data move or representing it in some other way.
Data organization: Dealing with DBMSs of different types, such as moving data from a NoSQL database to a relational database, can involve some level of data loss because the organization of the data between the two DBMSs is so different.
Any conversion will result in data loss in this case. In addition, you may have to calculate some values, replace missing values, and perform other sorts of conversions to successfully move the data from one DBMS to another of a completely different organizational type.
Storage methodology: The reason that storage methodology can incur so many issues is that the mechanics of working with the data are now different.
Having different storage technologies in play means that you must now consider all sorts of issues that you don’t ordinarily need to consider, such as whether the data requires encryption in the target database to ensure that the storage meets any legal requirements.
Given that cloud storage is inherently different from storage on a local drive, you always encounter this particular difference when moving your data to AWS, and you really need to think about all that a change in storage methodology entails.