Latest Post

Future-proof Your SysAdmin Career With LPI Certifications CompTIA A+ Certification will help you to secure your career.

Technologies related to data analytics and business insight have seen a dramatic increase in their number and reach over the past decade. Cloud is undoubtedly a catalyst for this growth. Both Microsoft Azure and Amazon Web Services have been working together to offer services for collecting, uploading, storing, and processing data. Here is a comparison of the services offered by both ecosystems for different stages of Data Analytics.
The lifecycle of data to analyze is broadly defined as the following stages.
Data Ingestion
Preservation of original data source
LifeCycle Management & Cold Storage
Capture Metadata
Governance, Security and Privacy Management
Self-Service Discovery, Search and Access
Quality Management
Preparing for Analytics
Orchestration and Job Scheduling
Capturing Data Change
Part 1 of this blog will focus on the first 5 stages in Data. I will also discuss how Azure and AWS can help you accomplish that purpose.
Data Ingestion
Both AWS and Azure offer REST support, so users need only make HTTP(s) calls in order to upload data to their Cloud. Azure does not offer any services for Data Ingestion on Azure resources, although it offers a few connectors that allow data to be moved to Databases. AWS, on the other hand, has supported Ingestion stage for quite some time.
AWS’ DataPipeline service was launched in April 2013. This service can be used to schedule Data Transformation and Loading into multiple AWS storage solutions. DataPipeline is a tool that allows data to be moved from an original data source such as S3 or RDS into an analysis environment such as Redshift or EMR.
Kinesis was later introduced by AWS. It has been very popular for its Real Time data streaming capabilities. Kinesis is used by many organizations that work on IoT and Sensor data collection.
Preservation of Original Data Source
Both Azure and AWS have taken great care to ensure that they offer secure and long-lasting solutions for storing and preserving data. AWS and Azure don’t charge customers to upload data to their resources in many cases.
AWS and Azure have both offered object storage solutions with high durability, Blob storage and S3 respectively. They will have a few hundred million objects together.
They both offer database solutions, both SQL or NoSQL, in addition to object stores. Amazon RDS is the SQL data storage option on AWS. Azure offers virtual machines with MSSQL databases and managed database pools to store SQL Data. Both AWS and Azure offer Document databases in NoSQL space, DynamoDB or DocumentDB.
LifeCycle Management & Cold Storage
Both the source and processed data go through multiple transformations and extractions from the moment they enter the analytic ecosystems of the providers to the point when they are delivered to the stakeholder. Cold storage is crucial for archiving data that might be required to store the source data for compliance purposes or any re-use.
AWS offers highly durable Glacier cold storage. This solution is very popular now. In 2016, Azure Cool Blob Storage was introduced, which is a cost-effective and durable archival solution.
Capture Metadata
Information about the data and the transformations it has undergone during its lifecycle must be captured, either as information or as API calls. Both providers offer options to include metadata, but they don’t provide any mechanisms to store or update custom metadata. Both providers need to track the transformations at the data level, but API tracking is available on both providers, viz CloudTrail in AWS or API Management in A.