AWS :: AWS GLUE :: Creating ETL in cloud
For any type of consultation, query or doubt. You may contact to the following: (+91) 9804 436 193 debabrataguha20@gmail.com and join the group https://www.facebook.com/groups/331231161093613/ AWS Glue is a cloud service that prepares data for analysis through automated extract, transform and load (ETL) processes. Let’s jump directly into some ETL examples by handling some small sample files. 1) What is Parquet files: What is Parquet? Parquet, an open source file format for Hadoop. Parquet stores nested data structures in a flat columnar format. Compared to a traditional approach where data is stored in row-oriented approach, parquet is more efficient in terms of storage and performance. Why Parquet? Parquet stores binary data in a column-oriented way, where the values of each column are organized so that they are all adjacent, enabling better compression. It is especially good for queries which read particular columns from a “wide” (with many...