20 Posts

Data engineering

Apache Spark for Data Science - User-Defined Functions (UDF) Explained

Apache Spark for Data Science - User-Defined Functions (UDF) Explained

You find Python easier than SQL? User-Defined Functions in PySpark might be what you’re …

Apache Spark for Data Science - Hands-On Introduction to Spark SQL

Apache Spark for Data Science - Hands-On Introduction to Spark SQL

Spark SQL - From basics to Regular Expressions and User-Defined Functions (UDF) in 10 minutes - …

Apache Spark for Data Science - Word Count With Spark and NLTK

Apache Spark for Data Science - Word Count With Spark and NLTK

Learn to count words of a book and address the common stop word issue - implemented in PySpark

Apache Spark for Data Science - How to Work with Spark RDDs

Apache Spark for Data Science - How to Work with Spark RDDs

Spark is based on Resilient Distributed Datasets (RDD) - Make sure you know how to use them

Apache Spark for Data Science - How to Install and Get Started with PySpark

Apache Spark for Data Science - How to Install and Get Started with PySpark

Want to learn Apache Spark for Data Science? This guide will help you get started. Learn how to …

Apache Airflow for Data Science - How to Work with Variables

Apache Airflow for Data Science - How to Work with Variables

Hardcoding values in your Airflow DAGs is a bad practice. Learn how to use Airflow variables …

Apache Airflow for Data Science - How to Download Files from Amazon S3

Apache Airflow for Data Science - How to Download Files from Amazon S3

Learn how to download files from Amazon S3 (AWS) to your local machine with Apache Airflow and …

Apache Airflow for Data Science - How to Upload Files to Amazon S3

Apache Airflow for Data Science - How to Upload Files to Amazon S3

Learn how to setup an Amazon S3 (AWS) Bucket and how to upload files from local disk with …

Apache Airflow for Data Science - How to Work with REST APIs

Apache Airflow for Data Science - How to Work with REST APIs

Learn to work with REST APIs in Apache Airflow by utilizing HttpSensor and HttpOperator Airflow …

Apache Airflow for Data Science - How to Communicate Between Tasks with Airflow XComs

Apache Airflow for Data Science - How to Communicate Between Tasks with Airflow XComs

Learn to send and receive data between Airflow tasks with XComs, and when you shouldn’t …

Apache Airflow for Data Science - How to Run Tasks in Parallel

Apache Airflow for Data Science - How to Run Tasks in Parallel

Build a Data Pipeline (DAG) in Apache Airflow that makes four GET API requests in Parallel.

Apache Airflow for Data Science - How to Migrate Airflow Metadata DB to Postgres and Enable Parallel Execution

Apache Airflow for Data Science - How to Migrate Airflow Metadata DB to Postgres and Enable Parallel Execution

Apache Airflow doesn’t run tasks in parallel by default - but there’s an easy fix. …

Apache Airflow for Data Science - How to Work With Databases (Postgres)

Apache Airflow for Data Science - How to Work With Databases (Postgres)

Learn how to extract, transform, and load data with Airflow and Postgres database by coding a …

Apache Airflow for Data Science - How to Write Your First DAG in 10 Minutes

Apache Airflow for Data Science - How to Write Your First DAG in 10 Minutes

Apache Airflow is a common tool used by Data Engineers. Learn how to write your first data …

Stop Using Python to Aggregate Data - Use SQL Instead

Stop Using Python to Aggregate Data - Use SQL Instead

Are you using Python to extract raw data from the database? It could be a huge bottleneck in …

Apache Airflow for Data Science - How to Install Airflow Locally

Apache Airflow for Data Science - How to Install Airflow Locally

Want to learn Apache Airflow as a Data Engineer? Start by installing it locally. Go from zero …

Apache Kafka in Python: How to Stream Data With Producers and Consumers

Apache Kafka in Python: How to Stream Data With Producers and Consumers

Apache Kafka Tutorial Series 3/3 - Learn how to write Kafka Producers and Consumers in Python, …

Master the Kafka Shell in 5 Minutes — Topics, Producers, and Consumers Explained

Master the Kafka Shell in 5 Minutes — Topics, Producers, and Consumers Explained

Apache Kafka Tutorial Series 2/3 - Learn all about Kafka topics, console Producers, and …

How to Install Apache Kafka Using Docker — The Easy Way

How to Install Apache Kafka Using Docker — The Easy Way

Apache Kafka Tutorial Series 1/3 - Learn how to install Apache Kafka using Docker and how to …

Python has a Built-in Database — Here’s How to use it

Python has a Built-in Database — Here’s How to use it

Learn to use Python’s built-in database in minutes with this complete guide.