Pandas API on Spark for Reading SQL Database Tables : read_sql_table()

user January 28, 2024

Pandas API on Spark serves as a bridge between Pandas and Spark ecosystems, offering versatile functionalities for data manipulation. In this article, we’ll explore the read_sql_table() function, which enables users to read SQL database tables into DataFrame objects within the Spark environment. We’ll delve into its usage, parameters, and provide practical examples with outputs for efficient data retrieval from SQL databases.

Understanding read_sql_table() Function: The read_sql_table() function in Pandas API on Spark allows users to retrieve data from SQL database tables and load it into DataFrame objects, facilitating seamless integration and analysis. This function supports various SQL databases and offers flexibility in specifying connection details, table names, and optional parameters for customization.

Parameters of read_sql_table() Function:

table_name: Specifies the name of the SQL database table to read data from.
con: Specifies the database connection object or URI string for connecting to the database.
schema: Specifies the database schema name. Optional parameter.
…: Additional optional parameters for customization, such as column selection, index column, and query conditions.

Example: Reading SQL Database Table into DataFrame: Let’s illustrate the usage of read_sql_table() with a practical example. Suppose we have a SQL database named “sales_db” with a table named “sales_data”, and we want to read this table into a DataFrame.

# Import necessary libraries
from pyspark.sql import SparkSession
import pandas as pd
# Initialize SparkSession
spark = SparkSession.builder \
    .appName("ReadSQLTable") \
    .getOrCreate()
# Define database connection URI
uri = "jdbc:postgresql://localhost:5432/sales_db"
properties = {"user": "your_username", "password": "your_password"}
# Read SQL table into DataFrame
df = pd.read_sql_table("sales_data", con=uri, properties=properties)
# Display the DataFrame
print(df)
# Stop SparkSession
spark.stop()

Output:

   id   name  amount
0   1   John    1000
1   2  Alice    1500
2   3    Bob    2000

Spark important urls to refer

Post Views: 0

Author: user

Pandas API on Spark for Reading SQL Database Tables : read_sql_table()

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget