Loading JSON schema from a JSON string in PySpark

user October 12, 2023

We want to load the JSON schema from a JSON string. In PySpark, you can do this by parsing the JSON string and creating a schema from it. Here’s a revised article that demonstrates how to load a JSON schema from a JSON string:

Loading JSON Schema from a JSON string in PySpark

In PySpark, you can load a JSON schema from a JSON string, allowing you to dynamically define the schema for your data. This can be useful when your data structure evolves or when you want to provide flexibility in handling different JSON structures.

1. Importing PySpark

First, make sure you have PySpark installed. You can install it using pip:

pip install pyspark

Import the necessary modules:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
import json

2. Creating a SparkSession

Create a SparkSession, the entry point for using PySpark:

spark = SparkSession.builder.appName("JSONSchema from JSONString at Freshers.in").getOrCreate()

3. Defining the JSON schema

Define your JSON schema by parsing a JSON string. This JSON string represents the schema structure. Here’s an example JSON schema string:

schema_json_string = """
{
    "type": "struct",
    "fields": [
        {"name": "id", "type": "integer", "nullable": true, "metadata": {}},
        {"name": "first_name", "type": "string", "nullable": true, "metadata": {}},
        {"name": "last_name", "type": "string", "nullable": true, "metadata": {}},
        {"name": "age", "type": "integer", "nullable": true, "metadata": {}},
        {"name": "salary", "type": "double", "nullable": true, "metadata": {}}
    ],
    "metadata": {}
}
"""

4. Creating a StructType schema

Parse the JSON schema string and create a StructType schema object:

schema_dict = json.loads(schema_json_string)
schema = StructType.fromJson(schema_dict)

5. Loading JSON Data with the schema

Now, you can load JSON data using the defined schema:

json_data = [
    {"id": 1, "first_name": "Sachin", "last_name": "Tendulkar", "age": 30, "salary": 50000.0},
    {"id": 2, "first_name": "Rajesh", "last_name": "Kanna", "age": 25, "salary": 60000.0},
    {"id": 3, "first_name": "Mahesh", "last_name": "Raj", "age": 35, "salary": 75000.0}
]

df = spark.createDataFrame(json_data, schema=schema)

6. Viewing the dataframe

You can now perform various operations on the DataFrame, such as displaying the schema or showing the first few rows of data:

df.printSchema()
df.show()

Output

root
 |-- id: integer (nullable = true)
 |-- first_name: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- salary: double (nullable = true)

+---+----------+---------+---+-------+
| id|first_name|last_name|age| salary|
+---+----------+---------+---+-------+
|  1|    Sachin|Tendulkar| 30|50000.0|
|  2|    Rajesh|    Kanna| 25|60000.0|
|  3|    Mahesh|      Raj| 35|75000.0|
+---+----------+---------+---+-------+

Spark important urls to refer

Post Views: 36

Author: user

Loading JSON schema from a JSON string in PySpark

Loading JSON Schema from a JSON string in PySpark

1. Importing PySpark

2. Creating a SparkSession

3. Defining the JSON schema

4. Creating a StructType schema

5. Loading JSON Data with the schema

6. Viewing the dataframe

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

Loading JSON Schema from a JSON string in PySpark

1. Importing PySpark

2. Creating a SparkSession

3. Defining the JSON schema

4. Creating a StructType schema

5. Loading JSON Data with the schema

6. Viewing the dataframe

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget