Package io.delta.kernel.defaults.engine
Class DefaultJsonHandler
Object
io.delta.kernel.defaults.engine.DefaultJsonHandler
- All Implemented Interfaces:
JsonHandler
Default implementation of
JsonHandler
based on Hadoop APIs.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionparseJson
(ColumnVector jsonStringVector, StructType outputSchema, Optional<ColumnVector> selectionVector) Parse the given json strings and return the fields requested byoutputSchema
as columns in aColumnarBatch
.readJsonFiles
(CloseableIterator<FileStatus> scanFileIter, StructType physicalSchema, Optional<Predicate> predicate) Read and parse the JSON format file at given locations and return the data as aColumnarBatch
with the columns requested byphysicalSchema
.void
writeJsonFileAtomically
(String filePath, CloseableIterator<Row> data, boolean overwrite) Makes use ofLogStore
implementations in `delta-storage` to atomically write the data to a file depending upon the destination filesystem.
-
Constructor Details
-
DefaultJsonHandler
-
-
Method Details
-
parseJson
public ColumnarBatch parseJson(ColumnVector jsonStringVector, StructType outputSchema, Optional<ColumnVector> selectionVector) Description copied from interface:JsonHandler
Parse the given json strings and return the fields requested byoutputSchema
as columns in aColumnarBatch
.There are a couple special cases that should be handled for specific data types:
- FloatType and DoubleType: handle non-numeric numbers encoded as strings
- NaN:
"NaN"
- Positive infinity:
"+INF", "Infinity", "+Infinity"
- Negative infinity:
"-INF", "-Infinity""
- NaN:
- DateType: handle dates encoded as strings in the format
"yyyy-MM-dd"
- TimestampType: handle timestamps encoded as strings in the format
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
- Specified by:
parseJson
in interfaceJsonHandler
- Parameters:
jsonStringVector
- StringColumnVector
of valid JSON strings.outputSchema
- Schema of the data to return from the parsed JSON. If any requested fields are missing in the JSON string, a null is returned for that particular field in the returnedRow
. The type for each given field is expected to match the type in the JSON.selectionVector
- Optional selection vector indicating which rows to parse the JSON. If present, only the selected rows should be parsed. Unselected rows should be all null in the returned batch.- Returns:
- a
ColumnarBatch
of schemaoutputSchema
with one row for each entry injsonStringVector
- FloatType and DoubleType: handle non-numeric numbers encoded as strings
-
readJsonFiles
public CloseableIterator<ColumnarBatch> readJsonFiles(CloseableIterator<FileStatus> scanFileIter, StructType physicalSchema, Optional<Predicate> predicate) throws IOException Description copied from interface:JsonHandler
Read and parse the JSON format file at given locations and return the data as aColumnarBatch
with the columns requested byphysicalSchema
.- Specified by:
readJsonFiles
in interfaceJsonHandler
- Parameters:
scanFileIter
- Iterator of files to read data from.physicalSchema
- Select list of columns to read from the JSON file.predicate
- Optional predicate which the JSON reader can optionally use to prune rows that don't satisfy the predicate. Because pruning is optional and may be incomplete, caller is still responsible apply the predicate on the data returned by this method.- Returns:
- an iterator of
ColumnarBatch
s containing the data in columnar format. It is the responsibility of the caller to close the iterator. The data returned is in the same as the order of files given inscanFileIter
- Throws:
IOException
- if an I/O error occurs during the read.
-
writeJsonFileAtomically
public void writeJsonFileAtomically(String filePath, CloseableIterator<Row> data, boolean overwrite) throws IOException Makes use ofLogStore
implementations in `delta-storage` to atomically write the data to a file depending upon the destination filesystem.- Specified by:
writeJsonFileAtomically
in interfaceJsonHandler
- Parameters:
filePath
- Destination file pathdata
- Data to write as Jsonoverwrite
- Iftrue
, the file is overwritten if it already exists. Iffalse
and a file existsFileAlreadyExistsException
is thrown.- Throws:
IOException
FileAlreadyExistsException
- if the file already exists andoverwrite
is false.
-