![]() parquet ( "data/test_table/key=2" ) // Read the partitioned table Dataset mergedDF = spark. parquet ( "data/test_table/key=1" ) List cubes = new ArrayList () for ( int value = 6 value cubesDF = spark. Here we prefix all the names with "Name:" schema <- structType ( structField ( "name", "string" )) teenNames <- dapply ( df, function ( p ) List squares = new ArrayList () for ( int value = 1 value squaresDF = spark. show () // +-+ // | value| // +-+ // |Name: Justin| // +-+ĭf = 13 AND age <= 19" ) head ( teenagers ) # name # 1 Justin # We can also run custom R-UDFs on Spark DataFrames. map ( ( MapFunction ) row -> "Name: " + row. sql ( "SELECT name FROM parquetFile WHERE age BETWEEN 13 AND 19" ) Dataset namesDS = namesDF. createOrReplaceTempView ( "parquetFile" ) Dataset namesDF = spark. ![]() parquet ( "people.parquet" ) // Parquet files can also be used to create a temporary view and then used in SQL statements parquetFileDF. ![]() Parquet files are self-describing so the schema is preserved // The result of loading a parquet file is also a DataFrame Dataset parquetFileDF = spark. parquet ( "people.parquet" ) // Read in the Parquet file created above. json ( "examples/src/main/resources/people.json" ) // DataFrames can be saved as Parquet files, maintaining the schema information peopleDF. Import. import .Encoders import .Dataset import .Row Dataset peopleDF = spark.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |