menu

Questions & Answers

Spark SQL pass variable to query

I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

The error:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax
Comments:
2023-01-21 00:30:11
bkt = 1 prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ="%bkt%"\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)") prime.show(50)
2023-01-21 00:30:11
Is this a question? Not sure why you've posted more code in a comment, as well. Please read How to Ask a Question too.
2023-01-21 00:30:11
First of all s"..." is a syntax error - what is that supposed to mean? Secondly, trying to format a string with $bkt is not valid python syntax. Look up String formatting in python
2023-01-21 00:30:11
The title of my post is my question. I got the s"..." from this answer which was marked correct on stackoverflow.stackoverflow.com/questions/37284216/…
2023-01-21 00:30:11
@email83 I don't know what language that is, but the answer you're looking for is this one: stackoverflow.com/a/37284354/5858851
2023-01-21 00:30:11
It's python. Yes, saw and tested that syntax as well. bkt = "1" prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ='{}'\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)").format(id) prime.show(50) AttributeError: 'DataFrame' object has no attribute 'format'
Answers(2) :

I found the correct syntax buried in this databricks post.

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

You add a lower case f in front of the query and wrap braces around the name of the variable in the query.

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)

This should work

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)

Comments:
2023-01-21 00:30:11
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.