Using the impyla module, I've downloaded the results of an impala query into a pandas dataframe, done analysis, and would now like to write the results back to a table on impala, or at least to an hdfs file. However, I cannot find any information on how to do this, or even how to ssh into the impala shell and write the table from there. What I'd like to do:
from impala.dbapi import connect from impala.util import as_pandas # connect to my host and port conn=connect(host='myhost', port=111) # create query to save table as pandas df create_query = """ SELECT * FROM <> """.format(my_table_name) # run query on impala cur = conn.cursor() cur.execute(create_query) # store results as pandas data frame pandas_df = as_pandas(cur) cur.close()
Once I've done whatever I need to do with pandas_df, save those results back to impala as a table.
# create query to save new_df back to impala save_query = """ CREATE TABLE new_table AS SELECT * FROM pandas_df """ # run query on impala cur = conn.cursor() cur.execute(save_query) cur.close()
The above scenario would be ideal, but I'd be happy if I could figure out how to ssh into impala-shell and do this from python, or even just save the table to hdfs. I'm writing this as a script for other users, so it's essential to have this all done within the script. Thanks so much!