Download Click here to download full source code file
Here, sample_models.json is for your reference
{
"model": "suppression_model1",
"columns": "company name,address1,city,state,zipcode,email",
"created_date": "3/23/2021"
}
you can also download sample csv files from here csv file and you may have this file in clientfiles folder
Step 1 : Load the Json columns in a variable
with open('sample_models.json', 'r') as file:
data_json = json.load(file)
jsonColumns = data_json["columns"]
jsonColumns = jsonColumns.split(",")
Step 2: Loop through the CSV files from a specific folder and match with the json column
inputFile = r"C:\yourfile\path\clientfiles"
all_files = glob.glob(inputFile + "/*.csv")
Step 3: Clean the column names and make it as lower case
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
#print(repr(filename[-1]))
csvfilename = os.path.basename(filename)
df.columns= df.columns.str.lower()
Step 4: Loop through the json columns for each file, If the cleaned column name match with the json column then take that matching column and write to a new curate file
df.columns= df.columns.str.replace(' ', '', regex=True).str.replace('.', '_', regex=True)
datacsvfile = list(df.columns)
new_df = pd.DataFrame()
for jdata in jsonColumns:
if jdata in datacsvfile:
new_df = new_df.append(df[jdata])
else:
df[jdata] = ''
new_df = new_df.append(df[jdata])
Step 5: Copy the curate file to another folder named curated
new_df.T.to_csv(f'.\\curatedfiles\\' + csvfilename + '', index=False)