我在 SQL Server 中有一个名为的库存表dbo.inventory
其中包含Year
, Month
, Material
and Stock_quantity
。我每天都会收到 csv 文件形式的新库存计数,需要将其加载到dbo.inventory
桌子。但是,我确实需要删除数据库中的记录,如果Year
and Month
csv 文件中的数据已存在于数据库中,以避免加载同一个月的多个库存盘点。
在 SQL 中我会这样做:
Delete t1
FROM dbo.inventory t1
JOIN csv t2 ON t1.Year = t2.Year and t1.Month = t2.Month
我不知道如何在 Python 脚本中执行此操作,因此我避免将 CSV 文件作为暂存表加载到数据仓库中,而只是删除匹配的现有行Year
and Month
然后加载它们。
我在另一个设置中使用了以下内容:
delete_date = sales.Date.max()
connection = engine.connect()
connection.execute(f"""delete from sales where Date = '{delete_date}'""")
connection.close()
但这在这里不起作用,因为应删除的输入是一个数据框,理论上如果它是对早期加载数据的更正,则它可能包含多个年份和月份。
Pandas 不支持根据特定条件删除 SQL 行。为此,您必须使用 SQLAlchemy:
import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://...')
meta = sa.MetaData()
# Map the Inventory table in your database to a SQLAlchemy object
inventory = sa.Table('Inventory', meta, autoload=True, autoload_with=engine)
# Build the WHERE clause of your DELETE statement from rows in the dataframe.
# Equivalence in SQL:
# WHERE (Year = <Year from row 1 of df> AND Month = <Month from row 1 of df>)
# OR (Year = <Year from row 2 of df> AND Month = <Month from row 2 of df>)
# ...
cond = df.apply(lambda row: sa.and_(inventory.c['Year'] == row['Year'], inventory.c['Month'] == row['Month']), axis=1)
cond = sa.or_(*cond)
# Define and execute the DELETE
delete = inventory.delete().where(cond)
with engine.connect() as conn:
conn.execute(delete)
# Now you can insert the new data
df.to_sql('Inventory', engine, if_exists='append', index=False)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)