python3 | openpyxl的简单应用2

背景

我现在有一份HTML文件，我需要将其中的一些文字批量替换成我想要的，比如说名字，性别，生日等等等。我现在手里有一份xlsx 文档，里面有各个对应的东西，例如每一行里对应得写着这个人的名字、性别、生日等等。

我期望的结果：一系列HTML文档，每一个是用xlsx 里每一行的内容填充替代文字后的结果。

我比较熟悉的工具是R. 我想象可以将HTML 里我想要换的名字先改成name sex dob 之类的变量名，然后写一个script, 让R读xlsx 里每一行的内容然后对各个变量赋值，然后输出HTML 文件。但我暂时还没想清楚怎么写….

想问一下大家有没有更简单的方法或者工具….

前期准备

原始html

假设如下

<!DOCTYPE html>
<meta charset="UTF-8">
<title>{user-id}_{user-name}</title>
<html>
<h1>{user-name}</h1>
<h1>{user-birthday}</h1>
<h1>{user-sex}</h1>
</html>

re_user.xlsx

新增了1列id因为怕有重名的人，这样命名html会被替换。

另外birthday这个excel里的设置一下单元格格式，输入这种样子会默认是“日期”，最后python会读成34343之类的数字，excel里单元格格式改成“文本”

id	name	birthday	sex
1	张三	1月4日	男
2	李四	11月5日	女
3	王麻子	12月6日	女

文件结构

files
- html（文件夹）
  - ……
  - ……
- re_user.xlsx
- user.html

解法

import os
import openpyxl
# 先获取xlsx的内容进行数据的处理。
def get_xlsx_info(file_path):
    lists = []
    list = []
    wb = openpyxl.load_workbook(file_path)
    sheet = wb.worksheets[0]
    for row in sheet.rows:
        for i in range(0,sheet.max_column):
            list.append(row[i].value)
        lists.append(list)
        list = []
    return lists    # 返回的是每个用户list为元素的lists

# 根据用户名称和ID去生成复制版本的待替换html
def mk_html(list):
    for user_info in list:
        with open('files/user.html',mode='r',encoding='utf-8') as file_1 ,\
        open(f'files/html/{user_info[0]}_{user_info[1]}.html',mode='w',encoding='utf-8') as file_2:
            content = file_1.read()
            file_2.write(content)

# 替换掉新旧字符串，其实还是比较麻烦，打开1个文件仅仅进行1次替换操作，也就是说5个文件有5个地方需要替换的话，循环是5*5 次。
# 而且每次循环的末尾都要删和重命名文件
def replace_html(list):
    # 一个文件有N处需要替换的地方，N的由re_html的长度提供
    for i in range(0,len(re_html)):
        #遍历每一个用户信息表
        for user_info in list:
            with open(f'files/html/{user_info[0]}_{user_info[1]}.html', mode="r", encoding="utf-8") as f1,\
            open(f'files/html/{user_info[0]}_{user_info[1]}_new.html', mode="w", encoding="utf-8") as f2:
                ## 打开已经存在的对应html, 新建一个新new html 作为存放替换后的结果。
                content = f1.read()
                # 存放内容的时候，替换掉对应的新旧字符串
                content_new = content.replace(re_html[i], str(user_info[i]))
                f2.write(content_new)
                # 赶紧刷新一下，但其实这句写不写都一样
                f2.flush()
            #删除旧HTML
            os.remove(f'files/html/{user_info[0]}_{user_info[1]}.html')
            # 重命名新的
            os.rename(f'files/html/{user_info[0]}_{user_info[1]}_new.html', f'files/html/{user_info[0]}_{user_info[1]}.html')
        print(f'已更新了所有的{re_html[i]}')

if __name__ == '__main__':
    path = 'files/re_user.xlsx'
    list_user = get_xlsx_info(path)
    # 这个是去替换的时候需要提供一个用来匹配的旧字符串
    re_html = ['{user-id}','{user-name}','{user-birthday}','{user-sex}']
    mk_html(list_user)
    replace_html(list_user)