In Office Open XML
for Word
(*.docx
) the altChunk
提供了一种使用纯HTML
描述文档部分。
关于以下两个重要注意事项altChunk
:
第一:它仅用于导入内容。如果您使用打开文档Word
并保存它,新保存的文档将不包含替代格式内容部分,也不包含引用它的altChunk标记。 Word 将所有导入的内容保存为默认值Office Open XML
元素。
第二:大多数应用程序,除了Word
哪些人能够阅读*.docx
也会not阅读altChunk
内容根本。例如Libreoffice
or OpenOffice
Writer
will not阅读altChunk
内容以及apache poi
will not阅读altChunk
打开时的内容*.docx
file.
How is altChunk
实施于*.docx
ZIP
文件结构?
有/word/*.html
文件在*.docx
ZIP
文件。这些由 Id 引用/word/document.xml
as <w:altChunk r:id="htmlDoc1"/>
例如。 Id 和 ID 之间的关系/word/*.html
文件给出在/word/_rels/document.xml.rels
as <Relationship Id="htmlDoc1" Target="htmlDoc1.html" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk"/>
例如。
所以我们首先需要POIXMLDocumentPart
s 为/word/*.html
文件和POIXMLRelation
s 表示 Id 和/word/*.html
文件。以下代码提供了一个扩展 POIXMLDocumentPart 的包装类/word/htmlDoc#.html
*.docx ZIP 存档中的文件。这还提供了操作 HTML 的方法。它还提供了一种创建方法/word/htmlDoc#.html
*.docx ZIP 存档中的文件并与其创建关系。
Code:
import java.io.*;
import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;
import org.apache.poi.xwpf.usermodel.*;
public class CreateWordWithHTMLaltChunk {
//a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//String id will be htmlDoc#.
private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception {
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
PackagePart part = oPCPackage.createPart(partName, "text/html");
MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id);
document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument);
return myXWPFHtmlDocument;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph;
XWPFRun run;
MyXWPFHtmlDocument myXWPFHtmlDocument;
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Default paragraph followed by first HTML chunk.");
myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1");
myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
"<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>"));
document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Default paragraph followed by second HTML chunk.");
myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc2");
myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
"<body>" +
"<table>"+
"<caption>A table></caption>" +
"<tr><th>Name</th><th>Date</th><th>Amount</th></tr>" +
"<tr><td>John Doe</td><td>2018-12-01</td><td>1,234.56</td></tr>" +
"</table>" +
"</body>"
));
document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());
FileOutputStream out = new FileOutputStream("CreateWordWithHTMLaltChunk.docx");
document.write(out);
out.close();
document.close();
}
//a wrapper class for the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//provides methods for manipulating the HTML
//TODO: We should *not* using String methods for manipulating HTML!
private static class MyXWPFHtmlDocument extends POIXMLDocumentPart {
private String html;
private String id;
private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
super(part);
this.html = "<!DOCTYPE html><html><head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\"><style></style><title>HTML import</title></head><body></body>";
this.id = id;
}
private String getId() {
return id;
}
private String getHtml() {
return html;
}
private void setHtml(String html) {
this.html = html;
}
@Override
protected void commit() throws IOException {
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(html);
writer.close();
out.close();
}
}
//the XWPFRelation for /word/htmlDoc#.html
private final static class XWPFHtmlRelation extends POIXMLRelation {
private XWPFHtmlRelation() {
super(
"text/html",
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
"/word/htmlDoc#.html");
}
}
}
注意:由于使用altChunk
此代码需要所有模式的完整 jarooxml-schemas-*.jar
如中提到的apache poi 常见问题解答-N10025.
Result: