我认为当你谈论时你指的是交互式表单按钮PDF 中的按钮.
一般来说
PDFBox 中的按钮没有明确的图标提取器。但是,由于带有自定义图标的按钮(以及一般的注释)将这些图标定义为其外观的一部分,因此可以简单地(递归地)遍历注释外观的资源并收集XObject带有子类型的 sImage:
public void extractAnnotationImages(PDDocument document, String fileNameFormat) throws IOException
{
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
if (pages == null)
return;
for (int i = 0; i < pages.size(); i++)
{
String pageFormat = String.format(fileNameFormat, "-" + i + "%s", "%s");
extractAnnotationImages(pages.get(i), pageFormat);
}
}
public void extractAnnotationImages(PDPage page, String pageFormat) throws IOException
{
List<PDAnnotation> annotations = page.getAnnotations();
if (annotations == null)
return;
for (int i = 0; i < annotations.size(); i++)
{
PDAnnotation annotation = annotations.get(i);
String annotationFormat = annotation.getAnnotationName() != null && annotation.getAnnotationName().length() > 0
? String.format(pageFormat, "-" + annotation.getAnnotationName() + "%s", "%s")
: String.format(pageFormat, "-" + i + "%s", "%s");
extractAnnotationImages(annotation, annotationFormat);
}
}
public void extractAnnotationImages(PDAnnotation annotation, String annotationFormat) throws IOException
{
PDAppearanceDictionary appearance = annotation.getAppearance();
extractAnnotationImages(appearance.getDownAppearance(), String.format(annotationFormat, "-Down%s", "%s"));
extractAnnotationImages(appearance.getNormalAppearance(), String.format(annotationFormat, "-Normal%s", "%s"));
extractAnnotationImages(appearance.getRolloverAppearance(), String.format(annotationFormat, "-Rollover%s", "%s"));
}
public void extractAnnotationImages(Map<String, PDAppearanceStream> stateAppearances, String stateFormat) throws IOException
{
if (stateAppearances == null)
return;
for (Map.Entry<String, PDAppearanceStream> entry: stateAppearances.entrySet())
{
String appearanceFormat = String.format(stateFormat, "-" + entry.getKey() + "%s", "%s");
extractAnnotationImages(entry.getValue(), appearanceFormat);
}
}
public void extractAnnotationImages(PDAppearanceStream appearance, String appearanceFormat) throws IOException
{
PDResources resources = appearance.getResources();
if (resources == null)
return;
Map<String, PDXObject> xObjects = resources.getXObjects();
if (xObjects == null)
return;
for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
{
PDXObject xObject = entry.getValue();
String xObjectFormat = String.format(appearanceFormat, "-" + entry.getKey() + "%s", "%s");
if (xObject instanceof PDXObjectForm)
extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
else if (xObject instanceof PDXObjectImage)
extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
}
}
public void extractAnnotationImages(PDXObjectForm form, String imageFormat) throws IOException
{
PDResources resources = form.getResources();
if (resources == null)
return;
Map<String, PDXObject> xObjects = resources.getXObjects();
if (xObjects == null)
return;
for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
{
PDXObject xObject = entry.getValue();
String xObjectFormat = String.format(imageFormat, "-" + entry.getKey() + "%s", "%s");
if (xObject instanceof PDXObjectForm)
extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
else if (xObject instanceof PDXObjectImage)
extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
}
}
public void extractAnnotationImages(PDXObjectImage image, String imageFormat) throws IOException
{
image.write2OutputStream(new FileOutputStream(String.format(imageFormat, "", image.getSuffix())));
}
(from ExtractAnnotationImageTest.java https://github.com/mkl-public/testarea-pdfbox1/blob/master/src/test/java/mkl/testarea/pdfbox1/extract/ExtractAnnotationImageTest.java)
不幸的是,OP没有提供示例PDF,所以我将代码应用到这个示例文件 http://examples.itextpdf.com/results/part2/chapter08/buttons.pdf
(存储为资源)如下所示:
/**
* Test using <a href="http://examples.itextpdf.com/results/part2/chapter08/buttons.pdf">buttons.pdf</a>
* created by <a href="http://itextpdf.com/examples/iia.php?id=154">part2.chapter08.Buttons</a>
* from ITEXT IN ACTION — SECOND EDITION.
*/
@Test
public void testButtonsPdf() throws IOException
{
try (InputStream resource = getClass().getResourceAsStream("buttons.pdf"))
{
PDDocument document = PDDocument.load(resource);
extractAnnotationImages(document, new File(RESULT_FOLDER, "buttons%s.%s").toString());;
}
}
(from ExtractAnnotationImageTest.java https://github.com/mkl-public/testarea-pdfbox1/blob/master/src/test/java/mkl/testarea/pdfbox1/extract/ExtractAnnotationImageTest.java)
并得到这些图像:
and
这里有两个问题:
- 我们提取所有图片资源附加到注释外观并且不检查它们是否实际上是used外观流中的任何位置。因此,您可能会发现比预期更多的图标。在上面的情况下,第一个图像不用作单独的资源,而仅用作第二个图像的掩码。
- 我们提取仅图片资源,不是内联图像,因此可能会丢失一些图像。
因此,请将此代码与您的 PDF 一起检查。如果需要的话,可以改进。
OP的文件
同时OP提供了一个示例文件图像图标.pdf https://github.com/mkl-public/testarea-pdfbox1/blob/master/src/test/resources/mkl/testarea/pdfbox1/extract/imageicon.pdf
像这样调用上面的方法
/**
* Test using <a href="http://www.docdroid.net/TDGVQzg/imageicon.pdf.html">imageicon.pdf</a>
* created by the OP.
*/
@Test
public void testImageiconPdf() throws IOException
{
try (InputStream resource = getClass().getResourceAsStream("imageicon.pdf"))
{
PDDocument document = PDDocument.load(resource);
extractAnnotationImages(document, new File(RESULT_FOLDER, "imageicon%s.%s").toString());;
}
}
(from ExtractAnnotationImageTest.java https://github.com/mkl-public/testarea-pdfbox1/blob/master/src/test/java/mkl/testarea/pdfbox1/extract/ExtractAnnotationImageTest.java)
输出此图像:
因此,它工作得很好!
作为独立工具开始
OP 在评论中指出
使用 junit 测试方法仍然令人困惑,但是当我尝试将其调用到我的主程序中时,它总是返回“流关闭”错误。我已经将文件放在与我的 jar 相同的目录中,也尝试手动给出路径,但仍然出现相同的错误。
因此,我添加了一个main
方法到类以允许它
- 无需 JUnit 框架即可启动
- 从本地文件系统中任意位置的 PDF 中提取,该文件名在命令行上给出。
In code:
public static void main(String[] args) throws IOException
{
ExtractAnnotationImageTest extractor = new ExtractAnnotationImageTest();
for (String arg : args)
{
try (PDDocument document = PDDocument.load(arg))
{
extractor.extractAnnotationImages(document, arg+"%s.%s");;
}
}
}
(from ExtractAnnotationImageTest.java https://github.com/mkl-public/testarea-pdfbox1/blob/master/src/test/java/mkl/testarea/pdfbox1/extract/ExtractAnnotationImageTest.java)