这个答案中的代码可能出现有点通用,因为它首先确定文档中的字段映射,然后允许删除文本字段的任意组合。但请注意,它是仅使用此问题中的单个示例 PDF 开发的。因此,我无法确定我是否正确理解了 HelloSign 标记字段的方式,特别是 HelloSign 填充这些字段的方式。
这个答案提出了两个类,一个类分析 HelloSign 表单,另一个类通过清除选定的字段来操作它;后者依赖于前者收集的信息。这两个类都是基于 PDFBox 构建的PDFTextStripper
实用类。
该代码是针对当前 PDFBox 开发版本 2.1.0-SNAPSHOT 开发的。它很可能也适用于所有 2.0.x 版本。
HelloSign分析仪
本课程分析给定的PDDocument
寻找序列
-
[$varname ]
它似乎定义了用于放置表单字段内容的占位符,以及
-
[def:$varname|type|req|signer|display|label]
它似乎定义了占位符的属性。
它创建了一个集合HelloSignField
每个实例都描述了这样一个占位符。如果可以在占位符上找到文本,它们还包含相应字段的值。
此外,它还存储页面上绘制的最后一个 xobject 的名称,在示例文档中,该名称是 HelloSign 绘制其字段内容的位置。
public class HelloSignAnalyzer extends PDFTextStripper
{
public class HelloSignField
{
public String getName()
{ return name; }
public String getValue()
{ return value; }
public float getX()
{ return x; }
public float getY()
{ return y; }
public float getWidth()
{ return width; }
public String getType()
{ return type; }
public boolean isOptional()
{ return optional; }
public String getSigner()
{ return signer; }
public String getDisplay()
{ return display; }
public String getLabel()
{ return label; }
public float getLastX()
{ return lastX; }
String name = null;
String value = "";
float x = 0, y = 0, width = 0;
String type = null;
boolean optional = false;
String signer = null;
String display = null;
String label = null;
float lastX = 0;
@Override
public String toString()
{
return String.format("[Name: '%s'; Value: `%s` Position: %s, %s; Width: %s; Type: '%s'; Optional: %s; Signer: '%s'; Display: '%s', Label: '%s']",
name, value, x, y, width, type, optional, signer, display, label);
}
void checkForValue(List<TextPosition> textPositions)
{
for (TextPosition textPosition : textPositions)
{
if (inField(textPosition))
{
float textX = textPosition.getTextMatrix().getTranslateX();
if (textX > lastX + textPosition.getWidthOfSpace() / 2 && value.length() > 0)
value += " ";
value += textPosition.getUnicode();
lastX = textX + textPosition.getWidth();
}
}
}
boolean inField(TextPosition textPosition)
{
float yPos = textPosition.getTextMatrix().getTranslateY();
float xPos = textPosition.getTextMatrix().getTranslateX();
return inField(xPos, yPos);
}
boolean inField(float xPos, float yPos)
{
if (yPos < y - 3 || yPos > y + 3)
return false;
if (xPos < x - 1 || xPos > x + width + 1)
return false;
return true;
}
}
public HelloSignAnalyzer(PDDocument pdDocument) throws IOException
{
super();
this.pdDocument = pdDocument;
}
public Map<String, HelloSignField> analyze() throws IOException
{
if (!analyzed)
{
fields = new HashMap<>();
setStartPage(pdDocument.getNumberOfPages());
getText(pdDocument);
analyzed = true;
}
return Collections.unmodifiableMap(fields);
}
public String getLastFormName()
{
return lastFormName;
}
//
// PDFTextStripper overrides
//
@Override
protected void writeString(String text, List<TextPosition> textPositions) throws IOException
{
{
for (HelloSignField field : fields.values())
{
field.checkForValue(textPositions);
}
}
int position = -1;
while ((position = text.indexOf('[', position + 1)) >= 0)
{
int endPosition = text.indexOf(']', position);
if (endPosition < 0)
continue;
if (endPosition > position + 1 && text.charAt(position + 1) == '$')
{
String fieldName = text.substring(position + 2, endPosition);
int spacePosition = fieldName.indexOf(' ');
if (spacePosition >= 0)
fieldName = fieldName.substring(0, spacePosition);
HelloSignField field = getOrCreateField(fieldName);
TextPosition start = textPositions.get(position);
field.x = start.getTextMatrix().getTranslateX();
field.y = start.getTextMatrix().getTranslateY();
TextPosition end = textPositions.get(endPosition);
field.width = end.getTextMatrix().getTranslateX() + end.getWidth() - field.x;
}
else if (endPosition > position + 5 && "def:$".equals(text.substring(position + 1, position + 6)))
{
String definition = text.substring(position + 6, endPosition);
String[] pieces = definition.split("\\|");
if (pieces.length == 0)
continue;
HelloSignField field = getOrCreateField(pieces[0]);
if (pieces.length > 1)
field.type = pieces[1];
if (pieces.length > 2)
field.optional = !"req".equals(pieces[2]);
if (pieces.length > 3)
field.signer = pieces[3];
if (pieces.length > 4)
field.display = pieces[4];
if (pieces.length > 5)
field.label = pieces[5];
}
}
super.writeString(text, textPositions);
}
@Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException
{
String currentFormName = formName;
if (operator != null && "Do".equals(operator.getName()) && operands != null && operands.size() > 0)
{
COSBase base0 = operands.get(0);
if (base0 instanceof COSName)
{
formName = ((COSName)base0).getName();
if (currentFormName == null)
lastFormName = formName;
}
}
try
{
super.processOperator(operator, operands);
}
finally
{
formName = currentFormName;
}
}
//
// helper methods
//
HelloSignField getOrCreateField(String name)
{
HelloSignField field = fields.get(name);
if (field == null)
{
field = new HelloSignField();
field.name = name;
fields.put(name, field);
}
return field;
}
//
// inner member variables
//
final PDDocument pdDocument;
boolean analyzed = false;
Map<String, HelloSignField> fields = null;
String formName = null;
String lastFormName = null;
}
(HelloSignAnalyzer.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/main/java/mkl/testarea/pdfbox2/content/HelloSignAnalyzer.java#L31)
Usage
一个人可以应用HelloSignAnalyzer
到一个文档如下:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
Map<String, HelloSignField> fields = helloSignAnalyzer.analyze();
System.out.printf("Found %s fields:\n\n", fields.size());
for (Map.Entry<String, HelloSignField> entry : fields.entrySet())
{
System.out.printf("%s -> %s\n", entry.getKey(), entry.getValue());
}
System.out.printf("\nLast form name: %s\n", helloSignAnalyzer.getLastFormName());
(PlayWithHelloSign.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/PlayWithHelloSign.java#L43测试方法testAnalyzeInput
)
对于OP的示例文档,输出是
Found 8 fields:
var1001 -> [Name: 'var1001'; Value: `123 Main St.` Position: 90.0, 580.0; Width: 165.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Address', Label: 'address1']
var1004 -> [Name: 'var1004'; Value: `12345` Position: 210.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Postal Code', Label: 'zip']
var1002 -> [Name: 'var1002'; Value: `TestCity` Position: 90.0, 564.0; Width: 65.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'City', Label: 'city']
var1003 -> [Name: 'var1003'; Value: `AA` Position: 161.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'State', Label: 'state']
date2 -> [Name: 'date2'; Value: `2016/12/09` Position: 397.0, 407.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null']
signature1 -> [Name: 'signature1'; Value: `` Position: 88.0, 489.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null']
date1 -> [Name: 'date1'; Value: `2016/12/09` Position: 397.0, 489.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null']
signature2 -> [Name: 'signature2'; Value: `` Position: 88.0, 407.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null']
Last form name: Xi0
HelloSign 机械手
这个类利用了以下信息HelloSignAnalyzer
已聚集并清除其姓名给出的文本字段的内容。
public class HelloSignManipulator extends PDFTextStripper
{
public HelloSignManipulator(HelloSignAnalyzer helloSignAnalyzer) throws IOException
{
super();
this.helloSignAnalyzer = helloSignAnalyzer;
addOperator(new SelectiveDrawObject());
}
public void clearFields(Iterable<String> fieldNames) throws IOException
{
try
{
Map<String, HelloSignField> fieldMap = helloSignAnalyzer.analyze();
List<HelloSignField> selectedFields = new ArrayList<>();
for (String fieldName : fieldNames)
{
selectedFields.add(fieldMap.get(fieldName));
}
fields = selectedFields;
PDDocument pdDocument = helloSignAnalyzer.pdDocument;
setStartPage(pdDocument.getNumberOfPages());
getText(pdDocument);
}
finally
{
fields = null;
}
}
class SelectiveDrawObject extends OperatorProcessor
{
@Override
public void process(Operator operator, List<COSBase> arguments) throws IOException
{
if (arguments.size() < 1)
{
throw new MissingOperandException(operator, arguments);
}
COSBase base0 = arguments.get(0);
if (!(base0 instanceof COSName))
{
return;
}
COSName name = (COSName) base0;
if (replacement != null || !helloSignAnalyzer.getLastFormName().equals(name.getName()))
{
return;
}
if (context.getResources().isImageXObject(name))
{
throw new IllegalArgumentException("The form xobject to edit turned out to be an image.");
}
PDXObject xobject = context.getResources().getXObject(name);
if (xobject instanceof PDTransparencyGroup)
{
throw new IllegalArgumentException("The form xobject to edit turned out to be a transparency group.");
}
else if (xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject) xobject;
PDFormXObject formReplacement = new PDFormXObject(helloSignAnalyzer.pdDocument);
formReplacement.setBBox(form.getBBox());
formReplacement.setFormType(form.getFormType());
formReplacement.setMatrix(form.getMatrix().createAffineTransform());
formReplacement.setResources(form.getResources());
OutputStream outputStream = formReplacement.getContentStream().createOutputStream(COSName.FLATE_DECODE);
replacement = new ContentStreamWriter(outputStream);
context.showForm(form);
outputStream.close();
getResources().put(name, formReplacement);
replacement = null;
}
}
@Override
public String getName()
{
return "Do";
}
}
//
// PDFTextStripper overrides
//
@Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException
{
if (replacement != null)
{
boolean copy = true;
if (TjTJ.contains(operator.getName()))
{
Matrix transformation = getTextMatrix().multiply(getGraphicsState().getCurrentTransformationMatrix());
float xPos = transformation.getTranslateX();
float yPos = transformation.getTranslateY();
for (HelloSignField field : fields)
{
if (field.inField(xPos, yPos))
{
copy = false;
}
}
}
if (copy)
{
replacement.writeTokens(operands);
replacement.writeToken(operator);
}
}
super.processOperator(operator, operands);
}
//
// helper methods
//
final HelloSignAnalyzer helloSignAnalyzer;
final Collection<String> TjTJ = Arrays.asList("Tj", "TJ");
Iterable<HelloSignField> fields;
ContentStreamWriter replacement = null;
}
(HelloSignManipulator.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/main/java/mkl/testarea/pdfbox2/content/HelloSignManipulator.java#L42)
用途:清除单个字段
一个人可以应用HelloSignManipulator
按如下方式对文档清除单个字段:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Collections.singleton("var1001"));
pdDocument.save(...);
(PlayWithHelloSign.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/PlayWithHelloSign.java#L77测试方法testClearAddress1Input
)
用法:一次清除多个字段
一个人可以应用HelloSignManipulator
按如下方式对文档进行一次清除多个字段:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Arrays.asList("var1004", "var1003", "date2"));
pdDocument.save(...);
(PlayWithHelloSign.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/PlayWithHelloSign.java#L107测试方法testClearZipStateDate2Input
)
用法:连续清除多个字段
一个人可以应用HelloSignManipulator
对文档进行如下操作,依次清除多个字段:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Collections.singleton("var1004"));
helloSignManipulator.clearFields(Collections.singleton("var1003"));
helloSignManipulator.clearFields(Collections.singleton("date2"));
pdDocument.save(...);
(PlayWithHelloSign.java https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/PlayWithHelloSign.java#L138测试方法testClearZipStateDate2SuccessivelyInput
)
Caveat
这些类只是概念验证。一方面,它们是基于单个示例 HelloSign 文件构建的,因此很有可能错过重要细节。另一方面,有一些内置的假设,例如在里面HelloSignField
method inField
.
此外,通常操作签名的 HelloSign 文件可能不是一个好主意。如果我正确地理解了他们的概念,他们会存储每个签名文档的哈希值以允许验证内容,如果文档如上所示进行操作,则哈希值将不再匹配。