Java 提取 Word 文档中的批注

<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.cn/repository/maven-public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.doc</artifactId> <version>14.6.0</version> </dependency> </dependencies>

Java 提取 Word 文档批注中的文本

使用 Java 获取 Word 文档批注中的文本并不难。首先遍历 Word 文档中的所有批注，然后使用 Spire.Doc for Java 提供的 Document.getComments().get() 方法获取当前的批注，再然后遍历批注正文的每一个段落并获取当前段落，最后使用 Paragraph.getText() 方法获取该段落的文本。下面是具体的操作步骤：

创建一个 Document 类的对象。

通过 Document.loadFromFile() 方法，加载一个 Word 文档。

遍历这个文档中的所有批注。

对于每条批注，遍历其正文中的所有段落。

对于每个段落，使用 Paragraph.getText() 方法提取其文本内容。

将提取到的内容保存为文本文件。

Java

import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.*;
import java.io.*;

public class ExtractComments {
   public static void main(String[] args) throws IOException {

       // 创建一个 Document 类的对象
       Document doc = new Document();

       // 加载一个 Word 文档
       doc.loadFromFile("/AI绘画的利弊及法律应对.docx");

       // 遍历文档中的每个批注
       for (int i = 0; i < doc.getComments().getCount(); i++) {
           // 获取当前索引处的批注
           Comment comment = doc.getComments().get(i);

           // 遍历批注正文中的每个段落
           for (int j = 0; j < comment.getBody().getParagraphs().getCount(); j++) {
               // 获取当前的段落
               Paragraph para = comment.getBody().getParagraphs().get(j);

               // 获取该段落的文本
               String result = para.getText() + "\r\n";

               // 将提取到的批注保存为文本文件
               writeStringToTxt(result, "/批注信息.txt");
           }
       }

       // 释放资源
       doc.dispose();
   }

   // 自定义将数据写入到文本文件的方法
   public static void writeStringToTxt(String content, String txtFileName) throws IOException {
       FileWriter fWriter = new FileWriter(txtFileName);
       try {
           // 写入文本文件
           fWriter.write(content);
       } catch (IOException ex) {
           ex.printStackTrace();
       } finally {
           try {
               // 关闭文件写入器
               fWriter.flush();
               fWriter.close();
           } catch (IOException ex) {
               ex.printStackTrace();
           }
       }
   }
}

Java 提取 Word 文档批注中的图片

要从 Word 文档的批注中提取图片，需要遍历批注段落中的子对象，找到 DocPicture 对象。然后通过 DocPicture.getImageBytes() 方法获取图片数据，并将其保存为图像文件。

创建一个 Document 类的对象。

使用 Document.loadFromFile() 方法加载一个 Word 文档。

创建一个列表以储存提取的图片数据。

遍历文档中的批注。

对每一个批注，遍历其批注正文的每一个段落。

对每个段落，遍历该段落的所有子对象。

检查该对象是否为 DocPicture 类型。

如果对象是 DocPicture，则使用 DocPicture.getImageBytes 属性获取图片数据并将其添加到列表中。

将列表中的图片数据保存为图像文件。

Java

import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.*;
import java.io.*;
import java.nio.file.*;
import java.util.ArrayList;
import java.util.List;

public class ExtractCommentImages {
   public static void main(String[] args) {
       // 创建一个 Document 对象
       Document document = new Document();

       // 加载包含批注的 Word 文档
       document.loadFromFile("/AI绘画的利弊及法律应对.docx");

       // 创建一个列表来存储提取的图片数据
       List images = new ArrayList<>();

       // 遍历文档中的批注
       for (int i = 0; i < document.getComments().getCount(); i++) {
           Comment comment = document.getComments().get(i);

           // 遍历批注正文中的所有段落
           for (int j = 0; j < comment.getBody().getParagraphs().getCount(); j++) {
               Paragraph paragraph = comment.getBody().getParagraphs().get(j);

               // 遍历段落中的所有子对象
               for (int k = 0; k < paragraph.getChildObjects().getCount(); k++) {
                   DocumentObject obj = paragraph.getChildObjects().get(k);

                   // 检查是否为图片
                   if (obj instanceof DocPicture) {
                       DocPicture picture = (DocPicture) obj;

                       // 获取图片数据并添加到列表
                       images.add(picture.getImageBytes());
                   }
               }
           }
       }

       // 指定输出路径
       String outputDir = "/批注图片/";
       new File(outputDir).mkdirs();

       // 保存图片数据为文件
       for (int i = 0; i < images.size(); i++) {
           String fileName = String.format("批注图片-.png", i);
           Path filePath = Paths.get(outputDir, fileName);
           try (FileOutputStream fos = new FileOutputStream(filePath.toFile())) {
               fos.write(images.get(i));
           } catch (IOException e) {
               e.printStackTrace();
           }
       }
   }
}