Class Xml
- java.lang.Object
-
- com.gengoai.io.Xml
-
public final class Xml extends Object
Common methods for parsing and handling XML files- Author:
- David B. Bracewell
-
-
Field Summary
Fields Modifier and Type Field Description static EventFilter
WHITESPACE_FILTER
An EventFilter that ignores character elements that are white space
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static XPath
createXPath()
Creates anXPath
instancestatic String
getAttributeValue(Node n, String atrName)
Gets the value of an attribute for a node using a case insensitive string matcherstatic String
getAttributeValue(Node n, String atrName, BiPredicate<String,String> matcher)
Gets the value of an attribute for a nodestatic String
getTextContent(Collection<Node> nodes)
Gathers the text from the child TEXT_NODE from each of the nodes in the given collection.static String
getTextContent(Node n)
Gathers the text from the child TEXT_NODE of the given nodestatic String
getTextContent(NodeList nodes)
Gathers the text from the child TEXT_NODE from each of the nodes in the given NodeList.static String
getTextContentRecursive(Collection<Node> nl)
Gets all text from all the nodes and their children in a list of nodesstatic String
getTextContentRecursive(Node n)
Gets all text from the node and its child nodesstatic String
getTextContentRecursive(NodeList nl)
Gets all text from all the nodes and their children in a list of nodesstatic String
getTextContentWithComments(Collection<Node> nodes)
Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given collection.static String
getTextContentWithComments(Node n)
Gathers the text from the child TEXT_NODE and COMMENT_NODE of the given nodestatic String
getTextContentWithComments(NodeList nodes)
Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given NodeList.static String
getTextContentWithCommentsRecursive(Collection<Node> nl)
Gets all text from all the nodes and their children in a list of nodesstatic String
getTextContentWithCommentsRecursive(Node n)
Gets all text from the node and its child nodesstatic String
getTextContentWithCommentsRecursive(NodeList nl)
Gets all text from all the nodes and their children in a list of nodesstatic Document
loadXMLFromFile(File f)
static Iterable<Document>
parse(Resource xmlResource, String tag)
Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name.static Iterable<Document>
parse(Resource xmlResource, String tag, EventFilter eventFilter)
Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name and filtering out events using the given event filter.static void
removeNodeAndChildren(Node n)
Removes a node by first recursively removing of all of it's children and their childrenstatic void
removeNodes(Collection<? extends Node> nl)
Removes a set of nodes in aCollection
from the document.static void
removeNodes(NodeList nl)
Removes a set of nodes in aNodeList
from the document.static List<Node>
selectAllNodes(Node n)
Selects all nodes under and including the given nodestatic Node
selectChildNodeBreadthFirst(Node n, Predicate<Node> predicate)
Selects the first node passing a given predicate using a breadth first searchstatic List<Node>
selectChildNodes(Node n, Predicate<Node> predicate)
Selects nodes under and including the given node that are of a given typestatic List<Node>
selectNodes(Node n, Predicate<Node> predicate)
Selects nodes under and including the given node that are of a given typestatic Predicate<Node>
tagMatchPredicate(String tagName)
Creates anPredicate
that evaluates to true if the input node's name is the same as the given node name.static Predicate<Node>
typeMatchPredicate(short nodeType)
Creates anPredicate
that evaluates to true if the input node's type is the same as the given node types
-
-
-
Field Detail
-
WHITESPACE_FILTER
public static final EventFilter WHITESPACE_FILTER
An EventFilter that ignores character elements that are white space
-
-
Method Detail
-
getAttributeValue
public static String getAttributeValue(Node n, String atrName, BiPredicate<String,String> matcher)
Gets the value of an attribute for a node- Parameters:
n
-Node
to get attribute value foratrName
- Name of attribute whose value is desiredmatcher
-StringMatcher
to match attribute names- Returns:
- The attribute value or null
-
getAttributeValue
public static String getAttributeValue(Node n, String atrName)
Gets the value of an attribute for a node using a case insensitive string matcher- Parameters:
n
-Node
to get attribute value foratrName
- Name of attribute whose value is desired- Returns:
- The attribute value or null
-
getTextContent
public static String getTextContent(Node n)
Gathers the text from the child TEXT_NODE of the given node
- Parameters:
n
-Node
to get text from- Returns:
- String with text from the child TEXT_NODE
-
getTextContent
public static String getTextContent(Collection<Node> nodes)
Gathers the text from the child TEXT_NODE from each of the nodes in the given collection.
A line separator is added between the content of each node
- Parameters:
nodes
- the nodes- Returns:
- String with text from the child TEXT_NODE
-
getTextContent
public static String getTextContent(NodeList nodes)
Gathers the text from the child TEXT_NODE from each of the nodes in the given NodeList.
A line separator is added between the content of each node
- Parameters:
nodes
- the nodes- Returns:
- String with text from the child TEXT_NODE
-
getTextContentRecursive
public static String getTextContentRecursive(Node n)
Gets all text from the node and its child nodes- Parameters:
n
-Node
to get text from- Returns:
- String with all text from the node and its children or null
-
getTextContentRecursive
public static String getTextContentRecursive(NodeList nl)
Gets all text from all the nodes and their children in a list of nodes- Parameters:
nl
- the nl- Returns:
- String with all text from all the nodes and their children in a list of nodes or null
-
getTextContentRecursive
public static String getTextContentRecursive(Collection<Node> nl)
Gets all text from all the nodes and their children in a list of nodes- Parameters:
nl
- the nl- Returns:
- String with all text from all the nodes and their children in a list of nodes or null
-
getTextContentWithComments
public static String getTextContentWithComments(Node n)
Gathers the text from the child TEXT_NODE and COMMENT_NODE of the given node
- Parameters:
n
-Node
to get text from- Returns:
- String with text from the child TEXT_NODE
-
getTextContentWithComments
public static String getTextContentWithComments(Collection<Node> nodes)
Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given collection.
A line separator is added between the content of each node
- Parameters:
nodes
- the nodes- Returns:
- String with text from the child TEXT_NODE and COMMENT_NODE
-
getTextContentWithComments
public static String getTextContentWithComments(NodeList nodes)
Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given NodeList.
A line separator is added between the content of each node
- Parameters:
nodes
- the nodes- Returns:
- String with text from the child TEXT_NODE and COMMENT_NODE
-
getTextContentWithCommentsRecursive
public static String getTextContentWithCommentsRecursive(NodeList nl)
Gets all text from all the nodes and their children in a list of nodes- Parameters:
nl
- the nl- Returns:
- String with all text from all the nodes and their children in a list of nodes or null
-
getTextContentWithCommentsRecursive
public static String getTextContentWithCommentsRecursive(Collection<Node> nl)
Gets all text from all the nodes and their children in a list of nodes- Parameters:
nl
- the nl- Returns:
- String with all text from all the nodes and their children in a list of nodes or null
-
getTextContentWithCommentsRecursive
public static String getTextContentWithCommentsRecursive(Node n)
Gets all text from the node and its child nodes- Parameters:
n
-Node
to get text from- Returns:
- String with all text from the node and its children or null
-
parse
public static Iterable<Document> parse(Resource xmlResource, String tag) throws IOException, XMLStreamException
Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name. An example of usage for this method is constructing DOM documents from each "page" element in the Wikipedia dump.- Parameters:
xmlResource
- the xml resourcetag
- the tag to create documents on- Returns:
- the iterable
- Throws:
IOException
- the io exceptionXMLStreamException
- the xml stream exception
-
parse
public static Iterable<Document> parse(Resource xmlResource, String tag, EventFilter eventFilter) throws IOException, XMLStreamException
Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name and filtering out events using the given event filter. An example of usage for this method is constructing DOM documents from each "page" element in the Wikipedia dump.- Parameters:
xmlResource
- the xml resourcetag
- the tag to create documents oneventFilter
- the event filter- Returns:
- the iterable
- Throws:
IOException
- the io exceptionXMLStreamException
- the xml stream exception
-
removeNodeAndChildren
public static void removeNodeAndChildren(Node n)
Removes a node by first recursively removing of all of it's children and their children
- Parameters:
n
-Node
to remove
-
removeNodes
public static void removeNodes(NodeList nl)
Removes a set of nodes in a
NodeList
from the document.- Parameters:
nl
- The nodes to remove
-
removeNodes
public static void removeNodes(Collection<? extends Node> nl)
Removes a set of nodes in a
Collection
from the document.- Parameters:
nl
- The nodes to remove
-
selectAllNodes
public static List<Node> selectAllNodes(Node n)
Selects all nodes under and including the given node
-
selectChildNodeBreadthFirst
public static Node selectChildNodeBreadthFirst(Node n, Predicate<Node> predicate)
Selects the first node passing a given predicate using a breadth first search
-
selectChildNodes
public static List<Node> selectChildNodes(Node n, Predicate<Node> predicate)
Selects nodes under and including the given node that are of a given type
-
selectNodes
public static List<Node> selectNodes(Node n, Predicate<Node> predicate)
Selects nodes under and including the given node that are of a given type
-
tagMatchPredicate
public static Predicate<Node> tagMatchPredicate(String tagName)
Creates an
Predicate
that evaluates to true if the input node's name is the same as the given node name. String matches are case insensitive- Parameters:
tagName
- The name to match- Returns:
- The
Predicate
-
-