Using ooRexx and JSoup for XML and HTML Processing and Conversions

Publication: Chapter in book/Conference proceedingContribution to conference proceedings

Abstract

Text documents from the Internet and the local file system that apply XML and HTML markup can be processed with parsers. JSoup is a powerful Java implementation of a parser and transformer allowing to fetch marked up text directly from the Internet, process and convert it from e.g. HTML to xhtml, or to analyze the DOM elements in the parse tree in an easy manner. Using the ooRexx-Java bridge BSF4ooRexx (an external function and class library for ooRexx) it is very easy to take advantage of JSoup from ooRexx programs without a need to know Java at all. As Java class libraries can be run on all operating systems out of the box, such ooRexx programs will run unchanged on all operating systems like Windows, macOS, Linux, AIX and much more.
Original languageEnglish
Title of host publication 2024 International Rexx Language Symposium Proceedings
Editors René Vincent Jansen
Place of PublicationAmsterdam
PublisherRexx Language Association
Pages423-436
Number of pages14
ISBN (Print)978-94-037-3776-8
Publication statusPublished - 2024
Event2024 International Rexx Symposium - Brisbane, Australien, Brisbane, Australia
Duration: 3 Mar 20246 Mar 2024
https://rexxla.org/events/schedule.rsp?year=2024

Publication series

SeriesProceedings of the Rexx Symposium for Developers and Users
ISSN1534-8954

Conference

Conference2024 International Rexx Symposium
Country/TerritoryAustralia
CityBrisbane
Period3/03/246/03/24
Internet address

Austrian Classification of Fields of Science and Technology (ÖFOS)

  • 102015 Information systems
  • 102022 Software development
  • 502050 Business informatics

Keywords

  • ooRexx
  • Java
  • JSoup
  • BSF4ooRexx850
  • web scraping

Cite this