Researchers from Xerox Corporation recently demonstrated a software technology that can link text and general images together, marking a breakthrough in how online and paper-based information is categorized.
Current tools classify or "tag" either text or images so they can be processed; but until now no one has combined the two effectively, according to Marco Bressan, a computer scientist who led the research team at Xerox Research Centre Europe. By linking image and text-based content, Xerox's new software technology significantly improves fundamental document management tasks like retrieving information from a database or automatically routing documents. The result? More complete searches and streamlined business processes.
The research aligns with Xerox's goal of developing smarter documents to make information-based work easier, more efficient and more effective. Bressan believes there are many uses for the new categorization software.
One application could be at Xerox's own imaging centers, where the company scans and digitizes documents to create secure, accessible and searchable online information archives for its customers. Currently the process of scanning, labeling and indexing documents is partially supervised by operators. Hybrid categorization can streamline document management in this application, improving accuracy and eliminating manual operations.
Enabling Xerox's hybrid categorizer are recent advances in machine learning and pattern recognition, advances in computer vision and the large body of hybrid content now available. XRCE has extensive experience with text categorization and, in 2005, demonstrated the industry's first generic image categorizer. The new categorizer combines earlier text and image categorizers to handle hybrid content, with powerful results.
The software remains under development. Xerox has filed a number of patents on the technology.Information Technology eNewsletter.