top of page

Approach Visual Text Correction using RoBERTa

Updated: Jan 30, 2022


Abstract


Visual Text Correction (VTC) represents a set of methods to find and replace of an inaccurate word in a sentence given both textual and visual information. In this paper, we propose a novel solution to the VTC problem by applying stacked generalization[12]. This solution combines the output of several neural networks based upon RoBERTa and MnasNet. We utilize image features processed by VGG19 and English natural language text as input into two backbone models. The image model backbone is MnasNet, and the NLP Transformer is RoBERTa. We train these models separately (RoBERTa only, and RoBERTa with MnasNet) and together to form a weighted ensemble. This ensemble when evaluated on a realistic falsified dataset shows the strength of ensembling neural networks. Our experiments show that a linear combination of two neural networks: RoBERTa only and RoBERTa with MNASTNET have higher accuracy than any single model.


KEYWORDS: NLP, BERT, RoBERTa, MnasNet, Stack Generalization, Visual Text Correction










You might also like

Stay Informed on Economics and Technology with Zimark Insight 

I'M AN ORIGINAL CATCHPHRASE

Connect & Get to Know Our Cases and Practice

  • Twitter
  • LinkedIn
Transaction Advisory
02:03
Management Consulting
02:13
Economic and Equity Research
01:23
Data Science and Analytics
01:12
bottom of page