Afreen Shaikh - Academia.edu (original) (raw)
Papers by Afreen Shaikh
Cornell University - arXiv, Oct 18, 2022
This paper proposes an OTSU-based differential evolution method for satellite image segmentation ... more This paper proposes an OTSU-based differential evolution method for satellite image segmentation and compares it with four other methods such as Modified Artificial Bee Colony Optimizer (MABC)[1], Artificial Bee Colony (ABC), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) using the objective function proposed by Otsu for optimal multilevel thresholding. The experiments conducted and their results illustrate that our proposed DE+OTSU algorithm segmentation can effectively and precisely segment the input image, close to results obtained by the other methods. In the proposed DE+OTSU algorithm, instead of passing the fitness function variables, the entire image is passed as an input to the DE algorithm after obtaining the threshold values for the input number of levels in the OTSU's algorithm. The image segmentation results are obtained after learning about the image instead of learning about the fitness variables. In comparison to other segmentation methods examined, the proposed DE+OTSU algorithm yields promising results with minimized computational time comparison to some algorithms.
Cornell University - arXiv, Oct 22, 2022
Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantl... more Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or govt reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. Here, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by public companies, as documents, and short experts-written telegramstyle bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark ECT-Sum with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yeteffective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
ArXiv, 2021
The determination of the reading sequence of text is fundamental to document understanding. This ... more The determination of the reading sequence of text is fundamental to document understanding. This problem is easily solved in pages where the text is organized into a sequence of lines and vertical alignment runs the height of the page (producing multiple columns which can be read from left to right). We present a situation – the directory page parsing problem – where information is presented on the page in an irregular, visually-organized, two-dimensional format. Directory pages are fairly common in financial prospectuses and carry information about organizations, their addresses and relationships that is key to business tasks in client onboarding. Interestingly, directory pages sometimes have hierarchical structure, motivating the need to generalize the reading sequence to a reading tree. We present solutions to the problem of identifying directory pages and constructing the reading tree, using (learnt) classifiers for text segments and a bottom-up (right to left, bottom-to-top) tr...
Cornell University - arXiv, Oct 18, 2022
This paper proposes an OTSU-based differential evolution method for satellite image segmentation ... more This paper proposes an OTSU-based differential evolution method for satellite image segmentation and compares it with four other methods such as Modified Artificial Bee Colony Optimizer (MABC)[1], Artificial Bee Colony (ABC), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) using the objective function proposed by Otsu for optimal multilevel thresholding. The experiments conducted and their results illustrate that our proposed DE+OTSU algorithm segmentation can effectively and precisely segment the input image, close to results obtained by the other methods. In the proposed DE+OTSU algorithm, instead of passing the fitness function variables, the entire image is passed as an input to the DE algorithm after obtaining the threshold values for the input number of levels in the OTSU's algorithm. The image segmentation results are obtained after learning about the image instead of learning about the fitness variables. In comparison to other segmentation methods examined, the proposed DE+OTSU algorithm yields promising results with minimized computational time comparison to some algorithms.
Cornell University - arXiv, Oct 22, 2022
Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantl... more Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or govt reports. Efficient techniques to summarize financial documents, including facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. Here, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by public companies, as documents, and short experts-written telegramstyle bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark ECT-Sum with state-of-the-art summarizers across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple-yeteffective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
ArXiv, 2021
The determination of the reading sequence of text is fundamental to document understanding. This ... more The determination of the reading sequence of text is fundamental to document understanding. This problem is easily solved in pages where the text is organized into a sequence of lines and vertical alignment runs the height of the page (producing multiple columns which can be read from left to right). We present a situation – the directory page parsing problem – where information is presented on the page in an irregular, visually-organized, two-dimensional format. Directory pages are fairly common in financial prospectuses and carry information about organizations, their addresses and relationships that is key to business tasks in client onboarding. Interestingly, directory pages sometimes have hierarchical structure, motivating the need to generalize the reading sequence to a reading tree. We present solutions to the problem of identifying directory pages and constructing the reading tree, using (learnt) classifiers for text segments and a bottom-up (right to left, bottom-to-top) tr...