Mital Kinderkhedia - Academia.edu (original) (raw)

Uploads

Papers by Mital Kinderkhedia

Research paper thumbnail of LEARNING REPRESENTATIONS OF GRAPH DATA: A SURVEY

Deep Neural Networks have shown tremendous success in the area of object recognition, image class... more Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.

Research paper thumbnail of Learning Representations of Graph Data - A Survey

ArXiv, 2019

Deep Neural Networks have shown tremendous success in the area of object recognition, image class... more Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.

Research paper thumbnail of Mind the Gap – E-Transitioning to University

EDULEARN19 Proceedings, 2019

Widening access for higher education causes a re-calibration of expectations for students, academ... more Widening access for higher education causes a re-calibration of expectations for students, academics and for Universities. Recruitment from different areas, educational backgrounds and ethnicities requires a reconsideration of assumptions about the skills and competencies that students have on arrival. This paper presents a report and early findings from a study funded by the UK Government Office for Students and coordinated by four leading Universities. In particular, it focuses on the preparedness for study of students from non-traditional educational backgrounds but sheds light on the preparedness of students generally, academics and Universities. The project had three distinct phases that reflect the underlying theory of change methodology: 1. Qualitative research interviews and focus groups with recent entrants and staff teaching them. This helped the project focus on key interventions that could respond to perceived gaps in provision. 2. Planning, design and deployment of interventions. This paper focuses on an online resource for University preparation known as "Warm_up", to be offered to students in the weeks before arrival at University. Warm_up is an interactive resource that focuses on Numeracy, Essay writing and some transferrable skills. It seeks development in the skills, offers exercises and readings to help improve them and an opportunity to reflect on progress. 3. Qualitative and quantitative analysis of the online intervention, including the interactions with students. The project produced convincing evidence of the "gaps" in student study skills before they entered University. It also highlighted gaps in academic enthusiasm for on-line interventions and in the ability of university VLE systems to deliver good quality on-line/e-learning resources. The research, itself, provided some evidence of impact on students. It provided a better evidence of impact on the institution.

Research paper thumbnail of Benchmarking JSON BinPack

arXiv (Cornell University), Nov 23, 2022

In this paper, we present bechmark results for a pre-production implementation of a novel seriali... more In this paper, we present bechmark results for a pre-production implementation of a novel serialization specification: JSON BinPack. JSON BinPack is a schema-driven and schema-less sequential binary serialization specification based on JSON Schema. It is rich in diverse encodings, and is developed to improve network performance and reduce the operational costs of Internet-based software systems. We present bechmark results for 27 JSON documents and for each plot, we show the schema-driven and schema-less serialization specifications that produce the smallest bit-strings. Through extensive plots and statistical comparisons, we show that JSON BinPack in schema-driven mode is as space-efficient or more space-efficient than every other serialization specification for the 27 documents under consideration. In comparison to JSON, JSON BinPack in schema-driven mode provides a median and average size reductions of 86.7% and 78.7%, respectively. We also show that the schema-less mode of the JSON BinPack binary serialization specification is as space-efficient or more space-efficient than every other schema-less serialization specification for the 27 documents under consideration. In comparison to JSON, JSON BinPack in schema-less mode provides a median and average size reductions of 30.6% and 30.5%, respectively. Unlike other considered schema-driven binary serialization specifications, JSON BinPack in schema-driven mode is space-efficient in comparison to best-case compressed JSON in terms of the median and average with size reductions of 76.1% and 66.8%, respectively. We have made our benchmark results available at jviotti/binary-json-size-benchmark on GitHub.

Research paper thumbnail of JSON Stats Analyzer

arXiv (Cornell University), Nov 21, 2022

Research paper thumbnail of A Survey of JSON-compatible Binary Serialization Specifications

In this paper, we present the recent advances that highlight the characteristics of JSON-compatib... more In this paper, we present the recent advances that highlight the characteristics of JSON-compatible binary serialization specifications. We motivate the discussion by covering the history and evolution of binary serialization specifications across the years starting from 1960s to 2000s and onwards. We analyze the use cases of the most popular serialization specifications across the industries. Drawing on the schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications, we compare and contrast their inner workings through our analysis. We explore a set of non-standardized binary integer encoding techniques (ZigZag integer encoding and Little Endian Base 128 variable-length integer encoding) that are essential to understand the various JSON-compatible binary serialization specifications. We systematically...

Research paper thumbnail of A Benchmark of JSON-compatible Binary Serialization Specifications

arXiv (Cornell University), Jan 9, 2022

In this paper, we present a comprehensive benchmark of JSON-compatible binary serialization speci... more In this paper, we present a comprehensive benchmark of JSON-compatible binary serialization specifications using the SchemaStore open-source test suite collection of over 400 JSON documents matching their respective schemas and representative of their use across industries. We benchmark a set of schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications. Existing literature on benchmarking JSON-compatible binary serialization specifications demonstrates extensive gaps when it comes to binary serialization specifications coverage, reproducibility and representativity, the role of data compression in binary serialization and the choice and use of obsolete versions of binary serialization specifications. We believe our work is the first of its kind to introduce a tiered taxonomy for JSON documents consisting of 36 categories classified as Tier 1, Tier 2 and Tier 3 as a common basis to class JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria. We built and published a free-to-use online tool to automatically categorize JSON documents according to our taxonomy that generates related summary statistics. In the interest of fairness and transparency, we adhere to reproducible software development standards and publicly host the benchmark software and results on GitHub. Our findings provide a number of conclusions: sequential binary serialization specifications are typically more space-efficient than pointer-based binary serialization specifications independently of whether they are schema-less or schema-driven; in comparison to compressed JSON, both compressed and uncompressed schemaless binary serialization specifications result in negative median and average size reductions. Through our analysis, we find that both compressed and uncompressed schema-driven binary serialization specifications result in positive median and average reduction. Furthermore, compressed sequential schema-driven binary serialization specifications are strictly superior to compressed JSON in all the cases from the input data.

Research paper thumbnail of LEARNING REPRESENTATIONS OF GRAPH DATA: A SURVEY

Deep Neural Networks have shown tremendous success in the area of object recognition, image class... more Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.

Research paper thumbnail of Learning Representations of Graph Data - A Survey

ArXiv, 2019

Deep Neural Networks have shown tremendous success in the area of object recognition, image class... more Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.

Research paper thumbnail of Mind the Gap – E-Transitioning to University

EDULEARN19 Proceedings, 2019

Widening access for higher education causes a re-calibration of expectations for students, academ... more Widening access for higher education causes a re-calibration of expectations for students, academics and for Universities. Recruitment from different areas, educational backgrounds and ethnicities requires a reconsideration of assumptions about the skills and competencies that students have on arrival. This paper presents a report and early findings from a study funded by the UK Government Office for Students and coordinated by four leading Universities. In particular, it focuses on the preparedness for study of students from non-traditional educational backgrounds but sheds light on the preparedness of students generally, academics and Universities. The project had three distinct phases that reflect the underlying theory of change methodology: 1. Qualitative research interviews and focus groups with recent entrants and staff teaching them. This helped the project focus on key interventions that could respond to perceived gaps in provision. 2. Planning, design and deployment of interventions. This paper focuses on an online resource for University preparation known as "Warm_up", to be offered to students in the weeks before arrival at University. Warm_up is an interactive resource that focuses on Numeracy, Essay writing and some transferrable skills. It seeks development in the skills, offers exercises and readings to help improve them and an opportunity to reflect on progress. 3. Qualitative and quantitative analysis of the online intervention, including the interactions with students. The project produced convincing evidence of the "gaps" in student study skills before they entered University. It also highlighted gaps in academic enthusiasm for on-line interventions and in the ability of university VLE systems to deliver good quality on-line/e-learning resources. The research, itself, provided some evidence of impact on students. It provided a better evidence of impact on the institution.

Research paper thumbnail of Benchmarking JSON BinPack

arXiv (Cornell University), Nov 23, 2022

In this paper, we present bechmark results for a pre-production implementation of a novel seriali... more In this paper, we present bechmark results for a pre-production implementation of a novel serialization specification: JSON BinPack. JSON BinPack is a schema-driven and schema-less sequential binary serialization specification based on JSON Schema. It is rich in diverse encodings, and is developed to improve network performance and reduce the operational costs of Internet-based software systems. We present bechmark results for 27 JSON documents and for each plot, we show the schema-driven and schema-less serialization specifications that produce the smallest bit-strings. Through extensive plots and statistical comparisons, we show that JSON BinPack in schema-driven mode is as space-efficient or more space-efficient than every other serialization specification for the 27 documents under consideration. In comparison to JSON, JSON BinPack in schema-driven mode provides a median and average size reductions of 86.7% and 78.7%, respectively. We also show that the schema-less mode of the JSON BinPack binary serialization specification is as space-efficient or more space-efficient than every other schema-less serialization specification for the 27 documents under consideration. In comparison to JSON, JSON BinPack in schema-less mode provides a median and average size reductions of 30.6% and 30.5%, respectively. Unlike other considered schema-driven binary serialization specifications, JSON BinPack in schema-driven mode is space-efficient in comparison to best-case compressed JSON in terms of the median and average with size reductions of 76.1% and 66.8%, respectively. We have made our benchmark results available at jviotti/binary-json-size-benchmark on GitHub.

Research paper thumbnail of JSON Stats Analyzer

arXiv (Cornell University), Nov 21, 2022

Research paper thumbnail of A Survey of JSON-compatible Binary Serialization Specifications

In this paper, we present the recent advances that highlight the characteristics of JSON-compatib... more In this paper, we present the recent advances that highlight the characteristics of JSON-compatible binary serialization specifications. We motivate the discussion by covering the history and evolution of binary serialization specifications across the years starting from 1960s to 2000s and onwards. We analyze the use cases of the most popular serialization specifications across the industries. Drawing on the schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications, we compare and contrast their inner workings through our analysis. We explore a set of non-standardized binary integer encoding techniques (ZigZag integer encoding and Little Endian Base 128 variable-length integer encoding) that are essential to understand the various JSON-compatible binary serialization specifications. We systematically...

Research paper thumbnail of A Benchmark of JSON-compatible Binary Serialization Specifications

arXiv (Cornell University), Jan 9, 2022

In this paper, we present a comprehensive benchmark of JSON-compatible binary serialization speci... more In this paper, we present a comprehensive benchmark of JSON-compatible binary serialization specifications using the SchemaStore open-source test suite collection of over 400 JSON documents matching their respective schemas and representative of their use across industries. We benchmark a set of schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications. Existing literature on benchmarking JSON-compatible binary serialization specifications demonstrates extensive gaps when it comes to binary serialization specifications coverage, reproducibility and representativity, the role of data compression in binary serialization and the choice and use of obsolete versions of binary serialization specifications. We believe our work is the first of its kind to introduce a tiered taxonomy for JSON documents consisting of 36 categories classified as Tier 1, Tier 2 and Tier 3 as a common basis to class JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria. We built and published a free-to-use online tool to automatically categorize JSON documents according to our taxonomy that generates related summary statistics. In the interest of fairness and transparency, we adhere to reproducible software development standards and publicly host the benchmark software and results on GitHub. Our findings provide a number of conclusions: sequential binary serialization specifications are typically more space-efficient than pointer-based binary serialization specifications independently of whether they are schema-less or schema-driven; in comparison to compressed JSON, both compressed and uncompressed schemaless binary serialization specifications result in negative median and average size reductions. Through our analysis, we find that both compressed and uncompressed schema-driven binary serialization specifications result in positive median and average reduction. Furthermore, compressed sequential schema-driven binary serialization specifications are strictly superior to compressed JSON in all the cases from the input data.