A. Hanjalic | TU Delft Repository

Still Making Noise

Improving Deep-Learning-Based Side-Channel Analysis

Journal article (2025) - Jaehun Kim (author), Jaehun Kim (author), Jeahun Kim (author), Jeahun Kim (author), Stjepan Picek (author), Stjepan Picek (author), S. Picek (author), S. Picek (author), Stjepan Picek (author), Stjepan Picek (author), Annelie Heuser (author), Annelie Heuser (author), Shivam Bhasin (author), A. Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author), Alan Hanjalic (author), A Hanjalic (author), A Hanjalic (author)

Editor’s notes: Side-channel attacks have been undermining cryptosystems for almost three decades. Advances in machine learning techniques have shown great promise in improving the performance and efficiency of side-channel attacks, even on systems with countermeasures. This arti ...

Multi-label Node Classification On Graph-Structured Data

Journal article (2024) - T. Zhao (author), Ngan Thi Dong (author), Thi Ngan Dong (author), A Hanjalic (author), Alan Hanjalic (author), A. Hanjalic (author), M. Khosla (author), Megha Khosla (author)

Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have mult ...

Preface

Journal article (2024) - Stevan Rudinac (author), A Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author), C.C.S. Liem (author), M Worring (author), Marcel Worring (author), Björn Þór Jónsson (author), Bei Liu (author), Yoko Yamakata (author)

MultiMedia Modeling

30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part I@en

Predicting nodal influence via local iterative metrics

Journal article (2024) - S. Zhang (author), Shilun Zhang (author), Alan Hanjalic (author), A. Hanjalic (author), A Hanjalic (author), H Wang (author), Huijuan Wang (author), H. Wang (author), Huijuan Wang (author)

Nodal spreading influence is the capability of a node to activate the rest of the network when it is the seed of spreading. Combining nodal properties (centrality metrics) derived from local and global topological information respectively has been shown to better predict nodal in ...

Nodal spreading influence is the capability of a node to activate the rest of the network when it is the seed of spreading. Combining nodal properties (centrality metrics) derived from local and global topological information respectively has been shown to better predict nodal influence than using a single metric. In this work, we investigate to what extent local and global topological information around a node contributes to the prediction of nodal influence and whether relatively local information is sufficient for the prediction. We show that by leveraging the iterative process used to derive a classical nodal centrality such as eigenvector centrality, we can define an iterative metric set that progressively incorporates more global information around the node. We propose to predict nodal influence using an iterative metric set that consists of an iterative metric from order 1 to K produced in an iterative process, encoding gradually more global information as K increases. Three iterative metrics are considered, which converge to three classical node centrality metrics, respectively. In various real-world networks and synthetic networks with community structures, we find that the prediction quality of each iterative based model converges to its optimal when the metric of relatively low orders (K∼4) are included and increases only marginally when further increasing K. This fast convergence of prediction quality with K is further explained by analyzing the correlation between the iterative metric and nodal influence, the convergence rate of each iterative process and network properties. The prediction quality of the best performing iterative metric set with K=4 is comparable with the benchmark method that combines seven centrality metrics: their prediction quality ratio is within the range [91%,106%] across all three quality measures and networks. In two spatially embedded networks with an extremely large diameter, however, iterative metric of higher orders, thus a large K, is needed to achieve comparable prediction quality with the benchmark.

@en

Preface

Journal article (2024) - Stevan Rudinac (author), A Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author), C.C.S. Liem (author), M Worring (author), Marcel Worring (author), Björn Þór Jónsson (author), Bei Liu (author), Yoko Yamakata (author)

MultiMedia Modeling

30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part II@en

Preface

Journal article (2024) - Stevan Rudinac (author), A Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author), C.C.S. Liem (author), M Worring (author), Marcel Worring (author), Björn Þór Jónsson (author), Bei Liu (author), Yoko Yamakata (author)

MultiMedia Modeling
30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part III@en

Mitigating Mainstream Bias in Recommendation via Cost-sensitive Learning

Conference paper (2023) - Roger Zhe Li (author), Julián Urbano (author), Julián Urbano (author), A. Hanjalic (author), A Hanjalic (author), Alan Hanjalic (author)

Mainstream bias, where some users receive poor recommendations because their preferences are uncommon or simply because they are less active, is an important aspect to consider regarding fairness in recommender systems. Existing methods to mitigate mainstream bias do not explicit ...

Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals

Journal article (2023) - T. Zhang (author), T. Zhang (author), Tianyi Zhang (author), Tianyi Zhang (author), Abdallah Ali (author), Abdallah El Ali (author), Abdallah El El Ali (author), Abdallah Ali (author), Abdallah El Ali (author), Abdallah El Ali (author), Chen Wang (author), A Hanjalic (author), Alan Hanjalic (author), A. Hanjalic (author), P.S. Cesar Garcia (author), P.S. Cesar Garcia (author), Pablo Cesar (author), Pablo Cesar (author), P.S. Cesar (author), P.S. Cesar (author), Pablo Cesar Garcia (author), Pablo Cesar Garcia (author)

Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the re ...

Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fully-supervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2 s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching.

@en

Topological-Temporal properties of evolving networks

Journal article (2022) - A. Ceria (author), Alberto Ceria (author), Shlomo Havlin (author), Alan Hanjalic (author), A Hanjalic (author), A. Hanjalic (author), Huijuan Wang (author), Huijuan Wang (author), H Wang (author), H. Wang (author)

Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare different real-world networks and to detect their common patter ...

Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare different real-world networks and to detect their common patterns or differences. A systematic method that can characterize simultaneously the temporal and topological relations of the time-specific interactions (also called contacts or events) of a temporal network, is still missing. In this article, we propose a method to characterize to what extent contacts that happen close in time occur also close in topology. Specifically, we study the interrelation between temporal and topological properties of the contacts from three perspectives: (1) the correlation (among the elements) of the activity time series which records the total number of contacts in a network that happen at each time step; (2) the interplay between the topological distance and time difference of two arbitrary contacts; (3) the temporal correlation of contacts within the local neighbourhood centred at each link (so-called ego-network) to explore whether such contacts that happen close in topology are also close in time. By applying our method to 13 real-world temporal networks, we found that temporal-Topological correlation of contacts is more evident in virtual contact networks than in physical contact networks. This could be due to the lower cost and easier access of online communications than physical interactions, allowing and possibly facilitating social contagion, that is, interactions of one individual may influence the activity of its neighbours. We also identify different patterns between virtual and physical networks and among physical contact networks at, for example, school and workplace, in the formation of correlation in local neighbourhoods. Patterns and differences detected via our method may further inspire the development of more realistic temporal network models, that could reproduce jointly temporal and topological properties of contacts.

@en

Task-Aware Connectivity Learning for Incoming Nodes Over Growing Graphs

Journal article (2022) - Bishwadeep Das (author), B. Das (author), Bishwadeep Das (author), Alan Hanjalic (author), A. Hanjalic (author), A Hanjalic (author), Elvin Isufi (author), Elvin Isufi (author), E. Isufi (author)

Data processing over graphs is usually done on graphs of fixed size. However, graphs often grow with new nodes arriving over time. Knowing the connectivity information of these nodes, and thus, the expanded graph is crucial for processing data over the expanded graph. In its abse ...

Influence of clustering coefficient on network embedding in link prediction

Journal article (2022) - Omar F. Robledo (author), Omar F. Fernández Robledo (author), O. Robledo (author), O. Fernández Robledo (author), X. Zhan (author), X. Zhan (author), Xiuxiu Zhan (author), Xiuxiu Zhan (author), Xiu Xiu Zhan (author), Xiu Xiu Zhan (author), Alan Hanjalic (author), A. Hanjalic (author), A Hanjalic (author), Huijuan Wang (author), H. Wang (author), Huijuan Wang (author), H Wang (author)

Multiple network embedding algorithms have been proposed to perform the prediction of missing or future links in complex networks. However, we lack the understanding of how network topology affects their performance, or which algorithms are more likely to perform better given the ...

Multiple network embedding algorithms have been proposed to perform the prediction of missing or future links in complex networks. However, we lack the understanding of how network topology affects their performance, or which algorithms are more likely to perform better given the topological properties of the network. In this paper, we investigate how the clustering coefficient of a network, i.e., the probability that the neighbours of a node are also connected, affects network embedding algorithms’ performance in link prediction, in terms of the AUC (area under the ROC curve). We evaluate classic embedding algorithms, i.e., Matrix Factorisation, Laplacian Eigenmaps and node2vec, in both synthetic networks and (rewired) real-world networks with variable clustering coefficient. Specifically, a rewiring algorithm is applied to each real-world network to change the clustering coefficient while keeping key network properties. We find that a higher clustering coefficient tends to lead to a higher AUC in link prediction, except for Matrix Factorisation, which is not sensitive to the change of clustering coefficient. To understand such influence of the clustering coefficient, we (1) explore the relation between the link rating (probability that a node pair is the missing link) derived from the aforementioned algorithms and the number of common neighbours of the node pair, and (2) evaluate these embedding algorithms’ ability to reconstruct the original training (sub)network. All the network embedding algorithms that we tested tend to assign higher likelihood of connection to node pairs that share an intermediate or high number of common neighbours, independently of the clustering coefficient of the training network. Then, the predicted networks will have more triangles, thus a higher clustering coefficient. As the clustering coefficient increases, all the algorithms but Matrix Factorisation could also better reconstruct the training network. These two observations may partially explain why increasing the clustering coefficient improves the prediction performance.

@en

Temporal Network Prediction and Interpretation

Journal article (2022) - Li Zou (author), L. Zou (author), Xiu Xiu Zhan (author), Xiu xiu Zhan (author), Jie Sun (author), Alan Hanjalic (author), A Hanjalic (author), A. Hanjalic (author), H. Wang (author), Huijuan Wang (author), H Wang (author), Huijuan Wang (author)

Temporal networks refer to networks like physical contact networks whose topology changes over time. Predicting future temporal network is crucial e.g., to forecast the epidemics. Existing prediction methods are either relatively accurate but black-box, or white-box but less accu ...

Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds

Conference paper (2022) - Shishir Subramanyam (author), S. Subramanyam (author), Irene Viola (author), Jack Jansen (author), Evangelos Alexiou (author), A Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author), P.S. Cesar (author), P.S. Cesar (author), P.S. Cesar Garcia (author), P.S. Cesar Garcia (author), Pablo Cesar Garcia (author), Pablo Cesar Garcia (author), Pablo Cesar (author), Pablo Cesar (author)

Technological advances in head-mounted displays and novel real-time 3D acquisition and reconstruction solutions have fostered the development of 6 Degrees of Freedom (6DoF) teleimmersive systems for social VR applications. Point clouds have emerged as a popular format for such ap ...

Joint Feature Synthesis and Embedding

Adversarial Cross-Modal Retrieval Revisited

Journal article (2022) - Xu Xu (author), Xu Xing (author), Xing Xu (author), Xing Xing (author), Kaiyi Lin (author), Yang Yang (author), Alan Hanjalic (author), A. Hanjalic (author), A Hanjalic (author), Heng Tao Shen (author)

Recently, generative adversarial network (GAN) has shown its strong ability on modeling data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the power of GAN to model the cross-modal joint distribution and to learn compatible cross-modal features ...

Recently, generative adversarial network (GAN) has shown its strong ability on modeling data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the power of GAN to model the cross-modal joint distribution and to learn compatible cross-modal features, is becoming the research hotspot. However, the existing cross-modal GAN approaches typically 1) require labeled multimodal data of massive labor cost to establish cross-modal correlation; 2) utilize the vanilla GAN model that results in unstable training procedure and meaningless synthetic features; and 3) lack of extensibility for retrieving cross-modal data of new classes. In this article, we revisit the adversarial learning in existing cross-modal GAN methods and propose Joint Feature Synthesis and Embedding (JFSE), a novel method that jointly performs multimodal feature synthesis and common embedding space learning to overcome the above three shortcomings. Specifically, JFSE deploys two coupled conditional Wassertein GAN modules for the input data of two modalities, to synthesize meaningful and correlated multimodal features under the guidance of the word embeddings of class labels. Moreover, three advanced distribution alignment schemes with advanced cycle-consistency constraints are proposed to preserve the semantic compatibility and enable the knowledge transfer in the common embedding space for both the true and synthetic cross-modal features. All these add-ons in JFSE not only help to learn more effective common embedding space that effectively captures the cross-modal correlation but also facilitate to transfer knowledge to multimodal data of new classes. Extensive experiments are conducted on four widely used cross-modal datasets, and the comparisons with more than ten state-of-the-art approaches show that our JFSE method achieves remarkably accuracy improvement on both standard retrieval and the newly explored zero-shot and generalized zero-shot retrieval tasks.

@en

Guest Editorial Learning From Noisy Multimedia Data

Review (2022) - Jian Zhang (author), Alan Hanjalic (author), A Hanjalic (author), A. Hanjalic (author), Ramesh Jain (author), Xiansheng Hua (author), Shin'ichi Satoh (author), Yazhou Yao (author), Dan Zeng (author)

This special issue provides a premier forum for researchers in multimedia big data to share challenges and recent advancements in learning from noisy multimedia data. The multimedia age and its proliferation of devices and platforms is fueling exponential data growth. As computat ...

Few-shot Learning for Fine-grained Emotion Recognition using Physiological Signals

Journal article (2022) - Tianyi Zhang (author), Tianyi Zhang (author), T. Zhang (author), T. Zhang (author), Abdallah El Ali (author), Abdallah El El Ali (author), Abdallah Ali (author), Abdallah El Ali (author), Abdallah Ali (author), Abdallah El Ali (author), Alan Hanjalic (author), A Hanjalic (author), A. Hanjalic (author), P.S. Cesar Garcia (author), P.S. Cesar Garcia (author), Pablo Cesar (author), Pablo Cesar (author), P.S. Cesar (author), P.S. Cesar (author), Pablo Cesar Garcia (author), Pablo Cesar Garcia (author)

Fine-grained emotion recognition can model the temporal dynamics of emotions, which is more precise than predicting one emotion retrospectively for an activity (e.g., video clip watching). Previous works require large amounts of continuously annotated data to train an accurate re ...

Generating Images from Spoken Descriptions

Journal article (2021) - Xinsheng Wang (author), Xinsheng Wang (author), X. Wang (author), X. Wang (author), Tingting Qiao (author), T. Qiao (author), Jihua Zhu (author), A. Hanjalic (author), Alan Hanjalic (author), A Hanjalic (author), Odette Scharenborg (author), O.E. Scharenborg (author)

Text-based technologies, such as text translation from one language to another, and image captioning, are gaining popularity. However, approximately half of the world's languages are estimated to be lacking a commonly used written form. Consequently, these languages cannot benefi ...

New Insights into Metric Optimization for Ranking-based Recommendation

Conference paper (2021) - Roger Zhe Li (author), Julián Urbano (author), Julián Urbano (author), A. Hanjalic (author), A Hanjalic (author), Alan Hanjalic (author)

Direct optimization of IR metrics has often been adopted as an approach to devise and develop ranking-based recommender systems. Most methods following this approach (e.g. TFMAP, CLiMF, Top-N-Rank) aim at optimizing the same metric being used for evaluation, under the assumption ...

Cross-modal hybrid feature fusion for image-sentence matching

Journal article (2021) - Xing Xu (author), Xu Xing (author), Xing Xing (author), Xu Xu (author), Yifan Wang (author), Yixuan He (author), Yang Yang (author), A. Hanjalic (author), A Hanjalic (author), Alan Hanjalic (author), Heng Tao Shen (author)

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calcula ...

Accuracy-diversity trade-off in recommender systems via graph convolutions

Journal article (2021) - E. Isufi (author), Elvin Isufi (author), Elvin Isufi (author), Matteo Pocchiari (author), A Hanjalic (author), A. Hanjalic (author), Alan Hanjalic (author)

Graph convolutions, in both their linear and neural network forms, have reached state-of-the-art accuracy on recommender system (RecSys) benchmarks. However, recommendation accuracy is tied with diversity in a delicate trade-off and the potential of graph convolutions to improve ...