Publications by authors named "Jinqiao Wang"

21 Publications

  • Page 1 of 1

Semi-Supervised Scene Text Recognition.

IEEE Trans Image Process 2021 18;30:3005-3016. Epub 2021 Feb 18.

Scene text recognition has been widely researched with supervised approaches. Most existing algorithms require a large amount of labeled data and some methods even require character-level or pixel-wise supervision information. However, labeled data is expensive, unlabeled data is relatively easy to collect, especially for many languages with fewer resources. In this paper, we propose a novel semi-supervised method for scene text recognition. Specifically, we design two global metrics, i.e., edit reward and embedding reward, to evaluate the quality of generated string and adopt reinforcement learning techniques to directly optimize these rewards. The edit reward measures the distance between the ground truth label and the generated string. Besides, the image feature and string feature are embedded into a common space and the embedding reward is defined by the similarity between the input image and generated string. It is natural that the generated string should be the nearest with the image it is generated from. Therefore, the embedding reward can be obtained without any ground truth information. In this way, we can effectively exploit a large number of unlabeled images to improve the recognition performance without any additional laborious annotations. Extensive experimental evaluations on the five challenging benchmarks, the Street View Text, IIIT5K, and ICDAR datasets demonstrate the effectiveness of the proposed approach, and our method significantly reduces annotation effort while maintaining competitive recognition performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2021.3051485DOI Listing
February 2021

Siamese Regression Tracking With Reinforced Template Updating.

IEEE Trans Image Process 2021 4;30:628-640. Epub 2020 Dec 4.

Siamese networks are prevalent in visual tracking because of the efficient localization. The networks take both a search patch and a target template as inputs where the target template is usually from the initial frame. Meanwhile, Siamese trackers do not update network parameters online for real-time efficiency. The fixed target template and CNN parameters make Siamese trackers not effective to capture target appearance variations. In this paper, we propose a template updating method via reinforcement learning for Siamese regression trackers. We collect a series of templates and learn to maintain them based on an actor-critic framework. Among this framework, the actor network that is trained by deep reinforcement learning effectively updates the templates based on the tracking result on each frame. Besides the target template, we update the Siamese regression tracker online to adapt to target appearance variations. The experimental results on the standard benchmarks show the effectiveness of both template and network updating. The proposed tracker SiamRTU performs favorably against state-of-the-art approaches.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2020.3036723DOI Listing
December 2020

Antidecay LSTM for Siamese Tracking With Adversarial Learning.

IEEE Trans Neural Netw Learn Syst 2020 Oct 8;PP. Epub 2020 Oct 8.

Visual tracking is one of the fundamental tasks in computer vision with many challenges, and it is mainly due to the changes in the target's appearance in temporal and spatial domains. Recently, numerous trackers model the appearance of the targets in the spatial domain well by utilizing deep convolutional features. However, most of these CNN-based trackers only take the appearance variations between two consecutive frames in a video sequence into consideration. Besides, some trackers model the appearance of the targets in the long term by applying RNN, but the decay of the target's features degrades the tracking performance. In this article, we propose the antidecay long short-term memory (AD-LSTM) for the Siamese tracking. Especially, we extend the architecture of the standard LSTM in two aspects for the visual tracking task. First, we replace all of the fully connected layers with convolutional layers to extract the features with spatial structure. Second, we improve the architecture of the cell unit. In this way, the information of the target appearance can flow through the AD-LSTM without decay as long as possible in the temporal domain. Meanwhile, since there is no ground truth for the feature maps generated by the AD-LSTM, we propose an adversarial learning algorithm to optimize the AD-LSTM. With the help of adversarial learning, the Siamese network can generate the response maps more accurately, and the AD-LSTM can generate the feature maps of the target more robustly. The experimental results show that our tracker performs favorably against the state-of-the-art trackers on six challenging benchmarks: OTB-100, TC-128, VOT2016, VOT2017, GOT-10k, and TrackingNet.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2020.3018025DOI Listing
October 2020

Nanoparticle-Based Drug Delivery System: A Patient-Friendly Chemotherapy for Oncology.

Dose Response 2020 Jul-Sep;18(3):1559325820936161. Epub 2020 Jul 10.

Department of Rehabilitation Medicine, The First People's Hospital of Wenling, Wenzhou Medical University, Wenling, Zhejiang, China.

Chemotherapy is widely used to treat cancer. The toxic effect of conventional chemotherapeutic drugs on healthy cells leads to serious toxic and side effects of conventional chemotherapy. The application of nanotechnology in tumor chemotherapy can increase the specificity of anticancer agents, increase the killing effect of tumors, and reduce toxic and side effects. Currently, a variety of formulations based on nanoparticles (NPs) for delivering chemotherapeutic drugs have been put into clinical use, and several others are in the stage of development or clinical trials. In this review, after briefly introducing current cancer chemotherapeutic methods and their limitations, we describe the clinical applications and advantages and disadvantages of several different types of NPs-based chemotherapeutic agents. We have summarized a lot of information in tables and figures related to the delivery of chemotherapeutic drugs based on NPs and the design of NPs with active targeting capabilities.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1559325820936161DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7357073PMC
July 2020

An end-to-end exemplar association for unsupervised person Re-identification.

Neural Netw 2020 Sep 23;129:43-54. Epub 2020 May 23.

Department of Computer Science, Edge Hill University, Ormskirk, United Kingdom. Electronic address:

Tracklet association methods learn the cross camera retrieval ability though associating underlying cross camera positive samples, which have proven to be successful in unsupervised person re-identification task. However, most of them use poor-efficiency association strategies which costs long training hours but gains the low performance. To solve this, we propose an effective end-to-end exemplar associations (EEA) framework in this work. EEA mainly adapts three strategies to improve efficiency: (1) end-to-end exemplar-based training, (2) exemplar association and (3) dynamic selection threshold. The first one is to accelerate the training process, while the others aim to improve the tracklet association precision. Compared with existing tracklet associating methods, EEA obviously reduces the training cost and achieves the higher performance. Extensive experiments and ablation studies on seven RE-ID datasets demonstrate the superiority of the proposed EEA over most state-of-the-art unsupervised and domain adaptation RE-ID methods.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2020.05.015DOI Listing
September 2020

Neuroprotective Effect of Activated Protein C on Blood-Brain Barrier Injury During Focal Cerebral Ischemia/Reperfusion.

Dose Response 2020 Apr-Jun;18(2):1559325820917288. Epub 2020 May 4.

Department of Rehabilitation Medicine, The First People's Hospital of Wenling, Wenzhou Medical University, Wenling, China.

Although the effect of activated protein C (APC) on neuronal injury and neuroinflammatory responses has been extensively studied, the detailed mechanism underlying APC-protective effect in the blood-brain barrier (BBB) injury during ischemia is still not clear. In this study, the APC effect against neuroinflammatory responses was evaluated in the model of right middle cerebral artery occlusion in male Sprague-Dawley rats with 2 hours of ischemia and 22 hours of reperfusion. The results showed that APC can significantly improve the neurological function scoring and reduce the infarct volume and BBB permeability. Moreover, the expression of protein nuclear factor-kappa B (NF-κB), both in cytoplasm and nuclei, was reduced. The downstream of NF-κB activation, including tumor necrosis factor-α and interleukin-1β secretion, was inhibited. In all, APC exerts a neuroprotective effect in focal cerebral ischemia-reperfusion in rats by inhibiting the activation and nuclear translocation of NF-κB. It may indicate a therapeutic approach for ischemic brain injury.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1559325820917288DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218308PMC
May 2020

Dynamic Collaborative Tracking.

IEEE Trans Neural Netw Learn Syst 2019 Oct;30(10):3035-3046

Correlation filter has been demonstrated remarkable success for visual tracking recently. However, most existing methods often face model drift caused by several factors, such as unlimited boundary effect, heavy occlusion, fast motion, and distracter perturbation. To address the issue, this paper proposes a unified dynamic collaborative tracking framework that can perform more flexible and robust position prediction. Specifically, the framework learns the object appearance model by jointly training the objective function with three components: target regression submodule, distracter suppression submodule, and maximum margin relation submodule. The first submodule mainly takes advantage of the circulant structure of training samples to obtain the distinguishing ability between the target and its surrounding background. The second submodule optimizes the label response of the possible distracting region close to zero for reducing the peak value of the confidence map in the distracting region. Inspired by the structure output support vector machines, the third submodule is introduced to utilize the differences between target appearance representation and distracter appearance representation in the discriminative mapping space for alleviating the disturbance of the most possible hard negative samples. In addition, a CUR filter as an assistant detector is embedded to provide effective object candidates for alleviating the model drift problem. Comprehensive experimental results show that the proposed approach achieves the state-of-the-art performance in several public benchmark data sets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2018.2861838DOI Listing
October 2019

Effect of 3-Aminobenzamide on the Ultrastructure of Astrocytes and Microvessels After Focal Cerebral Ischemia in Rats.

Dose Response 2020 Jan-Mar;18(1):1559325819901242. Epub 2020 Jan 22.

Department of Rehabilitation Medicine, The First People's Hospital of Wenling, Wenzhou Medical University, Wenling, Zhejiang, China.

The disruption of blood-brain barrier (BBB) is a critical event in the formation of brain edema during early phases of ischemic brain injury. Poly(ADP-ribose) polymerase (PARP) activation, which contributes to BBB damage, has been reported in ischemia-reperfusion and traumatic brain injury. Here, we investigated the effect of 3-aminobenzamide (3-AB), a PARP-1 inhibitor, on the ultrastructure of BBB. Male Sprague Dawley rats were suffered from 90 minutes of middle cerebral artery occlusion, followed by 4.5 hours or 22.5 hours of reperfusion (R). The vehicle or 3-AB (10 mg/kg) was administered intraperitoneally (ip) 60 minutes after lacking of blood. Tissue Evans Blue (EB) levels, ultrastructures of astrocytes and microvessels, and areas of perivascular edema were examined in penumbra and core, at I 1.5 hours /R 4.5 hours and I 1.5 hours /R 22.5 hours, respectively. The severity of ultrastructural changes was graded with a scoring system in each group. We showed that 3-AB treatment significantly decreased tissue EB levels and ultrastructural scores, attenuated damages in astrocytes and microvessels, and reduced areas of perivascular edema. In conclusion, PARP inhibition may provide a novel therapeutic approach to ischemic brain injury.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1559325819901242DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6977239PMC
January 2020

Exosomes: A Novel Therapeutic Agent for Cartilage and Bone Tissue Regeneration.

Dose Response 2019 Oct-Dec;17(4):1559325819892702. Epub 2019 Dec 13.

Department of Rehabilitation Medicine, The First People's Hospital of Wenling, Wenzhou Medical University, Zhejiang, People's Republic of China.

Despite traditionally treating autologous and allogeneic transplantation and emerging tissue engineering (TE)-based therapies, which have commonly performed in clinic for skeletal diseases, as the "gold standard" for care, undesirably low efficacy and other complications remain. Therefore, exploring new strategies with better therapeutic outcomes and lower incidences of unfavorable side effect is imperative. Recently, exosomes, secreted microvesicles of endocytic origin, have caught researcher's eyes in tissue regeneration fields, especially in cartilage and bone-related regeneration. Multiple researchers have demonstrated the crucial roles of exosomes throughout every developing stage of cartilage and bone tissue regeneration, indicating that there may be a potential therapeutic application of exosomes in future clinical use. Herein, we summarize the function of exosomes derived from the primary cells functioning in skeletal diseases and their restoration processes, therapeutic exosomes used to promote cartilage and bone repairing in recent research, and applications of exosomes within the setting of the TE matrix.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1559325819892702DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6913055PMC
December 2019

Isolation and Detection Technologies of Extracellular Vesicles and Application on Cancer Diagnostic.

Dose Response 2019 Oct-Dec;17(4):1559325819891004. Epub 2019 Dec 9.

Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, OH, USA.

The vast majority of cancers are treatable when diagnosed early. However, due to the elusive trace and the limitation of traditional biopsies, most cancers have already spread widely and are at advanced stages when they are first diagnosed, causing ever-increasing mortality in the past decades. Hence, developing reliable methods for early detection and diagnosis of cancer is indispensable. Recently, extracellular vesicles (EVs), as circulating phospholipid vesicles secreted by cells, are found to play significant roles in the intercellular communication as well as the setup of tumor microenvironments and have been identified as one of the key factors in the next-generation technique for cancer diagnosis. However, EVs present in complex biofluids that contain various contaminations such as nonvesicle proteins and nonspecific EVs, resulting in the interference of screening for desired biomarkers. Therefore, applicable isolation and enrichment methods that guarantee scale-up of sample volume, purity, speed, yield, and tumor specificity are necessary. In this review, we introduce current technologies for EV separation and summarize biomarkers toward EV-based cancer liquid biopsy. In conclusion, a novel systematic isolation method that guarantees high purity, recovery rate, and tumor specificity is still missing. Besides that, a dual-model EV-based clinical trial system includes isolation and detection is a hot trend in the future due to efficient point-of-care needs. In addition, cancer-related biomarkers discovery and biomarker database establishment are essential objectives in the research field for diagnostic settings.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1559325819891004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902397PMC
December 2019

Real-Time Multi-Scale Face Detector on Embedded Devices.

Sensors (Basel) 2019 May 9;19(9). Epub 2019 May 9.

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

Face detection is the basic step in video face analysis and has been studied for many years. However, achieving real-time performance on computation-resource-limited embedded devices still remains an open challenge. To address this problem, in this paper we propose a face detector, EagleEye, which shows a good trade-off between high accuracy and fast speed on the popular embedded device with low computation power (e.g., the Raspberry Pi 3b+). The EagleEye is designed to have low floating-point operations per second (FLOPS) as well as enough capacity, and its accuracy is further improved without adding too much FLOPS. Specifically, we design five strategies for building efficient face detectors with a good balance of accuracy and running speed. The first two strategies help to build a detector with low computation complexity and enough capacity. We use convolution factorization to change traditional convolutions into more sparse depth-wise convolutions to save computation costs and we use successive downsampling convolutions at the beginning of the face detection network. The latter three strategies significantly improve the accuracy of the light-weight detector without adding too much computation costs. We design an efficient context module to utilize context information to benefit the face detection. We also adopt information preserving activation function to increase the network capacity. Finally, we use focal loss to further improve the accuracy by handling the class imbalance problem better. Experiments show that the EagleEye outperforms the other face detectors with the same order of computation costs, on both runtime efficiency and accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3390/s19092158DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6539187PMC
May 2019

Two-Level Attention Network With Multi-Grain Ranking Loss for Vehicle Re-Identification.

IEEE Trans Image Process 2019 Sep 16;28(9):4328-4338. Epub 2019 Apr 16.

Vehicle re-identification (re-ID) aims to identify the same vehicle across multiple non-overlapping cameras, which is rather a challenging task. On the one hand, subtle changes in viewpoint and illumination condition can make the same vehicle look much different. On the other hand, different vehicles, even different vehicle models, may look quite similar. In this paper, we propose a novel Two-level Attention network supervised by a Multi-grain Ranking loss (TAMR) to learn an efficient feature embedding for the vehicle re-ID task. The two-level attention network consisting of hard part-level attention and soft pixel-level attention can adaptively extract discriminative features from the visual appearance of vehicles. The former one is designed to localize the salient vehicle parts, such as windscreen and car head. The latter one gives an additional attention refinement at pixel level to focus on the distinctive characteristics within each part. In addition, we present a multi-grain ranking loss to further enhance the discriminative ability of learned features. We creatively take the multi-grain relationship between vehicles into consideration. Thus, not only the discrimination between different vehicles but also the distinction between different vehicle models is constrained. Finally, the proposed network can learn a feature space, where both intra-class compactness and inter-class discrimination are well guaranteed. Extensive experiments demonstrate the effectiveness of our approach and we achieve state-of-the-art results on two challenging datasets, including VehicleID and Vehicle-1M.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2019.2910408DOI Listing
September 2019

Fine-grained Human-centric Tracklet Segmentation with Single Frame Supervision.

IEEE Trans Pattern Anal Mach Intell 2019 Apr 17. Epub 2019 Apr 17.

In this paper, we target at the Fine-grAined human-Centric Tracklet Segmentation (FACTS) problem, where 12 human parts, e.g., face, pants, left-leg, are segmented. To reduce the heavy and tedious labeling efforts, FACTS requires only one labeled frame per video during training. The small size of human parts and the labeling scarcity makes FACTS very challenging. Considering adjacent frames of videos are continuous and human usually do not change clothes in a short time, we explicitly consider the pixel-level and frame-level context in the proposed Temporal Context segmentation Network (TCNet). On one hand, optical flow is on-line calculated to propagate the pixel-level segmentation results to neighboring frames. On the other hand, frame-level classification likelihood vectors are also propagated to nearby frames. By fully exploiting the pixel-level and framelevel context, TCNet indirectly uses the large amount of unlabeled frames during training and produces smooth segmentation results during inference. Experimental results on four video datasets show the superiority of TCNet over the state-of-the-arts. The newly annotated datasets can be downloaded via http://liusi-group.com/projects/FACTS for the further studies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2019.2911936DOI Listing
April 2019

Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection.

IEEE Trans Image Process 2019 Jan 13;28(1):113-126. Epub 2018 Aug 13.

The field of object detection has made great progress in recent years. Most of these improvements are derived from using a more sophisticated convolutional neural network. However, in the case of humans, the attention mechanism, global structure information, and local details of objects all play an important role for detecting an object. In this paper, we propose a novel fully convolutional network, named as Attention CoupleNet, to incorporate the attention-related information and global and local information of objects to improve the detection performance. Specifically, we first design a cascade attention structure to perceive the global scene of the image and generate class-agnostic attention maps. Then the attention maps are encoded into the network to acquire object-aware features. Next, we propose a unique fully convolutional coupling structure to couple global structure and local parts of the object to further formulate a discriminative feature representation. To fully explore the global and local properties, we also design different coupling strategies and normalization ways to make full use of the complementary advantages between the global and local information. Extensive experiments demonstrate the effectiveness of our approach. We achieve state-of-the-art results on all three challenging data sets, i.e., a mAP of 85.7% on VOC07, 84.3% on VOC12, and 35.4% on COCO. Codes are publicly available at https://github.com/tshizys/CoupleNet.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2018.2865280DOI Listing
January 2019

Feature Distilled Tracking.

IEEE Trans Cybern 2019 Feb 7;49(2):440-452. Epub 2017 Dec 7.

Feature extraction and representation is one of the most important components for fast, accurate, and robust visual tracking. Very deep convolutional neural networks (CNNs) provide effective tools for feature extraction with good generalization ability. However, extracting features using very deep CNN models needs high performance hardware due to its large computation complexity, which prohibits its extensions in real-time applications. To alleviate this problem, we aim at obtaining small and fast-to-execute shallow models based on model compression for visual tracking. Specifically, we propose a small feature distilled network (FDN) for tracking by imitating the intermediate representations of a much deeper network. The FDN extracts rich visual features with higher speed than the original deeper network. To further speed-up, we introduce a shift-and-stitch method to reduce the arithmetic operations, while preserving the spatial resolution of the distilled feature maps unchanged. Finally, a scale adaptive discriminative correlation filter is learned on the distilled feature for visual tracking to handle scale variation of the target. Comprehensive experimental results on object tracking benchmark datasets show that the proposed approach achieves 5× speed-up with competitive performance to the state-of-the-art deep trackers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2017.2776977DOI Listing
February 2019

Multi-View 3D Object Retrieval With Deep Embedding Network.

IEEE Trans Image Process 2016 12 15;25(12):5526-5537. Epub 2016 Sep 15.

In multi-view 3D object retrieval, each object is characterized by a group of 2D images captured from different views. Rather than using hand-crafted features, in this paper, we take advantage of the strong discriminative power of convolutional neural network to learn an effective 3D object representation tailored for this retrieval task. Specifically, we propose a deep embedding network jointly supervised by classification loss and triplet loss to map the high-dimensional image space into a low-dimensional feature space, where the Euclidean distance of features directly corresponds to the semantic similarity of images. By effectively reducing the intra-class variations while increasing the inter-class ones of the input images, the network guarantees that similar images are closer than dissimilar ones in the learned feature space. Besides, we investigate the effectiveness of deep features extracted from different layers of the embedding network extensively and find that an efficient 3D object representation should be a tradeoff between global semantic information and discriminative local characteristics. Then, with the set of deep features extracted from different views, we can generate a comprehensive description for each 3D object and formulate the multi-view 3D object retrieval as a set-to-set matching problem. Extensive experiments on SHREC'15 data set demonstrate the superiority of our proposed method over the previous state-of-the-art approaches with over 12% performance improvement.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2016.2609814DOI Listing
December 2016

Improving Visual Saliency Computing With Emotion Intensity.

IEEE Trans Neural Netw Learn Syst 2016 06;27(6):1201-13

Saliency maps that integrate individual feature maps into a global measure of visual attention are widely used to estimate human gaze density. Most of the existing methods consider low-level visual features and locations of objects, and/or emphasize the spatial position with center prior. Recent psychology research suggests that emotions strongly influence human visual attention. In this paper, we explore the influence of emotional content on visual attention. On top of the traditional bottom-up saliency map generation, our saliency map is generated in cooperation with three emotion factors, i.e., general emotional content, facial expression intensity, and emotional object locations. Experiments, carried out on National University of Singapore Eye Fixation (a public eye tracking data set), demonstrate that incorporating emotion does improve the quality of visual saliency maps computed by bottom-up approaches for the gaze density estimation. Our method increases about 0.1 on an average of area under the curve of receiver operation characteristic curve, compared with the four baseline bottom-up approaches (Itti's, attention based on information maximization, saliency using natural, and graph-based vision saliency).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2016.2553579DOI Listing
June 2016

Weighted Part Context Learning for Visual Tracking.

IEEE Trans Image Process 2015 Dec 16;24(12):5140-51. Epub 2015 Sep 16.

Context information is widely used in computer vision for tracking arbitrary objects. Most of the existing studies focus on how to distinguish the object of interest from background or how to use keypoint-based supporters as their auxiliary information to assist them in tracking. However, in most cases, how to discover and represent both the intrinsic properties inside the object and the surrounding context is still an open problem. In this paper, we propose a unified context learning framework that can effectively capture spatiotemporal relations, prior knowledge, and motion consistency to enhance tracker's performance. The proposed weighted part context tracker (WPCT) consists of an appearance model, an internal relation model, and a context relation model. The appearance model represents the appearances of the object and the parts. The internal relation model utilizes the parts inside the object to directly describe the spatiotemporal structure property, while the context relation model takes advantage of the latent intersection between the object and background regions. Then, the three models are embedded in a max-margin structured learning framework. Furthermore, prior label distribution is added, which can effectively exploit the spatial prior knowledge for learning the classifier and inferring the object state in the tracking process. Meanwhile, we define online update functions to decide when to update WPCT, as well as how to reweight the parts. Extensive experiments and comparisons with the state of the arts demonstrate the effectiveness of the proposed method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2015.2479460DOI Listing
December 2015

Bilayer sparse topic model for scene analysis in imbalanced surveillance videos.

IEEE Trans Image Process 2014 Dec 14;23(12):5198-208. Epub 2014 Oct 14.

Dynamic scene analysis has become a popular research area especially in video surveillance. The goal of this paper is to mine semantic motion patterns and detect abnormalities deviating from normal ones occurring in complex dynamic scenarios. To address this problem, we propose a data-driven and scene-independent approach, namely, Bilayer sparse topic model (BiSTM), where a given surveillance video is represented by a word-document hierarchical generative process. In this BiSTM, motion patterns are treated as latent topics sparsely distributed over low-level motion vectors, whereas a video clip can be sparsely reconstructed by a mixture of topics (motion pattern). In addition to capture the characteristic of extreme imbalance between numerous typical normal activities and few rare abnormalities in surveillance video data, a one-class constraint is directly imposed on the distribution of documents as a discriminant priori. By jointly learning topics and one-class document representation within a discriminative framework, the topic (pattern) space is more specific and explicit. An effective alternative iteration algorithm is presented for the model learning. Experimental results and comparisons on various public data sets demonstrate the promise of the proposed approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2014.2363408DOI Listing
December 2014

Spatiotemporal grid flow for video retargeting.

IEEE Trans Image Process 2014 Apr;23(4):1615-28

Video retargeting is a useful technique to adapt a video to a desired display resolution. It aims to preserve the information contained in the original video and the shapes of salient objects while maintaining the temporal coherence of contents in the video. Existing video retargeting schemes achieve temporal coherence via constraining each region/pixel to be deformed consistently with its corresponding region/pixel in neighboring frames. However, these methods often distort the shapes of salient objects, since they do not ensure the content consistency for regions/pixels constrained to be coherently deformed along time axis. In this paper, we propose a video retargeting scheme to simultaneously meet the two requirements. Our method first segments a video clip into spatiotemporal grids called grid flows, where the consistency of the content associated with a grid flow is maintained while retargeting the grid flow. After that, due to the coarse granularity of grid, there still may exist content inconsistency in some grid flows. We exploit the temporal redundancy in a grid flow to avoid that the grids with inconsistent content be incorrectly constrained to be coherently deformed. In particular, we use grid flows to select a set of key-frames which summarize a video clip, and resize subgrid-flows in these key-frames. We then resize the remaining nonkey-frames by simply interpolating their grid contents from the two nearest retargeted key-frames. With the key-frame-based scheme, we only need to solve a small-scale quadratic programming problem to resize subgrid-flows and perform grid interpolation, leading to low computation and memory costs. The experimental results demonstrate the superior performance of our scheme.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2014.2305843DOI Listing
April 2014

Real-time probabilistic covariance tracking with efficient model update.

IEEE Trans Image Process 2012 May 2;21(5):2824-37. Epub 2012 Jan 2.

School of Information and Control Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China.

The recently proposed covariance region descriptor has been proven robust and versatile for a modest computational cost. The covariance matrix enables efficient fusion of different types of features, where the spatial and statistical properties, as well as their correlation, are characterized. The similarity between two covariance descriptors is measured on Riemannian manifolds. Based on the same metric but with a probabilistic framework, we propose a novel tracking approach on Riemannian manifolds with a novel incremental covariance tensor learning (ICTL). To address the appearance variations, ICTL incrementally learns a low-dimensional covariance tensor representation and efficiently adapts online to appearance changes of the target with only O(1) computational complexity, resulting in a real-time performance. The covariance-based representation and the ICTL are then combined with the particle filter framework to allow better handling of background clutter, as well as the temporary occlusions. We test the proposed probabilistic ICTL tracker on numerous benchmark sequences involving different types of challenges including occlusions and variations in illumination, scale, and pose. The proposed approach demonstrates excellent real-time performance, both qualitatively and quantitatively, in comparison with several previously proposed trackers.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2011.2182521DOI Listing
May 2012