Publications by authors named "Luming Zhang"

50 Publications

Prognostic Value of Blood Urea Nitrogen/Creatinine Ratio for Septic Shock: An Analysis of the MIMIC-III Clinical Database.

Biomed Res Int 2021 22;2021:5595042. Epub 2021 May 22.

Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangdong Province 510630, China.

Background: Research has previously been done into the risk factors for mortality in septic shock patients. However, there has been no epidemiological study investigating the effect of the blood urea nitrogen/creatinine ratio (BCR) on the prognosis of critically ill septic shock patients. This study is aimed at determining the relationship between BCR and all-cause mortality in adult septic shock patients.

Methods: Data were extracted from the MIMIC-III database. The clinical endpoints were 28-, 90-, and 365-day all-cause mortality rates in critically ill septic shock patients. Cox proportional hazards models and subgroup analyses were used to analyze the relationship between BCR quartiles and all-cause mortality in septic shock patients. Receiver operator characteristic (ROC) curves and areas under the ROC curves (AUCs) were calculated to evaluate how accurately BCR predicts the mortality of septic shock patients.

Results: Among the 2484 septic shock patients extracted from the database, 619, 563, 677, and 625 fell into the first (<14.4 mg/dL), second (≥14.4 mg/dL and <20.0 mg/dL), third (≥20.0 mg/dL and <27.3 mg/dL), and fourth (≥27.3 mg/dL) quartiles of BCR, respectively. Male and white patients accounted for 53.8% (1336 patients) and 74.8% (1857 patients) of the population, respectively. The mean age of the population was 67.7 ± 15.8 years. An inverse M-shaped relationship between BCR and mortality in septic shock patients was identified, with a value of ≥27.3 mg/dL providing the highest risk (HR = 1.596, 95% CI: 1.396-1.824, < 0.001). In the Cox regression model adjusted for different confounding variables, BCR values in the fourth quartiles were significantly associated with increased mortality, using the first quartiles as a reference. The areas under the ROC curves (AUCs) for BCR plus the Sequential Organ Failure Assessment (SOFA) score and BCR plus Acute Physiology Score III (APSIII) were 0.694 (95% CI: 0.673-0.716) and 0.724 (95% CI: 0.703-0.744), respectively.

Conclusion: An inverse M-shaped curve was determined between BCR and the mortality of septic shock patients. BCR was identified as a readily available and independent prognostic biomarker for septic shock patients, and higher BCRs were associated with increased mortality in these patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1155/2021/5595042DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8164535PMC
May 2021

Effects of Stress Hyperglycemia on Short-Term Prognosis of Patients Without Diabetes Mellitus in Coronary Care Unit.

Front Cardiovasc Med 2021 19;8:683932. Epub 2021 May 19.

Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China.

Diabetes mellitus (DM) has a high morbidity and mortality worldwide, and it is a risk factor for cardiovascular diseases. Non-diabetic stress hyperglycemia is common in severely ill patients, and it could affect prognosis. This study aimed to analyze the influence of different blood glucose levels on prognosis from the perspective of stress hyperglycemia by comparing them with normal blood glucose levels and those of patients with DM. A retrospective study of 1,401 patients in coronary care unit (CCU) from the critical care database called Medical Information Mart for Intensive Care IV was performed. Patients were assigned to the following groups 1-4 based on their history of DM, random blood glucose, and HbA1c levels: normal blood glucose group, moderate stress hyperglycemia group, severe stress hyperglycemia group and DM group. The main outcome of this study was 30- and 90-day mortality rates. The associations between groups and outcomes were analyzed using Kaplan-Meier survival analysis, Cox proportional hazard regression model and competing risk regression model. A total of 1,401 patients in CCU were enrolled in this study. The Kaplan-Meier survival curve showed that group 1 had a higher survival probability than groups 3 and 4 in terms of 30- and 90-day mortalities. After controlling the potential confounders in Cox regression, groups 3 and 4 had a statistically significant higher risk of both mortalities than group 1, while no difference in mortality risk was found between groups 2 and 1. The hazard ratios [95% confidence interval (CI)] of 30- and 90-day mortality rates for group 3 were 2.77(1.39,5.54) and 2.59(1.31,5.12), respectively, while those for group 4 were 1.92(1.08,3.40) and 1.94(1.11,3.37), respectively. Severe stress hyperglycemia (≥200 mg/dL) in patients without DM in CCU may increase the risk of short-term death, which is greater than the prognostic effect in patients with diabetes. Patients with normal blood glucose levels and moderate stress hyperglycemia (140 mg/dL ≤ RBG <200 mg/dL) had no effect on short-term outcomes in patients with CCU.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fcvm.2021.683932DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8169960PMC
May 2021

Construction and Evaluation of a Sepsis Risk Prediction Model for Urinary Tract Infection.

Front Med (Lausanne) 2021 21;8:671184. Epub 2021 May 21.

Intensive Care Unit, The First Affiliated Hospital of Jinan University, Guangzhou, China.

Urinary tract infection (UTI) is one of the common causes of sepsis. However, nomograms predicting the sepsis risk in UTI patients have not been comprehensively researched. The goal of this study was to establish and validate a nomogram to predict the probability of sepsis in UTI patients. Patients diagnosed with UTI were extracted from the Medical Information Mart for Intensive Care III database. These patients were randomly divided into training and validation cohorts. Independent prognostic factors for UTI patients were determined using forward stepwise logistic regression. A nomogram containing these factors was established to predict the sepsis incidence in UTI patients. The validity of our nomogram model was determined using multiple indicators, including the area under the receiver operating characteristic curve (AUC), correction curve, Hosmer-Lemeshow test, integrated discrimination improvement (IDI), net reclassification improvement (NRI), and decision-curve analysis (DCA). This study included 6,551 UTI patients. Stepwise regression analysis revealed that the independent risk factors for sepsis in UTI patients were congestive heart failure, diabetes, liver disease, fluid electrolyte disorders, APSIII, neutrophils, lymphocytes, red blood cell distribution width, urinary protein, urinary blood, and microorganisms. The nomogram was then constructed and validated. The AUC, NRI, IDI and DCA of the nomogram all showed better performance than traditional APSIII score. The calibration curve and Hosmer-Lemeshow test results indicate that the nomogram was well-calibrated. Improved NRI and IDI values indicate that our nomogram scoring system is superior to other commonly used ICU scoring systems. The DCA curve indicates that the DCA map of the nomogram has good clinical application ability. This study identified the independent risk factors of sepsis in UTI patients and used them to construct a prediction model. The present findings may provide clinical reference information for preventing sepsis in UTI patients.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fmed.2021.671184DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8175780PMC
May 2021

Semi-supervised Perception Augmentation for Aerial Photo Topologies Understanding.

IEEE Trans Image Process 2021 May 18;PP. Epub 2021 May 18.

Intelligently understanding the sophisticated topological structures from aerial photographs is a useful technique in aerial image analysis. Conventional methods cannot fulfill this task due to the following challenges: 1) the topology number of an aerial photo increases exponentially with the topology size, which requires a fine-grained visual descriptor to discriminatively represent each topology; 2) identifying visually/semantically salient topologies within each aerial photo in a weakly-labeled context, owing to the unaffordable human resources required for pixel-level annotation; and 3) designing a cross-domain knowledge transferal module to augment aerial photo perception, since multi-resolution aerial photos are taken asynchronistically in practice. To handle the above problems, we propose a unified framework to understand aerial photo topologies, focusing on representing each aerial photo by a set of visually/semantically salient topologies based on human visual perception and further employing them for visual categorization. Specifically, we first extract multiple atomic regions from each aerial photo, and thereby graphlets are built to capture the each aerial photo topologically. Then, a weakly-supervised ranking algorithm selects a few semantically salient graphlets by seamlessly encoding multiple image-level attributes. Toward a visualizable and perception-aware framework, we construct gaze shifting path (GSP) by linking the top-ranking graphlets. Finally, we derive the deep GSP representation, and formulate a semi-supervised and cross-domain SVM to partition each aerial photo into multiple categories. The SVM utilizes the global composition from low-resolution counterparts to enhance the deep GSP features from high-resolution aerial photos which are partially-annotated. Extensive visualization results and categorization performance comparisons have demonstrated the competitiveness of our approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2021.3079820DOI Listing
May 2021

Bioinspired Scene Classification by Deep Active Learning With Remote Sensing Applications.

IEEE Trans Cybern 2021 Feb 26;PP. Epub 2021 Feb 26.

Accurately classifying sceneries with different spatial configurations is an indispensable technique in computer vision and intelligent systems, for example, scene parsing, robot motion planning, and autonomous driving. Remarkable performance has been achieved by the deep recognition models in the past decade. As far as we know, however, these deep architectures are incapable of explicitly encoding the human visual perception, that is, the sequence of gaze movements and the subsequent cognitive processes. In this article, a biologically inspired deep model is proposed for scene classification, where the human gaze behaviors are robustly discovered and represented by a unified deep active learning (UDAL) framework. More specifically, to characterize objects' components with varied sizes, an objectness measure is employed to decompose each scenery into a set of semantically aware object patches. To represent each region at a low level, a local-global feature fusion scheme is developed which optimally integrates multimodal features by automatically calculating each feature's weight. To mimic the human visual perception of various sceneries, we develop the UDAL that hierarchically represents the human gaze behavior by recognizing semantically important regions within the scenery. Importantly, UDAL combines the semantically salient region detection and the deep gaze shifting path (GSP) representation learning into a principled framework, where only the partial semantic tags are required. Meanwhile, by incorporating the sparsity penalty, the contaminated/redundant low-level regional features can be intelligently avoided. Finally, the learned deep GSP features from the entire scene images are integrated to form an image kernel machine, which is subsequently fed into a kernel SVM to classify different sceneries. Experimental evaluations on six well-known scenery sets (including remote sensing images) have shown the competitiveness of our approach.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2020.2981480DOI Listing
February 2021

Massive-Scale Aerial Photo Categorization by Cross-Resolution Visual Perception Enhancement.

IEEE Trans Neural Netw Learn Syst 2021 Feb 15;PP. Epub 2021 Feb 15.

Categorizing aerial photographs with varied weather/lighting conditions and sophisticated geomorphic factors is a key module in autonomous navigation, environmental evaluation, and so on. Previous image recognizers cannot fulfill this task due to three challenges: 1) localizing visually/semantically salient regions within each aerial photograph in a weakly annotated context due to the unaffordable human resources required for pixel-level annotation; 2) aerial photographs are generally with multiple informative attributes (e.g., clarity and reflectivity), and we have to encode them for better aerial photograph modeling; and 3) designing a cross-domain knowledge transferal module to enhance aerial photograph perception since multiresolution aerial photographs are taken asynchronistically and are mutually complementary. To handle the above problems, we propose to optimize aerial photograph's feature learning by leveraging the low-resolution spatial composition to enhance the deep learning of perceptual features with a high resolution. More specifically, we first extract many BING-based object patches (Cheng et al., 2014) from each aerial photograph. A weakly supervised ranking algorithm selects a few semantically salient ones by seamlessly incorporating multiple aerial photograph attributes. Toward an interpretable aerial photograph recognizer indicative to human visual perception, we construct a gaze shifting path (GSP) by linking the top-ranking object patches and, subsequently, derive the deep GSP feature. Finally, a cross-domain multilabel SVM is formulated to categorize each aerial photograph. It leverages the global feature from low-resolution counterparts to optimize the deep GSP feature from a high-resolution aerial photograph. Comparative results on our compiled million-scale aerial photograph set have demonstrated the competitiveness of our approach. Besides, the eye-tracking experiment has shown that our ranking-based GSPs are over 92% consistent with the real human gaze shifting sequences.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3055548DOI Listing
February 2021

MiR-375 silencing attenuates pro-inflammatory macrophage response and foam cell formation by targeting KLF4.

Exp Cell Res 2021 Mar 3;400(1):112507. Epub 2021 Feb 3.

Department of Cardio-Pulmonary Function, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Henan University People's Hospital, Zhengzhou, Henan, 450003, China.

Macrophage mediated inflammation and foam cell formation play crucial roles in the development of atherosclerosis. MiR-375 is a small noncoding RNA that significantly implicated in multiple tumor regulation and has been emerged as a novel biomarker for type 2 diabetes. However, the exact role of miR-375 on macrophage activation remains unknown. In the present study, we observed that miR-375 expression showed an up-regulated expression in atherosclerotic aortas, as well as in bone marrow derived macrophages (BMDMs) and mouse peritoneal macrophages (MPMs) isolated from ApoE deficiency mice and was gradually increased followed the Ox-LDL treated time. Functionally, miR-375 inhibition significantly decreased foam cell formation accompanied by up-regulated genes expression involved in cholesterol efflux but reduced genes expression implicated in cholesterol influx. Moreover, miR-375 silencing increased resolving M2 macrophage but reduced pro-inflammatory M1 macrophage markers expression. Such above effects can be reversed by miR-375 overexpression. Mechanistically, we noticed that miR-375 knockdown promoted KLF4 expression which was required for the ameliorated effect of miR-375 silencing on macrophage activation. Importantly, the consistent results in mRNA expression of M1 and M2 markers were observed in vivo, and miR-375ApoE mice significant decreased atherosclerotic lesions in the whole aorta and aortic sinus. Taken together, these evidences suggested that miR-375 knockdown attenuated macrophage activation partially through activation of KLF4-dependent mechanism.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.yexcr.2021.112507DOI Listing
March 2021

Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception.

IEEE Trans Cybern 2020 Jul 31;PP. Epub 2020 Jul 31.

Computational photo quality evaluation is a useful technique in many tasks of computer vision and graphics, for example, photo retaregeting, 3-D rendering, and fashion recommendation. The conventional photo quality models are designed by characterizing the pictures from all communities (e.g., ``architecture'' and ``colorful'') indiscriminately, wherein community-specific features are not exploited explicitly. In this article, we develop a new community-aware photo quality evaluation framework. It uncovers the latent community-specific topics by a regularized latent topic model (LTM) and captures human visual quality perception by exploring multiple attributes. More specifically, given massive-scale online photographs from multiple communities, a novel ranking algorithm is proposed to measure the visual/semantic attractiveness of regions inside each photograph. Meanwhile, three attributes, namely: 1) photo quality scores; weak semantic tags; and inter-region correlations, are seamlessly and collaboratively incorporated during ranking. Subsequently, we construct the gaze shifting path (GSP) for each photograph by sequentially linking the top-ranking regions from each photograph, and an aggregation-based CNN calculates the deep representation for each GSP. Based on this, an LTM is proposed to model the GSP distribution from multiple communities in the latent space. To mitigate the overfitting problem caused by communities with very few photographs, a regularizer is incorporated into our LTM. Finally, given a test photograph, we obtain its deep GSP representation and its quality score is determined by the posterior probability of the regularized LTM. Comparative studies on four image sets have shown the competitiveness of our method. Besides, the eye-tracking experiments have demonstrated that our ranking-based GSPs are highly consistent with real human gaze movements.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2019.2937319DOI Listing
July 2020

Context-Aware Block Net for Small Object Detection.

IEEE Trans Cybern 2020 Jul 28;PP. Epub 2020 Jul 28.

State-of-the-art object detectors usually progressively downsample the input image until it is represented by small feature maps, which loses the spatial information and compromises the representation of small objects. In this article, we propose a context-aware block net (CAB Net) to improve small object detection by building high-resolution and strong semantic feature maps. To internally enhance the representation capacity of feature maps with high spatial resolution, we delicately design the context-aware block (CAB). CAB exploits pyramidal dilated convolutions to incorporate multilevel contextual information without losing the original resolution of feature maps. Then, we assemble CAB to the end of the truncated backbone network (e.g., VGG16) with a relatively small downsampling factor (e.g., 8) and cast off all following layers. CAB Net can capture both basic visual patterns as well as semantical information of small objects, thus improving the performance of small object detection. Experiments conducted on the benchmark Tsinghua-Tencent 100K and the Airport dataset show that CAB Net outperforms other top-performing detectors by a large margin while keeping real-time speed, which demonstrates the effectiveness of CAB Net for small object detection.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2020.3004636DOI Listing
July 2020

Validation and identification of reference genes in Chinese hamster ovary cells for Fc-fusion protein production.

Exp Biol Med (Maywood) 2020 04 26;245(8):690-702. Epub 2020 Mar 26.

Wuya college of Innovation; College of life science and biopharmaceutics, Shenyang Pharmaceutical University, Shenyang 110016, China.

Impact Statement: In order to reveal potential genotype-phenotype relationship, RT-qPCR reactions are frequently applied which require validated and reliable reference genes. With the investigation on long-term passage and fed-batch cultivation of CHO cells producing an Fc-fusion protein, four new reference genes-Akr1a1, Gpx1, Aprt, and Rps16, were identified from 20 candidates with the aid of geNorm, NormFinder, BestKeeper, and ΔCt programs and methods. This article provided more verified options in reference gene selection in related research on CHO cells.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1177/1535370220914058DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372735PMC
April 2020

Weakly Supervised Complets Ranking for Deep Image Quality Modeling.

IEEE Trans Neural Netw Learn Syst 2020 Dec 30;31(12):5041-5054. Epub 2020 Nov 30.

Despite the competitive prediction performance, recent deep image quality models suffer from the following limitations. First, it is deficiently effective to interpret and quantify the region-level quality, which contributes to global features during deep architecture training. Second, human visual perception is sensitive to compositional features (i.e., the sophisticated spatial configurations among regions), but explicitly incorporating them into a deep model is challenging. Third, the state-of-the-art deep quality models typically use rectangular image patches as inputs, but there is no evidence that these rectangles can reflect arbitrarily shaped objects, such as beaches and jungles. By defining the complet, which is a set of image segments collaboratively characterizing the spatial/geometric distribution of multiple visual elements, we propose a novel quality-modeling framework that involves two key modules: a complet ranking algorithm and a spatially-aware dual aggregation network (SDA-Net). Specifically, to describe the region-level quality features, we build complets to characterize the high-order spatial interactions among the arbitrarily shaped segments in each image. To obtain complets that are highly descriptive to image compositions, a weakly supervised complet ranking algorithm is designed by quantifying the quality of each complet. The algorithm seamlessly encodes three factors: the image-level quality discrimination, weakly supervised constraint, and complet geometry of each image. Based on the top-ranking complets, a novel multi-column convolutional neural network (CNN) called SDA-Net is designed, which supports input segments with arbitrary shapes. The key is a dual-aggregation mechanism that fuses the intracomplet deep features and the intercomplet deep features under a unified framework. Thorough experimental validations on a series of benchmark data sets demonstrated the superiority of our method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2019.2962548DOI Listing
December 2020

Deeply Encoding Stable Patterns From Contaminated Data for Scenery Image Recognition.

IEEE Trans Cybern 2019 Nov 28. Epub 2019 Nov 28.

Effectively recognizing different sceneries with complex backgrounds and varied lighting conditions plays an important role in modern AI systems. Competitive performance has recently been achieved by the deep scene categorization models. However, these models implicitly hypothesize that the image-level labels are 100% correct, which is too restrictive. Practically, the image-level labels for massive-scale scenery sets are usually calculated by external predictors such as ImageNet-CN. These labels can easily become contaminated because no predictors are completely accurate. This article proposes a new deep architecture that calculates scene categories by hierarchically deriving stable templates, which are discovered using a generative model. Specifically, we first construct a semantic space by incorporating image-level labels using subspace embedding. Afterward, it is noticeable that in the semantic space, the superpixel distributions from identically labeled images remain unchanged, regardless of the image-level label noises. On the basis of this observation, a probabilistic generative model learns the stable templates for each scene category. To deeply represent each scenery category, a novel aggregation network is developed to statistically concatenate the CNN features learned from scene annotations predicted by HSA. Finally, the learned deep representations are integrated into an image kernel, which is subsequently incorporated into a multiclass SVM for distinguishing scene categories. Thorough experiments have shown the performance of our method. As a byproduct, an empirical study of 33 SIFT-flow categories shows that the learned stable templates remain almost unchanged under a nearly 36% image label contamination rate.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2019.2951798DOI Listing
November 2019

Scene Categorization by Deeply Learning Gaze Behavior in a Semisupervised Context.

IEEE Trans Cybern 2019 May 23. Epub 2019 May 23.

Accurately recognizing different categories of sceneries with sophisticated spatial configurations is a useful technique in computer vision and intelligent systems, e.g., scene understanding and autonomous driving. Competitive accuracies have been observed by the deep recognition models recently. Nevertheless, these deep architectures cannot explicitly characterize human visual perception, that is, the sequence of gaze allocation and the subsequent cognitive processes when viewing each scenery. In this paper, a novel spatially aware aggregation network is proposed for scene categorization, where the human gaze behavior is discovered in a semisupervised setting. In particular, as semantically labeling a large quantity of scene images is labor-intensive, a semisupervised and structure-preserved non-negative matrix factorization (NMF) is proposed to detect a set of visually/semantically salient regions from each scenery. Afterward, the gaze shifting path (GSP) is engineered to characterize the process of humans perceiving each scene picture. To deeply describe each GSP, a novel spatially aware CNN termed SA-Net is developed. It accepts input regions with various shapes and statistically aggregates all the salient regions along each GSP. Finally, the learned deep GSP features from the entire scene images are fused into an image kernel, which is subsequently integrated into a kernel SVM to categorize different sceneries. Comparative experiments on six scene image sets have shown the advantage of our method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2019.2913016DOI Listing
May 2019

The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers.

IEEE Trans Pattern Anal Mach Intell 2020 Sep 30;42(9):2287-2305. Epub 2019 Apr 30.

Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics. Despite the long history dated back to the 1960s, MWPs have regained intensive attention in the past few years with the advancement of Artificial Intelligence (AI). Solving MWPs successfully is considered as a milestone towards general AI. Many systems have claimed promising results in self-crafted and small-scale datasets. However, when applied on large and diverse datasets, none of the proposed methods in the literature achieves high precision, revealing that current MWP solvers still have much room for improvement. This motivated us to present a comprehensive survey to deliver a clear and complete picture of automatic math problem solvers. In this survey, we emphasize on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap, and compare their performance in the publicly accessible datasets. We also cover automatic solvers for other types of math problems such as geometric problems that require the understanding of diagrams. Finally, we identify several emerging research directions for the readers with interests in MWPs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2019.2914054DOI Listing
September 2020

Aesthetics-Guided Graph Clustering With Absent Modalities Imputation.

IEEE Trans Image Process 2019 Jul 6;28(7):3462-3476. Epub 2019 Feb 6.

Accurately clustering Internet-scale Internet users into multiple communities according to their aesthetic styles is a useful technique in image modeling and data mining. In this paper, we present a novel partially supervised model which seeks a sparse representation to capture photo aesthetics. It optimally fuzes multi-channel features, i.e., human gaze behavior, quality scores, and semantic tags, each of which could be absent. Afterward, by leveraging the KL-divergence to distinguish the aesthetic distributions between photo sets, a large-scale graph is constructed to describe the aesthetic correlations between users. Finally, a dense subgraph mining algorithm which intrinsically supports outliers (i.e., unique users not belong to any community) is adopted to detect aesthetic communities. The comprehensive experimental results on a million-scale image set grabbed from Flickr have demonstrated the superiority of our method. As a byproduct, the discovered aesthetic communities can enhance photo retargeting and video summarization substantially.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2019.2897940DOI Listing
July 2019

Learning Latent Stable Patterns for Image Understanding With Weak and Noisy Labels.

IEEE Trans Cybern 2019 Dec 5;49(12):4243-4252. Epub 2018 Oct 5.

This paper focuses on weakly supervised image understanding, in which the semantic labels are available only at image-level, without the specific object or scene location in an image. Existing algorithms implicitly assume that image-level labels are error-free, which might be too restrictive. In practice, image labels obtained from the pretrained predictors are easily contaminated. To solve this problem, we propose a novel algorithm for weakly supervised segmentation when only noisy image labels are available during training. More specifically, a semantic space is constructed first by encoding image labels through a graphlet (i.e., superpixel cluster) embedding process. Then, we observe that in the semantic space, the distribution of graphlets from images with a same label remains stable, regardless of the noises in image labels. Therefore, we propose a generative model, called latent stability analysis, to discover the stable patterns from images with noisy labels. Inferring graphlet semantics by making use of these mid-level stable patterns is much more secure and accurate than directly transferring noisy image-level labels into different regions. Finally, we calculate the semantics of each superpixel using maximum majority voting of its correlated graphlets. Comprehensive experimental results show that our algorithm performs impressively when the image labels are predicted by either the hand-crafted or deeply learned image descriptors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2018.2861419DOI Listing
December 2019

Online Robust Low-Rank Tensor Modeling for Streaming Data Analysis.

IEEE Trans Neural Netw Learn Syst 2019 Apr 20;30(4):1061-1075. Epub 2018 Aug 20.

Tensor data (i.e., the data having multiple dimensions) are quickly growing in scale in many practical applications, which poses new challenges for data modeling and analysis approaches, such as high-order relations of large complexity, gross noise, and varying data scale. Existing low-rank data analysis methods, which are effective at analyzing matrix data, may fail in the regime of tensor data due to these challenges. A robust and scalable low-rank tensor modeling method is heavily desired. In this paper, we develop an online robust low-rank tensor modeling (ORLTM) method to address these challenges. The ORLTM method leverages the high-order correlations among all tensor modes to model an intrinsic low-rank structure of streaming tensor data online and can effectively analyze data residing in a mixture of multiple subspaces by virtue of dictionary learning. ORLTM consumes a very limited memory space that remains constant regardless of the increase of tensor data size, which facilitates processing tensor data at a large scale. More concretely, it models each mode unfolding of streaming tensor data using the bilinear formulation of tensor nuclear norms. With this reformulation, ORLTM employs a stochastic optimization algorithm to learn the tensor low-rank structure alternatively for online updating. To capture the final tensors, ORLTM uses an average pooling operation on folded tensors in all modes. We also provide the analysis regarding computational complexity, memory cost, and convergence. Moreover, we extend ORLTM to the image alignment scenario by incorporating the geometrical transformations and linearizing the constraints. Extensive empirical studies on synthetic database and three practical vision tasks, including video background subtraction, image alignment, and visual tracking, have demonstrated the superiority of the proposed method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2018.2860964DOI Listing
April 2019

Scene Categorization Using Deeply Learned Gaze Shifting Kernel.

IEEE Trans Cybern 2019 Jun 11;49(6):2156-2167. Epub 2018 May 11.

Accurately recognizing sophisticated sceneries from a rich variety of semantic categories is an indispensable component in many intelligent systems, e.g., scene parsing, video surveillance, and autonomous driving. Recently, there have emerged a large quantity of deep architectures for scene categorization, wherein promising performance has been achieved. However, these models cannot explicitly encode human visual perception toward different sceneries, i.e., the sequence of humans sequentially allocates their gazes. To solve this problem, we propose deep gaze shifting kernel to distinguish sceneries from different categories. Specifically, we first project regions from each scenery into the so-called perceptual space, which is established by combining color, texture, and semantic features. Then, a novel non-negative matrix factorization algorithm is developed which decomposes the regions' feature matrix into the product of the basis matrix and the sparse codes. The sparse codes indicate the saliency level of different regions. In this way, the gaze shifting path from each scenery is derived and an aggregation-based convolutional neural network is designed accordingly to learn its deep representation. Finally, the deep representations of gaze shifting paths from all the scene images are incorporated into an image kernel, which is further fed into a kernel SVM for scene categorization. Comprehensive experiments on six scenery data sets have demonstrated the superiority of our method over a series of shallow/deep recognition models. Besides, eye tracking experiments have shown that our predicted gaze shifting paths are 94.6% consistent with the real human gaze allocations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2018.2820731DOI Listing
June 2019

Deep Active Learning with Contaminated Tags for Image Aesthetics Assessment.

IEEE Trans Image Process 2018 Apr 18. Epub 2018 Apr 18.

Image aesthetic quality assessment has becoming an indispensable technique that facilitates a variety of image applications, e.g., photo retargeting and non-realistic rendering. Conventional approaches suffer from the following limitations: 1) the inefficiency of semantically describing images due to the inherent tag noise and incompletion, 2) the difficulty of accurately reflecting how humans actively perceive various regions inside each image, and 3) the challenge of incorporating the aesthetic experiences of multiple users. To solve these problems, we propose a novel semi-supervised deep active learning (SDAL) algorithm, which discovers how humans perceive semantically important regions from a large quantity of images partially assigned with contaminated tags. More specifically, as humans usually attend to the foreground objects before understanding them, we extract a succinct set of BING (binarized normed gradients) [60]-based object patches from each image. To simulate human visual perception, we propose SDAL which hierarchically learns human gaze shifting path (GSP) by sequentially linking semantically important object patches from each scenery. Noticeably, SDLA unifies the semantically important regions discovery and deep GSP feature learning into a principled framework, wherein only a small proportion of tagged images are adopted. Moreover, based on the sparsity penalty, SDLA can optimally abandon the noisy or redundant low-level image features. Finally, by leveraging the deeply-learned GSP features, a probabilistic model is developed for image aesthetics assessment, where the experience of multiple professional photographers can be encoded. Besides, auxiliary quality-related features can be conveniently integrated into our probabilistic model. Comprehensive experiments on a series of benchmark image sets have demonstrated the superiority of our method. As a byproduct, eye tracking experiments have shown that GSPs generated by our SDAL are about 93% consistent with real human gaze shifting paths.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2018.2828326DOI Listing
April 2018

Camera-Assisted Video Saliency Prediction and Its Applications.

IEEE Trans Cybern 2018 Sep 21;48(9):2520-2530. Epub 2017 Dec 21.

Video saliency prediction is an indispensable yet challenging technique which can facilitate various applications, such as video surveillance, autonomous driving, and realistic rendering. Based on the popularity of embedded cameras, we in this paper predict region-level saliency from videos by leveraging human gaze locations recorded using a camera, (e.g., those equipped on an iMAC and laptop PC). Our proposed camera-assisted mechanism improves saliency prediction by discovering human attended regions inside a video clip. It is orthogonal to the current saliency models, i.e., any existing video/image saliency model can be boosted by our mechanism. First of all, the spatial-and temporal-level visual features are exploited collaboratively for calculating an initial saliency map. We notice that the current saliency models are not sufficiently adaptable to the variations in lighting, different view angles, and complicated backgrounds. Therefore, assisted by a camera tracking human gaze movements, a non-negative matrix factorization algorithm is designed to accurately localize the semantically/visually salient video regions perceived by humans. Finally, the learned human gaze locations as well as the initial saliency map are integrated to optimize video saliency calculation. Empirical results thoroughly demonstrated that: 1) our approach achieves the state-of-the-art video saliency prediction accuracy by outperforming 11 mainstream algorithms considerably and 2) our method can conveniently and successfully enhance video retargeting, quality estimation, and summarization.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2017.2741498DOI Listing
September 2018

Engineering Deep Representations for Modeling Aesthetic Perception.

IEEE Trans Cybern 2018 Nov 11;48(11):3092-3104. Epub 2017 Dec 11.

Many aesthetic models in multimedia and computer vision suffer from two shortcomings: 1) the low descriptiveness and interpretability 1 of the hand-crafted aesthetic criteria (i.e., fail to indicate region-level aesthetics) and 2) the difficulty of engineering aesthetic features adaptively and automatically toward different image sets. To remedy these problems, we develop a deep architecture to learn aesthetically relevant visual attributes from Flickr, 2 which are localized by multiple textual attributes in a weakly supervised setting. More specifically, using a bag-of-words representation of the frequent Flickr image tags, a sparsity-constrained subspace algorithm discovers a compact set of textual attributes (i.e., each textual attribute is a sparse and linear representation of those frequent image tags) for each Flickr image. Then, a weakly supervised learning algorithm projects the textual attributes at image-level to the highly-responsive image patches. These patches indicate where humans look at appealing regions with respect to each textual attribute, which are employed to learn the visual attributes. Psychological and anatomical studies have demonstrated that humans perceive visual concepts in a hierarchical way. Therefore, we normalize these patches and further feed them into a five-layer convolutional neural network to mimic the hierarchy of human perceiving the visual attributes. We apply the learned deep features onto applications like image retargeting, aesthetics ranking, and retrieval. Both subjective and objective experimental results thoroughly demonstrate the superiority of our approach.1 In this paper, "describing" and "interpretability" means the ability of seeking region-level representation of each mined textual attribute, i.e., a sparse and linear representation of those frequent image tags. 2 https://www.flickr.com/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2017.2758350DOI Listing
November 2018

Exploring Web Images to Enhance Skin Disease Analysis Under A Computer Vision Framework.

IEEE Trans Cybern 2018 Nov 1;48(11):3080-3091. Epub 2017 Nov 1.

To benefit the skin care, this paper aims to design an automatic and effective visual analysis framework, with the expectation of recognizing the skin disease from a given image conveying the disease affected surface. This task is nontrivial, since it is hard to collect sufficient well-labeled samples. To address such problem, we present a novel transfer learning model, which is able to incorporate external knowledge obtained from the rich and relevant Web images contributed by grassroots. In particular, we first construct a target domain by crawling a small set of images from vertical and professional dermatological websites. We then construct a source domain by collecting a large set of skin disease related images from commercial search engines. To reinforce the learning performance in the target domain, we initially build a learning model in the target domain, and then seamlessly leverage the training samples in the source domain to enhance this learning model. The distribution gap between these two domains are bridged by a linear combination of Gaussian kernels. Instead of training models with low-level features, we resort to deep models to learn the succinct, invariant, and high-level image representations. Different from previous efforts that focus on a few types of skin diseases with a small and confidential set of images generated from hospitals, this paper targets at thousands of commonly seen skin diseases with publicly accessible Web images. Hence the proposed model is easily repeatable by other researchers and extendable to other disease types. Extensive experiments on a real-world dataset have demonstrated the superiority of our proposed method over the state-of-the-art competitors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2017.2765665DOI Listing
November 2018

Perceptually Aware Image Retargeting for Mobile Devices.

IEEE Trans Image Process 2018 May;27(5):2301-2313

Retargeting aims at adapting an original high-resolution photograph/video to a low-resolution screen with an arbitrary aspect ratio. Conventional approaches are generally based on desktop PCs, since the computation might be intolerable for mobile platforms (especially when retargeting videos). Typically, only low-level visual features are exploited, and human visual perception is not well encoded. In this paper, we propose a novel retargeting framework that rapidly shrinks a photograph/video by leveraging human gaze behavior. Specifically, we first derive a geometry-preserving graph ranking algorithm, which efficiently selects a few salient object patches to mimic the human gaze shifting path (GSP) when viewing a scene. Afterward, an aggregation-based CNN is developed to hierarchically learn the deep representation for each GSP. Based on this, a probabilistic model is developed to learn the priors of the training photographs that are marked as aesthetically pleasing by professional photographers. We utilize the learned priors to efficiently shrink the corresponding GSP of a retargeted photograph/video to maximize its similarity to those from the training photographs. Extensive experiments have demonstrated that: 1) our method requires less than 35 ms to retarget a photograph (or a video frame) on popular iOS/Android devices, which is orders of magnitude faster than the conventional retargeting algorithms; 2) the retargeted photographs/videos produced by our method significantly outperform those of its competitors based on a paired-comparison-based user study; and 3) the learned GSPs are highly indicative of human visual attention according to the human eye tracking experiments.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2017.2779272DOI Listing
May 2018

Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge.

IEEE Trans Image Process 2017 Oct 19;26(10):4871-4884. Epub 2017 Jun 19.

Driven by the rapid development of Internet and digital technologies, we have witnessed the explosive growth of Web images in recent years. Seeing that labels can reflect the semantic contents of the images, automatic image annotation, which can further facilitate the procedure of image semantic indexing, retrieval, and other image management tasks, has become one of the most crucial research directions in multimedia. Most of the existing annotation methods, heavily rely on well-labeled training data (expensive to collect) and/or single view of visual features (insufficient representative power). In this paper, inspired by the promising advance of feature engineering (e.g., CNN feature and scale-invariant feature transform feature) and inexhaustible image data (associated with noisy and incomplete labels) on the Web, we propose an effective and robust scheme, termed robust multi-view semi-supervised learning (RMSL), for facilitating image annotation task. Specifically, we exploit both labeled images and unlabeled images to uncover the intrinsic data structural information. Meanwhile, to comprehensively describe an individual datum, we take advantage of the correlated and complemental information derived from multiple facets of image data (i.e., multiple views or features). We devise a robust pairwise constraint on outcomes of different views to achieve annotation consistency. Furthermore, we integrate a robust classifier learning component via l loss, which can provide effective noise identification power during the learning process. Finally, we devise an efficient iterative algorithm to solve the optimization problem in RMSL. We conduct comprehensive experiments on three different data sets, and the results illustrate that our proposed approach is promising for automatic image annotation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2017.2717185DOI Listing
October 2017

GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media.

KDD 2016 Aug;2016:1305-1314

Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (, GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a method using GeoSM data. Our insight is that the GeoSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GeoSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we find that GMove significantly outperforms baseline mobility models in terms of prediction accuracy.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1145/2939672.2939793DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5288006PMC
August 2016

Weakly Supervised Multimodal Kernel for Categorizing Aerial Photographs.

IEEE Trans Image Process 2017 Aug 14;26(8):3748-3758. Epub 2016 Dec 14.

Accurately distinguishing aerial photographs from different categories is a promising technique in computer vision. It can facilitate a series of applications, such as video surveillance and vehicle navigation. In this paper, a new image kernel is proposed for effectively recognizing aerial photographs. The key is to encode high-level semantic cues into local image patches in a weakly supervised way, and integrate multimodal visual features using a newly developed hashing algorithm. The flowchart can be elaborated as follows. Given an aerial photo, we first extract a number of graphlets to describe its topological structure. For each graphlet, we utilize color and texture to capture its appearance, and a weakly supervised algorithm to capture its semantics. Thereafter, aerial photo categorization can be naturally formulated as graphlet-to-graphlet matching. As the number of graphlets from each aerial photo is huge, to accelerate matching, we present a hashing algorithm to seamlessly fuze the multiple visual features into binary codes. Finally, an image kernel is calculated by fast matching the binary codes corresponding to each graphlet. And a multi-class SVM is learned for aerial photo categorization. We demonstrate the advantage of our proposed model by comparing it with state-of-the-art image descriptors. Moreover, an in-depth study of the descriptiveness of the hash-based graphlet is presented.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2016.2639438DOI Listing
August 2017

Constrained Low-Rank Learning Using Least Squares-Based Regularization.

IEEE Trans Cybern 2017 Dec 10;47(12):4250-4262. Epub 2016 Nov 10.

Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2016.2623638DOI Listing
December 2017

SnapVideo: Personalized Video Generation for a Sightseeing Trip.

IEEE Trans Cybern 2017 Nov 19;47(11):3866-3878. Epub 2016 Jul 19.

Leisure tourism is an indispensable activity in urban people's life. Due to the popularity of intelligent mobile devices, a large number of photos and videos are recorded during a trip. Therefore, the ability to vividly and interestingly display these media data is a useful technique. In this paper, we propose SnapVideo, a new method that intelligently converts a personal album describing of a trip into a comprehensive, aesthetically pleasing, and coherent video clip. The proposed framework contains three main components. The scenic spot identification model first personalizes the video clips based on multiple prespecified audience classes. We then search for some auxiliary related videos from YouTube according to the selected photos. To comprehensively describe a scenery, the view generation module clusters the crawled video frames into a number of views. Finally, a probabilistic model is developed to fit the frames from multiple views into an aesthetically pleasing and coherent video clip, which optimally captures the semantics of a sightseeing trip. Extensive user studies demonstrated the competitiveness of our method from an aesthetic point of view. Moreover, quantitative analysis reflects that semantically important spots are well preserved in the final video clip.https://www.youtube.com/.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2016.2585764DOI Listing
November 2017

Multiview Physician-Specific Attributes Fusion for Health Seeking.

IEEE Trans Cybern 2017 Nov 21;47(11):3680-3691. Epub 2016 Jun 21.

Community-based health services have risen as important online resources for resolving users health concerns. Despite the value, the gap between what health seekers with specific health needs and what busy physicians with specific attitudes and expertise can offer is being widened. To bridge this gap, we present a question routing scheme that is able to connect health seekers to the right physicians. In this scheme, we first bridge the expertise matching gap via a probabilistic fusion of the physician-expertise distribution and the expertise-question distribution. The distributions are calculated by hypergraph-based learning and kernel density estimation. We then measure physicians attitudes toward answering general questions from the perspectives of activity, responsibility, reputation, and willingness. At last, we adaptively fuse the expertise modeling and attitude modeling by considering the personal needs of the health seekers. Extensive experiments have been conducted on a real-world dataset to validate our proposed scheme.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2016.2577590DOI Listing
November 2017

On the convergence of a high-accuracy compact conservative scheme for the modified regularized long-wave equation.

Springerplus 2016 18;5:474. Epub 2016 Apr 18.

Department of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016 China.

In this article, we develop a high-order efficient numerical scheme to solve the initial-boundary problem of the MRLW equation. The method is based on a combination between the requirement to have a discrete counterpart of the conservation of the physical "energy" of the system and finite difference method. The scheme consists of a fourth-order compact finite difference approximation in space and a version of the leap-frog scheme in time. The unique solvability of numerical solutions is shown. A priori estimate and fourth-order convergence of the finite difference approximate solution are discussed by using discrete energy method and some techniques of matrix theory. Numerical results are given to show the validity and the accuracy of the proposed method.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/s40064-016-2085-9DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4835426PMC
May 2016
-->