Tomorrow is the workshop, we will be learning material for 2D image and 3D point cloud, I have put together lecture material that covers a good story in the context of remote sensing application. I am very excited to meet some new people tomorrow and inspire scientist to apply Deep Learning for their research! Tomorrow will be great.
Labeling point cloud is a very (labor) expensive work, in image domain, much work has been put into developing tools and algorithms for annotation. e.g. VoTT by Microsoft (https://github.com/microsoft/VoTT) Pixel Perfect (https://blogs.nvidia.com/blog/2020/08/28/v7-labs-image-annotation/) Labelme (http://labelme2.csail.mit.edu/Release3.0/browserTools/php/publications.php) and many many more if you just do a google search.
Drawing 3D bounding boxes is even more tedious to do compared to image base because it has 6 degree of freedom. On one perspective e.g. BEV your box could look perfect but when you rotate it, it’s not.
I have so far tried 2 methods 1 from Matlab an app is called “Ground Truth Labeler”
and another one from pointly.ai
Both very easy to use but Pointly.ai is much easier to navigate, however it is also very costly.
After the buildings are built in sketchup, they need to be georeferenced before they can be display on a map with proper location. Otherwise they will show up in local coordinate, which in most cases, centered at (0,0,0).
Traditionally, you can have 2 approaches
Approach 1: Georeference in sketchup and export in .kmz
Approach 2: From sketchup, export in whichever format you want, then georeference in mapping software (e.g. ArcGIS)
My approach is a bit out of the box, Approach 3: use OSM (Open Street Map)
Add location documentation from sketchup can be found here
Pros: The georeferencing procedures is very easy, in my opinion is easier than using ArcGIS.
Cons: The only format to export it is through .kmz and the exported model does not look as detail as other formats.
The second approach is export sketchup model first, then georeference in ArcGIS Pro. The documentation can be found here.
This link is also useful https://www.esri.com/arcgis-blog/products/arcgis-pro/3d-gis/geolocating-revit-files-in-arcgis-pro/
The procedure is also straight forward but the translation from (0,0,0) to appropriate place is rather cumbersome, personally I did not like it.
Pros: sketchup files can be exported in better graphics (I tried .obj), compared to kmz (approach 1) the obj results looks better visually
Cons: the georeferencing procedure is not as easy as approach 1, it took me a lot more time to put the building into the right location using this method
Approach 3: Out of box thinking
- Export sketchup building model as obj
- Using City Engine to download OSM building models, export the model as .gdb. Note the building downloaded has no texture
- Then open the model in ArcGIS Pro, use “Replace Multipatch” function to replace the OSM model with your sketchup building models (obj, from step 1).
Pros: I did not have to do any manual georeference in sketchup OR ArcGIS Pro, it’s much faster
I also did not have to use kmz (does not look good)
Benchmarking datasets are essential for standardizing performance evaluation in machine learning.
Work in progress:
What I am working on right now is a LiDAR centric benchmarking dataset, the motivation is that deep learning rely on large number of good quality training data for the purpose of semantic segmentation, object detection and object classification.
Stedman Lecture Halls
Steacie Science and Engineering Library
Accolade West Building and Centre for Film and Theatre
Date: June 21, 2021 – 9:00 am, Mountain Daylight Time (GMT-6)
Instructors: Dr. Connie Ko, Maryam Jameela, Andrew Chadwick
Description: Deep learning is one of the fastest growing areas of machine learning and has been successfully applied in many applications including speech recognition, object detection and classification for autonomous navigation. In the remote sensing community, deep learning has been applied to a variety of data types (e.g. spectral, hyperspectral, LiDAR )for the applications of image classification, anomaly detection, terrain surface classification, object detection and many more (Zhu et al., 2017).
There are two key parts to deep learning: training, and inference. Training is a process of inputting a large amount of labelled data into the deep learning model. During the training phase, the model understands the data characteristics automatically and memorizes these characteristics. Unlike traditional machine learning where features are designed by the user, deep learning algorithms automatically learn features from the training data. During the inference phase, deep learning algorithms apply features learned during the training phase to make predictions on new data. Because these processes are automated, deep learning is often referred to as an end-to-end solution for tasks that traditionally required user supervision.
The purpose of this workshop is to introduce deep learning approaches through theory, examples, and experiences. The workshop will include two sessions, the first will focus on image datasets (2D) and the second will focus on LiDAR datasets (3D). Each session will include a 40-minute lecture and 80-minute demonstration, and/or hands-on exercise.
The first session will provide an overview of deep learning, explaining some of the important theories and terminologies in convolutional neural networks. Following this, we will review Mask R-CCN (He et al., 2017), which will be the focus of this session’s demonstration. This demonstration will provide a walk-through of adapting Mask R-CNN to the task of individual tree crown delineation. The second session will cover a lecture that discuss LiDAR point-cloud processing and especially precoding. We will be reviewing PointNet (Qi et al., 2017) and Point Pillars (Lang et al., 2018) for the purpose of understanding the demonstration of single tree detection in LiDAR point cloud
Zhu, X. X., Tuia, D., Mou, L., Xia, G. S., Zhang, L., Xu, F., and Fraundorfer, F. (2017). Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine, 5(4), 8–36.
He, K., Gkioxari, G., Dollar, P. and Girshick. R. (2017). Mask R-CNN. arXiv:1703.06870.
Qi, C. R., Su, H., Mo, K., and Guibas. L. J., (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2018). PointPillars: Fast encoders for object detection from point clouds, arXiv preprint arXiv:1812.05784.
Dr. Connie Ko is an adjunct faculty member and research associate at York University. She graduated her doctorate from Department of Earth and Space Science and Engineering at Lassonde School of Engineering, York University. Connie has 13 years of LiDAR data experience and her current research interest involves the development and application of 3D object detection with LiDAR data.
Maryam Jameela is a third year PhD student in Department of Earth and Space Sciences and Engineering at Lassonde School of Engineering, York University. She has been focusing on computer vision and deep learning for geomatics applications. Her current research theme is development of noise classification algorithm and deep utility semantic neural network for large-scale 3D point cloud. She is hardcore programmer on the side with computer-science major.
Andrew Chadwick is a PhD student working under supervisor Dr. Nicholas Coops at the University of British Columbia. His work is focused on the development of operational forest monitoring tools that leverage deep learning and remotely sensed data to increase the speed and scale with which conventional inventory data is collected.
Under Open Data Toronto, 3D building models are available to download within the city of Toronto (https://open.toronto.ca/dataset/3d-massing/). According to the website “The Open Data site will enable access to application developers, designers, urban planners and architects, and the public. Ideally this will enable the creation of a visual portal and access to a large collection of city building ideas.” I visualize part of the York University campus with ArcGIS pro, they show up as multipatch.
Here is an interactive version:
Scroll to zoom in and out, left click to grab, right click to rotate.
One thing you see is the buildings are basically white blocks, the process of adding colors and details (e.g. windows, doors, building material etc.) is called texturizing. If you google Texture mapping you can see a lot of methods of doing such task.
For the purpose of learning, I am using SketchUp to perform the task.
It is a manual process, but the results makes such visual improvements:
This is the original 3D model from the Toronto Massing data.
This is a screen capture of Google Earth Model.
This is the texturized 3D model.
The model isn’t perfect, and I didn’t spend a lot of time on it, but it is definitely an improved version from the massing data. This can then be uploaded back to ArcGIS pro for geo-referencing.
Once a while I will get invited to talk about how to make media in academia, this time talking about making posters to a class of 3rd year university students. Through a hands on workshop, we will be discussing about components of academic poster, ways to present your work effectively, and visually pleasing to your audience. At the end of the workshop, I will show them how to make one using powerpoint. Tips and tricks too!