#LyX 2.3 created this file. For more info see http://www.lyx.org/ \lyxformat 544 \begin_document \begin_header \save_transient_properties true \origin unavailable \textclass article \use_default_options true \begin_modules customHeadersFooters minimalistic todonotes \end_modules \maintain_unincluded_children false \language english \language_package default \inputencoding auto \fontencoding global \font_roman "default" "default" \font_sans "default" "default" \font_typewriter "default" "default" \font_math "auto" "auto" \font_default_family default \use_non_tex_fonts false \font_sc false \font_osf false \font_sf_scale 100 100 \font_tt_scale 100 100 \use_microtype false \use_dash_ligatures true \graphics default \default_output_format default \output_sync 0 \bibtex_command biber \index_command default \paperfontsize default \spacing single \use_hyperref false \pdf_title "Holoportation" \pdf_author "Andy Pack" \pdf_subject "The use of Kinect cameras to stream 3D video from client to server" \pdf_bookmarks true \pdf_bookmarksnumbered false \pdf_bookmarksopen false \pdf_bookmarksopenlevel 1 \pdf_breaklinks false \pdf_pdfborder false \pdf_colorlinks false \pdf_backref false \pdf_pdfusetitle true \papersize default \use_geometry true \use_package amsmath 1 \use_package amssymb 1 \use_package cancel 1 \use_package esint 1 \use_package mathdots 1 \use_package mathtools 1 \use_package mhchem 1 \use_package stackrel 1 \use_package stmaryrd 1 \use_package undertilde 1 \cite_engine biblatex \cite_engine_type authoryear \biblio_style plain \biblatex_bibstyle ieee \biblatex_citestyle ieee \use_bibtopic false \use_indices false \paperorientation portrait \suppress_date true \justification true \use_refstyle 1 \use_minted 0 \index Index \shortcut idx \color #008000 \end_index \leftmargin 2cm \topmargin 2cm \rightmargin 2cm \bottommargin 2cm \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \paragraph_indentation default \is_math_indent 0 \math_numbering_side default \quotes_style english \dynamic_quotes 0 \papercolumns 1 \papersides 1 \paperpagestyle fancy \bullet 1 0 9 -1 \tracking_changes false \output_changes false \html_math_output 0 \html_css_as_file 0 \html_be_strict false \end_header \begin_body \begin_layout Title \size giant Multi-Source Holoportation \end_layout \begin_layout Author Andy Pack \end_layout \begin_layout Standard \align center \size largest Mid-Term Report \end_layout \begin_layout Standard \begin_inset VSpace bigskip \end_inset \end_layout \begin_layout Standard \align center \begin_inset Graphics filename ../surreylogo.png lyxscale 30 width 60col% \end_inset \end_layout \begin_layout Standard \begin_inset VSpace vfill \end_inset \end_layout \begin_layout Standard \align center \size large Department of Electrical and Electronic Engineering \begin_inset Newline newline \end_inset Faculty of Engineering and Physical Sciences \begin_inset Newline newline \end_inset University of Surrey \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Abstract abstract \end_layout \begin_layout Standard \begin_inset CommandInset toc LatexCommand tableofcontents \end_inset \end_layout \begin_layout List of TODOs \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Right Footer Andy Pack / 6420013 \end_layout \begin_layout Left Footer January 2020 \end_layout \begin_layout Section Introduction \end_layout \begin_layout Standard The aim of this project is to develop a piece of software capable of supporting multi-source holoportation (hologram teleportation) using the \emph on \noun on LiveScan3D \emph default \noun default \begin_inset CommandInset citation LatexCommand cite key "livescan3d" literal "false" \end_inset suite of software as a base. \end_layout \begin_layout Standard As the spaces of augmented and virtual reality mature and become more commonplac e, the ability to capture and stream 3D renderings of objects and people over the internet using consumer-grade hardware has many possible applications. \end_layout \begin_layout Standard This represents one of the most direct evolutions of traditional video streaming when applied to this new technological space. \end_layout \begin_layout Standard The \noun on LiveScan3D \noun default suite uses \noun on Xbox Kinect \noun default cameras to capture and stream 3D renders of objects from one or many angles simultaneously however the destination server is only able to process and reconstruct one object or surroundings at a time. \end_layout \begin_layout Standard The capability to concurrently receive and reconstruct streams of different objects further broadens the landscape of possible applications, analogous to the movement from 1-to-1 phone calls to conference calling. \end_layout \begin_layout Section Literature Review \end_layout \begin_layout Standard The significance of the 3D video captured and relayed with the \noun on LiveScan \noun default suite is closely related to the development of new technologies able to immersively display such video content. Therefore before discussing the specific extension that this project will make to the \noun on LiveScan \noun default software it is important to contextualise it within the space of 3D video capture while also considering it's implications for AR and VR applications. \end_layout \begin_layout Subsection Augmented and Virtual Reality \end_layout \begin_layout Subsection Traditional Optical 3D Reconstruction \end_layout \begin_layout Subsection Kinect and RGB-D Cameras \end_layout \begin_layout Subsection Holoportation and Telepresence \end_layout \begin_layout Standard The term Holoportation is defined and exemplified in the \noun on Microsoft Research \noun default paper \begin_inset CommandInset citation LatexCommand cite key "holoportation" literal "false" \end_inset , where an end-to-end pipeline is laid out for the acquisition, transmission and display of 3D video facilitating real-time AR and VR experiences. The \noun on Microsoft Research \noun default paper builds on works such as \begin_inset CommandInset citation LatexCommand cite key "Immersive-telepresence" literal "false" \end_inset 2 years earlier which describes attempts at achieving \begin_inset Quotes eld \end_inset telepresence \begin_inset Quotes erd \end_inset , a term coined by Marvin Minksy to describe the transparent and intuitive remote control of robot arms as if they were the controllers own \begin_inset CommandInset citation LatexCommand cite key "marvin-minksy" literal "false" \end_inset . The term was broadened by Bill Buxton \begin_inset CommandInset citation LatexCommand cite key "buxton-telepresence" literal "false" \end_inset to include the space of telecommunications to describe technology being used to make someone feel present in a different environment. In the context of holoportation this is through the use of 3D video reconstruct ion. The aforementioned \begin_inset CommandInset citation LatexCommand cite key "Immersive-telepresence" literal "false" \end_inset used 10 \noun on Microsoft Kinect \noun default cameras to capture a room before virtually reconstructing the models. \end_layout \begin_layout Standard In service of demonstrating it's applicability to achieving telepresence, a figure was isolated from the surroundings and stereoscopically rear-projected onto a screen for a single participant, a result of this can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:stereoscopic" plural "false" caps "false" noprefix "false" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/telepresence-stereoscopic.png lyxscale 30 width 40col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout An example of stereoscopic projection of depth aware footage captured during \begin_inset CommandInset citation LatexCommand cite key "Immersive-telepresence" literal "false" \end_inset \begin_inset CommandInset label LatexCommand label name "fig:stereoscopic" \end_inset \end_layout \end_inset \end_layout \begin_layout Plain Layout \end_layout \end_inset \end_layout \begin_layout Standard The \noun on Microsoft Research \noun default paper demonstrates a system using 8 cameras surrounding a space. Each camera captured both Near Infra-Red and colour images to construct a colour-depth video stream, . \end_layout \begin_layout Subsection Multi-Source Holoportation \end_layout \begin_layout Standard The space of work implementing multi-source holoportation has been explored in works such as \begin_inset CommandInset citation LatexCommand cite key "group-to-group-telepresence" literal "false" \end_inset in the context of shared architectural design spaces in virtual reality similar to a conference call. Two groups of people were captured in 3D using clusters of \noun on Kinect \noun default cameras before having these renders transmitted to the other group. Each group reconstructs the other's render for display in virtual reality in conjunction with their own. In doing so a shared virtual space for the two groups has been created and it can be seen to implement the process of holoportation. The shared architectural design experience is emergent of the semantics of the virtual space where a World in Miniature (WIM) metaphor is used. \end_layout \begin_layout Subsubsection Worlds in Miniature \end_layout \begin_layout Standard The Worlds in Miniature is described in the paper \begin_inset CommandInset citation LatexCommand cite key "wim" literal "false" \end_inset as a set of interfaces between the user and the virtual space they experience using tactile and visual tools. The interface involves providing the user with a miniature render of the world they are inhabiting. This model can interacted with in order to affect the full scale environment around them. \end_layout \begin_layout Standard This navigation tool maps well to the architecture groupware structure of \begin_inset CommandInset citation LatexCommand cite key "group-to-group-telepresence" literal "false" \end_inset , an image captured during the work can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:World-in-Miniature-group-by-group" plural "false" caps "false" noprefix "false" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/group-by-group.png lyxscale 30 width 50col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout World in Miniature render demonstrated in a multi-source holoporation context during \begin_inset CommandInset citation LatexCommand cite key "group-to-group-telepresence" literal "false" \end_inset \begin_inset CommandInset label LatexCommand label name "fig:World-in-Miniature-group-by-group" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section LiveScan3D \end_layout \begin_layout Standard \noun on LiveScan3D \noun default is a suite of software developed by Marek Kowalski, Jacek Naruniec and Michal Daniluk of the Warsaw University of Technology in 2015 \begin_inset CommandInset citation LatexCommand cite key "livescan3d" literal "false" \end_inset . The suite utilises the \noun on Xbox Kinect \noun default v2 camera to record and transmit 3D renders over an IP network. A server can manage multiple clients simultaneously and is responsible for processing, reconstructing and displaying the renderings in real-time. \end_layout \begin_layout Standard These renderings take the form of a point cloud, a collection of 3D co-ordinates indicating the position of each voxel (3D pixel) and it's associated RGB colour value. As a result of it's analogous nature to a traditional frame of 2D video, the terms \begin_inset Quotes eld \end_inset render \begin_inset Quotes erd \end_inset , \begin_inset Quotes eld \end_inset point cloud \begin_inset Quotes erd \end_inset and \begin_inset Quotes eld \end_inset frame \begin_inset Quotes erd \end_inset are used interchangeably from here. \end_layout \begin_layout Subsection \noun on LiveScan \noun default Client \end_layout \begin_layout Standard The \noun on LiveScan \noun default Client is responsible for interfacing with the \noun on Kinect \noun default sensor via the \noun on Kinect \noun default v2 SDK and transmitting frames to the \noun on LiveScan \noun default Server. Body detection takes place client side, as does calibration when using multiple sensors. \end_layout \begin_layout Subsection \noun on LiveScan \noun default Server \end_layout \begin_layout Standard The server component of the \noun on LiveScan \noun default suite is responsible for managing and receiving 3D renders from connected clients. These renderings are reconstructed in an \noun on OpenGL \noun default window, the structure of the \noun on LiveScan \noun default server can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:server-structure" plural "false" caps "false" noprefix "false" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/initial-state.png lyxscale 30 width 50col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout Initial structure of the \noun on LiveScan3D \noun default server \begin_inset CommandInset label LatexCommand label name "fig:server-structure" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard The \noun on KinectServer \noun default is responsible for the network layer of the program, managing client connection s via \noun on KinectSocket \noun default s and frame reception. Received frames in the form of lists of vertices, RGB values, camera poses and bodies override shared variables between the main window and the \noun on OpenGL \noun default window. \end_layout \begin_layout Subsection Frame Geometry & Multi-View Configurations \end_layout \begin_layout Standard When using a single client setup frames are transmitted in their own co-ordinate space, the sensor is made the origin with the scene being rendered in front of it. \end_layout \begin_layout Standard When using multiple sensors, the server would be unable to combine these unique Euclidean spaces without knowledge of the sensors relative positions. \end_layout \begin_layout Standard In order to make a composite frame a calibration process is completed client side following instruction by the server. \end_layout \begin_layout Section Current Work \end_layout \begin_layout Standard The required development to take the existing \noun on LiveScan \noun default codebase to the desired multi-source result can be split into two parts. \end_layout \begin_layout Standard The network layer of the \noun on LiveScan \noun default server must be updated in order to accommodate multiple clients logically grouped into \begin_inset Quotes eld \end_inset sources \begin_inset Quotes erd \end_inset for which separate frames are collected for display. \end_layout \begin_layout Standard Finally the display element of the server should be extended to allow the simultaneous presentation of multiple point clouds. These objects should be individually arrangeable in the display space allowing both movement and rotation. \end_layout \begin_layout Standard As of January 2020 the method for displaying renderings, the server's \noun on OpenGL \noun default window, has been modified such that it can construct and render point clouds from multiple sources. To do so a dynamic sub-system of geometric transformations has been included such that the renders of individual sources are arranged coherently within the space when reconstructed. The default arrangements can be overridden with keyboard controls facilitating arbitrary placement and rotation of separate sources within the \noun on OpenGL \noun default window's co-ordinate space. \end_layout \begin_layout Subsection Geometric Transformations \end_layout \begin_layout Standard Within the \noun on LiveScan3D \noun default server source code are utility structures and classes which were extended in order to develop a wider geometric manipulation system. Structures defining Cartesian coordinates in both 3D and 2D spaces called \noun on Point3f \noun default and \noun on Point2f \noun default respectively are used in drawing skeletons. There is also a class defining an affine transformation. \end_layout \begin_layout Standard Affine transformations are a family of geometric transformations that preserve parallel lines within geometric spaces. Some examples of affine transformations include scaling, reflection, rotation, translation and shearing. \end_layout \begin_layout Standard The class definition is made up of a three-by-three transformation matrix and single 3D vector for translation, within the initial code it is used for both camera poses and world transformations. \end_layout \begin_layout Standard A camera pose is the affine transformation defining the position and orientation of the \noun on Kinect \noun default camera when drawn in the \noun on OpenGL \noun default space as a green cross. The world transformations are used when using multiple sensors simultaneously. When completing the calibration process, the origin of the \noun on OpenGL \noun default space shifts from being the position of the single \noun on Kinect \noun default sensor to being the calibration markers that each camera now orbits. The server, however, still receives renders from each sensor defined by their own Euclidean space and as such the server must transform each view into a composite one. The world transforms define the transformations for each sensor that correctly construct a calibrated 3D render. \end_layout \begin_layout Standard When considering how each source's render would be arranged in the space the use of this class definition of affine transformations was extended. As the use of the class is fairly limited within the base source code, some utility classes and functions were required in order to fully maximise their effectiveness. \end_layout \begin_layout Standard The \noun on Transformer \noun default class has static methods to apply \noun on AffineTransform \noun default s to both \noun on Point3f \noun default structures and raw vertices when received from \noun on LiveScan \noun default clients. \end_layout \begin_layout Standard It also has static methods to generate affine transformations for rotations in each axis given an arbitrary angle. This provided a foundation on which to define how the \noun on OpenGL \noun default space would arrange separate sources within it's combined co-ordinate space. \end_layout \begin_layout Subsection Separation of Network and Presentation Layer \end_layout \begin_layout Standard During initial testing frames received from a live sensor were intercepted and serialized to XML files in local storage. These frames were loaded back as the server started and the values were merged with those received live before display. \end_layout \begin_layout Standard The composite frame can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:Initial-composite-frame" plural "false" caps "false" noprefix "false" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/pretransform.jpg lyxscale 10 width 50col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout Initial composite testing frame \begin_inset CommandInset label LatexCommand label name "fig:Initial-composite-frame" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard The objects can be seen to be occupying the same space due to their similar positions in the frame. This is not a sufficient solution for displaying separate sources and so geometric transformations like those mentioned above were employed. This can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:Initial-testing-layout" plural "false" caps "false" noprefix "false" \end_inset . A rotation of 180° in the \begin_inset Formula $y$ \end_inset axis pivoted the frames such that they faced those being received live, the results can be seen in figure \begin_inset CommandInset ref LatexCommand ref reference "fig:180-degree-rotation" plural "false" caps "false" noprefix "false" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/local-testing.png lyxscale 30 width 70col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout Initial testing process transforming frames loaded from local storage \begin_inset CommandInset label LatexCommand label name "fig:Initial-testing-layout" \end_inset \end_layout \end_inset \end_layout \begin_layout Plain Layout \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/180flip.jpg lyxscale 10 width 50col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout Composite frame following 180° rotation of recorded frame in \begin_inset Formula $y$ \end_inset axis \begin_inset CommandInset label LatexCommand label name "fig:180-degree-rotation" \end_inset \end_layout \end_inset \end_layout \begin_layout Plain Layout \end_layout \end_inset \end_layout \begin_layout Standard At this point it was noted that transforming and arranging figures within the main window before passing the \noun on OpenGL \noun default window a complete point cloud spreads responsibility for the display process logic to the main window. \end_layout \begin_layout Standard \noun on LiveScan3D \noun default is capable of supporting more display methods than just the native \noun on OpenGL \noun default implementation with versions available for both \noun on Microsoft Hololens \noun default and Mobile AR applications. Therefore when designing the multi-source capabilities the separation of logic between the network and presentation layer will be important. The way in which the \noun on OpenGL \noun default window arranges the figures within should be defined by the \noun on OpenGL \noun default window. The network layer should be display agnostic and not make assumptions about how the display will process figures. \end_layout \begin_layout Standard In order to follow this design the transformations were moved to instead occur within the \noun on OpenGL \noun default window class. To allow this the shared variables between the \noun on MainWindow \noun default and \noun on OpenGL \noun default were changed. The Frame structure was defined to wrap an individual point cloud with a client ID to allow differentiation. The structure holds fields for each of the lists previously shared between the two objects including a list of vertices or co-ordinates and the RGB values for each as well as the camera poses and bodies. \end_layout \begin_layout Standard The original \noun on LiveScan3D \noun default cleared each of these variables for each newly retrieved frame, when moving to a multi-source architecture the ability to individually update source point clouds was required. \end_layout \begin_layout Standard To accomplish this a dictionary was used as the shared variable with each clients frame being keyed by it's client ID. In doing so only one frame per client is kept and each new frame overrides the last. During rendering the dictionary is iterated through and each point cloud combined. Before combination a client specific transformation is retrieved from an instance of the \noun on DisplayFrameTransformer \noun default class. This object is a member of the \noun on OpenGL \noun default window and is responsible for defining the orientation and position of each point cloud. \end_layout \begin_layout Subsection DisplayFrameTransformer \end_layout \begin_layout Standard The \noun on DisplayFrameTransformer \noun default is responsible for generating transformations for the sources displayed within the \noun on OpenGL \noun default window. \end_layout \begin_layout Standard Each client is assigned a default transformation which can be overridden using keyboard controls. \end_layout \begin_layout Standard Clients are initially arranged in a circle in around the origin in the center of the space. This is done by retrieving a transformation for a rotation in the \begin_inset Formula $y$ \end_inset axis for each client number, \begin_inset Formula $n$ \end_inset , using the below, \end_layout \begin_layout Standard \begin_inset Formula \[ \alpha\left(n\right)=\frac{n}{client\:total}\cdotp360\textdegree \] \end_inset \end_layout \begin_layout Standard Similar to the shared variables between the \noun on MainWindow \noun default and \noun on OpenGL \noun default window, client transformations are stored within a dictionary indexed by client ID. \end_layout \begin_layout Standard The \noun on DisplayFrameTransformer \noun default also has methods to override these initial transforms with the RotateClient() and TranslateClient() methods. When these methods are called for the first time for a client an object defining the position and rotation is pulled from the default rotation. From here the presence of a client override leads returned transforms to be defined by these values instead. \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Graphics filename ../media/december-state.png lyxscale 30 width 60col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout Current state of \noun on LiveScan \noun default server structure with \noun on OpenGL \noun default window-based transformer \begin_inset CommandInset label LatexCommand label name "fig:current-state-diagram" \end_inset \end_layout \end_inset \end_layout \begin_layout Plain Layout \end_layout \end_inset \end_layout \begin_layout Section Future Work \end_layout \begin_layout Standard Following the extension of the \noun on OpenGL \noun default window, the network layer of the \noun on KinectServer \noun default can now be developed and tested using a fully functional display method. \end_layout \begin_layout Section Summary \end_layout \begin_layout Section Conclusions \end_layout \begin_layout Standard \begin_inset Newpage pagebreak \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset bibtex LatexCommand bibtex btprint "btPrintCited" bibfiles "references" options "bibtotoc" \end_inset \end_layout \begin_layout Standard \start_of_appendix \begin_inset FloatList figure \end_inset \end_layout \end_body \end_document