Learning Robot Vision under Insufficient Data

· LinkÃķping Studies in Science and Technology. Dissertations āšŦāšŧāš§āš—āšĩ 1 · LinkÃķping University Electronic Press
āš›āšķāŧ‰āšĄāš­āšĩāššāšļāš
57
āŧœāŧ‰āšē
āššāŧāŧˆāŧ„āš”āŧ‰āšĒāšąāŧ‰āš‡āšĒāš·āš™āšāšēāš™āšˆāšąāš”āš­āšąāš™āš”āšąāšš āŧāšĨāš° āš„āšģāš•āšīāšŠāšŧāšĄ āšŠāšķāšāšŠāšēāŧ€āšžāšĩāŧˆāšĄāŧ€āš•āšĩāšĄ

āšāŧˆāš―āš§āšāšąāššāš›āšķāŧ‰āšĄ e-book āš™āšĩāŧ‰

Machine learning is used today in a wide variety of applications, especially within computer vision, robotics, and autonomous systems. Example use cases include detecting people or other objects using cameras in autonomous vehicles, or navigating robots through collision-free paths to solve different tasks. The flexibility of machine learning is attractive as it can be applied to a wide variety of challenging tasks, without detailed prior knowledge of the problem domain. However, training machine learning models requires vast amounts of data, which leads to a significant manual effort, both for collecting the data and for annotating it. 

In this thesis, we study and develop methods for training machine learning models under in-sufficient data within computer vision, robotics, and autonomous systems, for the purpose of reducing the manual effort. In summary, we study (1) weakly-supervised learning for reducing the annotation cost, (2) methods for reducing model bias under highly imbalanced training data,(3) methods for obtaining trustworthy uncertainty estimates, and (4) the use of simulated and semi-virtual environments for reducing the amount of real-world data in reinforcement learning. 

In the first part of this thesis, we investigate how weakly-supervised learning can be used within image segmentation. In contrast to fully supervised learning, weakly-supervised learning uses a weaker form of annotation, which reduces the annotation effort. Typically, in image segmentation, each object needs to be precisely annotated in every image on the pixel level. Creating this type of annotation is both time consuming and costly. In weakly-supervised segmentation, however, the only information required is which objects are depicted in the images. This significantly reduces the annotation time. In Papers A and B, we propose two loss functions for improving the predicted object segmentations, especially their contours, in weakly-supervised segmentation. 

In the next part of the thesis, we tackle class imbalance in image classification. During data collection, some classes naturally occur more frequently than others, which leads to an imbalance in the amount of data between the different classes. Models trained on such datasets may become biased towards the more common classes. Overcoming this effect by collecting more data of the rare classes may take a very long time. Instead, we develop an ensemble method for image classification in Paper C, which is unbiased despite being trained on highly imbalanced data. 

When using machine learning models within autonomous systems, a desirable property for them is to predict trustworthy uncertainty estimates. This is especially important when the training data is limited, as the probability for encountering previously unseen cases is large. In short, a model making a prediction with a certain confidence should be correct with the corresponding probability. This is not the case in general, as machine learning models are notorious for predicting overconfident uncertainty estimates. We apply methods for improving the uncertainty estimates for classification in Paper C and for regression in Paper D. 

In the final part of this thesis, we utilize reinforcement learning for teaching a robot to perform coverage path planning, e.g. for lawn mowing or search-and-rescue. In reinforcement learning, the robot interacts with an environment and gets rewards based on how well it solves the task. Initially, its actions are random, which improve over time as it explores the environment and gathers data. It typically takes a long time for this learning process to converge. This is problematic in real-world environments where the robot needs to operate during the full duration, which may require human supervision. At the same time, a large variety in the training data is important for generalisation, which is difficult to achieve in real-world environments. Instead, we utilize a simulated environment in Paper E for accelerating the training process, where we procedurally generate random environments. To simplify the transfer from simulation to reality, we fine-tune the model in a semi-virtual indoor environment on the real robot in Paper F. 

MaskininlÃĪrning anvÃĪnds idag i bred utstrÃĪckning inom mÃĨnga omrÃĨden, och i synnerhet in-om datorseende, robotik, och autonoma system. Det kan till exempel anvÃĪndas fÃķr att detektera mÃĪnniskor och andra fÃķremÃĨl med kameror i autonoma bilar, eller fÃķr att styra robotar lÃĪngs kollisionsfria banor fÃķr att lÃķsa diverse uppgifter. Flexibiliteten i maskininlÃĪrning ÃĪr attraktiv dÃĨ den kan tillÃĪmpas fÃķr att lÃķsa svÃĨra problem utan detaljkÃĪnnedom inom problemdomÃĪnen i frÃĨga. Dock krÃĪvs en stor mÃĪngd data fÃķr att trÃĪna maskininlÃĪrningsmodeller, vilket medfÃķr en stor manuell arbetsbÃķrda, dels fÃķr att samla in data, och dels fÃķr att annotera insamlade data.

I denna avhandling undersÃķker och utvecklar vi metoder fÃķr att trÃĪna maskininlÃĪrningsmodeller med begrÃĪnsad tillgÃĨng till data inom datorseende, robotik och autonoma system, i syfte att minska den manuella arbetsbÃķrdan. Sammanfattningsvis undersÃķker vi (1) svagt vÃĪglett lÃĪran-de fÃķr att minska annoteringstiden, (2) metoder som ÃĪr opartiska under hÃķgt obalanserade data,(3) metoder fÃķr att erhÃĨlla pÃĨlitliga osÃĪkerhetsskattningar, och (4) simulerings- och semivirtuella miljÃķer fÃķr att minska mÃĪngden riktiga data fÃķr fÃķrstÃĪrkningsinlÃĪrning.

I den fÃķrsta delen av avhandlingen undersÃķker vi hur svagt vÃĪglett lÃĪrande (eng. weakly-supervised learning) kan anvÃĪndas inom bildsegmentering. Till skillnad frÃĨn fullt vÃĪglett lÃĪrande anvÃĪnds en svagare annoteringsform, vilket medfÃķr en minskning i den manuella annoterings-bÃķrdan. FÃķr bildsegmentering krÃĪvs i vanliga fall en noggrann annotering av varje enskilt objekt i varje bild pÃĨ pixelnivÃĨ. Att skapa denna typ av annotering ÃĪr bÃĨde tidskrÃĪvande och kostsam. Med svagt vÃĪglett lÃĪrande krÃĪvs endast kÃĪnnedom om vilka typer av objekt som finns i varje bild, vilket avsevÃĪrt minskar annoteringstiden. I Artikel A och B utformar vi tvÃĨ mÃĨlfunktioner som ÃĪr anpassade fÃķr att bÃĪttre segmentera objekt av intresse, i synnerhet deras konturer.

I nÃĪsta del hanterar vi en oÃķnskad effekt som kan uppstÃĨ under datainsamlingen. Vissa typer av klasser fÃķrekommer naturligt oftare ÃĪn andra, vilket leder till att det blir en obalans av mÃĪngden data emellan olika klasser. En modell som ÃĪr trÃĪnad pÃĨ en sÃĨdan datamÃĪngd kan bli partisk mot de klasser som fÃķrekommer oftare. Om vissa klasser ÃĪr sÃĪllsynta kan det ta vÃĪldigt lÃĨng tid att samla in tillrÃĪckligt mycket data fÃķr att Ãķverkomma den effekten. FÃķr att motverka effekten i bildklassificering utvecklar vi en ensemblemetod i Artikel C som ÃĪr opartisk, trots att den ÃĪr trÃĪnad pÃĨ hÃķgt obalanserade data.

FÃķr att maskininlÃĪrningsmodeller ska vara anvÃĪndbara inom autonoma system ÃĪr det fÃķrdelaktigt om de pÃĨ ett pÃĨlitligt sÃĪtt kan skatta sin osÃĪkerhet. Detta ÃĪr sÃĪrskilt viktigt vid begrÃĪnsad trÃĪningsdata, eftersom sannolikheten Ãķkar fÃķr att okÃĪnda situationer uppstÃĨr som modellen inte har sett under trÃĪning. I korthet bÃķr en modell som gÃķr en skattning med en viss sÃĪkerhet vara korrekt med motsvarande sannolikhet. Detta ÃĪr inte fallet generellt fÃķr maskininlÃĪrningsmodeller, utan de har en tendens att vara Ãķverdrivet sjÃĪlvsÃĪkra. Vi tillÃĪmpar metoder fÃķr att fÃķrbÃĪttra osÃĪkerhetsskattningen fÃķr klassificering i Artikel C och fÃķr regression i Artikel D.

I den sista delen av avhandlingen undersÃķker vi hur fÃķrstÃĪrkningsinlÃĪrning (eng. reinforcement learning) kan tillÃĪmpas fÃķr att lÃĪra en robot yttÃĪckningsplanering, exempelvis fÃķr grÃĪsklippning eller fÃķr att hitta fÃķrsvunna personer. Under fÃķrstÃĪrkningsinlÃĪrning interagerar roboten i den tilltÃĪnkta miljÃķn, och fÃĨr belÃķningar baserat pÃĨ hur vÃĪl den utfÃķr uppgiften. Initialt ÃĪr dess handlingar slumpmÃĪssiga som sedan fÃķrbÃĪttras Ãķver tid. I mÃĨnga fall tar detta vÃĪldigt lÃĨng tid, vilket ÃĪr problematiskt i verkliga miljÃķer dÃĨ roboten behÃķver hÃĨllas i drift under hela trÃĪningsprocessen. Samtidigt ÃĪr varierande trÃĪningsmiljÃķer viktiga fÃķr generalisering till nya miljÃķer, vilket ÃĪr svÃĨrt att ÃĨstadkomma. IstÃĪllet anvÃĪnder vi en simulerad miljÃķ i Artikel E fÃķr att pÃĨskynda trÃĪnings-processen dÃĪr vi utnyttjar slumpmÃĪssigt genererade miljÃķer. FÃķr att sedan fÃķrenkla ÃķvergÃĨngen frÃĨn simulering till verklighet finjusterar vi modellen i en semivirtuell inomhusmiljÃķ i Artikel F.  

āŧƒāšŦāŧ‰āš„āš°āŧāš™āš™ e-book āš™āšĩāŧ‰

āššāš­āšāšžāš§āšāŧ€āšŪāšŧāšēāš§āŧˆāšēāš—āŧˆāšēāš™āš„āšīāš”āŧāš™āš§āŧƒāš”.

āš­āŧˆāšēāš™â€‹āš‚āŧāŧ‰â€‹āšĄāšđāš™â€‹āš‚āŧˆāšēāš§â€‹āšŠāšēāš™

āšŠāš°āšĄāšēāš”āŧ‚āšŸāš™ āŧāšĨāš° āŧāš—āšąāššāŧ€āšĨāšąāš”
āš•āšīāš”āš•āšąāŧ‰āš‡ āŧāš­āšąāšš Google Play Books āšŠāšģāšĨāšąāšš Android āŧāšĨāš° iPad/iPhone. āšĄāšąāš™āšŠāšīāŧ‰āš‡āš‚āŧāŧ‰āšĄāšđāš™āŧ‚āš”āšāš­āšąāš”āš•āš°āŧ‚āš™āšĄāšąāš”āšāšąāššāššāšąāš™āšŠāšĩāš‚āš­āš‡āš—āŧˆāšēāš™ āŧāšĨāš° āš­āš°āš™āšļāšāšēāš”āŧƒāšŦāŧ‰āš—āŧˆāšēāš™āš­āŧˆāšēāš™āš—āšēāš‡āš­āš­āš™āšĨāšēāš āšŦāšžāš· āŧāššāššāš­āš­āššāšĨāšēāšāŧ„āš”āŧ‰ āššāŧāŧˆāš§āŧˆāšēāš—āŧˆāšēāš™āšˆāš°āšĒāšđāŧˆāŧƒāšŠ.
āŧāšĨāšąāššāš—āšąāš­āšš āŧāšĨāš° āš„āš­āšĄāšžāšīāš§āŧ€āš•āšĩ
āš—āŧˆāšēāš™āšŠāšēāšĄāšēāš”āšŸāšąāš‡āš›āšķāŧ‰āšĄāšŠāš―āš‡āš—āšĩāŧˆāšŠāš·āŧ‰āŧƒāš™ Google Play āŧ‚āš”āšāŧƒāšŠāŧ‰āŧ‚āš›āšĢāŧāšāšĢāšĄāš—āŧˆāš­āš‡āŧ€āš§āšąāššāš‚āš­āš‡āš„āš­āšĄāšžāšīāš§āŧ€āš•āšĩāš‚āš­āš‡āš—āŧˆāšēāš™āŧ„āš”āŧ‰.
eReaders āŧāšĨāš°āš­āšļāš›āš°āšāš­āš™āš­āš·āŧˆāš™āŧ†
āŧ€āšžāš·āŧˆāš­āš­āŧˆāšēāš™āŧƒāš™āš­āšļāš›āš°āšāš­āš™ e-ink āŧ€āšŠāšąāŧˆāš™: Kobo eReader, āš—āŧˆāšēāš™āšˆāšģāŧ€āš›āšąāš™āš•āŧ‰āš­āš‡āš”āšēāš§āŧ‚āšŦāšžāš”āŧ„āšŸāšĨāŧŒ āŧāšĨāš° āŧ‚āš­āš™āšāŧ‰āšēāšāšĄāšąāš™āŧ„āš›āŧƒāšŠāŧˆāš­āšļāš›āš°āšāš­āš™āš‚āš­āš‡āš—āŧˆāšēāš™āšāŧˆāš­āš™. āš›āš°āš•āšīāššāšąāš”āš•āšēāšĄāš„āšģāŧāš™āš°āš™āšģāšĨāš°āš­āš―āš”āš‚āš­āš‡ āšŠāšđāš™āšŠāŧˆāš§āšāŧ€āšŦāšžāš·āš­ āŧ€āšžāš·āŧˆāš­āŧ‚āš­āš™āšāŧ‰āšēāšāŧ„āšŸāšĨāŧŒāŧ„āŧƒāšŠāŧˆ eReader āš—āšĩāŧˆāšŪāš­āš‡āšŪāšąāšš.