• No results found

Normalization and parsing algorithms for uncertain input

N/A
N/A
Protected

Academic year: 2021

Share "Normalization and parsing algorithms for uncertain input"

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Normalization and parsing algorithms for uncertain input

van der Goot, Rob Matthijs

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van der Goot, R. M. (2019). Normalization and parsing algorithms for uncertain input. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Stellingen

behorende bij het proefschrift

Normalization and Parsing Algorithms

for Uncertain Input

van

Rob van der Goot

1. The hardest part of the normalization problem, is knowing when to normalize.

2. When applying a POS tagger on social media data, normalizing the input before tagging it is beneficial. If the tagger is also trained on social media data, normalizing the training data leads to further improvements.

3. Current state-of-the-art syntactic parsers perform well on news texts (> 90% accuracy), but experience a huge performance drop when applied to social media texts (≈ 65% accuracy).

4. Using normalization as a pre-processing step is effective for con-stituency parsing and dependency parsing of tweets.

5. For a constituency parser, integrating the normalization leads to an even better performance compared to the direct use of normalization. This can be done by representing the top-n normalization candidates as a word graph, and then using this word graph as input to the parser.

6. When integrating normalization, paraphrasing certain words with incorrect normalizations leads to higher parser performance.

7. For a neural network parser, integration of normalization can be done by merging the vectors of the top-n normalization candidates, weighted by the probability from the normalization model.

8. Even when using gold normalization, parser performance on tweets is still far from what is achieved on news texts. Complementary methods are necessary.

9. lmao kause I kan it ain’t English klass, its twittr — lia, 2018 10. The ability to speak does not make you intelligent. — Qui Gon Jin,

Referenties

GERELATEERDE DOCUMENTEN

In this section, we report the distribution of spelling errors in our corpus (Section 4.1 ), the evaluation of spelling correction (Section 4.2 ) and detection methods (Section 4.3 )

In recent years, user-generated data from social media that contains information about health, such as patient forum posts or health-related tweets, has been used extensively

In this thesis, collected Dutch user generated content from three domains was normalized using a state-of-the-art normalization model after which both the original data and

This issue of Research Activities contains two articles on research that has been done in the framework of the European IMMORTAL project: the SWOV study into the influence

foto b: in de humeuze lagen L.A en D bevonden zich een aantal slakkenhuisjes van de tuinslak Een schuin aflopende ingeslibte lichtgrijze laag (laag F + E) van eolisch zand, leem

The information infrastructure, required to support the discovery process in biosciences is complex and utilizes a range of computational technologies, including data management

they needed.. Access to information, mo st importantly government info rmat ion, i s one of the main i ss u es that changed for the better. Government services are

The sub-array loess normalization methods described in this article are based on the fact that dye balance typically varies with spot intensity and with spatial position on the