Kód: 06842339
The amount of information available on the web and other electronic formats is increasing at a rapid rate. Moreover, e-mails are now becoming the preferred mode of communication. This thesis investigates various Information Extrac ... celý popis
Nákupem získáte 148 bodů
The amount of information available on the web and other electronic formats is increasing at a rapid rate. Moreover, e-mails are now becoming the preferred mode of communication. This thesis investigates various Information Extraction techniques (Tokenization, POS Tagger, Chunker, NER, Co-reference Resolution) and develops a system that inferences calendar appointments from a user's e-mail account. More specifically, the system identifies the subject, date and time of an appointment and upon user confirmation enters it into a calendar service. It makes use of an intelligent user feedback mechanism that helps tailor the system towards individual users. A novel approach adopted towards constructing rules to identify entities in the absence of a domain relevant corpus, reinstates the importance of a rule-based approach towards building a Named Entity Recognizer. It allows the system to be easily extended and helps identify unseen patterns without much domain expertise. Finally, the thesis tries to provide a data format that could be used in future systems, paving the way for a world in which devices could truly communicate with each other.
Zařazení knihy Knihy v angličtině Computing & information technology Information technology: general issues
1475 Kč
Osobní odběr Praha, Brno a 12903 dalších
Copyright ©2008-24 nejlevnejsi-knihy.cz Všechna práva vyhrazenaSoukromíCookies
Nákupní košík ( prázdný )